Field programmable gate array implementation of variable-bins high efficiency video coding CABAC decoder with path delay optimisation

Context-based adaptive binary arithmetic coding (CABAC) is a single operation mode for entropy coding in the last video coding standard high-efficiency video coding. For high-resolution applications, the throughput of one bin/cycle is not sufficient and it is a very challenging task to implement pip...

Full description

Saved in:
Bibliographic Details
Published in:IET image processing Vol. 13; no. 6; pp. 954 - 963
Main Authors: Menasri, Wahiba, Skoudarli, Abdellah, Belhadj, Aichouche, Azzaz, Mohamed Salah
Format: Journal Article
Language:English
Published: The Institution of Engineering and Technology 01.05.2019
Subjects:
ISSN:1751-9659, 1751-9667
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Context-based adaptive binary arithmetic coding (CABAC) is a single operation mode for entropy coding in the last video coding standard high-efficiency video coding. For high-resolution applications, the throughput of one bin/cycle is not sufficient and it is a very challenging task to implement pipeline and/or parallel CABAC decoding architecture by simply adding more stages. Indeed, the tight data dependencies make it difficult to parallelise and cause it to be a throughput bottleneck for video decoding. Consequently, in order to improve the CABAC decoder throughput, parallel and pipeline architectures are used in authors’ design. In this work, an algorithm-architecture adequation is proposed to implement a CABAC decoder on a field programmable gate array. Mainly, a new classification of 32 syntax elements is given to speed up the authors’ solution. Furthermore, the context selection and modelling of regular syntax elements are studied, designed and implemented. Finally, a novel technique of memories rearrangement to reduce the critical path delay required to process each binary symbol is proposed. As a result, the implementation can process 2.2 bins/cycle when operated at 123.49 MHz and exhibits an improved high-throughput of 271.678 Mbins/s. The hardware architecture is coded using hardware description language and synthesised using ISE Xilinx tools targeting the Virtex4 platform.
ISSN:1751-9659
1751-9667
DOI:10.1049/iet-ipr.2018.6336