Unified process and syntax for generalized prediction in video coding/decoding

Saved in:
Bibliographic Details
Title: Unified process and syntax for generalized prediction in video coding/decoding
Patent Number: 12149,731
Publication Date: November 19, 2024
Appl. No: 18/100667
Application Filed: January 24, 2023
Abstract: At least a method and an apparatus are provided for efficiently encoding or decoding video. For example, a plurality of different motion prediction modes are obtained for a current block. The current block is encoded or decoded based on a combination of the plurality of different motion prediction modes with corresponding weights, wherein the combination with the corresponding weights comprising at least two inter prediction modes, or an inter prediction mode and an intra prediction mode. Both triangle prediction and multi-hypothesis prediction are allowed to be indicated in one or more lists of possible motion vector candidates, such as, e.g., in advanced motion vector prediction (AMVP) mode.
Inventors: INTERDIGITAL MADISON PATENT HOLDINGS, SAS (Paris, FR)
Assignees: INTERDIGITAL MADISON PATENT HOLDINGS, SAS (Paris, FR)
Claim: 1. A method for video decoding, comprising: obtaining video data representative of an image block to be decoded; and decoding the image block from the obtained video data; wherein, in a case where bi-prediction applies to the image block with motion vector prediction, decoding comprises decoding a first flag that indicates whether weighted bi-prediction applies to the image block and, in the case where the first flag indicates weighted bi-prediction applies, decoding a second flag that indicates whether weighted bi-prediction is done by using at least two different weights inside the image block, and otherwise applying regular bi-prediction to the image block; in a case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, decoding comprises decoding a third flag that indicates whether inter prediction is combined with intra prediction and, otherwise, decoding an index in a table of weights pairs and applying generalized bi-prediction to the image block responsive to the weights pair indexed by the index; and in a case where the third flag indicates inter prediction combined with intra prediction, decoding comprises decoding an intra direction and applying multi-hypothesis prediction to the image block responsive to the intra direction, and, otherwise, decoding a partition type and applying triangle prediction to the image block responsive to the partition type.
Claim: 2. The method of claim 1 , wherein, in the case where merge mode applies to the image block, decoding comprises decoding the second flag and, in the case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block decoding the third flag that indicates whether inter prediction is combined with intra prediction and, otherwise, decoding a fourth flag that indicates whether affine merge applies and decoding merge or affine merge indices responsive to a value of the fourth flag, and wherein, in the case where third flag indicates inter prediction is combined with intra prediction, decoding an intra direction and a merge index for an inter predictor and applying multi-hypothesis prediction to the image block responsive to the intra direction and merge index and, otherwise, decoding a partition type and triangle predictors indices and applying a triangle prediction to the image block responsive to the partition type and triangle predictors indices.
Claim: 3. The method of claim 1 , wherein, in the case where skip mode applies to the image block, decoding comprises decoding the second flag and, in the case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, decoding triangle predictors indices and applying triangle prediction to the image block responsive to the triangle predictors indices and, otherwise, decoding a fourth flag that indicates whether affine merge applies and decoding merge or affine merge indices responsive to a value of the fourth flag.
Claim: 4. The method of claim 1 , wherein, in context adaptive binary arithmetic coding, a context for the first flag is based on neighboring blocks decoded using generalized bi-prediction, triangle prediction, or multi-hypothesis prediction, a context for the second flag is based on neighboring blocks decoded using triangle prediction or multi-hypothesis prediction, and a context for the third flag is based on neighboring blocks decoded using triangle prediction.
Claim: 5. An apparatus for video decoding, comprising one or more processors, wherein the one or more processors are configured to implement: obtaining video data representative of an image block to be decoded; and decoding the image block from the obtained video data; wherein, in case where bi-prediction applies to the image block with motion vector prediction, the decoding comprises to decoding a first flag that indicates whether weighted bi-prediction applies to the image block and, in the case where the first flag indicates weighted bi-prediction applies, decoding a second flag that indicates whether weighted bi-prediction is done by using at least two different weights inside the image block, and otherwise applying regular bi-prediction to the image block; in a case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, decoding comprises decoding a third flag that indicates whether inter prediction is combined with intra prediction and, otherwise, decoding an index in a table of weights pairs and applying generalized bi-prediction to the image block responsive to the weights pair indexed by the index; and in a case where the third flag indicates inter prediction combined with intra prediction, decoding comprises decoding an intra direction and applying multi-hypothesis prediction to the image block responsive to the intra direction, and, otherwise, decoding a partition type and applying triangle prediction to the image block responsive to the partition type.
Claim: 6. The apparatus of claim 5 , wherein, in the case where merge mode applies to the image block, decoding comprises decoding the second flag and, in the case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, decoding the third flag that indicates whether inter prediction is combined with intra prediction and, otherwise, decoding a fourth flag that indicates whether affine merge applies and decoding merge or affine merge indices responsive to a value of the fourth flag, and wherein, in the case where third flag indicates inter prediction is combined with intra prediction, decoding an intra direction and a merge index for an inter predictor and applying multi-hypothesis prediction to the image block responsive to the intra direction and merge index and, otherwise, to decoding a partition type and triangle predictors indices and applying a triangle prediction to the image block responsive to the partition type and triangle predictors indices.
Claim: 7. The apparatus of claim 5 , wherein, in the case where skip mode applies to the image block, decoding comprises decoding the second flag and, in the case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, decoding triangle predictors indices and applying triangle prediction to the image block responsive to the triangle predictors indices and, otherwise, decoding a fourth flag that indicates whether affine merge applies and decoding merge or affine merge indices responsive to a value of the fourth flag.
Claim: 8. The apparatus of claim 5 , wherein, in context adaptive binary arithmetic coding, a context for the first flag is based on neighboring blocks decoded using generalized bi-prediction, triangle prediction, or multi-hypothesis prediction, a context for the second flag is based on neighboring blocks decoded using triangle prediction or multi-hypothesis prediction, and a context for the third flag is based on neighboring blocks decoded using triangle prediction.
Claim: 9. A method for video encoding, comprising: obtaining an image block to be encoded; and encoding the image block into encoded data; wherein, in case where bi-prediction applies to the image block with motion vector prediction, encoding comprises encoding a first flag that indicates whether weighted bi-prediction applies to the image block and, in the case where the first flag indicates weighted bi-prediction applies, encoding a second flag that indicates whether weighted bi-prediction is done by using at least two different weights inside the image block, and otherwise applying regular bi-prediction to the image block; in a case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, encoding comprises encoding a third flag that indicates whether inter prediction is combined with intra prediction and, otherwise, encoding an index in a table of weights pairs and applying generalized bi-prediction to the image block responsive to the weights pair indexed by the index; and in a case where the third flag indicates inter prediction combined with intra prediction, encoding comprises encoding an intra direction and applying multi-hypothesis prediction to the image block responsive to the intra direction, and, otherwise, encoding a partition type and applying triangle prediction to the image block responsive to the partition type.
Claim: 10. The method of claim 9 , wherein, in the case where merge mode applies to the image block, encoding comprises encoding the second flag and, in the case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, encoding the third flag that indicates whether inter prediction is combined with intra prediction and, otherwise, encoding a fourth flag that indicates whether affine merge applies and encoding merge or affine merge indices responsive to a value of the fourth flag, and wherein, in the case where third flag indicates inter prediction is combined with intra prediction, encoding an intra direction and a merge index for an inter predictor and applying multi-hypothesis prediction to the image block responsive to the intra direction and merge index and, otherwise, encoding a partition type and triangle predictors indices and applying a triangle prediction to the image block responsive to the partition type and triangle predictors indices.
Claim: 11. The method of claim 9 , wherein, in the case where skip mode applies to the image block, encoding comprises encoding the second flag and, in the case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, encoding triangle predictors indices and applying triangle prediction to the image block responsive to the triangle predictors indices and, otherwise, encoding a fourth flag that indicates whether affine merge applies and encoding merge or affine merge indices responsive to a value of the fourth flag.
Claim: 12. The method of claim 9 , wherein, in context adaptive binary arithmetic coding, a context for the first flag is based on neighboring blocks encoded using generalized bi-prediction, triangle prediction, or multi-hypothesis prediction, a context for the second flag is based on neighboring blocks encoded using triangle prediction or multi-hypothesis prediction, and a context for the third flag is based on neighboring blocks encoded using triangle prediction.
Claim: 13. An apparatus for video encoding, comprising one or more processors, wherein the one or more processors are configured to implement: obtaining an image block to be encoded; and encoding the image block into encoded data; wherein, in case where bi-prediction applies to the image block with motion vector prediction, the encoding comprises encoding a first flag that indicates whether weighted bi-prediction applies to the image block and, in the case where the first flag indicates weighted bi-prediction applies, encoding a second flag that indicates whether weighted bi-prediction is done by using at least two different weights inside the image block, and otherwise applying regular bi-prediction to the image block; in a case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, encoding comprises encoding a third flag that indicates whether inter prediction is combined with intra prediction and, otherwise, encoding an index in a table of weights pairs and applying generalized bi-prediction to the image block responsive to the weights pair indexed by the index; and in a case where the third flag indicates inter prediction combined with intra prediction, encoding comprises encoding an intra direction and applying multi-hypothesis prediction to the image block responsive to the intra direction, and, otherwise, encoding a partition type and applying triangle prediction to the image block responsive to the partition type.
Claim: 14. The apparatus of claim 13 , wherein, in the case where merge mode applies to the image block, encoding comprises encoding the second flag and, in the case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, encoding the third flag that indicates whether inter prediction is combined with intra prediction and, otherwise, encoding a fourth flag that indicates whether affine merge applies and encoding merge or affine merge indices responsive to a value of the fourth flag, and wherein, in the case where third flag indicates inter prediction is combined with intra prediction, encoding an intra direction and a merge index for an inter predictor and applying multi-hypothesis prediction to the image block responsive to the intra direction and merge index and, otherwise, encoding a partition type and triangle predictors indices and applying a triangle prediction to the image block responsive to the partition type and triangle predictors indices.
Claim: 15. The apparatus of claim 13 , wherein, in the case where skip mode applies to the image block, encoding comprises encoding the second flag and, in the case where the second flag indicates weighted bi-prediction is done by using at least two different weights inside the image block, encoding triangle predictors indices and applying triangle prediction to the image block responsive to the triangle predictors indices and, otherwise, encoding a fourth flag that indicates whether affine merge applies and encoding merge or affine merge indices responsive to a value of the fourth flag.
Claim: 16. The apparatus of claim 13 , wherein, in context adaptive binary arithmetic coding, a context for the first flag is based on neighboring blocks encoded using generalized bi-prediction, triangle prediction, or multi-hypothesis prediction, a context for the second flag is based on neighboring blocks encoded using triangle prediction or multi-hypothesis prediction and a context for the third flag is based on neighboring blocks encoded using triangle prediction.
Claim: 17. The method of claim 1 , wherein applying multi-hypothesis prediction comprises combining an inter uni-prediction and an intra prediction or combining an inter bi-prediction and an intra prediction.
Claim: 18. The method of claim 2 , wherein applying multi-hypothesis prediction comprises combining an inter uni-prediction and an intra prediction, a merge index being signaled for the inter uni-prediction.
Claim: 19. The apparatus of claim 5 , wherein applying multi-hypothesis prediction comprises combining an inter uni-prediction and an intra prediction or combining an inter bi-prediction and an intra prediction.
Claim: 20. The apparatus of claim 6 , wherein applying multi-hypothesis prediction comprises combining an inter uni-prediction and an intra prediction, a merge index being signaled for the inter uni-prediction.
Claim: 21. The method of claim 9 , wherein applying multi-hypothesis prediction comprises combining an inter uni-prediction and an intra prediction or combining an inter bi-prediction and an intra prediction.
Claim: 22. The method of claim 10 , wherein applying multi-hypothesis prediction comprises combining an inter uni-prediction and an intra prediction, a merge index being signaled for the inter uni-prediction.
Claim: 23. The apparatus of claim 13 , wherein applying multi-hypothesis prediction comprises combining an inter uni-prediction and an intra prediction or combining an inter bi-prediction and an intra prediction.
Claim: 24. The apparatus of claim 14 , wherein applying multi-hypothesis prediction comprises combining an inter uni-prediction and an intra prediction, a merge index being signaled for the inter uni-prediction.
Patent References Cited: 20110142132 June 2011 Tourapis et al.
20160142729 May 2016 Wang et al.
20170142418 May 2017 Li et al.
20180070101 March 2018 Suzuki
20180278932 September 2018 Mukherjee
20180302642 October 2018 Schwarz
20190037213 January 2019 Hermansson
20190149821 May 2019 Moon
20220014778 January 2022 Galpin et al.
20230254502 August 2023 Zhao
102113326 June 2011
105493505 April 2016
382048 May 2021
3820148 May 2021
2003-032687 January 2003
2010/017166 February 2010
2015/010319 January 2015
2016/034058 March 2016










Other References: Chen, et al., Algorithm Description for Versatile Video Coding and Test Model 2 (VTM 2), Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG11; Editors; 11th Meeting: Ljubljana, SI; JVET-K1002-V2, Jul. 10-18, 2018, 21 pages. cited by applicant
Chen, et al., Generalized Bi-Prediction for Inter Coding, JVET-C0047, InterDigital Communications, Inc., Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 3rd Meeting: Geneva, CH, May 26-Jun. 1, 2016, 4 pages. cited by applicant
Chiang, et al., CE10.1: Combined and Multi-Hypothesis Prediction, JVET-K0257-V1, MediaTek Inc., Joint Video Expe1is Team (JVET) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG 11, 11th Meeting Ljubljana, Slovenia, Jul. 10-18, 2018, 6 pages. cited by applicant
Chiang, et al., CE10.1: Combined and Multi-Hypothesis Prediction for Improving AMVP Mode, Skip or Merge Mode, and Intra Mode, JVET-L0100-V3, MediaTek Inc., Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IED JTC 1/SC 29/WG 11, 12th Meeting, Macao, China, Oct. 3, 2018, 14 pages. cited by applicant
Hsu, et al., Description of Core Experiment 10: Combined and Multi-Hypothesis Prediction, JVET-L1030-V2, CE Coordinators, Joint Video Experts Team (JVET) of ITU-T SG 16 WP3 and ISO/IEC JTC 1/SC 29/WG 11, 12th Meeting: Macao, CN, Oct. 3-12, 2018, 12 pages. cited by applicant
ISO/IEC, Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Video, ISO 13818-2: 1995 (E), Recommendation ITU-T H.262, International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), International Standard 13818-2, Jul. 1995, 211 pages. cited by applicant
ITU-T, Information Technology—Generic Coding of Moving Pictures and Associated Audio Information: Systems, H.222.0, Series H: Audiovisual and Multimedia Systems, Infrastructure of Audiovisual Services-Transmission Multiplexing and Synchronization, Jun. 2012, 228 pages. cited by applicant
ITU-T, Reference Software for ITU-T H.265 High Efficiency Video Coding, Recommendation ITU-T H.265.2, Series H: Audiovisual and Multimedia Systems, Infrastructure of Audiovisual Services—Coding of Moving Video, Oct. 2014, 12 pages. cited by applicant
Liao, et al., CE10 Related: Combining Multi-Hypothesis Prediction with Triangular Prediction Unit Mode, JVET-K0148-V2, Panasonic, Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 11th Meeting: Ljubljana, SI, Jul. 10-18, 2018, 4 pages. cited by applicant
Liao, et al., CE10: Triangular Prediction Unit Mode (CE10.3.1 and CE10.3.2), JVET-K0144-V2, Joint Video Exploration Team (JVET) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG 11, 11th Meeting Ljubljana, Slovenia, Jul. 10, 2018, 6 pages. cited by applicant
Poirier, et al., CE10-Related: Multiple Prediction Unit Shapes, JVET-L0208-V1, Joint Video Experts Team (JVET) of ITU-T SG16 WP3 and ISO/IEC JTC 1/SC 29/WG 11, 12th Meeting: Macao, China, Oct. 3, 2018, 6 pages. cited by applicant
Primary Examiner: Mikeska, Neil R
Attorney, Agent or Firm: Condo Roccia Koptiw LLP
Accession Number: edspgr.12149731
Database: USPTO Patent Grants
Be the first to leave a comment!
You must be logged in first