In EDS ansehen

Microdosing for low bitrate video compression

Gespeichert in:

Bibliographische Detailangaben
Titel:	Microdosing for low bitrate video compression
Patent Number:	12010,335
Publikationsdatum:	June 11, 2024
Appl. No:	17/704722
Application Filed:	March 25, 2022
Abstract:	A system includes a machine learning (ML) model-based video encoder configured to receive an uncompressed video sequence including multiple video frames, determine, from among the multiple video frames, a first video frame subset and a second video frame subset, encode the first video frame subset to produce a first compressed video frame subset, and identify a first decompression data for the first compressed video frame subset. The ML model-based video encoder is further configured to encode the second video frame subset to produce a second compressed video frame subset, and identify a second decompression data for the second compressed video frame subset. The first decompression data is specific to decoding the first compressed video frame subset but not the second compressed video frame subset, and the second decompression data is specific to decoding the second compressed video frame subset but not the first compressed video frame subset.
Inventors:	Disney Enterprises, Inc. (Burbank, CA, US); ETH Zürich (EIDGENÖSSISCHE TECHNISCHE HOCHSCHULE ZÜRICH) (Zürich, CH)
Assignees:	Disney Enterprises, Inc. (Burbank, CA, US)
Claim:	1. A system comprising: a machine learning (ML) model-based video encoder; and an ML model-based video decoder comprising a degradation-aware block based Micro-Residual-Network (MicroRN) defined by a number of hidden channels and a number of degradation-aware blocks of the MicroRN, the MicroRN configured to decode a first compressed video frame subset using a first decompression data, and decode a second compressed video frame subset using a second decompression data, without utilizing a residual network of a generative adversarial network (GAN) trained decoder; the ML model-based video encoder configured to: receive an uncompressed video sequence including a plurality of video frames; determine, from among the plurality of video frames, a first video frame subset and a second video frame subset; encode the first video frame subset to produce the first compressed video frame subset; identify the first decompression data for the first compressed video frame subset; encode the second video frame subset to produce the second compressed video frame subset; and identify the second decompression data for the second compressed video frame subset.
Claim:	2. The system of claim 1 , wherein identifying the first decompression data comprises overfitting the first decompression data during the encoding of the first video frame subset, and wherein identifying the second decompression data comprises overfitting the second decompression data during the encoding of the second video frame subset.
Claim:	3. The system of claim 1 , wherein: the ML model-based video encoder is further configured to: transmit, to the ML model-based video decoder, the first compressed video frame subset, the second compressed video frame subset, the first decompression data, and the second decompression data; the ML model-based video decoder is configured to: receive the first compressed video frame subset, the second compressed video frame subset, the first second decompression data, and the second decompression data; decode the first compressed video frame subset using the first decompression data; and decode the second compressed video frame subset using the second decompression data.
Claim:	4. The system of claim 3 , wherein the first decompression data is received only once for decoding of the first compressed video frame subset, and wherein the second decompression data is received only once for decoding of the second compressed video frame subset.
Claim:	5. The system of claim 1 , wherein the first decompression data is specific to decoding the first compressed video frame subset but not the second compressed video frame subset, and the second decompression data is specific to decoding the second compressed video frame subset but not the first compressed video frame subset.
Claim:	6. The system of claim 1 , wherein the first decompression data and the second decompression data contain only weights of the MicroRN.
Claim:	7. The system of claim 1 , wherein the ML model-based video encoder comprises a High-Fidelity Compression (HiFiC) encoder, and wherein the ML model-based video decoder includes at least ten times fewer parameters than a HiFiC decoder not using the first decompression data and the second decompression data.
Claim:	8. The system of claim 1 , wherein the ML model-based video encoder comprises a HiFiC encoder, and wherein the ML model-based video decoder is fifty percent faster than a HiFiC decoder not using the first decompression data and the second decompression data.
Claim:	9. A method for use by a system including a machine learning (ML) model-based video encoder and an ML model-based video decoder comprising a degradation-aware block based Micro-Residual-Network (MicroRN) defined by a number of hidden channels and a number of degradation-aware blocks of the MicroRN, the MicroRN configured to decode a first compressed video frame subset using a first decompression data, and decode a second compressed video frame subset using a second decompression data, without utilizing a residual network of a generative adversarial network (GAN) trained decoder, the method comprising: receiving, by the ML model-based video encoder, an uncompressed video sequence including a plurality of video frames; determining, by the ML model-based video encoder from among the plurality of video frames, a first video frame subset and a second video frame subset; encoding, by the ML model-based video encoder, the first video frame subset to produce the first compressed video frame subset; identifying, by the ML model-based video encoder, the first decompression data for the first compressed video frame subset; encoding, by the ML model-based video encoder, the second video frame subset to produce the second compressed video frame subset; and identifying, by the ML model-based video encoder, the second decompression data for the second compressed video frame subset.
Claim:	10. The method of claim 9 , wherein identifying the first decompression data comprises overfitting the first decompression data during the encoding of the first video frame subset, and wherein identifying the second decompression data comprises overfitting the second decompression data during the encoding of the second video frame subset.
Claim:	11. The method of claim 9 , further comprising: transmitting, by the ML model-based video encoder, the first compressed video frame subset, second compressed video frame subset, the first decompression data, and second decompression data to an ML model-based video decoder; receiving, by the ML model-based video decoder, the first compressed video frame subset, second compressed video frame subset, the first decompression data, and second decompression data; decoding, by the ML model-based video decoder, the first compressed video frame subset using the first decompression data; and decoding, by the ML model-based video decoder, the second compressed video frame subset using the second decompression data.
Claim:	12. The method of claim 11 , wherein the first decompression data is received only once for decoding of the first compressed video frame subset, and wherein the second decompression data is received only once for decoding of the second compressed video frame subset.
Claim:	13. The method of claim 11 , wherein the first decompression data is specific to decoding the first compressed video frame subset but not the second compressed video frame subset, and the second decompression data is specific to decoding the second compressed video frame subset but not the first compressed video frame subset.
Claim:	14. The method of claim 9 , wherein the first decompression data and the second decompression data contain only weights of the MicroRN.
Claim:	15. The method of claim 9 , wherein the ML model-based video encoder comprises a High-Fidelity Compression (HiFiC) encoder, and wherein the ML model-based video decoder includes at least ten times fewer parameters than a HiFiC decoder not using the first decompression data and the second decompression data.
Claim:	16. The method of claim 9 , wherein the ML model-based video encoder comprises a HiFiC encoder, and wherein the ML model-based video decoder is fifty percent faster than a HiFiC decoder not using the first decompression data and the second decompression data.
Patent References Cited:	20210067808 March 2021 Schroers 20210099731 April 2021 Zhai 20220086463 March 2022 Coban 20220103839 March 2022 Van Rozendaal 2020-136884 August 2020 2020136884 August 2020 2020/107877 June 2020
Other References:	Chang et al. (“TinyGAN: Distilling BigGAN for Conditional Image Generation” Institute of Informational Science, Academia Sinica, Taiwan; hereinafter Chang). (Year: 2020). cited by examiner Eirikur Agustsson, Michael Tschannen, Fabian Mentzer, Radu Timofte, and Luc Van Gool. Generative adversarial networks for extreme learned image compression. In The IEEE InternationalConference on Computer Vision (ICCV), Oct. 2019. cited by applicant Johannes Ballé, Valero Laparra, and Eero P. Simoncelli. End-to-end optimized image compression. CoRR, abs/1611.01704, 2016. cited by applicant Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. Variational image compression with a scale hyperprior. ICLR, 2018. cited by applicant Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, and Alexander Kolesnikov. Knowledge distillation: A good teacher is patient and consistent. arXiv preprint arXiv:2106.05237, 2021. cited by applicant Andrew Brock, Jeff Donahue, and Karen Simonyan. Large scale GAN training for high fidelity natural image synthesis. CoRR, abs/1809.11096, 2018. cited by applicant Ting-Yun Chang and Chi-Jen Lu. Tinygan: Distilling biggan for conditional image generation. CoRR, abs/2009.13829, 2020. cited by applicant Abdelaziz Djelouah, Joaquim Campos, Simone Schaub-Meyer, and Christopher Schroers. Neural inter-frame compression for video coding. In Proceedings of the IEEE International Conference of Computer Vision, pp. 6421-6429, 2019b. cited by applicant Leonhard Helminger, Abdelaziz Djelouah, Markus Gross, and Christopher Schroers. Lossy image compression with normalizing flows. arXiv preprint arXiv:2008.10486, 2020. cited by applicant Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Yoshua Bengio and Yann LeCun, editors, 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, 2015. cited by applicant Théo Ladune, Pierrick Philippe, Wassim Hamidouche, Lu Zhang, and Olivier Déforges. Conditional coding for flexible learned video compression. arXiv preprint arXiv:2104.07930, 2021. cited by applicant Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, and Zhiyong Gao. DVC: an end-to-end deep video compression framework. In IEEE Conference on Computer Vi-sion and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, Jun. 16-20, 2019, pp. 11006-11015. Computer Vision Foundation / IEEE, 2019a. cited by applicant Theo Ladune, Pierrick Philippe, Wassim Hamidouche, Lu Zhang, Olivier Deforges “Optical Flow and Mode Selection for Learning-based Video Coding” Aug. 6, 2020 6 Pgs. cited by applicant Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao “DVC: An End-to-end Deep Video Compression Framework” 2019 IEEE/CVF Conference on Cmputer Vision and Pattern Recognition (CVPR) pp. 10998-11007. cited by applicant Siwei Ma, Xinfeng Zhang, Chuanmin Jia, Zhenghui Zhao, Shiqi Wang, Shanshe Wang “Image and Video Compression with Neural Networks: A Review” IEEE Transactions on Circuits and Systems for Video Technology Apr. 10, 2019 16 Pgs. cited by applicant Adam Golinski, Reza Pourreza, Yang Yang, Guillaume Sautiere, Taco S. Cohen “Feedback Recurrent Autoencoder for Video Compression” Apr. 9, 2020 29 Pgs. cited by applicant Leonhard Helminger, Roberto Azevedo, Abdelaziz Djelouah, Markus Gross, Christopher Schroers “Microdosing: Knowledge Distillation for GAN based Compression” 2021 13 Pgs. cited by applicant Extended European Search Report dated Oct. 10, 2022 for EP Application 22165891.7. cited by applicant Abdelaziz Djelough, Joaquim Campos, Simone Schaub-Meyer, Christopher Schroers “Neural Inter-Frame Compression for Video Coding” 2019 IEEE/CVF International Conference on Computer Vision (IVVC). pp. 6420-6428. cited by applicant Fabian Mentzer, George Toderici, Michael Tschannen, Eirikur Agustsson “High-Fidelity Generative Image Compression” 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada pp. 1-20. cited by applicant Ties van Rozendaal, Iris A.M. Huijben, Taco S. Cohen “Overfitting for Fun and Profit: Instance-Adaptive Data Compression” Published as a Conference Paper at ICLR 2021 pp. 1-18. cited by applicant Ying-Yun Chang and Chi-Jen Lu “TinyGAN: Distilling BigGAN for Conditional Image Generation” Institute of Informational Science, Academia Sinica, Taiwan. pp. 1-21. cited by applicant Extended European Search Report dated Sep. 7, 2022 for EP Application 22165396.0. 12 Pgs. cited by applicant Theo Ladune, Pierrick Philippe, Wassim Hamidouche, Lu Zhang, Olivier Deforges “Optical Flow and Mode Selection for Learning-based Video Coding” MMSP 2020, IEEE 22nd International Workshop on Multimedia Signal Processing, Sep. 2020, Tampere, Finland. hal-02911680 6 Pgs. cited by applicant Guo Lu, Wanli Ouyang, Dong Xu, Xiaoyun Zhang, Chunlei Cai, Zhiyong Gao “DVC: An End-to-end Deep Video Compression Framework” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 10 Pgs. cited by applicant Siwei Ma, Xinfeng Zhang, Chuanmin Jia, Zhenghui Zhao, Shiqi Wang and Shanshe Wang “Image and Video Compression with Neural Networks: A Review” IEEE Transactions on Circuits and Systems for Video Technology Apr. 2019 16 Pgs. cited by applicant Ties van Rozendall, Iris A.M. Huijben, Taco S. Cohen “Overfitting for Fun and Profit: Instance-Adaptive Data Compression” ICLR 2021 Conference Jan. 2021 18 Pgs. cited by applicant Ting-Yun Chen, Chi-Jen Lu “TinhGAN: Distilling BigGAN for Conditional Image Generation” ACCV 2020. Lecture Notes in Computer Science, vol. 12625 16 Pgs. cited by applicant CA Office Action dated Aug. 29, 2023 for JP Application 2022-062152. cited by applicant CA Office Action dated Nov. 29, 2023 for Korean Patent Application 10-2022-0041682. cited by applicant
Primary Examiner:	Fereja, Samuel D
Attorney, Agent or Firm:	Farjami & Farjami LLP
Dokumentencode:	edspgr.12010335
Datenbank:	USPTO Patent Grants

View record in USPTO Patent Grants

Beschreibung
Abstract:	A system includes a machine learning (ML) model-based video encoder configured to receive an uncompressed video sequence including multiple video frames, determine, from among the multiple video frames, a first video frame subset and a second video frame subset, encode the first video frame subset to produce a first compressed video frame subset, and identify a first decompression data for the first compressed video frame subset. The ML model-based video encoder is further configured to encode the second video frame subset to produce a second compressed video frame subset, and identify a second decompression data for the second compressed video frame subset. The first decompression data is specific to decoding the first compressed video frame subset but not the second compressed video frame subset, and the second decompression data is specific to decoding the second compressed video frame subset but not the first compressed video frame subset.