Object detection informed encoding

Gespeichert in:
Bibliographische Detailangaben
Titel: Object detection informed encoding
Patent Number: 10205,953
Publikationsdatum: February 12, 2019
Appl. No: 13/359377
Application Filed: January 26, 2012
Abstract: Embodiments of the present invention provide techniques for coding video data efficiently based on detection of objects within video sequences. A video coder may perform object detection on the frame and when an object is detected, develop statistics of an area of the frame in which the object is located. The video coder may compare pixels adjacent to the object location to the object's statistics and may define an object region to include pixel blocks corresponding to the object's location and pixel blocks corresponding to adjacent pixels having similar statistics as the detected object. The coder may code the video frame according to a block-based compression algorithm wherein pixel blocks of the object region are coded according to coding parameters generating relatively high quality coding and pixel blocks outside the object region are coded according to coding parameters generating relatively lower quality coding.
Inventors: Price, Douglas Scott (San Jose, CA, US); Wu, Hsi-Jung (San Jose, CA, US); Zhou, Xiaosong (Campbell, CA, US); Zhang, Dazhong (Milpitas, CA, US)
Assignees: APPLE INC. (Cupertino, CA, US)
Claim: 1. A video coding method, comprising: parsing a frame to be coded into a plurality of pixel blocks, performing object detection on the plurality of pixel blocks, and when an object is detected, defining a final object region by: defining an initial object region representing an area of the frame in which the object is located and sampling the pixel blocks in the initial object region to develop statistics of the detected object, comparing statistics of pixel blocks not in the initial object region but adjacent to the initial object region to the object statistics of the detected object based on a variable similarity threshold, for pixel blocks adjacent to at least two pixel blocks of the initial object region, lowering the variable similarity threshold for the comparison to increase a probability of admission of the at least two pixel blocks into final object region, growing the final object region to include: (1) the pixel blocks of the initial object region, and (2) when the lowered variable similarity thershold is met, the pixel blocks adjacent to the at least two pixel blocks of the initial object region identified by the comparison as having similar statistics to the detected object statistics, reducing a quality of coding parameters of the pixel blocks outside the final object region according to each pixel block's respective distance from the final object region; defining a plurality of sub-regions of the frame outside the object region; assigning each pixel block outside the object region to a respective sub-region according to that pixel block's quality of coding parameters; and coding the frame according to a block-based compression algorithm wherein pixel blocks of the final object region are coded according to coding parameters generating relatively high quality coding and pixel blocks of each of the sub-regions are coded according to the quality of coding parameters of the pixel blocks within each sub-region.
Claim: 2. The method of claim 1 , wherein the coding parameters include quantization parameters and the quantization parameters of the object region pixel blocks are generally lower than the quantization parameters of the non-object region pixel blocks.
Claim: 3. The method of claim 1 , wherein the coding parameters include coding mode selections and coding mode selections of non-object region pixel blocks are set to SKIP.
Claim: 4. The method of claim 1 , further comprising, prior to coding, applying a blurring filter to pixels in spatial areas outside the final object region.
Claim: 5. The method of claim 1 , wherein the quality reduction is a quantization parameter adjustment.
Claim: 6. The method of claim 1 , wherein the sub-regions form halos of pixel blocks around the object region.
Claim: 7. The method of claim 1 , wherein the object detection is face detection.
Claim: 8. The method of claim 7 , further comprising detecting whether the face exhibits a predetermined expression.
Claim: 9. The method of claim 1 , further comprising adding to the final object region pixel blocks surrounded by pixel blocks in the object region.
Claim: 10. The method of claim 1 , further comprising applying a blur filter to a pixel block and varying a strength of the blur filter according to an amount of estimated motion associated with the pixel block.
Claim: 11. A video coding method, comprising: parsing a frame to be coded into a plurality of pixel blocks, performing object detection on the plurality of pixel blocks, and when an object is detected, defining a final object region by: defining an initial object region representing an area of the frame in which the object is located and sampling the pixel blocks in the initial object region to develop statistics of the initial object region, comparing statistics of pixel blocks not in the initial object region but adjacent to the initial object region to the object statistics of the detected object based on a variable similarity threshold, for pixel blocks adjacent to at least two pixel blocks of the initial object region, lowering the variable similarity threshold for the comparison to increase a probability of admission of the at least two pixel blocks into the final object region, growing the final object region to include (1) the pixel blocks of the initial object region, and(2) when the lowered variable similarity threshold is met, the pixel blocks adjacent to the at least two pixel blocks of the initial object region identified by the comparison as having similar statistics to the detected object statistics, increasing quantization parameters of the pixel blocks outside the final object region according to each pixel block's respective distance from the final object region, defining a plurality of sub-regions of the frame outside the object region, assigning each pixel block outside the final object region to a respective sub-region according to that pixel block's quantization parameters, and coding the frame according to a block-based compression algorithm wherein pixel blocks of the final object region are coded according to relatively lower quantization parameters and pixel blocks of each of the sub-regions outside the final object region are coded according to generally higher quantization parameters of the pixel block's within each sub-region.
Claim: 12. The method of claim 11 , wherein the increasing of quantization parameters varies linearly based on a distance of each sub-region's distance from the object region.
Claim: 13. The method of claim 11 , wherein the increasing of quantization parameters varies non-uniformly based on a distance of each sub-region's distance from the object region.
Claim: 14. The method of claim 11 , wherein the sub-regions form halos of pixel blocks around the object region.
Claim: 15. The method of claim 11 , further comprising, prior to coding, applying a blurring filter to pixels in spatial areas outside the final object region.
Claim: 16. The method of claim 11 , wherein the object detection is face detection.
Claim: 17. A video coding method, comprising: performing face detection on a video frame to be coded, when a face is detected, sampling content of pixel blocks in an initial face polygon identified by the face detection to generate statistics of pixels in the sampled pixel blocks, comparing the statistics of the sampled pixels to content of pixels outside the initial face polygon but adjacent to the initial face polygon based on a variable similarity threshold, for pixel blocks that are adjacent to at least two pixel blocks of the initial face polygon, lowering the variable similarity threshold for the comparison to increase a probability of admission of the at least two pixel blocks into the initial face polygon, growing the initial face polygon to a final face polygon to include (1) pixels in the initial face polygon, and (2) when the lowered variable similarity threshold is met, pixel blocks adjacent to the at least two pixel blocks filtering content of pixel blocks for which the comparison determines are not part of a face, increasing quantization parameters of the pixel blocks outside the final face polygon according to each pixel block's respective distance from the final face polygon, defining a plurality of sub-regions of the frame around the final face polygon, assigning the pixels outside the final face to a respective sub-region according the pixels' quantization parameters, and coding the video frame according to a block-based compression algorithm that employs a quantization parameter, wherein the pixel blocks inside the final face polygon are coded with a quantization parameter having lower values for pixel blocks within the final face polygon and pixel blocks of each sub-region outside the final face polygon are coded according to generally higher quantization parameters within each sub-region.
Claim: 18. The method of claim 17 , wherein the sub-regions form halos of pixel blocks around the face polygon.
Claim: 19. A coding apparatus, comprising: a coding engine to code frames of a video sequence according to predictive coding techniques applied to pixel blocks of the frames, an object detector to identify locations of objects within frames of the video sequence for an initial object region, a controller to grow the initial object region to a final object region to include: (1) pixel blocks corresponding to the initial object region, and (2) when a condition of a lowered variable similarity threshold is met, pixel blocks adjacent to at least two pixel blocks of the initial object region that share common statistics as the pixel blocks in the initial object region based on a comparison of statistics of the pixel blocks in the initial object region and the adjacent pixel blocks using a variable similarity threshold, wherein the variable similarity threshold is lowered to increase a probability of admission of the at least two pixel blocks into the final object region, and the controller to reduce a quality of coding parameters of the pixel blocks outside the final object region according to each pixel block's respective distance from the final object region, define coding regions within the frames, the coding regions including the final object region and a plurality of sub-regions formed based on the quality of coding parameters of each pixel block outside the final object region, the controller to adjust coding parameters applied by the coding engine providing relatively high quality coding for pixel blocks in the final object region and the pixel blocks in each sub-region are coded according to the quality of coding parameters of the pixel blocks within each sub-region.
Claim: 20. The apparatus of claim 19 , further comprising a blur filter applying pre-processing to the video sequence prior to coding by the coding engine, the blur filter to apply blurring to frame data corresponding to the sub-regions.
Claim: 21. The apparatus of claim 20 , wherein the blur filter applies increasing levels of blur for pixels at higher distances from the object region and lower levels of blur for pixels at lower distances from the object region.
Claim: 22. The apparatus of claim 20 , further comprising a motion estimator, wherein the blur filter applies increasing levels of blur for pixels outside the object region for frames having higher levels of motion and lower levels of blur for pixels outside the object region for frames having lower levels of motion.
Claim: 23. The apparatus of claim 19 , wherein the object detector is a face detector.
Claim: 24. The apparatus of claim 19 , wherein the controller provides quantization parameter adjustments to the coding engine, including higher quantization parameter adjustments to pixel blocks of the sub-regions.
Claim: 25. The apparatus of claim 19 , wherein the controller provides coding mode selections to the coding engine, including SKIP mode assignments to select pixel blocks of the sub-regions.
Claim: 26. A non-transitory computer readable medium storing program instructions that, when executed by a processor, cause the processor to: parse a frame to be coded into a plurality of pixel blocks, perform object detection on the plurality of pixel blocks, when an object is detected, define a final object region: define an initial object region representing an area of the frame in which the object is located and sample pixel blocks in the initial object region to develop statistics of the detected object, compare pixels not in the initial object region but adjacent to the initial object region to the object statistics of the detected object based on a variable similarity threshold, for pixel blocks adjacent to at least two pixel blocks of the initial object region, lower the variable similarity threshold for the comparison to increase a probability of admission of the at least two pixel blocks into the final object region, grow the final object region to include: (1) pixel blocks corresponding to the initial object region, and (2) when the lowered variable similarity threshold is met, the pixel blocks adjacent to the at least two pixel blocks of the initial object region identified by the comparison as having similar statistics as the detected object statistics, reduce a quality of coding parameters of the pixel blocks outside the final object region according to each pixel block's respective distance from the final object region, define a plurality of sub-regions of the frame outside the final object region, assign each pixel block outside the final object region to a respective sub-region according to that pixel block's quality of coding parameters, and code the frame according to a block-based compression algorithm wherein pixel blocks of the final object region are coded according to coding parameters generating relatively high quality coding and pixel blocks in each sub-region outside the final object region are coded according to the quality of coding parameters of the pixel blocks within each sub-region.
Claim: 27. The medium of claim 26 , wherein the coding parameters include quantization parameters and the quantization parameters of the object region pixel blocks are generally lower than the quantization parameters of the non-object region pixel blocks.
Claim: 28. The medium of claim 26 , wherein the coding parameters include coding mode selections and coding mode selections of non-object region pixel blocks are set to SKIP.
Claim: 29. The medium of claim 26 , wherein the instructions further cause the processor to, prior to coding, apply a blurring filter to pixels in spatial areas outside the final object region.
Claim: 30. A non-transitory physical computer readable storage medium storing coded video data generated by an encoder that: parsed a frame to be coded into a plurality of pixel blocks, performed object detection on the frame, when an object is detected, defined a final object region by: having defined an initial object region representing an area of the frame in which the object is located and sampled the pixel blocks in the initial object region to develop statistics of the detected object, having compared statistics of pixel blocks not in the initial object region but adjacent to the initial object region to the object statistics of the detected object based on a variable similarity threshold, for pixel blocks adjacent to at least two pixel blocks of the initial object region, having lowered the variable similarity threshold for the comparison to increase a probability of admission of the at least two pixel blocks into the final object region, having grown the final object region to include: (1) the pixel blocks corresponding to the initial object region, and (2)when the lowered variable similarity threshold is met, the pixel blocks adjacent to the at least two pixel blocks of the initial object region identified by the comparison as having similar statistics to the detected object statistics, having reduced a quality of coding parameters of the pixel blocks outside the final object region according to each pixel block's respective distance from the final object region, having defined a plurality of sub-regions of the frame outside the object region, having assigned each pixel block outside the final object region to a respective sub-region according to that pixel block's quality of coding parameters, and having coded the frame according to a block-based compression algorithm wherein pixel blocks of the final object region were coded according to coding parameters generating relatively high quality coding and pixel blocks in the sub-regions outside the final object region were coded according to the quality of coding parameters of the pixel blocks within each sub-region.
Patent References Cited: 5852669 December 1998 Eleftheriadis et al.
6173069 January 2001 Daly et al.
6453069 September 2002 Matsugu
6594375 July 2003 Kato et al.
6987889 January 2006 Horowitz
7031517 April 2006 Le
7269292 September 2007 Steinberg
7310435 December 2007 Mallya
7620218 November 2009 Steinberg
8520892 August 2013 Kuleschow
8655102 February 2014 Smith
2001/0016066 August 2001 Amonou
2002/0164074 November 2002 Matsugu
2002/0172426 November 2002 Honda et al.
2003/0099386 May 2003 Schneider
2003/0128882 July 2003 Kim et al.
2004/0130546 July 2004 Porikli
2005/0058345 March 2005 Koide
2006/0204113 September 2006 Wang et al.
2007/0154096 July 2007 Cao
2007/0248164 October 2007 Zuo et al.
2008/0152245 June 2008 El-Maleh
2008/0267498 October 2008 Shaw
2009/0010328 January 2009 Pan
2009/0202169 August 2009 Hayashi
2009/0310822 December 2009 Chang
2009/0324113 December 2009 Lu et al.
2010/0124274 May 2010 Cheok et al.
2012/0114231 May 2012 Bushell
2012/0281904 November 2012 Gong
2014/0118578 May 2014 Sasaki
2007-228614 September 2007
2008-199521 August 2008
2009-005238 January 2009
10-2002-0077093 October 2002
10-2010-0002632 January 2010
10-2010-0095833 September 2010


Other References: International Search Report, dated Apr. 25, 2013, from corresponding International Patent Application No. PCT/US2013/023118 filed Jan. 25, 2013. cited by applicant
Moschetti et al., “Automatic Object Extraction and Dynamic Bitrate Allocation for Second Generation Video Coding,” Proceedings of IEEE International Conference in Lausanne, Switzerland, Aug. 26-29, 2002, IEEE, vol. 1, Aug. 26, 2002. cited by applicant
Meguro et al., “Object Extraction from Image Sequence Based on Correction of Segmented Regions in Each of the Consecutive Frames”, IMPS 2001, The Proceeding of the 6th Image Media Processing Symposium, The Institute of Electronics Information and Communication Engineers, Institute of Technical Committee on Image Engineering, pp. 21-22, Nov. 14, 2001 (English language abstract provided on p. 9 of the attached non-patent literature). cited by applicant
Assistant Examiner: Adrovel, William
Primary Examiner: Navas, Jr., Edemio
Attorney, Agent or Firm: Baker & Hostetler LLP
Dokumentencode: edspgr.10205953
Datenbank: USPTO Patent Grants
Beschreibung
Abstract:Embodiments of the present invention provide techniques for coding video data efficiently based on detection of objects within video sequences. A video coder may perform object detection on the frame and when an object is detected, develop statistics of an area of the frame in which the object is located. The video coder may compare pixels adjacent to the object location to the object's statistics and may define an object region to include pixel blocks corresponding to the object's location and pixel blocks corresponding to adjacent pixels having similar statistics as the detected object. The coder may code the video frame according to a block-based compression algorithm wherein pixel blocks of the object region are coded according to coding parameters generating relatively high quality coding and pixel blocks outside the object region are coded according to coding parameters generating relatively lower quality coding.