PixelSieve: Towards Efficient Activity Analysis From Compressed Video Streams

Pixel-level data redundancy in video induces additional memory and computing overhead when neural networks are employed to mine spatiotemporal patterns, e.g. activity and event labels from video streams. This work proposes PixelSieve, to enable highly efficient CNN-based activity analysis directly f...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2021 58th ACM/IEEE Design Automation Conference (DAC) s. 811 - 816
Hlavní autoři:	Wang, Yongchen, Wang, Ying, Li, Huawei, Li, Xiaowei
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 05.12.2021
Témata:	Decoding Design automation Metadata Neural networks Redundancy Spatiotemporal phenomena Streaming media
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Pixel-level data redundancy in video induces additional memory and computing overhead when neural networks are employed to mine spatiotemporal patterns, e.g. activity and event labels from video streams. This work proposes PixelSieve, to enable highly efficient CNN-based activity analysis directly from video data in compressed formats. Instead of recovering original RGB frames from compressed video, PixelSieve utilizes the built-in metadata in compressed video streams to distill only the critical pixels that render relevant spatiotemporal features, and then conducts efficient CNN inference with the condensed inputs. PixelSieve removes the overhead of video decoding and significantly improves the performance of CNN-based video analysis by 4.5x on average.
DOI:	10.1109/DAC18074.2021.9586310