Large-Scale Video Retrieval Using Image Queries

Retrieving videos from large repositories using image queries is important for many applications, such as brand monitoring or content linking. We introduce a new retrieval architecture, in which the image query can be compared directly with database videos-significantly improving retrieval scalabili...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on circuits and systems for video technology Ročník 28; číslo 6; s. 1406 - 1420
Hlavní autoři: Araujo, Andre, Girod, Bernd
Médium: Journal Article
Jazyk:angličtina
Vydáno: New York IEEE 01.06.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Témata:
ISSN:1051-8215, 1558-2205
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Retrieving videos from large repositories using image queries is important for many applications, such as brand monitoring or content linking. We introduce a new retrieval architecture, in which the image query can be compared directly with database videos-significantly improving retrieval scalability compared with a baseline system that searches the database on a video frame level. Matching an image to a video is an inherently asymmetric problem. We propose an asymmetric comparison technique for Fisher vectors and systematically explore query or database items with varying amounts of clutter, showing the benefits of the proposed technique. We then propose novel video descriptors that can be compared directly with image descriptors. We start by constructing Fisher vectors for video segments, by exploring different aggregation techniques. For a database of lecture videos, such methods obtain a two orders of magnitude compression gain with respect to a frame-based scheme, with no loss in retrieval accuracy. Then, we consider the design of video descriptors, which combine Fisher embedding with hashing techniques, in a flexible framework based on Bloom filters. Large-scale experiments using three datasets show that this technique enables faster and more memory-efficient retrieval, compared with a frame-based method, with similar accuracy. The proposed techniques are further compared against pre-trained convolutional neural network features, outperforming them on three datasets by a substantial margin.
Bibliografie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2017.2667710