Large-Scale Video Retrieval Using Image Queries

Retrieving videos from large repositories using image queries is important for many applications, such as brand monitoring or content linking. We introduce a new retrieval architecture, in which the image query can be compared directly with database videos-significantly improving retrieval scalabili...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on circuits and systems for video technology Vol. 28; no. 6; pp. 1406 - 1420
Main Authors:	Araujo, Andre, Girod, Bernd
Format:	Journal Article
Language:	English
Published:	New York IEEE 01.06.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Artificial neural networks Bloom filter Clutter Datasets Electronic mail fisher vector Image segmentation Indexes large-scale Queries query-by-image Repositories Retrieval Search problems video retrieval Visual databases Visualization
ISSN:	1051-8215, 1558-2205
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Retrieving videos from large repositories using image queries is important for many applications, such as brand monitoring or content linking. We introduce a new retrieval architecture, in which the image query can be compared directly with database videos-significantly improving retrieval scalability compared with a baseline system that searches the database on a video frame level. Matching an image to a video is an inherently asymmetric problem. We propose an asymmetric comparison technique for Fisher vectors and systematically explore query or database items with varying amounts of clutter, showing the benefits of the proposed technique. We then propose novel video descriptors that can be compared directly with image descriptors. We start by constructing Fisher vectors for video segments, by exploring different aggregation techniques. For a database of lecture videos, such methods obtain a two orders of magnitude compression gain with respect to a frame-based scheme, with no loss in retrieval accuracy. Then, we consider the design of video descriptors, which combine Fisher embedding with hashing techniques, in a flexible framework based on Bloom filters. Large-scale experiments using three datasets show that this technique enables faster and more memory-efficient retrieval, compared with a frame-based method, with similar accuracy. The proposed techniques are further compared against pre-trained convolutional neural network features, outperforming them on three datasets by a substantial margin.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2017.2667710