Factor graph framework for semantic video indexing

Video query by semantic keywords is one of the most challenging research issues in video data management. To go beyond low-level similarity and access video data content by semantics, we need to bridge the gap between the low-level representation and high-level semantics. This is a difficult multime...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on circuits and systems for video technology Vol. 12; no. 1; pp. 40 - 52
Main Authors:	Ramesh Naphade, M., Kozintsev, I.V., Huang, T.S.
Format:	Journal Article
Language:	English
Published:	New York, NY IEEE 01.01.2002 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Applied sciences Artificial intelligence Bridges Computational modeling Computer science; control theory; systems Context modeling Exact sciences and technology Explosions Filtering Forestry Graphs Image processing Indexing Information, signal and communications theory Mathematical models Motion pictures Mountains Pattern recognition. Digital image processing. Computational geometry Probabilistic methods Probability theory Semantics Signal processing Snow Studies Telecommunications and information theory Video data Performance evaluation Multimedia Probabilistic approach Database query Hidden Markov models Likelihood ratio test Feature extraction Graph method Pattern recognition Video signal processing Indexing
ISSN:	1051-8215, 1558-2205
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Video query by semantic keywords is one of the most challenging research issues in video data management. To go beyond low-level similarity and access video data content by semantics, we need to bridge the gap between the low-level representation and high-level semantics. This is a difficult multimedia understanding problem. We formulate this problem as a probabilistic pattern-recognition problem for modeling semantics in terms of concepts and context. To map low-level features to high-level semantics, we propose probabilistic multimedia objects (multijects). Examples of multijects in movies include explosion, mountain, beach, outdoor, music, etc. Semantic concepts in videos interact and appear in context. To model this interaction explicitly, we propose a network of multijects (multinet). To model the multinet computationally, we propose a factor graph framework which can enforce spatio-temporal constraints. Using probabilistic models for multijects, rocks, sky, snow, water-body, and forestry/greenery, and using a factor graph as the multinet, we demonstrate the application of this framework to semantic video indexing. We demonstrate how detection performance can be significantly improved using the multinet to take inter-conceptual relationships into account. Our experiments using a large video database consisting of clips from several movies and based on a set of five semantic concepts reveal a significant improvement in detection performance by over 22%. We also show how the multinet is extended to take temporal correlation into account. By constructing a dynamic multinet, we show that the detection performance is further enhanced by as much as 12%. With this framework, we show how keyword-based query and semantic filtering is possible for a predetermined set of concepts.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23
ISSN:	1051-8215 1558-2205
DOI:	10.1109/76.981844