Multi-modal RGB–Depth–Thermal Human Body Segmentation

This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB–depth–thermal dataset along with a multi-modal segmentation baseline. The several modalities are registered using a calibration device...

Full description

Saved in:

Bibliographic Details
Published in:	International journal of computer vision Vol. 118; no. 2; pp. 217 - 239
Main Authors:	Palmero, Cristina, Clapés, Albert, Bahnsen, Chris, Møgelmose, Andreas, Moeslund, Thomas B., Escalera, Sergio
Format:	Journal Article
Language:	English
Published:	New York Springer US 01.06.2016 Springer Springer Nature B.V
Subjects:	Algorithms Artificial Intelligence Calibration Computer Imaging Computer Science Computer vision Data mining Datasets Feature extraction Human acts Human behavior Human body Human subjects Image Processing and Computer Vision Image processing systems Learning Machine learning Mathematical analysis Pattern Recognition Pattern Recognition and Graphics Probabilistic methods Probability theory Registration Segmentation State of the art Studies Support vector machines Vision Vision systems RGB Depth Human body segmentation Thermal
ISSN:	0920-5691, 1573-1405
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This work addresses the problem of human body segmentation from multi-modal visual cues as a first stage of automatic human behavior analysis. We propose a novel RGB–depth–thermal dataset along with a multi-modal segmentation baseline. The several modalities are registered using a calibration device and a registration algorithm. Our baseline extracts regions of interest using background subtraction, defines a partitioning of the foreground regions into cells, computes a set of image features on those cells using different state-of-the-art feature extractions, and models the distribution of the descriptors per cell using probabilistic models. A supervised learning algorithm then fuses the output likelihoods over cells in a stacked feature vector representation. The baseline, using Gaussian mixture models for the probabilistic modeling and Random Forest for the stacked learning, is superior to other state-of-the-art methods, obtaining an overlap above 75 % on the novel dataset when compared to the manually annotated ground-truth of human segmentations.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0920-5691 1573-1405
DOI:	10.1007/s11263-016-0901-x