Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalab...

Full description

Saved in:

Bibliographic Details
Published in:	2014 IEEE Conference on Computer Vision and Pattern Recognition pp. 580 - 587
Main Authors:	Girshick, Ross, Donahue, Jeff, Darrell, Trevor, Malik, Jitendra
Format:	Conference Proceeding Journal Article
Language:	English
Published:	IEEE 01.06.2014
Subjects:	Computer vision Conferences Feature extraction Hierarchies Neural networks Object detection Pattern recognition Proposals Support vector machines Tasks Training Vectors Visualization Volatile organic compounds
ISSN:	1063-6919, 1063-6919
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also present experiments that provide insight into what the network learns, revealing a rich hierarchy of image features. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2
ISSN:	1063-6919 1063-6919
DOI:	10.1109/CVPR.2014.81