Multi-modal deep network for RGB-D segmentation of clothes

In this Letter, the authors propose a deep learning based method to perform semantic segmentation of clothes from RGB-D images of people. First, they present a synthetic dataset containing more than 50,000 RGB-D samples of characters in different clothing styles, featuring various poses and environm...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics letters Jg. 56; H. 9; S. 432 - 435
Hauptverfasser:	Joukovsky, B, Hu, P, Munteanu, A
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	The Institution of Engineering and Technology 30.04.2020
Schlagworte:	clothes data generation pipeline deep learning depth images depth modalities different clothing styles ground‐truth label maps Image and vision processing and display technology image colour analysis image fusion image segmentation learning (artificial intelligence) multimodal deep network multimodal features multiscale atrous convolutions novel multimodal encoder–ecoder convolutional network real‐world data RGB‐D segmentation semantic classes semantic segmentation synthetic data synthetic dataset trained fusion modules image fusion different clothing styles synthetic dataset ground-truth label maps multiscale atrous convolutions real-world data depth images semantic segmentation trained fusion modules clothes novel multimodal encoder–ecoder convolutional network deep learning semantic classes data generation pipeline multimodal features image segmentation synthetic data RGB-D segmentation multimodal deep network image colour analysis learning (artificial intelligence) depth modalities
ISSN:	0013-5194, 1350-911X, 1350-911X
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this Letter, the authors propose a deep learning based method to perform semantic segmentation of clothes from RGB-D images of people. First, they present a synthetic dataset containing more than 50,000 RGB-D samples of characters in different clothing styles, featuring various poses and environments for a total of nine semantic classes. The proposed data generation pipeline allows for fast production of RGB, depth images and ground-truth label maps. Secondly, a novel multi-modal encoder–ecoder convolutional network is proposed which operates on RGB and depth modalities. Multi-modal features are merged using trained fusion modules which use multi-scale atrous convolutions in the fusion process. The method is numerically evaluated on synthetic data and visually assessed on real-world data. The experiments demonstrate the efficiency of the proposed model over existing methods.
ISSN:	0013-5194 1350-911X 1350-911X
DOI:	10.1049/el.2019.4150