LeSSS: Learned Shared Semantic Spaces for Relating Multi-Modal Representations of 3D Shapes

In this paper, we propose a new method for structuring multi‐modal representations of shapes according to semantic relations. We learn a metric that links semantically similar objects represented in different modalities. First, 3D‐shapes are associated with textual labels by learning how textual att...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer graphics forum Jg. 34; H. 5; S. 141 - 151
Hauptverfasser:	Herzog, Robert, Mewes, Daniel, Wand, Michael, Guibas, Leonidas, Seidel, Hans-Peter
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	Oxford Blackwell Publishing Ltd 01.08.2015
Schlagworte:	3D-shape descriptors Analysis and object representations Artificial Intelligence [I.2.10]: Vision and Scene Understanding Categories and Subject Descriptors (according to ACM CCS) collaborative filtering Computer Graphics [I.3.5]: Computational Geometry and Object Modeling-Curve Computer Graphics [I.3.5]: Computational Geometry and Object Modeling—Curve, surface, solid, and object representations Correlation Image processing systems Labels multi-modal learning object recognition object retrieval Representations semantic correspondences Semantics Shape Image Processing and Computer Vision [I.4.8]: Scene Analysis-Object recognition Similarity solid Studies surface Texts Three dimensional Topological manifolds
ISSN:	0167-7055, 1467-8659
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we propose a new method for structuring multi‐modal representations of shapes according to semantic relations. We learn a metric that links semantically similar objects represented in different modalities. First, 3D‐shapes are associated with textual labels by learning how textual attributes are related to the observed geometry. Correlations between similar labels are captured by simultaneously embedding labels and shape descriptors into a common latent space in which an inner product corresponds to similarity. The mapping is learned robustly by optimizing a rank‐based loss function under a sparseness prior for the spectrum of the matrix of all classifiers. Second, we extend this framework towards relating multi‐modal representations of the geometric objects. The key idea is that weak cues from shared human labels are sufficient to obtain a consistent embedding of related objects even though their representations are not directly comparable. We evaluate our method against common base‐line approaches, investigate the influence of different geometric descriptors, and demonstrate a prototypical multi‐modal browser that relates 3D‐objects with text, photographs, and 2D line sketches.
Bibliographie:	ArticleID:CGF12703 Supporting InformationSupporting Information istex:149E377C35E8B7B851D10E3CBAD33D6AB6B54181 ark:/67375/WNG-MGZDHLCL-B SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0167-7055 1467-8659
DOI:	10.1111/cgf.12703