Latent Dirichlet Co-Clustering

We present a generative model for simultaneously clustering documents and terms. Our model is a four-level hierarchical Bayesian model, in which each document is modeled as a random mixture of document topics , where each topic is a distribution over some segments of the text. Each of these segments...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Sixth International Conference on Data Mining (ICDM'06) S. 542 - 551
Hauptverfasser: Shafiei, M.M., Milios, E.E.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.12.2006
Schlagworte:
ISBN:9780769527017, 0769527019
ISSN:1550-4786
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We present a generative model for simultaneously clustering documents and terms. Our model is a four-level hierarchical Bayesian model, in which each document is modeled as a random mixture of document topics , where each topic is a distribution over some segments of the text. Each of these segments in the document can be modeled as a mixture of word topics where each topic is a distribution over words. We present efficient approximate inference techniques based on Markov Chain Monte Carlo method and a moment-matching algorithm for empirical Bayes parameter estimation. We report results in document modeling, document and term clustering, comparing to other topic models, Clustering and Co-Clustering algorithms including latent Dirichlet allocation (LDA), model-based overlapping clustering (MOC), model-based overlapping co-clustering (MOCC) and information-theoretic co-clustering (ITCC).
ISBN:9780769527017
0769527019
ISSN:1550-4786
DOI:10.1109/ICDM.2006.94