A Stochastic Technique to Obtain Training Data for Word Segmentation

Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03 Jg. 3; S. 283 - 286
Hauptverfasser: Fukuda, Takuya, Miura, Takao
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: Washington, DC, USA IEEE Computer Society 15.09.2009
IEEE
Schriftenreihe:ACM Conferences
Schlagworte:
ISBN:0769538010, 9780769538013
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo test corpus based on Markov process Monte Carlo Method (MCMC), given small amount of test data. In this environment we show nice results using our approach.
ISBN:0769538010
9780769538013
DOI:10.1109/WI-IAT.2009.283