A Stochastic Technique to Obtain Training Data for Word Segmentation

Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo t...

Celý popis

Uložené v:

Podrobná bibliografia
Vydané v:	Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03 Ročník 3; s. 283 - 286
Hlavní autori:	Fukuda, Takuya, Miura, Takao
Médium:	Konferenčný príspevok..
Jazyk:	English
Vydavateľské údaje:	Washington, DC, USA IEEE Computer Society 15.09.2009 IEEE
Edícia:	ACM Conferences
Predmet:	Computing methodologies > Artificial intelligence > Natural language processing Computing methodologies > Machine learning Computing methodologies > Machine learning > Learning paradigms Computing methodologies > Modeling and simulation > Model development and analysis > Modeling methodologies Markov Chain Monte Carlo (MCMC) method Mathematics of computing > Probability and statistics > Probabilistic algorithms Mathematics of computing > Probability and statistics > Probabilistic reasoning algorithms > Markov-chain Monte Carlo methods Mathematics of computing > Probability and statistics > Probabilistic reasoning algorithms > Sequential Monte Carlo methods Mathematics of computing > Probability and statistics > Probabilistic representations > Markov networks Mathematics of computing > Probability and statistics > Stochastic processes Mathematics of computing > Probability and statistics > Stochastic processes > Markov processes Stochastic processes Stochastic Techniques Theory of computation > Theory and algorithms for application domains > Machine learning theory > Markov decision processes Training data Word Segmentation Word Segmentation Stochastic Techniques Markov Chain Monte Carlo (MCMC) method
ISBN:	0769538010, 9780769538013
On-line prístup:	Získať plný text
Tagy:	Pridať tag Žiadne tagy, Buďte prvý, kto otaguje tento záznam!

Popis
Shrnutí:	Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo test corpus based on Markov process Monte Carlo Method (MCMC), given small amount of test data. In this environment we show nice results using our approach.
ISBN:	0769538010 9780769538013
DOI:	10.1109/WI-IAT.2009.283