A Stochastic Technique to Obtain Training Data for Word Segmentation

Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03 Jg. 3; S. 283 - 286
Hauptverfasser: Fukuda, Takuya, Miura, Takao
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: Washington, DC, USA IEEE Computer Society 15.09.2009
IEEE
Schriftenreihe:ACM Conferences
Schlagworte:
ISBN:0769538010, 9780769538013
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo test corpus based on Markov process Monte Carlo Method (MCMC), given small amount of test data. In this environment we show nice results using our approach.
AbstractList Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More difficulty arises in expertised domains such as medical, mechanical, computer science documents. In this work, we discuss how to obtain pseudo test corpus based on Markov process Monte Carlo Method (MCMC), given small amount of test data. In this environment we show nice results using our approach.
Author Miura, Takao
Fukuda, Takuya
Author_xml – sequence: 1
  givenname: Takuya
  surname: Fukuda
  fullname: Fukuda, Takuya
– sequence: 2
  givenname: Takao
  surname: Miura
  fullname: Miura, Takao
BookMark eNqNkD1PwzAURY0ACVo6M7B4ZEl5L3ZSe6xaPiJV6tCgjtZL7LQGGkNiBv49qcoPYLl3uEd3OCN20YbWMXaLMEUE_bAtkmJeTlMAPU2VOGMTPVMoUykzIVCcsxHMcp0JBQhXbNL3bwCAmILM8mu2nPNNDPWe-uhrXrp63_qvb8dj4Osqkm952Q3p2x1fUiTehI5vQ2f5xu0Oro0UfWhv2GVDH72b_PWYvT49louXZLV-LhbzVUKYpzHJlbZ5rVQNGmwqldVS5aKyQllytkIFLkNCKSw2FpEykpJmVaN0TaIhEGN2d_r1zjnz2fkDdT8mS1UG4rjen1aqD6YK4b03COboyGwLMzgyR0dmcDSg03-ipuq8a8Qve_hnMQ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/WI-IAT.2009.283
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library (IEL) (UW System Shared)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781424453313
1424453313
EndPage 286
ExternalDocumentID 5285030
Genre orig-research
GroupedDBID 6IE
6IF
6IK
6IL
6IN
AAJGR
AARBI
ACM
ADPZR
ALMA_UNASSIGNED_HOLDINGS
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
GUFHI
IERZE
OCL
RIB
RIC
RIE
RIL
AAWTH
LHSKQ
ID FETCH-LOGICAL-a162t-689d6c88c090d248d94863bd38daedb180e51a143d1fd11a5a44a7bf89ca3fa03
IEDL.DBID RIE
ISBN 0769538010
9780769538013
IngestDate Wed Aug 27 01:35:35 EDT 2025
Wed Jan 31 06:41:49 EST 2024
IsPeerReviewed false
IsScholarly false
Keywords Word Segmentation
Stochastic Techniques
Markov Chain Monte Carlo (MCMC) method
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a162t-689d6c88c090d248d94863bd38daedb180e51a143d1fd11a5a44a7bf89ca3fa03
PageCount 4
ParticipantIDs acm_books_10_1109_WI_IAT_2009_283
acm_books_10_1109_WI_IAT_2009_283_brief
ieee_primary_5285030
PublicationCentury 2000
PublicationDate 20090915
2009-Sept.
PublicationDateYYYYMMDD 2009-09-15
2009-09-01
PublicationDate_xml – month: 09
  year: 2009
  text: 20090915
  day: 15
PublicationDecade 2000
PublicationPlace Washington, DC, USA
PublicationPlace_xml – name: Washington, DC, USA
PublicationSeriesTitle ACM Conferences
PublicationTitle Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 03
PublicationTitleAbbrev WIIAT
PublicationYear 2009
Publisher IEEE Computer Society
IEEE
Publisher_xml – name: IEEE Computer Society
– name: IEEE
SSID ssj0001120456
Score 1.4240607
Snippet Unlike western languages, there exists no word boundary in Japanese. This is why we face to hard problems to analyze documents in Japanese very often. More...
SourceID ieee
acm
SourceType Publisher
StartPage 283
SubjectTerms Computing methodologies -- Artificial intelligence -- Natural language processing
Computing methodologies -- Machine learning
Computing methodologies -- Machine learning -- Learning paradigms
Computing methodologies -- Modeling and simulation -- Model development and analysis -- Modeling methodologies
Markov Chain Monte Carlo (MCMC) method
Mathematics of computing -- Probability and statistics -- Probabilistic algorithms
Mathematics of computing -- Probability and statistics -- Probabilistic reasoning algorithms -- Markov-chain Monte Carlo methods
Mathematics of computing -- Probability and statistics -- Probabilistic reasoning algorithms -- Sequential Monte Carlo methods
Mathematics of computing -- Probability and statistics -- Probabilistic representations -- Markov networks
Mathematics of computing -- Probability and statistics -- Stochastic processes
Mathematics of computing -- Probability and statistics -- Stochastic processes -- Markov processes
Stochastic processes
Stochastic Techniques
Theory of computation -- Theory and algorithms for application domains -- Machine learning theory -- Markov decision processes
Training data
Word Segmentation
Title A Stochastic Technique to Obtain Training Data for Word Segmentation
URI https://ieeexplore.ieee.org/document/5285030
Volume 3
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dS8MwFL048cGn-TFxfhFB8MW6ZEnb5FHU4UBU2HR7K7dNontYJ1v195u0nUMQxLemtBAOueSem9xzAM4yziyP0AZGWkdQuAgDRCoC7ULJC5grUZYuXu7jhwc5HqunNbj47oUxxpSXz8ylfyzP8vUs-_Clsk7YlaFblA1oxHFU9Wqt6inMC6tHFTNXLowd0agFdpZjXkv7MKo6o37QvxpWepXdUjMQs-kPh5Vyg-k1_ze1LWitOvXI0_cetA1rJt-B5tKqgdSRuws3V2RQzLI39LLMZLgUbiXFjDymvjpAhrVXBLnBAolLZcnI8VIyMK_Tuj0pb8Fz73Z4fRfUBgoBsqhbBJFUOsqkzKiiuiukVkJGPNVcajQ6ZZKakKHLmDSzmjEMUQiMUytVhtwi5Xuwns9ysw8kNi4R5ChszFKBPEwFo-5Px-C0sBTjNpw6ABPPDBZJSSyoSkb9xIHsnS5V4kBuw_mf3yTpfGJsG3Y9xMl7pbiR1Oge_P76EDarMx5_8-sI1ov5hzmGjeyzmCzmJ-Uy-QKKiLV5
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dT9swFL2CDok9daMgCvsw0qS9ELBjJ7Ef0QC1WilIDZQ36ya2Nx5opxL4_dhJWoSEhPYWR4lkHfnK91z7ngPwo-TM8RRdZKXzBIWLJEKkIjI-lIKAuRJ16eJmlI3H8vZWXa3B4aoXxlpbXz6zR-GxPss38_IxlMqOk1gmflGuw4dEiJg23VovFRUWpNXThpsrH8iearQSO8sxb8V9GFXH02E0PMkbxcq4Vg3E8v6Vx0q9xZx3_29yn2D7pVePXK12oc-wZmdb0F2aNZA2dntwekIm1bz8i0GYmeRL6VZSzcllEeoDJG_dIsgpVkh8MkumnpmSif1z3zYozbbh-vws_zWIWguFCFkaV1EqlUlLKUuqqImFNErIlBeGS4PWFExSmzD0OZNhzjCGCQqBWeGkKpE7pHwHOrP5zO4CyaxPBTkKl7FCIE8Kwaj_03M4IxzFrA8HHkAduMGDrqkFVXo61B7k4HWptAe5Dz_f_UYXizvr-tALEOt_jeaGbtHde_v1d9gc5BcjPRqOf-_Dx-bEJ9wD-wKdavFov8JG-VTdPSy-1UvmGS7VuMA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+2009+IEEE%2FWIC%2FACM+International+Joint+Conference+on+Web+Intelligence+and+Intelligent+Agent+Technology+-+Volume+03&rft.atitle=A+Stochastic+Technique+to+Obtain+Training+Data+for+Word+Segmentation&rft.au=Fukuda%2C+Takuya&rft.au=Miura%2C+Takao&rft.series=ACM+Conferences&rft.date=2009-09-15&rft.pub=IEEE+Computer+Society&rft.isbn=0769538010&rft.spage=283&rft.epage=286&rft_id=info:doi/10.1109%2FWI-IAT.2009.283
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769538013/lc.gif&client=summon&freeimage=true
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769538013/mc.gif&client=summon&freeimage=true
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=9780769538013/sc.gif&client=summon&freeimage=true