Semi-supervised topic classification for low resource languages
In this paper, we present a novel methodology for rapidly developing a topic-based document classification system for a language that has limited resources. Our approach, a hybrid one, combines supervised and unsupervised topic classification techniques. Given that access to native speakers is fairl...
Uloženo v:
| Vydáno v: | 2008 IEEE International Conference on Acoustics, Speech and Signal Processing s. 5093 - 5096 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.03.2008
|
| Témata: | |
| ISBN: | 9781424414833, 1424414830 |
| ISSN: | 1520-6149 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | In this paper, we present a novel methodology for rapidly developing a topic-based document classification system for a language that has limited resources. Our approach, a hybrid one, combines supervised and unsupervised topic classification techniques. Given that access to native speakers is fairly limited for low resource languages, our approach requires annotating only a few broad "root" topics in the corpus. Next, unsupervised topic discovery (UTD) technique is used to automatically determine finer topics within the root topics. Lastly, we use the recently developed unsupervised topic clustering technique to organize the corpus into a hierarchical structure that enables browsing documents at multiple levels of granularity. Recognizing the need for reducing false alarms during runtime, we describe rejection techniques for discarding off-topic documents. |
|---|---|
| AbstractList | In this paper, we present a novel methodology for rapidly developing a topic-based document classification system for a language that has limited resources. Our approach, a hybrid one, combines supervised and unsupervised topic classification techniques. Given that access to native speakers is fairly limited for low resource languages, our approach requires annotating only a few broad "root" topics in the corpus. Next, unsupervised topic discovery (UTD) technique is used to automatically determine finer topics within the root topics. Lastly, we use the recently developed unsupervised topic clustering technique to organize the corpus into a hierarchical structure that enables browsing documents at multiple levels of granularity. Recognizing the need for reducing false alarms during runtime, we describe rejection techniques for discarding off-topic documents. |
| Author | Prasad, R. Natarajan, P. McVeety, S. Daben Liu |
| Author_xml | – sequence: 1 surname: Daben Liu fullname: Daben Liu organization: BBN Technol., Cambridge, MA – sequence: 2 givenname: S. surname: McVeety fullname: McVeety, S. organization: BBN Technol., Cambridge, MA – sequence: 3 givenname: R. surname: Prasad fullname: Prasad, R. organization: BBN Technol., Cambridge, MA – sequence: 4 givenname: P. surname: Natarajan fullname: Natarajan, P. organization: BBN Technol., Cambridge, MA |
| BookMark | eNo1j81Kw0AUhUesYFvzBN3kBRLvZO5kZlYixT8oKFTX5WZyU0bSJGQaxbe3YD2bw9l8fGchZl3fsRArCbmU4G5f1vfb7VteANgctbQW8EIsJBaIEi26S5E4Y_-3UjMxl7qArJTorkUS4yecglppp-fibsuHkMVp4PErRK7TYz8En_qWYgxN8HQMfZc2_Zi2_Xc6cuyn0XPaUrefaM_xRlw11EZOzr0UH48P7-vnbPP6dDLdZEEafcwIiYxCj9j4qtGkTm66NgqqWhE4QwDkKi0bpWtnfGktQ2l9ZUCyQvZqKVZ_3MDMu2EMBxp_duf_6hegYE7- |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO |
| DOI | 10.1109/ICASSP.2008.4518804 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering |
| EISBN | 1424414849 9781424414840 |
| EndPage | 5096 |
| ExternalDocumentID | 4518804 |
| Genre | orig-research |
| GroupedDBID | 23M 29P 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS |
| ID | FETCH-LOGICAL-i175t-a4aa734c44fcbf5a34145d730bd3a097a00a9b51f35d97c688e068cb701e34ec3 |
| IEDL.DBID | RIE |
| ISBN | 9781424414833 1424414830 |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000257456703257&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1520-6149 |
| IngestDate | Wed Aug 27 02:04:14 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i175t-a4aa734c44fcbf5a34145d730bd3a097a00a9b51f35d97c688e068cb701e34ec3 |
| PageCount | 4 |
| ParticipantIDs | ieee_primary_4518804 |
| PublicationCentury | 2000 |
| PublicationDate | 2008-March |
| PublicationDateYYYYMMDD | 2008-03-01 |
| PublicationDate_xml | – month: 03 year: 2008 text: 2008-March |
| PublicationDecade | 2000 |
| PublicationTitle | 2008 IEEE International Conference on Acoustics, Speech and Signal Processing |
| PublicationTitleAbbrev | ICASSP |
| PublicationYear | 2008 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0000453595 ssj0008748 |
| Score | 1.6851705 |
| Snippet | In this paper, we present a novel methodology for rapidly developing a topic-based document classification system for a language that has limited resources.... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 5093 |
| SubjectTerms | Broadcasting Hidden Markov Model Hidden Markov models Humans Internet Malay Natural languages off-topic rejection Runtime Search engines Testing topic clustering Topology unsupervised topic discovery Web sites |
| Title | Semi-supervised topic classification for low resource languages |
| URI | https://ieeexplore.ieee.org/document/4518804 |
| WOSCitedRecordID | wos000257456703257&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV27TsMwFLXaigEWHi3iLQ-MmMbYjp0RVVSwVJUKUrfKr0iRoKmatPw-tuMGkFjY4iyJfGWf4-t7zgXg1geaPAiMjOEMUYUNkg6W3JlHG0aY5Rib0GyCTyZiPs-mHXDXamGstaH4zN77x3CXb0q98amyIQ3uYbQLupynjVarzac4atJoTOMuLHjonOXgyR-PaLYTdTn6T1qvpzgm0Y4IJ9nwZfQ4m02bIsv4vV-NVwLujA__98dHYPAt4IPTFpqOQccuT8DBD-_BvmPp9qNA1WblN4vKGliXq0JD7dm0Lx8KEYOO0sL38hOuY5Yf7vKb1QC8jZ9eR88odlNAhaMINZJUSk6opjTXKmfSwRdlxi1wZYgLD5dJIjPFcE6YybhOhbBJKrTiCbaEWk1OQW9ZLu0ZgEISrPJUSWwSmhMu0lyqRHmjHa6NFOeg76disWoMMxZxFi7-fn0J9psiDF_YdQV69Xpjr8Ge3tZFtb4JUf4C9KCgwg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1dS8MwFA1zCuqLH1P8tg8-WpcsSZM-ynBsOMdgE_Y28lUo6DrWTv--SdpVBV98a_vSkktyz70951wA7lygcYejUGtGQyKRDoVNS7bmUZpiahhC2g-bYKMRn83icQPc11oYY4wnn5kHd-n_5etMrV2rrE28exjZAtuUkA4s1Vp1R8WCk1JlWp3DnPnZWTZBuQKJxBtZly0AcO32VN3jypAIwbg96D5OJuOSZlm98dfoFZ95egf_--ZDcPIt4QvGdXI6Ag2zOAb7P9wHWxanm_c0zNdLd1zkRgdFtkxVoByedgQiH7PAgtrgLfsMVlWfP9h0OPMT8Np7mnb7YTVPIUwtSChCQYRgmChCEiUTKmwCI1TbLS41tgFiAkIRS4oSTHXMVMS5gRFXkkFkMDEKn4LmIluYMxBwgZFMIimQhiTBjEeJkFA6qx2mtODnoOWWYr4sLTPm1Spc_P34Fuz2py_D-XAwer4EeyUlw9G8rkCzWK3NNdhRH0War258xL8Ah7GkCQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2008+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing&rft.atitle=Semi-supervised+topic+classification+for+low+resource+languages&rft.au=Daben+Liu&rft.au=McVeety%2C+S.&rft.au=Prasad%2C+R.&rft.au=Natarajan%2C+P.&rft.date=2008-03-01&rft.pub=IEEE&rft.isbn=9781424414833&rft.issn=1520-6149&rft.spage=5093&rft.epage=5096&rft_id=info:doi/10.1109%2FICASSP.2008.4518804&rft.externalDocID=4518804 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-6149&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-6149&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-6149&client=summon |

