Parallel Algorithms for Unsupervised Tagging
We propose a new method for unsupervised tagging that finds minimal models which are then further improved by Expectation Maximization training. In contrast to previous approaches that rely on manually specified and multi-step heuristics for model minimization, our approach is a simple greedy approx...
Uložené v:
| Vydané v: | Transactions of the Association for Computational Linguistics Ročník 2; s. 105 - 118 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
One Rogers Street, Cambridge, MA 02142-1209, USA
MIT Press
01.07.2024
MIT Press Journals, The The MIT Press |
| Predmet: | |
| ISSN: | 2307-387X, 2307-387X |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Shrnutí: | We propose a new method for unsupervised tagging that finds minimal models which
are then further improved by Expectation Maximization training. In contrast to
previous approaches that rely on manually specified and multi-step heuristics
for model minimization, our approach is a simple greedy approximation algorithm
DMLC (D
-M
-L
-C
) that
solves this objective in a single step.
We extend the method and show how to efficiently parallelize the algorithm on
modern parallel computing platforms while preserving approximation guarantees.
The new method easily scales to large data and grammar sizes, overcoming the
memory bottleneck in previous approaches. We demonstrate the power of the new
algorithm by evaluating on various sequence labeling tasks: Part-of-Speech
tagging for multiple languages (including low-resource languages), with complete
and incomplete dictionaries, and supertagging, a complex sequence labeling task,
where the grammar size alone can grow to millions of entries. Our results show
that for all of these settings, our method achieves state-of-the-art scalable
performance that yields high quality tagging outputs. |
|---|---|
| Bibliografia: | Volume, 2014 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2307-387X 2307-387X |
| DOI: | 10.1162/tacl_a_00169 |