A Study on Modeling Roman Numeral Analysis Progressions Using the Encoder-Decoder Transformer
This paper evaluates the efficacy and usage of a proposed model built on the encoder-decoder Transformer for the purposes of modeling harmonic progressions rooted in the Western tonality schema using Roman numeral analysis (RNA). A combination of the WhenInRome and Yale-Classical Archives Corpus Lig...
Saved in:
| Published in: | IEEE International Conference on Electro Information Technology pp. 518 - 523 |
|---|---|
| Main Authors: | , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
29.05.2025
|
| Subjects: | |
| ISSN: | 2154-0373 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper evaluates the efficacy and usage of a proposed model built on the encoder-decoder Transformer for the purposes of modeling harmonic progressions rooted in the Western tonality schema using Roman numeral analysis (RNA). A combination of the WhenInRome and Yale-Classical Archives Corpus Light corpora produced 8,934 compositions dated around the Common Practice Period, which were then preprocessed to produce a tokenization of each RNA symbol as well as a pitch-class vector corresponding to the unique constituent chord tones of each symbol, transposed to pitch-class 0. Each symbol followed a tokenization schema expressing each symbol in terms of its degree (position in key), tonality (major/minor), form (augmented/diminished), figured bass/inversions, added chord tones, and secondary dominance. The chord tokens and pitch-class vectors are then embedded, summed, and applied to the positional layer to provide the Transformer model with additional harmonic context. We find that while the encoder-decoder model shows promise in its ability to predict next tokens when given simple progressions, it remains limited in its ability to complete more complex progressions by both data sparsity and the structural complexity of RNA. This result is indicative of a) the continuing problem of a lack of systematic and rigorous training data in the field of computational musicology (CM), b) the complexity of RNA as a harmonic language and its continued lack of usage in the midst of more modern forms of communicating harmonic information, and c) the general inefficiency of the practice of generalizing models to specific tasks without large amounts of specialization, such as pretraining or heavy modifications to model architecture. We discuss these problems and suggest solutions to broadening the amount of data available in the CM domain as well as improving the quality of both the model and the (data. |
|---|---|
| ISSN: | 2154-0373 |
| DOI: | 10.1109/eIT64391.2025.11103637 |