Bibliographic Details
| Title: |
Context tree classification and clustering. |
| Authors: |
Zanin Zambom, Adriano1 (AUTHOR) adriano.zambom@csun.edu, Rud, Daniel2 (AUTHOR) |
| Source: |
Journal of Statistical Computation & Simulation. Mar2025, Vol. 95 Issue 4, p781-809. 29p. |
| Subject Terms: |
*MARKOV processes, *CLUSTER analysis (Statistics), *TARGET marketing, CLASSIFICATION algorithms, PATTERN perception, STATISTICAL models, RHYTHM |
| Abstract: |
In this manuscript, we develop clustering and classification algorithms for Context trees arising from Variable Length Markov Chains (VLMC). The Context is defined as the finite suffix of the past that is sufficient to predict the next state of the chain. Defining relevant Contexts through the VLMC fitting procedure allows the Contexts to have different lengths depending on the past itself and can be described by a rooted tree. This type of parsimonious model relaxes the assumptions of higher order Markov Chains, whose number of parameters increases exponentially with the order of the chain. Dissimilarity measures that consider both the VLMC tree structure and the transition probability distributions of Contexts are derived and integrated into the procedures. Through simulations in a variety of scenarios, the proposed algorithms are shown to outperform classical competitors in both classification and clustering, especially as the sample size of the state sequences increases. Two applications to real datasets are presented. In the first application, we develop clustering and classification methods for written texts according to rhythmic patterns. We introduce a new retrieval process for rhythm of texts written in English by encoding the morphological structure of sentences with the building blocks of phonological words and the position of stressed syllables. Sequences of syllables in text are modelled with a stochastic process, where the choice of lexical items depends on the rhythmic characteristics of the preceding words. In the second application, we perform unsupervised clustering on click-stream data of users from an online maternity clothing store. The browsing behaviours of users from different countries can be leveraged for optimizing targeted marketing strategies and the constructed VLMCs are used to rank weblinks of the website based on the stationary distributions of the VLMCs. [ABSTRACT FROM AUTHOR] |
|
Copyright of Journal of Statistical Computation & Simulation is the property of Taylor & Francis Ltd and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Database: |
Business Source Index |