LD-SPatt: large deviations statistics for patterns on Markov chains
Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern...
Gespeichert in:
| Veröffentlicht in: | Journal of computational biology Jg. 11; H. 6; S. 1023 |
|---|---|
| 1. Verfasser: | |
| Format: | Journal Article |
| Sprache: | Englisch |
| Veröffentlicht: |
United States
2004
|
| Schlagworte: | |
| ISSN: | 1066-5277 |
| Online-Zugang: | Weitere Angaben |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Zusammenfassung: | Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method. |
|---|---|
| Bibliographie: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 1066-5277 |
| DOI: | 10.1089/cmb.2004.11.1023 |