LD-SPatt: large deviations statistics for patterns on Markov chains

Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern...

Full description

Saved in:
Bibliographic Details
Published in:Journal of computational biology Vol. 11; no. 6; p. 1023
Main Author: Nuel, G
Format: Journal Article
Language:English
Published: United States 2004
Subjects:
ISSN:1066-5277
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.
AbstractList Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.
Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.
Author Nuel, G
Author_xml – sequence: 1
  givenname: G
  surname: Nuel
  fullname: Nuel, G
  email: nuel@genopole.cnrs.fr
  organization: Laboratoire Statistique et Génome, Tour Evry 2, 523 place des terasses, 91034 Evry, France. nuel@genopole.cnrs.fr
BackLink https://www.ncbi.nlm.nih.gov/pubmed/15662195$$D View this record in MEDLINE/PubMed
BookMark eNo1jztPwzAUhT0U0QfsTMgTW8q1E9sNGyrlIRWBBMyRe30LgSQOtluJf08kynSOPn060pmyUec7YuxMwFzAorzEdjOXAMVciAHIfMQmArTOlDRmzKYxfgKIXIM5ZmOhtJaiVBO2XN9kL882pSve2PBO3NG-tqn2XeQxDSWmGiPf-sD7waIwcN_xRxu-_J7jh627eMKOtraJdHrIGXu7Xb0u77P1093D8nqdYa51ylA6sSgVWXRbhdYQGcgtOSwJjDOIhSIhjMuNtgDgFrlRmpAGryAlUM7Yxd9uH_z3jmKq2joiNY3tyO9ipY2EUpbFIJ4fxN2mJVf1oW5t-Kn-b8tfR05alg
CitedBy_id crossref_primary_10_1007_s11009_019_09700_0
crossref_primary_10_1186_1748_7188_1_17
crossref_primary_10_1186_1748_7188_5_15
crossref_primary_10_1017_S0021900200007403
crossref_primary_10_1186_s13015_014_0025_1
crossref_primary_10_1093_bioinformatics_bti451
crossref_primary_10_1007_s11009_007_9019_5
crossref_primary_10_1186_1748_7188_1_5
crossref_primary_10_1239_jap_1294170523
crossref_primary_10_1016_j_jda_2013_09_004
crossref_primary_10_1016_j_tcs_2012_10_019
crossref_primary_10_1089_cmb_2009_0218
crossref_primary_10_1038_nrmicro2477
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1089/cmb.2004.11.1023
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE
MEDLINE - Academic
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Biology
Mathematics
ExternalDocumentID 15662195
Genre Journal Article
GroupedDBID ---
0R~
1-M
29K
34G
39C
4.4
53G
5GY
ABBKN
ABEFU
ACGFO
ADBBV
AENEX
AFOSN
AI.
ALMA_UNASSIGNED_HOLDINGS
BAWUL
BNQNF
CAG
CGR
COF
CS3
CUY
CVF
D-I
DIK
DU5
EBS
ECM
EIF
EJD
F5P
IAO
IER
IGS
IHR
IM4
ISR
ITC
MV1
NPM
NQHIM
O9-
OK1
P2P
R.V
RIG
RML
RMSOB
RNS
TN5
TR2
UE5
VH1
7X8
SCNPE
ID FETCH-LOGICAL-c366t-c2d1895eacdf5ca7ee703aedc9e07d7cc45e117d376a000d83756eceee74e51c2
IEDL.DBID 7X8
ISICitedReferencesCount 16
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000226750300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1066-5277
IngestDate Thu Oct 02 06:49:34 EDT 2025
Sat Sep 28 07:50:09 EDT 2024
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c366t-c2d1895eacdf5ca7ee703aedc9e07d7cc45e117d376a000d83756eceee74e51c2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 15662195
PQID 67209294
PQPubID 23479
ParticipantIDs proquest_miscellaneous_67209294
pubmed_primary_15662195
PublicationCentury 2000
PublicationDate 2004-00-00
PublicationDateYYYYMMDD 2004-01-01
PublicationDate_xml – year: 2004
  text: 2004-00-00
PublicationDecade 2000
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Journal of computational biology
PublicationTitleAlternate J Comput Biol
PublicationYear 2004
SSID ssj0013607
Score 1.7620399
Snippet Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 1023
SubjectTerms Computational Biology
Data Interpretation, Statistical
Markov Chains
Probability
Sequence Analysis, DNA - statistics & numerical data
Sequence Analysis, Protein - statistics & numerical data
Software
Title LD-SPatt: large deviations statistics for patterns on Markov chains
URI https://www.ncbi.nlm.nih.gov/pubmed/15662195
https://www.proquest.com/docview/67209294
Volume 11
WOSCitedRecordID wos000226750300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB6qVdCDj_qqzz14XUzSJLsRQaRaPNhSUKG3sN2doKBJtbXgv3c2m4gX8eAlh0AgDDPfvL8BOB37Ro2lJ7iW2vBQxx6XxtM8NOTMMEoCI92xCTEYyNEoGTbgot6FsWOVNSaWQG0KbWvkZ7EIPHLl4eXkjdubUba3Wh3QWIBmhwIZO9AlRj96CHG5LE05T0zplhBVk9KTyZl-HZepIQFGyV3we3hZupne-v9-cAPWqvCSXTl92IQG5i1YdgcnP1uw2v9maZ1uQffumt8P1Wx2zl7sQDgz5CVdBY_ZRSPH4cworGWTkoaT3hc5s-s9xZzpJ_WcT7fhsXfz0L3l1VEFrjtxPOM6ML5MIsJbk0VaCUSyeYVGJ-gJI7QOI_R9YQh4FMGloQQ2ipFcKYoQI18HO7CYFznuAdO-SlCFJkMThGh56zORGUGIkaEkQ2_DSS2plJTWdiJUjsXHNK1l1YZdJ-x04rg1UptOEohG-39-ewArborGlkMOoZmRueIRLOk5ief9uNQFeg6G_S90dL8v
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=LD-SPatt%3A+large+deviations+statistics+for+patterns+on+Markov+chains&rft.jtitle=Journal+of+computational+biology&rft.au=Nuel%2C+G&rft.date=2004-01-01&rft.issn=1066-5277&rft.volume=11&rft.issue=6&rft.spage=1023&rft_id=info:doi/10.1089%2Fcmb.2004.11.1023&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1066-5277&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1066-5277&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1066-5277&client=summon