LD-SPatt: large deviations statistics for patterns on Markov chains
Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern...
Uloženo v:
| Vydáno v: | Journal of computational biology Ročník 11; číslo 6; s. 1023 |
|---|---|
| Hlavní autor: | |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
United States
2004
|
| Témata: | |
| ISSN: | 1066-5277 |
| On-line přístup: | Zjistit podrobnosti o přístupu |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method. |
|---|---|
| AbstractList | Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method. Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method.Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several approaches. Central limit theorem (CLT) producing Gaussian approximations are one of the most popular ones. Unfortunately, in order to find a pattern of interest, these methods have to deal with tail distribution events where CLT is especially bad. In this paper, we propose a new approach based on the large deviations theory to assess pattern statistics. We first recall theoretical results for empiric mean (level 1) as well as empiric distribution (level 2) large deviations on Markov chains. Then, we present the applications of these results focusing on numerical issues. LD-SPatt is the name of GPL software implementing these algorithms. We compare this approach to several existing ones in terms of complexity and reliability and show that the large deviations are more reliable than the Gaussian approximations in absolute values as well as in terms of ranking and are at least as reliable as compound Poisson approximations. We then finally discuss some further possible improvements and applications of this new method. |
| Author | Nuel, G |
| Author_xml | – sequence: 1 givenname: G surname: Nuel fullname: Nuel, G email: nuel@genopole.cnrs.fr organization: Laboratoire Statistique et Génome, Tour Evry 2, 523 place des terasses, 91034 Evry, France. nuel@genopole.cnrs.fr |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/15662195$$D View this record in MEDLINE/PubMed |
| BookMark | eNo1jztPwzAUhT0U0QfsTMgTW8q1E9sNGyrlIRWBBMyRe30LgSQOtluJf08kynSOPn060pmyUec7YuxMwFzAorzEdjOXAMVciAHIfMQmArTOlDRmzKYxfgKIXIM5ZmOhtJaiVBO2XN9kL882pSve2PBO3NG-tqn2XeQxDSWmGiPf-sD7waIwcN_xRxu-_J7jh627eMKOtraJdHrIGXu7Xb0u77P1093D8nqdYa51ylA6sSgVWXRbhdYQGcgtOSwJjDOIhSIhjMuNtgDgFrlRmpAGryAlUM7Yxd9uH_z3jmKq2joiNY3tyO9ipY2EUpbFIJ4fxN2mJVf1oW5t-Kn-b8tfR05alg |
| CitedBy_id | crossref_primary_10_1007_s11009_019_09700_0 crossref_primary_10_1186_1748_7188_1_17 crossref_primary_10_1186_1748_7188_5_15 crossref_primary_10_1017_S0021900200007403 crossref_primary_10_1186_s13015_014_0025_1 crossref_primary_10_1093_bioinformatics_bti451 crossref_primary_10_1007_s11009_007_9019_5 crossref_primary_10_1186_1748_7188_1_5 crossref_primary_10_1239_jap_1294170523 crossref_primary_10_1016_j_jda_2013_09_004 crossref_primary_10_1016_j_tcs_2012_10_019 crossref_primary_10_1089_cmb_2009_0218 crossref_primary_10_1038_nrmicro2477 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1089/cmb.2004.11.1023 |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE MEDLINE - Academic |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Biology Mathematics |
| ExternalDocumentID | 15662195 |
| Genre | Journal Article |
| GroupedDBID | --- 0R~ 1-M 29K 34G 39C 4.4 53G 5GY ABBKN ABEFU ACGFO ADBBV AENEX AFOSN AI. ALMA_UNASSIGNED_HOLDINGS BAWUL BNQNF CAG CGR COF CS3 CUY CVF D-I DIK DU5 EBS ECM EIF EJD F5P IAO IER IGS IHR IM4 ISR ITC MV1 NPM NQHIM O9- OK1 P2P R.V RIG RML RMSOB RNS TN5 TR2 UE5 VH1 7X8 SCNPE |
| ID | FETCH-LOGICAL-c366t-c2d1895eacdf5ca7ee703aedc9e07d7cc45e117d376a000d83756eceee74e51c2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 16 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000226750300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1066-5277 |
| IngestDate | Thu Oct 02 06:49:34 EDT 2025 Sat Sep 28 07:50:09 EDT 2024 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 6 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c366t-c2d1895eacdf5ca7ee703aedc9e07d7cc45e117d376a000d83756eceee74e51c2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 15662195 |
| PQID | 67209294 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_67209294 pubmed_primary_15662195 |
| PublicationCentury | 2000 |
| PublicationDate | 2004-00-00 |
| PublicationDateYYYYMMDD | 2004-01-01 |
| PublicationDate_xml | – year: 2004 text: 2004-00-00 |
| PublicationDecade | 2000 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Journal of computational biology |
| PublicationTitleAlternate | J Comput Biol |
| PublicationYear | 2004 |
| SSID | ssj0013607 |
| Score | 1.7621344 |
| Snippet | Statistics on Markov chains are widely used for the study of patterns in biological sequences. Statistics on these models can be done through several... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 1023 |
| SubjectTerms | Computational Biology Data Interpretation, Statistical Markov Chains Probability Sequence Analysis, DNA - statistics & numerical data Sequence Analysis, Protein - statistics & numerical data Software |
| Title | LD-SPatt: large deviations statistics for patterns on Markov chains |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/15662195 https://www.proquest.com/docview/67209294 |
| Volume | 11 |
| WOSCitedRecordID | wos000226750300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB6qVdCDj_qqzz14XZqkyW4igki1eLCloEJvYbMPKmhSbS34753dJOJFPHjZQ2AhDLPfvOcDOFcBWjUV-9RXvqGW6p4KLzM0y5TJTMA9oTNHNsGHw3g8TkYNuKxnYWxbZY2JDqhVIW2OvMN44KEpD6-mb9RyRtnaakWgsQTNLjoytqGLj3_UEJgblsaYh2G4xXlVpPTipCNfMxcaImC43QW_u5fOzPQ3__eDW7BRuZfkutSHbWjovAWrJeHkZwvWB99bWmc70Lu_oQ8jMZ9fkBfbEE4UWskyg0fsoFG5w5mgW0umbg0nfi9yYsd7igWRE_Gcz3bhqX_72LujFakClV3G5lQGyo-TCPFWmUgKrjW-eaGVTLTHFZcyjLTvc4XAIxAuFQawEdNoSjUPdeTLYA-W8yLXB0CMsZOxEkUfhaFORMLQB44DYUIlhElUG85qSaWotLYSIXJdfMzSWlZt2C-FnU7L3RqpDScRRKPDP-8ewVrZRWPTIcfQNPhc9QmsyAWK5_3U6QKew9HgC2v3v3k |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=LD-SPatt%3A+large+deviations+statistics+for+patterns+on+Markov+chains&rft.jtitle=Journal+of+computational+biology&rft.au=Nuel%2C+G&rft.date=2004-01-01&rft.issn=1066-5277&rft.volume=11&rft.issue=6&rft.spage=1023&rft_id=info:doi/10.1089%2Fcmb.2004.11.1023&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1066-5277&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1066-5277&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1066-5277&client=summon |