MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry
Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, ca...
Saved in:
| Published in: | Analytical chemistry (Washington) Vol. 79; no. 13; p. 4870 |
|---|---|
| Main Authors: | , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
United States
01.07.2007
|
| Subjects: | |
| ISSN: | 0003-2700 |
| Online Access: | Get more information |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo. |
|---|---|
| AbstractList | Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo.Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo. Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo. |
| Author | Wan, Yunhu Chen, Ting Mo, Lijuan Dutta, Debojyoti |
| Author_xml | – sequence: 1 givenname: Lijuan surname: Mo fullname: Mo, Lijuan organization: Department of Biology, Department of Mathematics, University of Southern California, Los Angeles, California 90089, USA – sequence: 2 givenname: Debojyoti surname: Dutta fullname: Dutta, Debojyoti – sequence: 3 givenname: Yunhu surname: Wan fullname: Wan, Yunhu – sequence: 4 givenname: Ting surname: Chen fullname: Chen, Ting |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/17550227$$D View this record in MEDLINE/PubMed |
| BookMark | eNo1kEtPwzAQhH0oog848AeQT9wCu05SJ9xQxUsqcADOkWtvSlBsBzutlH9PKsppZ6RPo52Zs4nzjhi7QLhGEHijNEiAtHQTNoNRJGK0UzaP8RsAEXB5yqYo8xyEkDOmX95f_d7fcsXN4JRtNO-C3wZlbeO2XLVbH5r-y_LaB26IuxHmHXV9M5pIPzty-gDuG8V75QxZblWMPHak--At9WE4Yye1aiOdH--CfT7cf6yekvXb4_Pqbp2oLC37JCvSErXISagsT43Uhx6kl4RUY1lgberCFIpqiQVI3GwQc5OlUIKmHIwSC3b1lzs2GB-LfWWbqKltlSO_i5WEpcS0wBG8PIK7jSVTdaGxKgzV_yziFxRtZG0 |
| CitedBy_id | crossref_primary_10_1016_j_genrep_2023_101800 crossref_primary_10_1186_1477_5956_10_68 crossref_primary_10_1093_bib_bbw115 crossref_primary_10_1002_pmic_201700319 crossref_primary_10_1109_TNB_2016_2519841 crossref_primary_10_1016_j_aca_2023_341330 crossref_primary_10_1016_j_clinms_2019_06_001 crossref_primary_10_1109_TCBB_2019_2945954 crossref_primary_10_1016_j_talanta_2010_10_029 crossref_primary_10_1186_1471_2105_11_118 crossref_primary_10_1016_j_ijms_2011_05_005 crossref_primary_10_1371_journal_pone_0259349 crossref_primary_10_1002_jms_1298 crossref_primary_10_1109_ACCESS_2020_3047588 crossref_primary_10_1007_s13361_015_1204_0 crossref_primary_10_1021_acs_analchem_5b04563 crossref_primary_10_1080_14789450_2016_1242417 crossref_primary_10_1002_pmic_201700150 crossref_primary_10_1186_s13015_017_0104_1 crossref_primary_10_3390_molecules27154976 crossref_primary_10_1002_mas_21487 crossref_primary_10_1016_j_ijms_2011_04_008 crossref_primary_10_1155_2011_754109 crossref_primary_10_1186_1471_2105_12_346 crossref_primary_10_1186_1477_5956_11_S1_S4 crossref_primary_10_1016_j_dsp_2017_04_012 crossref_primary_10_1016_j_theriogenology_2011_11_012 crossref_primary_10_1093_bioinformatics_btn184 crossref_primary_10_1002_pmic_201200338 crossref_primary_10_1038_nbt1208_1336 crossref_primary_10_1016_j_csbj_2022_03_008 crossref_primary_10_1002_pmic_200900459 crossref_primary_10_1073_pnas_1108399108 crossref_primary_10_1016_j_jprot_2009_01_012 crossref_primary_10_1073_pnas_1705691114 crossref_primary_10_1002_rcm_8574 crossref_primary_10_1038_s42256_023_00738_x crossref_primary_10_1080_14789450_2020_1831387 crossref_primary_10_1002_rcm_7049 crossref_primary_10_1093_bib_bbx033 crossref_primary_10_1586_epr_11_54 |
| ContentType | Journal Article |
| DBID | CGR CUY CVF ECM EIF NPM 7X8 |
| DOI | 10.1021/ac070039n |
| DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed MEDLINE - Academic |
| DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) MEDLINE - Academic |
| DatabaseTitleList | MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: 7X8 name: MEDLINE - Academic url: https://search.proquest.com/medline sourceTypes: Aggregation Database |
| DeliveryMethod | no_fulltext_linktorsrc |
| Discipline | Engineering Chemistry |
| ExternalDocumentID | 17550227 |
| Genre | Research Support, Non-U.S. Gov't Journal Article Research Support, N.I.H., Extramural |
| GrantInformation_xml | – fundername: NCRR NIH HHS grantid: R01-RR16522-01 |
| GroupedDBID | --- -DZ -~X .DC .K2 186 23M 3O- 4.4 53G 55A 5GY 5RE 5VS 6J9 6TJ 7~N 85S AABXI AAHBH ABHFT ABHMW ABJNI ABMVS ABOCM ABPPZ ABQRX ABUCX ACBEA ACGFO ACGFS ACGOD ACIWK ACJ ACKOT ACNCT ACPRK ACRPL ACS ADHLV ADNMO AEESW AENEX AEYZD AFEFF AFFNX AFRAH AGXLV AHGAQ AIDAL ALMA_UNASSIGNED_HOLDINGS ANPPW ANTXH AQSVZ BAANH BKOMP CGR CS3 CUPRZ CUY CVF D0L EBS ECM ED~ EIF EJD F5P GGK GNL IH9 IHE JG~ KZ1 LG6 LMP NPM OHT P2P PQQKQ ROL RXW TAE TN5 UHB UI2 UKR VF5 VG9 VQA VXZ W1F WH7 X6Y XSW YIN YR5 YZZ ZCA ZCG ~02 7X8 ABBLG ABLBI ABUFD ADXHL AETEA AGQPQ |
| ID | FETCH-LOGICAL-a439t-48391c25e2a453d7c7003ec6e1ef1981fdf8d8aef718071bb115d43090ce50da2 |
| IEDL.DBID | 7X8 |
| ISICitedReferencesCount | 71 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000247611600014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0003-2700 |
| IngestDate | Sun Nov 09 13:40:14 EST 2025 Wed Feb 19 02:12:12 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 13 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a439t-48391c25e2a453d7c7003ec6e1ef1981fdf8d8aef718071bb115d43090ce50da2 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| PMID | 17550227 |
| PQID | 70671381 |
| PQPubID | 23479 |
| ParticipantIDs | proquest_miscellaneous_70671381 pubmed_primary_17550227 |
| PublicationCentury | 2000 |
| PublicationDate | 2007-07-01 |
| PublicationDateYYYYMMDD | 2007-07-01 |
| PublicationDate_xml | – month: 07 year: 2007 text: 2007-07-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States |
| PublicationTitle | Analytical chemistry (Washington) |
| PublicationTitleAlternate | Anal Chem |
| PublicationYear | 2007 |
| SSID | ssj0011016 |
| Score | 2.195692 |
| Snippet | Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways... |
| SourceID | proquest pubmed |
| SourceType | Aggregation Database Index Database |
| StartPage | 4870 |
| SubjectTerms | Algorithms Computational Biology Databases, Factual Molecular Weight Peptides - chemistry Reproducibility of Results Sensitivity and Specificity Sequence Analysis - methods Sequence Analysis - statistics & numerical data Tandem Mass Spectrometry - methods Tandem Mass Spectrometry - statistics & numerical data |
| Title | MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry |
| URI | https://www.ncbi.nlm.nih.gov/pubmed/17550227 https://www.proquest.com/docview/70671381 |
| Volume | 79 |
| WOSCitedRecordID | wos000247611600014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB6qFdSDj_qqzz14DWbz2qwIIsXipaGgQm9hX9GCSaqt_f3O5oFexIOXQMiGDZPZmW9nZucDuAxUKIMok7i-0QQGUcgdkcXCYTJUGsGcpExXZBMsSeLJhI87cNOehbFlla1NrAy1LpWNkV8xNKsU3cvt7N2xnFE2t9oQaKxA10cgY7WcTb5zCHZf2vLl2fRq21fIo1dCoaa7Pi9-x5WVfxlu_-_LdmCrwZXkrlaEXeiYogfrg5bOrQebPzoP7oEaPSblsrwmguiak540lVo5Pifi7QXnWLzmBDEt0YYUOJjMbAUM3jTl13bgciqIDUaYnOQIw0l1cNN2QMA59-F5eP80eHAavgVHICxZ2Lgip8oLjSeC0NdMWRkZFRlqMspjmuks1rEwGfozRCZSIprUge9yV5nQ1cI7gNWiLMwREBkJzmzzL8X9gEkTG9xZadQLpjPhG78PF60sU5SDTVKIwpSf87SVZh8O69-Rzuq2GykCndA2PDz-890T2GhDsC49hW6GK9mcwZpaLqbzj_NKTfCajEdfxHjJkw |
| linkProvider | ProQuest |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MSNovo%3A+a+dynamic+programming+algorithm+for+de+novo+peptide+sequencing+via+tandem+mass+spectrometry&rft.jtitle=Analytical+chemistry+%28Washington%29&rft.au=Mo%2C+Lijuan&rft.au=Dutta%2C+Debojyoti&rft.au=Wan%2C+Yunhu&rft.au=Chen%2C+Ting&rft.date=2007-07-01&rft.issn=0003-2700&rft.volume=79&rft.issue=13&rft.spage=4870&rft_id=info:doi/10.1021%2Fac070039n&rft_id=info%3Apmid%2F17550227&rft_id=info%3Apmid%2F17550227&rft.externalDocID=17550227 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0003-2700&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0003-2700&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0003-2700&client=summon |