MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry

Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, ca...

Full description

Saved in:
Bibliographic Details
Published in:Analytical chemistry (Washington) Vol. 79; no. 13; p. 4870
Main Authors: Mo, Lijuan, Dutta, Debojyoti, Wan, Yunhu, Chen, Ting
Format: Journal Article
Language:English
Published: United States 01.07.2007
Subjects:
ISSN:0003-2700
Online Access:Get more information
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo.
AbstractList Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo.Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo.
Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways of analyzing MS/MS data are database search and de novo sequencing. In this paper, we present a new approach to peptide de novo sequencing, called MSNovo, which has the following advanced features. (1) It works on data generated from both LCQ and LTQ mass spectrometers and interprets singly, doubly, and triply charged ions. (2) It integrates a new probabilistic scoring function with a mass array-based dynamic programming algorithm. The simplicity of the scoring function, with only 6-10 parameters to be trained, avoids the problem of overfitting and allows MSNovo to be adopted for other machines and data sets easily. The mass array data structure explicitly encodes all possible peptides and allows the dynamic programming algorithm to find the best peptide. (3) Compared to existing programs, MSNovo predicts peptides as well as sequence tags with a higher accuracy, which is important for those applications that search protein databases using the de novo sequencing results. More specifically, we show that MSNovo outperforms other programs on various ESI ion trap data. We also show that for high-resolution data the performance of MSNovo improves significantly. Supporting Information, executable files and data sets can be found at http://msms.usc.edu/supplementary/msnovo.
Author Wan, Yunhu
Chen, Ting
Mo, Lijuan
Dutta, Debojyoti
Author_xml – sequence: 1
  givenname: Lijuan
  surname: Mo
  fullname: Mo, Lijuan
  organization: Department of Biology, Department of Mathematics, University of Southern California, Los Angeles, California 90089, USA
– sequence: 2
  givenname: Debojyoti
  surname: Dutta
  fullname: Dutta, Debojyoti
– sequence: 3
  givenname: Yunhu
  surname: Wan
  fullname: Wan, Yunhu
– sequence: 4
  givenname: Ting
  surname: Chen
  fullname: Chen, Ting
BackLink https://www.ncbi.nlm.nih.gov/pubmed/17550227$$D View this record in MEDLINE/PubMed
BookMark eNo1kEtPwzAQhH0oog848AeQT9wCu05SJ9xQxUsqcADOkWtvSlBsBzutlH9PKsppZ6RPo52Zs4nzjhi7QLhGEHijNEiAtHQTNoNRJGK0UzaP8RsAEXB5yqYo8xyEkDOmX95f_d7fcsXN4JRtNO-C3wZlbeO2XLVbH5r-y_LaB26IuxHmHXV9M5pIPzty-gDuG8V75QxZblWMPHak--At9WE4Yye1aiOdH--CfT7cf6yekvXb4_Pqbp2oLC37JCvSErXISagsT43Uhx6kl4RUY1lgberCFIpqiQVI3GwQc5OlUIKmHIwSC3b1lzs2GB-LfWWbqKltlSO_i5WEpcS0wBG8PIK7jSVTdaGxKgzV_yziFxRtZG0
CitedBy_id crossref_primary_10_1016_j_genrep_2023_101800
crossref_primary_10_1186_1477_5956_10_68
crossref_primary_10_1093_bib_bbw115
crossref_primary_10_1002_pmic_201700319
crossref_primary_10_1109_TNB_2016_2519841
crossref_primary_10_1016_j_aca_2023_341330
crossref_primary_10_1016_j_clinms_2019_06_001
crossref_primary_10_1109_TCBB_2019_2945954
crossref_primary_10_1016_j_talanta_2010_10_029
crossref_primary_10_1186_1471_2105_11_118
crossref_primary_10_1016_j_ijms_2011_05_005
crossref_primary_10_1371_journal_pone_0259349
crossref_primary_10_1002_jms_1298
crossref_primary_10_1109_ACCESS_2020_3047588
crossref_primary_10_1007_s13361_015_1204_0
crossref_primary_10_1021_acs_analchem_5b04563
crossref_primary_10_1080_14789450_2016_1242417
crossref_primary_10_1002_pmic_201700150
crossref_primary_10_1186_s13015_017_0104_1
crossref_primary_10_3390_molecules27154976
crossref_primary_10_1002_mas_21487
crossref_primary_10_1016_j_ijms_2011_04_008
crossref_primary_10_1155_2011_754109
crossref_primary_10_1186_1471_2105_12_346
crossref_primary_10_1186_1477_5956_11_S1_S4
crossref_primary_10_1016_j_dsp_2017_04_012
crossref_primary_10_1016_j_theriogenology_2011_11_012
crossref_primary_10_1093_bioinformatics_btn184
crossref_primary_10_1002_pmic_201200338
crossref_primary_10_1038_nbt1208_1336
crossref_primary_10_1016_j_csbj_2022_03_008
crossref_primary_10_1002_pmic_200900459
crossref_primary_10_1073_pnas_1108399108
crossref_primary_10_1016_j_jprot_2009_01_012
crossref_primary_10_1073_pnas_1705691114
crossref_primary_10_1002_rcm_8574
crossref_primary_10_1038_s42256_023_00738_x
crossref_primary_10_1080_14789450_2020_1831387
crossref_primary_10_1002_rcm_7049
crossref_primary_10_1093_bib_bbx033
crossref_primary_10_1586_epr_11_54
ContentType Journal Article
DBID CGR
CUY
CVF
ECM
EIF
NPM
7X8
DOI 10.1021/ac070039n
DatabaseName Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: 7X8
  name: MEDLINE - Academic
  url: https://search.proquest.com/medline
  sourceTypes: Aggregation Database
DeliveryMethod no_fulltext_linktorsrc
Discipline Engineering
Chemistry
ExternalDocumentID 17550227
Genre Research Support, Non-U.S. Gov't
Journal Article
Research Support, N.I.H., Extramural
GrantInformation_xml – fundername: NCRR NIH HHS
  grantid: R01-RR16522-01
GroupedDBID ---
-DZ
-~X
.DC
.K2
186
23M
3O-
4.4
53G
55A
5GY
5RE
5VS
6J9
6TJ
7~N
85S
AABXI
AAHBH
ABHFT
ABHMW
ABJNI
ABMVS
ABOCM
ABPPZ
ABQRX
ABUCX
ACBEA
ACGFO
ACGFS
ACGOD
ACIWK
ACJ
ACKOT
ACNCT
ACPRK
ACRPL
ACS
ADHLV
ADNMO
AEESW
AENEX
AEYZD
AFEFF
AFFNX
AFRAH
AGXLV
AHGAQ
AIDAL
ALMA_UNASSIGNED_HOLDINGS
ANPPW
ANTXH
AQSVZ
BAANH
BKOMP
CGR
CS3
CUPRZ
CUY
CVF
D0L
EBS
ECM
ED~
EIF
EJD
F5P
GGK
GNL
IH9
IHE
JG~
KZ1
LG6
LMP
NPM
OHT
P2P
PQQKQ
ROL
RXW
TAE
TN5
UHB
UI2
UKR
VF5
VG9
VQA
VXZ
W1F
WH7
X6Y
XSW
YIN
YR5
YZZ
ZCA
ZCG
~02
7X8
ABBLG
ABLBI
ABUFD
ADXHL
AETEA
AGQPQ
ID FETCH-LOGICAL-a439t-48391c25e2a453d7c7003ec6e1ef1981fdf8d8aef718071bb115d43090ce50da2
IEDL.DBID 7X8
ISICitedReferencesCount 71
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000247611600014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0003-2700
IngestDate Sun Nov 09 13:40:14 EST 2025
Wed Feb 19 02:12:12 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 13
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a439t-48391c25e2a453d7c7003ec6e1ef1981fdf8d8aef718071bb115d43090ce50da2
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
PMID 17550227
PQID 70671381
PQPubID 23479
ParticipantIDs proquest_miscellaneous_70671381
pubmed_primary_17550227
PublicationCentury 2000
PublicationDate 2007-07-01
PublicationDateYYYYMMDD 2007-07-01
PublicationDate_xml – month: 07
  year: 2007
  text: 2007-07-01
  day: 01
PublicationDecade 2000
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle Analytical chemistry (Washington)
PublicationTitleAlternate Anal Chem
PublicationYear 2007
SSID ssj0011016
Score 2.195692
Snippet Tandem mass spectrometry (MS/MS) has become the experimental method of choice for high-throughput proteomics-based biological discovery. The two primary ways...
SourceID proquest
pubmed
SourceType Aggregation Database
Index Database
StartPage 4870
SubjectTerms Algorithms
Computational Biology
Databases, Factual
Molecular Weight
Peptides - chemistry
Reproducibility of Results
Sensitivity and Specificity
Sequence Analysis - methods
Sequence Analysis - statistics & numerical data
Tandem Mass Spectrometry - methods
Tandem Mass Spectrometry - statistics & numerical data
Title MSNovo: a dynamic programming algorithm for de novo peptide sequencing via tandem mass spectrometry
URI https://www.ncbi.nlm.nih.gov/pubmed/17550227
https://www.proquest.com/docview/70671381
Volume 79
WOSCitedRecordID wos000247611600014&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LS8NAEB6qFdSDj_qqzz14DWbz2qwIIsXipaGgQm9hX9GCSaqt_f3O5oFexIOXQMiGDZPZmW9nZucDuAxUKIMok7i-0QQGUcgdkcXCYTJUGsGcpExXZBMsSeLJhI87cNOehbFlla1NrAy1LpWNkV8xNKsU3cvt7N2xnFE2t9oQaKxA10cgY7WcTb5zCHZf2vLl2fRq21fIo1dCoaa7Pi9-x5WVfxlu_-_LdmCrwZXkrlaEXeiYogfrg5bOrQebPzoP7oEaPSblsrwmguiak540lVo5Pifi7QXnWLzmBDEt0YYUOJjMbAUM3jTl13bgciqIDUaYnOQIw0l1cNN2QMA59-F5eP80eHAavgVHICxZ2Lgip8oLjSeC0NdMWRkZFRlqMspjmuks1rEwGfozRCZSIprUge9yV5nQ1cI7gNWiLMwREBkJzmzzL8X9gEkTG9xZadQLpjPhG78PF60sU5SDTVKIwpSf87SVZh8O69-Rzuq2GykCndA2PDz-890T2GhDsC49hW6GK9mcwZpaLqbzj_NKTfCajEdfxHjJkw
linkProvider ProQuest
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MSNovo%3A+a+dynamic+programming+algorithm+for+de+novo+peptide+sequencing+via+tandem+mass+spectrometry&rft.jtitle=Analytical+chemistry+%28Washington%29&rft.au=Mo%2C+Lijuan&rft.au=Dutta%2C+Debojyoti&rft.au=Wan%2C+Yunhu&rft.au=Chen%2C+Ting&rft.date=2007-07-01&rft.issn=0003-2700&rft.volume=79&rft.issue=13&rft.spage=4870&rft_id=info:doi/10.1021%2Fac070039n&rft_id=info%3Apmid%2F17550227&rft_id=info%3Apmid%2F17550227&rft.externalDocID=17550227
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0003-2700&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0003-2700&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0003-2700&client=summon