Melody Extraction and Musical Onset Detection via Probabilistic Models of Framewise STFT Peak Data
We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method operates on framewise short-time Fourier transform (STFT) peaks, enabling a computationally efficient inference of note onset, duration, and pi...
Uložené v:
| Vydané v: | IEEE transactions on audio, speech, and language processing Ročník 15; číslo 4; s. 1257 - 1272 |
|---|---|
| Hlavní autori: | , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
Piscataway, NJ
IEEE
01.05.2007
Institute of Electrical and Electronics Engineers |
| Predmet: | |
| ISSN: | 1558-7916 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method operates on framewise short-time Fourier transform (STFT) peaks, enabling a computationally efficient inference of note onset, duration, and pitch attributes while retaining sufficient information for pitch determination and spectral change detection. The system explicitly models note events in terms of transient and steady-state regions as well as possible gaps between note events. In this way, the system readily distinguishes abrupt spectral changes associated with musical onsets from other abrupt change events. Additionally, the method may incorporate melodic context by modeling note-to-note dependences. The method is successfully applied to a variety of piano and violin recordings containing reverberation, effective polyphony due to legato playing style, expressive pitch variations, and background voices. While the method does not provide a sample-accurate segmentation, it facilitates the latter in subsequent processing by isolating musical onsets to frame neighborhoods and identifying possible pitch content before and after the true onset sample location |
|---|---|
| AbstractList | We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method operates on framewise short-time Fourier transform (STFT) peaks, enabling a computationally efficient inference of note onset, duration, and pitch attributes while retaining sufficient information for pitch determination and spectral change detection. The system explicitly models note events in terms of transient and steady-state regions as well as possible gaps between note events. In this way, the system readily distinguishes abrupt spectral changes associated with musical onsets from other abrupt change events. Additionally, the method may incorporate melodic context by modeling note-to-note dependences. The method is successfully applied to a variety of piano and violin recordings containing reverberation, effective polyphony due to legato playing style, expressive pitch variations, and background voices. While the method does not provide a sample-accurate segmentation, it facilitates the latter in subsequent processing by isolating musical onsets to frame neighborhoods and identifying possible pitch content before and after the true onset sample location We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method operates on framewise short-time Fourier transform (STFT) peaks, enabling a computationally efficient inference of note onset, duration, and pitch attributes while retaining sufficient information for pitch determination and spectral change detection. The system explicitly models note events in terms of transient and steady-state regions as well as possible gaps between note events. In this way, the system readily distinguishes abrupt spectral changes associated with musical onsets from other abrupt change events. Additionally, the method may incorporate melodic context by modeling note-to-note dependences. The method is successfully applied to a variety of piano and violin recordings containing reverberation, effective polyphony due to legato playing style, expressive pitch variations, and background voices. While the method does not provide a sample-accurate segmentation, it facilitates the latter in subsequent processing by isolating musical onsets to frame neighborhoods and identifying possible pitch content before and after the true onset sample location. |
| Author | Leistikow, R.J. Berger, J. Thornburg, H. |
| Author_xml | – sequence: 1 givenname: H. surname: Thornburg fullname: Thornburg, H. organization: Dept. of Electr. Eng., Arizona State Univ., Tempe, AZ – sequence: 2 givenname: R.J. surname: Leistikow fullname: Leistikow, R.J. – sequence: 3 givenname: J. surname: Berger fullname: Berger, J. |
| BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=18707791$$DView record in Pascal Francis |
| BookMark | eNp9kT1PwzAURT2ARAvsSCxeQCwtduw4zoigBaRWrUSZI8d-kQxpXGyXj3-Pq1QgMTB5eOe-Z90zRAed6wChM0rGlJLyenXzNBtnhIixlKUk9AANaJ7LUVFScYSGIbwQwpngdIDqObTOfOHJZ_RKR-s6rDqD59tgtWrxogsQ8R1E6GfvVuGld7WqbWtDtBrPnYE2YNfgqVdr-LAB8NNqusJLUK_4TkV1gg4b1QY43b_H6Hk6Wd0-jGaL-8fbm9lIM8njyDCdPs8g40w2Nc3qWpOsLlnOeaEaA1ozQ40QNeSG0KY00ghitAEuGWN5yY7RZb93493bFkKs1jZoaFvVgduGigmWcVmQBF79C1JRUEZFnuUJvdijKqQ-Gq86bUO18Xat_FdF07oitZo40nPauxA8NL8IqXZOqp2Tauek6p2kiPgT0TaqXcvJhG3_C573QQsAP3c4zUVGS_YNFDCczQ |
| CODEN | ITASD8 |
| CitedBy_id | crossref_primary_10_1109_TCSII_2016_2534838 crossref_primary_10_1109_TASL_2008_919073 crossref_primary_10_1007_s11036_023_02175_x crossref_primary_10_1109_JSTSP_2011_2158804 crossref_primary_10_1080_09298210902890299 crossref_primary_10_1109_TASL_2009_2032947 crossref_primary_10_1007_s13173_013_0118_6 crossref_primary_10_1109_JSTSP_2011_2146229 crossref_primary_10_1109_TASL_2010_2045186 crossref_primary_10_1007_s00500_009_0416_2 crossref_primary_10_1109_TASL_2010_2041384 crossref_primary_10_1155_2008_231367 crossref_primary_10_1155_2009_729494 crossref_primary_10_1155_2010_523791 crossref_primary_10_1186_1687_4722_2010_523791 |
| Cites_doi | 10.1525/mp.2004.21.4.457 10.1109/ICASSP.2004.1326824 10.1109/34.761266 10.1109/5.18626 10.1109/ASPAA.1999.810864 10.1109/CDC.1980.271915 10.1093/oso/9780198538493.001.0001 10.1515/FREQ.1989.43.9.252 10.1111/j.2517-6161.1977.tb01600.x 10.1121/1.1914448 |
| ContentType | Journal Article |
| Copyright | 2007 INIST-CNRS |
| Copyright_xml | – notice: 2007 INIST-CNRS |
| DBID | 97E RIA RIE AAYXX CITATION IQODW 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/TASL.2006.889801 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Pascal-Francis Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts Computer and Information Systems Abstracts |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Music Applied Sciences |
| EndPage | 1272 |
| ExternalDocumentID | 18707791 10_1109_TASL_2006_889801 4156219 |
| Genre | orig-research |
| GroupedDBID | 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AASAJ AAWTH ABAZT ABQJQ ABVLG AETIX AGQYO AGSQL AHBIQ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL RIA RIE RNS AAYXX CITATION IQODW RIG 7SC 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-c384t-d3c1103e2438fb12bbc02b935447afdecc3d1d66be5d01f9d8d60dcde48333593 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 24 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000245909800013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1558-7916 |
| IngestDate | Fri Sep 05 09:36:27 EDT 2025 Thu Oct 02 11:08:07 EDT 2025 Mon Jul 21 09:14:23 EDT 2025 Sat Nov 29 02:11:19 EST 2025 Tue Nov 18 22:16:38 EST 2025 Tue Aug 26 16:38:56 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 4 |
| Keywords | Fourier transformation Probabilistic approach Segmentation Acoustic signal Change detection Signal estimation Unsteady state Violin pitch identification Modeling Steady state Dynamic Bayesian networks onset detection Audio signal music transcription Bayes network Localization Musical sound Pitch(acoustics) Musical score Musical instrument |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html CC BY 4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c384t-d3c1103e2438fb12bbc02b935447afdecc3d1d66be5d01f9d8d60dcde48333593 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 ObjectType-Article-2 ObjectType-Feature-1 |
| PQID | 1671316525 |
| PQPubID | 23500 |
| PageCount | 16 |
| ParticipantIDs | crossref_primary_10_1109_TASL_2006_889801 proquest_miscellaneous_1671316525 crossref_citationtrail_10_1109_TASL_2006_889801 ieee_primary_4156219 proquest_miscellaneous_36324870 pascalfrancis_primary_18707791 |
| PublicationCentury | 2000 |
| PublicationDate | 2007-05-01 |
| PublicationDateYYYYMMDD | 2007-05-01 |
| PublicationDate_xml | – month: 05 year: 2007 text: 2007-05-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationPlace | Piscataway, NJ |
| PublicationPlace_xml | – name: Piscataway, NJ |
| PublicationTitle | IEEE transactions on audio, speech, and language processing |
| PublicationTitleAbbrev | TASL |
| PublicationYear | 2007 |
| Publisher | IEEE Institute of Electrical and Electronics Engineers |
| Publisher_xml | – name: IEEE – name: Institute of Electrical and Electronics Engineers |
| References | ref13 duxbury (ref5) 2002 ref12 levine (ref16) 1998 dempster (ref4) 1977; 39 schloss (ref21) 1995 hainsworth (ref9) 2003 murphy (ref17) 1998 ref1 ref19 kailath (ref10) 2000 bishop (ref2) 1995 thornburg (ref25) 2004 leistikow (ref15) 2006 cemgil (ref3) 2004 sheh (ref22) 2003 ref26 leistikow (ref14) 2004 ref20 pavlovic (ref18) 2000 thornburg (ref24) 2005 ref8 smith (ref23) 1987 edler (ref6) 1989; 43 kashino (ref11) 1995 fearnhead (ref7) 2003 |
| References_xml | – start-page: 366 year: 2000 ident: ref18 article-title: a dynamic bayesian network approach to tracking using learned switching dynamic models publication-title: Proc Int Workshop Hybrid Syst – ident: ref13 doi: 10.1525/mp.2004.21.4.457 – year: 1995 ident: ref11 article-title: application of bayesian probability network to music scene analysis publication-title: Working Notes IJCAI Workshop Computat Auditory Scene Anal (IJCAI-CASA) – ident: ref12 doi: 10.1109/ICASSP.2004.1326824 – year: 2003 ident: ref9 publication-title: Techniques for the Automated Analysis of Musical Audio – ident: ref20 doi: 10.1109/34.761266 – year: 2002 ident: ref5 article-title: improved time-scaling of musical audio using phase locking at transients publication-title: Proc 112th AES Convention – year: 1995 ident: ref21 publication-title: On the automatic transcription of percussive musicFrom acoustic signal to high-level analysis – year: 2005 ident: ref24 publication-title: Detection and modeling of transient audio signals with prior information – year: 2003 ident: ref7 publication-title: Exact and efficient Bayesian inference for multiple changepoint problems – ident: ref19 doi: 10.1109/5.18626 – ident: ref26 doi: 10.1109/ASPAA.1999.810864 – year: 2004 ident: ref3 publication-title: Bayesian music transcription – year: 2000 ident: ref10 publication-title: Linear Estimation – ident: ref1 doi: 10.1109/CDC.1980.271915 – start-page: 290 year: 1987 ident: ref23 article-title: parshl: an analysis/synthesis program for nonharmonic sounds based on a sinusoidal representation publication-title: Proc 1987 Int Comput Music Conf (ICMC-87) – year: 2006 ident: ref15 publication-title: Bayesian modeling of musical expectations via maximum entropy stochastic grammars – year: 1995 ident: ref2 publication-title: Neural Networks for Pattern Recognition doi: 10.1093/oso/9780198538493.001.0001 – volume: 43 start-page: 252 year: 1989 ident: ref6 article-title: codierung von audiosignalen mit berlappender transformation und adaptiven fensterfunktionen publication-title: Frequenz doi: 10.1515/FREQ.1989.43.9.252 – start-page: 41 year: 2004 ident: ref25 article-title: a new probabilistic spectral pitch estimator: exact and mcmc-approximate strategies publication-title: Proc Comput Music Modeling Retrieval (CMMR-2004) – year: 1998 ident: ref17 publication-title: Filtering smoothing and the junction tree algorithm – start-page: 228 year: 2004 ident: ref14 article-title: bayesian identification of closely-spaced chords from single-frame stft peaks publication-title: Proc 7th Int Conf Digital Audio Effects (DAFx-04) – volume: 39 start-page: 1 year: 1977 ident: ref4 article-title: maximum likelihood from incomplete data via the em algorithm publication-title: J R Statist Soc Ser B doi: 10.1111/j.2517-6161.1977.tb01600.x – year: 1998 ident: ref16 article-title: a sines+transients+noise audio representation for data compression and time/pitch-scale modifications publication-title: Proc 105th AES Convention – ident: ref8 doi: 10.1121/1.1914448 – start-page: 183 year: 2003 ident: ref22 article-title: chord segmentation and recognition using em-trained hidden markov models publication-title: Proc 4th Int Symp Music Inf Retrieval (ISMIR-03) |
| SSID | ssj0043641 |
| Score | 1.9701501 |
| Snippet | We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method... |
| SourceID | proquest pascalfrancis crossref ieee |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 1257 |
| SubjectTerms | Acoustics Applied sciences Audio signals Bayesian methods Cities and towns Context modeling Data mining Detection, estimation, filtering, equalization, prediction Dynamic Bayesian networks Exact sciences and technology Extraction Fourier transforms Inference Information, signal and communications theory Mathematical models Music music transcription onset detection Pianos pitch identification Probabilistic methods Reverberation Robustness Segmentation Signal and communications theory Signal, noise Spectra Steady-state Telecommunications and information theory |
| Title | Melody Extraction and Musical Onset Detection via Probabilistic Models of Framewise STFT Peak Data |
| URI | https://ieeexplore.ieee.org/document/4156219 https://www.proquest.com/docview/1671316525 https://www.proquest.com/docview/36324870 |
| Volume | 15 |
| WOSCitedRecordID | wos000245909800013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE Xplore issn: 1558-7916 databaseCode: RIE dateStart: 20060101 customDbUrl: isFulltext: true dateEnd: 20131231 titleUrlDefault: https://ieeexplore.ieee.org/ omitProxy: false ssIdentifier: ssj0043641 providerName: IEEE |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6VwgEOPFoQ4VGMxAWJsHHs-HGsaFccoFTqIvUWxfZEWnWVoE22gn-PH9kFxEPiZsmPRDP22OMZfx_AK20kdQXH3BeEd1CEL1kr8pZTo1GakqKKZBPy7ExdXurzPXizewuDiDH5DN-GYozlu95uwlXZLDgbZcD4vCGlSG-1tlaXM8ETNmqlAgTjLiRZ6Nni-OJDCjsopdVE_7LdgiKnSsiIbAYvlDaxWfxmmONuM7_3f_95H-5Op0pynKbBA9jD7gDu_IQ1eAA3I6PzIZiPuOrdN3L6dVynRw2k6RyJtX6IT92AIznBEVPd9bIh52u_6EMSbcB0JoE9bTWQviXzmNi1HJBcLOYL4q3rFTlpxuYhfJ6fLt69zyeihdwyxcfcMesFxbDkTLWGlsbYojSaVZzLpnVey8xRJ4TByhW01U45UTjrkCvGWKXZI9jv-g4fA1HOFlShsCzApUvXaL_5tdILpHLKnwYzmG1lX9sJhTyQYazq6I0Uug7aCuSYok7ayuD1rseXhMDxj7aHQRu7dpMiMjj6Rb0_xvHWSvrZksHLrb5rv7xCzKTpsN8MNRXei6eiKqsMXvylDQuQ936kJ3_--lO4nS6EQ5bkM9gf1xt8Drfs9bgc1kdxFn8HJ-HvMg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Za9wwEB5CUmj70CNpqXskKvSlUHctS5alx9BkSelmG4gLeTO2NIalix3W3pD---rwblt6QN8EOmxmpJFGM_o-gDeqzqlJOMa2IKyDImxJaxE3nNYK8zqlKD3ZRD6fy6srdbED77ZvYRDRJ5_he1f0sXzT6bW7Kps4ZyN1GJ97GedpEl5rbewuZ4IHdNRMOhDGbVAyUZPi-HIWAg9SKjkSwGw2Ic-q4nIiq96KpQl8Fr-ZZr_fTB_-358-ggfjuZIch4nwGHaw3Yf7P6EN7sOe53Q-gPocl535Rk5vh1V41kCq1hBfa4f43PY4kBMcMNTdLCpysbLL3qXROlRn4vjTlj3pGjL1qV2LHsllMS2Ita9fyUk1VE_gy_S0-HAWj1QLsWaSD7Fh2gqKYcqZbGqa1rVO0loxK-e8aozVMzPUCFFjZhLaKCONSIw2yCVjLFPsKey2XYvPgEijEypRaOYA03NTKbv9NbkVSGakPQ9GMNnIvtQjDrmjw1iW3h9JVOm05egxRRm0FcHbbY_rgMHxj7YHThvbdqMiIjj8Rb0_xrH2KrezJYLXG32XdoG5qEnVYrfuSyqsH09FlmYRHP2lDXOg93ak53_--hHcPSvOZ-Xs4_zTC7gXroddzuRL2B1Wa3wFd_TNsOhXh35Gfwe7GfJ5 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Melody+extraction+and+musical+onset+detection+via+probabilistic+models+of+framewise+STFT+peak+data&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=THORNBURG%2C+Harvey&rft.au=LEISTIKOW%2C+Randal+J&rft.au=BERGER%2C+Jonathan&rft.date=2007-05-01&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=1558-7916&rft.volume=15&rft.issue=4&rft.spage=1257&rft.epage=1272&rft_id=info:doi/10.1109%2FTASL.2006.889801&rft.externalDBID=n%2Fa&rft.externalDocID=18707791 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon |