Melody Extraction and Musical Onset Detection via Probabilistic Models of Framewise STFT Peak Data

We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method operates on framewise short-time Fourier transform (STFT) peaks, enabling a computationally efficient inference of note onset, duration, and pi...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE transactions on audio, speech, and language processing Ročník 15; číslo 4; s. 1257 - 1272
Hlavní autori: Thornburg, H., Leistikow, R.J., Berger, J.
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Piscataway, NJ IEEE 01.05.2007
Institute of Electrical and Electronics Engineers
Predmet:
ISSN:1558-7916
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method operates on framewise short-time Fourier transform (STFT) peaks, enabling a computationally efficient inference of note onset, duration, and pitch attributes while retaining sufficient information for pitch determination and spectral change detection. The system explicitly models note events in terms of transient and steady-state regions as well as possible gaps between note events. In this way, the system readily distinguishes abrupt spectral changes associated with musical onsets from other abrupt change events. Additionally, the method may incorporate melodic context by modeling note-to-note dependences. The method is successfully applied to a variety of piano and violin recordings containing reverberation, effective polyphony due to legato playing style, expressive pitch variations, and background voices. While the method does not provide a sample-accurate segmentation, it facilitates the latter in subsequent processing by isolating musical onsets to frame neighborhoods and identifying possible pitch content before and after the true onset sample location
AbstractList We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method operates on framewise short-time Fourier transform (STFT) peaks, enabling a computationally efficient inference of note onset, duration, and pitch attributes while retaining sufficient information for pitch determination and spectral change detection. The system explicitly models note events in terms of transient and steady-state regions as well as possible gaps between note events. In this way, the system readily distinguishes abrupt spectral changes associated with musical onsets from other abrupt change events. Additionally, the method may incorporate melodic context by modeling note-to-note dependences. The method is successfully applied to a variety of piano and violin recordings containing reverberation, effective polyphony due to legato playing style, expressive pitch variations, and background voices. While the method does not provide a sample-accurate segmentation, it facilitates the latter in subsequent processing by isolating musical onsets to frame neighborhoods and identifying possible pitch content before and after the true onset sample location
We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method operates on framewise short-time Fourier transform (STFT) peaks, enabling a computationally efficient inference of note onset, duration, and pitch attributes while retaining sufficient information for pitch determination and spectral change detection. The system explicitly models note events in terms of transient and steady-state regions as well as possible gaps between note events. In this way, the system readily distinguishes abrupt spectral changes associated with musical onsets from other abrupt change events. Additionally, the method may incorporate melodic context by modeling note-to-note dependences. The method is successfully applied to a variety of piano and violin recordings containing reverberation, effective polyphony due to legato playing style, expressive pitch variations, and background voices. While the method does not provide a sample-accurate segmentation, it facilitates the latter in subsequent processing by isolating musical onsets to frame neighborhoods and identifying possible pitch content before and after the true onset sample location.
Author Leistikow, R.J.
Berger, J.
Thornburg, H.
Author_xml – sequence: 1
  givenname: H.
  surname: Thornburg
  fullname: Thornburg, H.
  organization: Dept. of Electr. Eng., Arizona State Univ., Tempe, AZ
– sequence: 2
  givenname: R.J.
  surname: Leistikow
  fullname: Leistikow, R.J.
– sequence: 3
  givenname: J.
  surname: Berger
  fullname: Berger, J.
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=18707791$$DView record in Pascal Francis
BookMark eNp9kT1PwzAURT2ARAvsSCxeQCwtduw4zoigBaRWrUSZI8d-kQxpXGyXj3-Pq1QgMTB5eOe-Z90zRAed6wChM0rGlJLyenXzNBtnhIixlKUk9AANaJ7LUVFScYSGIbwQwpngdIDqObTOfOHJZ_RKR-s6rDqD59tgtWrxogsQ8R1E6GfvVuGld7WqbWtDtBrPnYE2YNfgqVdr-LAB8NNqusJLUK_4TkV1gg4b1QY43b_H6Hk6Wd0-jGaL-8fbm9lIM8njyDCdPs8g40w2Nc3qWpOsLlnOeaEaA1ozQ40QNeSG0KY00ghitAEuGWN5yY7RZb93493bFkKs1jZoaFvVgduGigmWcVmQBF79C1JRUEZFnuUJvdijKqQ-Gq86bUO18Xat_FdF07oitZo40nPauxA8NL8IqXZOqp2Tauek6p2kiPgT0TaqXcvJhG3_C573QQsAP3c4zUVGS_YNFDCczQ
CODEN ITASD8
CitedBy_id crossref_primary_10_1109_TCSII_2016_2534838
crossref_primary_10_1109_TASL_2008_919073
crossref_primary_10_1007_s11036_023_02175_x
crossref_primary_10_1109_JSTSP_2011_2158804
crossref_primary_10_1080_09298210902890299
crossref_primary_10_1109_TASL_2009_2032947
crossref_primary_10_1007_s13173_013_0118_6
crossref_primary_10_1109_JSTSP_2011_2146229
crossref_primary_10_1109_TASL_2010_2045186
crossref_primary_10_1007_s00500_009_0416_2
crossref_primary_10_1109_TASL_2010_2041384
crossref_primary_10_1155_2008_231367
crossref_primary_10_1155_2009_729494
crossref_primary_10_1155_2010_523791
crossref_primary_10_1186_1687_4722_2010_523791
Cites_doi 10.1525/mp.2004.21.4.457
10.1109/ICASSP.2004.1326824
10.1109/34.761266
10.1109/5.18626
10.1109/ASPAA.1999.810864
10.1109/CDC.1980.271915
10.1093/oso/9780198538493.001.0001
10.1515/FREQ.1989.43.9.252
10.1111/j.2517-6161.1977.tb01600.x
10.1121/1.1914448
ContentType Journal Article
Copyright 2007 INIST-CNRS
Copyright_xml – notice: 2007 INIST-CNRS
DBID 97E
RIA
RIE
AAYXX
CITATION
IQODW
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TASL.2006.889801
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Pascal-Francis
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Music
Applied Sciences
EndPage 1272
ExternalDocumentID 18707791
10_1109_TASL_2006_889801
4156219
Genre orig-research
GroupedDBID 0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
AETIX
AGQYO
AGSQL
AHBIQ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
RIA
RIE
RNS
AAYXX
CITATION
IQODW
RIG
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c384t-d3c1103e2438fb12bbc02b935447afdecc3d1d66be5d01f9d8d60dcde48333593
IEDL.DBID RIE
ISICitedReferencesCount 24
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000245909800013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1558-7916
IngestDate Fri Sep 05 09:36:27 EDT 2025
Thu Oct 02 11:08:07 EDT 2025
Mon Jul 21 09:14:23 EDT 2025
Sat Nov 29 02:11:19 EST 2025
Tue Nov 18 22:16:38 EST 2025
Tue Aug 26 16:38:56 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 4
Keywords Fourier transformation
Probabilistic approach
Segmentation
Acoustic signal
Change detection
Signal estimation
Unsteady state
Violin
pitch identification
Modeling
Steady state
Dynamic Bayesian networks
onset detection
Audio signal
music transcription
Bayes network
Localization
Musical sound
Pitch(acoustics)
Musical score
Musical instrument
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
CC BY 4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c384t-d3c1103e2438fb12bbc02b935447afdecc3d1d66be5d01f9d8d60dcde48333593
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
PQID 1671316525
PQPubID 23500
PageCount 16
ParticipantIDs crossref_primary_10_1109_TASL_2006_889801
proquest_miscellaneous_1671316525
crossref_citationtrail_10_1109_TASL_2006_889801
ieee_primary_4156219
proquest_miscellaneous_36324870
pascalfrancis_primary_18707791
PublicationCentury 2000
PublicationDate 2007-05-01
PublicationDateYYYYMMDD 2007-05-01
PublicationDate_xml – month: 05
  year: 2007
  text: 2007-05-01
  day: 01
PublicationDecade 2000
PublicationPlace Piscataway, NJ
PublicationPlace_xml – name: Piscataway, NJ
PublicationTitle IEEE transactions on audio, speech, and language processing
PublicationTitleAbbrev TASL
PublicationYear 2007
Publisher IEEE
Institute of Electrical and Electronics Engineers
Publisher_xml – name: IEEE
– name: Institute of Electrical and Electronics Engineers
References ref13
duxbury (ref5) 2002
ref12
levine (ref16) 1998
dempster (ref4) 1977; 39
schloss (ref21) 1995
hainsworth (ref9) 2003
murphy (ref17) 1998
ref1
ref19
kailath (ref10) 2000
bishop (ref2) 1995
thornburg (ref25) 2004
leistikow (ref15) 2006
cemgil (ref3) 2004
sheh (ref22) 2003
ref26
leistikow (ref14) 2004
ref20
pavlovic (ref18) 2000
thornburg (ref24) 2005
ref8
smith (ref23) 1987
edler (ref6) 1989; 43
kashino (ref11) 1995
fearnhead (ref7) 2003
References_xml – start-page: 366
  year: 2000
  ident: ref18
  article-title: a dynamic bayesian network approach to tracking using learned switching dynamic models
  publication-title: Proc Int Workshop Hybrid Syst
– ident: ref13
  doi: 10.1525/mp.2004.21.4.457
– year: 1995
  ident: ref11
  article-title: application of bayesian probability network to music scene analysis
  publication-title: Working Notes IJCAI Workshop Computat Auditory Scene Anal (IJCAI-CASA)
– ident: ref12
  doi: 10.1109/ICASSP.2004.1326824
– year: 2003
  ident: ref9
  publication-title: Techniques for the Automated Analysis of Musical Audio
– ident: ref20
  doi: 10.1109/34.761266
– year: 2002
  ident: ref5
  article-title: improved time-scaling of musical audio using phase locking at transients
  publication-title: Proc 112th AES Convention
– year: 1995
  ident: ref21
  publication-title: On the automatic transcription of percussive musicFrom acoustic signal to high-level analysis
– year: 2005
  ident: ref24
  publication-title: Detection and modeling of transient audio signals with prior information
– year: 2003
  ident: ref7
  publication-title: Exact and efficient Bayesian inference for multiple changepoint problems
– ident: ref19
  doi: 10.1109/5.18626
– ident: ref26
  doi: 10.1109/ASPAA.1999.810864
– year: 2004
  ident: ref3
  publication-title: Bayesian music transcription
– year: 2000
  ident: ref10
  publication-title: Linear Estimation
– ident: ref1
  doi: 10.1109/CDC.1980.271915
– start-page: 290
  year: 1987
  ident: ref23
  article-title: parshl: an analysis/synthesis program for nonharmonic sounds based on a sinusoidal representation
  publication-title: Proc 1987 Int Comput Music Conf (ICMC-87)
– year: 2006
  ident: ref15
  publication-title: Bayesian modeling of musical expectations via maximum entropy stochastic grammars
– year: 1995
  ident: ref2
  publication-title: Neural Networks for Pattern Recognition
  doi: 10.1093/oso/9780198538493.001.0001
– volume: 43
  start-page: 252
  year: 1989
  ident: ref6
  article-title: codierung von audiosignalen mit berlappender transformation und adaptiven fensterfunktionen
  publication-title: Frequenz
  doi: 10.1515/FREQ.1989.43.9.252
– start-page: 41
  year: 2004
  ident: ref25
  article-title: a new probabilistic spectral pitch estimator: exact and mcmc-approximate strategies
  publication-title: Proc Comput Music Modeling Retrieval (CMMR-2004)
– year: 1998
  ident: ref17
  publication-title: Filtering smoothing and the junction tree algorithm
– start-page: 228
  year: 2004
  ident: ref14
  article-title: bayesian identification of closely-spaced chords from single-frame stft peaks
  publication-title: Proc 7th Int Conf Digital Audio Effects (DAFx-04)
– volume: 39
  start-page: 1
  year: 1977
  ident: ref4
  article-title: maximum likelihood from incomplete data via the em algorithm
  publication-title: J R Statist Soc Ser B
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– year: 1998
  ident: ref16
  article-title: a sines+transients+noise audio representation for data compression and time/pitch-scale modifications
  publication-title: Proc 105th AES Convention
– ident: ref8
  doi: 10.1121/1.1914448
– start-page: 183
  year: 2003
  ident: ref22
  article-title: chord segmentation and recognition using em-trained hidden markov models
  publication-title: Proc 4th Int Symp Music Inf Retrieval (ISMIR-03)
SSID ssj0043641
Score 1.9701501
Snippet We propose a probabilistic method for the joint segmentation and melody extraction for musical audio signals which arise from a monophonic score. The method...
SourceID proquest
pascalfrancis
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1257
SubjectTerms Acoustics
Applied sciences
Audio signals
Bayesian methods
Cities and towns
Context modeling
Data mining
Detection, estimation, filtering, equalization, prediction
Dynamic Bayesian networks
Exact sciences and technology
Extraction
Fourier transforms
Inference
Information, signal and communications theory
Mathematical models
Music
music transcription
onset detection
Pianos
pitch identification
Probabilistic methods
Reverberation
Robustness
Segmentation
Signal and communications theory
Signal, noise
Spectra
Steady-state
Telecommunications and information theory
Title Melody Extraction and Musical Onset Detection via Probabilistic Models of Framewise STFT Peak Data
URI https://ieeexplore.ieee.org/document/4156219
https://www.proquest.com/docview/1671316525
https://www.proquest.com/docview/36324870
Volume 15
WOSCitedRecordID wos000245909800013&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE Xplore
  issn: 1558-7916
  databaseCode: RIE
  dateStart: 20060101
  customDbUrl:
  isFulltext: true
  dateEnd: 20131231
  titleUrlDefault: https://ieeexplore.ieee.org/
  omitProxy: false
  ssIdentifier: ssj0043641
  providerName: IEEE
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwEB6VwgEOPFoQ4VGMxAWJsHHs-HGsaFccoFTqIvUWxfZEWnWVoE22gn-PH9kFxEPiZsmPRDP22OMZfx_AK20kdQXH3BeEd1CEL1kr8pZTo1GakqKKZBPy7ExdXurzPXizewuDiDH5DN-GYozlu95uwlXZLDgbZcD4vCGlSG-1tlaXM8ETNmqlAgTjLiRZ6Nni-OJDCjsopdVE_7LdgiKnSsiIbAYvlDaxWfxmmONuM7_3f_95H-5Op0pynKbBA9jD7gDu_IQ1eAA3I6PzIZiPuOrdN3L6dVynRw2k6RyJtX6IT92AIznBEVPd9bIh52u_6EMSbcB0JoE9bTWQviXzmNi1HJBcLOYL4q3rFTlpxuYhfJ6fLt69zyeihdwyxcfcMesFxbDkTLWGlsbYojSaVZzLpnVey8xRJ4TByhW01U45UTjrkCvGWKXZI9jv-g4fA1HOFlShsCzApUvXaL_5tdILpHLKnwYzmG1lX9sJhTyQYazq6I0Uug7aCuSYok7ayuD1rseXhMDxj7aHQRu7dpMiMjj6Rb0_xvHWSvrZksHLrb5rv7xCzKTpsN8MNRXei6eiKqsMXvylDQuQ936kJ3_--lO4nS6EQ5bkM9gf1xt8Drfs9bgc1kdxFn8HJ-HvMg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Za9wwEB5CUmj70CNpqXskKvSlUHctS5alx9BkSelmG4gLeTO2NIalix3W3pD---rwblt6QN8EOmxmpJFGM_o-gDeqzqlJOMa2IKyDImxJaxE3nNYK8zqlKD3ZRD6fy6srdbED77ZvYRDRJ5_he1f0sXzT6bW7Kps4ZyN1GJ97GedpEl5rbewuZ4IHdNRMOhDGbVAyUZPi-HIWAg9SKjkSwGw2Ic-q4nIiq96KpQl8Fr-ZZr_fTB_-358-ggfjuZIch4nwGHaw3Yf7P6EN7sOe53Q-gPocl535Rk5vh1V41kCq1hBfa4f43PY4kBMcMNTdLCpysbLL3qXROlRn4vjTlj3pGjL1qV2LHsllMS2Ita9fyUk1VE_gy_S0-HAWj1QLsWaSD7Fh2gqKYcqZbGqa1rVO0loxK-e8aozVMzPUCFFjZhLaKCONSIw2yCVjLFPsKey2XYvPgEijEypRaOYA03NTKbv9NbkVSGakPQ9GMNnIvtQjDrmjw1iW3h9JVOm05egxRRm0FcHbbY_rgMHxj7YHThvbdqMiIjj8Rb0_xrH2KrezJYLXG32XdoG5qEnVYrfuSyqsH09FlmYRHP2lDXOg93ak53_--hHcPSvOZ-Xs4_zTC7gXroddzuRL2B1Wa3wFd_TNsOhXh35Gfwe7GfJ5
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Melody+extraction+and+musical+onset+detection+via+probabilistic+models+of+framewise+STFT+peak+data&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=THORNBURG%2C+Harvey&rft.au=LEISTIKOW%2C+Randal+J&rft.au=BERGER%2C+Jonathan&rft.date=2007-05-01&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=1558-7916&rft.volume=15&rft.issue=4&rft.spage=1257&rft.epage=1272&rft_id=info:doi/10.1109%2FTASL.2006.889801&rft.externalDBID=n%2Fa&rft.externalDocID=18707791
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon