Bass Accompaniment Generation Via Latent Diffusion

The ability to automatically generate music that appropriately matches an arbitrary input track is a challenging task. We present a novel controllable system for generating single stems to accompany musical mixes of arbitrary length. At the core of our method are audio autoencoders that efficiently...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 1166 - 1170
Main Authors: Pasini, Marco, Grachten, Maarten, Lattner, Stefan
Format: Conference Proceeding
Language:English
Published: IEEE 14.04.2024
Subjects:
ISSN:2379-190X
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract The ability to automatically generate music that appropriately matches an arbitrary input track is a challenging task. We present a novel controllable system for generating single stems to accompany musical mixes of arbitrary length. At the core of our method are audio autoencoders that efficiently compress audio waveform samples into invertible latent representations, and a conditional latent diffusion model that takes as input the latent encoding of a mix and generates the latent encoding of a corresponding stem. To provide control over the timbre of generated samples, we introduce a technique to ground the latent space to a user-provided reference style during diffusion sampling. For further improving audio quality, we adapt classifier-free guidance to avoid distortions at high guidance strengths when generating an unbounded latent space. We train our model on a dataset of pairs of mixes and matching bass stems. Quantitative experiments demonstrate that, given an input mix, the proposed system can generate basslines with user-specified timbres. Our controllable conditional audio generation framework represents a significant step forward in creating generative AI tools to assist musicians in music production.
AbstractList The ability to automatically generate music that appropriately matches an arbitrary input track is a challenging task. We present a novel controllable system for generating single stems to accompany musical mixes of arbitrary length. At the core of our method are audio autoencoders that efficiently compress audio waveform samples into invertible latent representations, and a conditional latent diffusion model that takes as input the latent encoding of a mix and generates the latent encoding of a corresponding stem. To provide control over the timbre of generated samples, we introduce a technique to ground the latent space to a user-provided reference style during diffusion sampling. For further improving audio quality, we adapt classifier-free guidance to avoid distortions at high guidance strengths when generating an unbounded latent space. We train our model on a dataset of pairs of mixes and matching bass stems. Quantitative experiments demonstrate that, given an input mix, the proposed system can generate basslines with user-specified timbres. Our controllable conditional audio generation framework represents a significant step forward in creating generative AI tools to assist musicians in music production.
Author Pasini, Marco
Grachten, Maarten
Lattner, Stefan
Author_xml – sequence: 1
  givenname: Marco
  surname: Pasini
  fullname: Pasini, Marco
  organization: Sony Computer Science Laboratories,Paris,France
– sequence: 2
  givenname: Maarten
  surname: Grachten
  fullname: Grachten, Maarten
  organization: Sony Computer Science Laboratories,Paris,France
– sequence: 3
  givenname: Stefan
  surname: Lattner
  fullname: Lattner, Stefan
  organization: Sony Computer Science Laboratories,Paris,France
BookMark eNo1j81Kw0AUhUdRsK19AxfxARLv_M8sa9VaCFSoirsymdyBETMpmbjw7Y2oqwPfgcN35uQs9QkJuaZQUQr2Zrte7fdPwggjKwZMVBSEUALghCyttoZL4GIq6SmZMa5tSS28XZB5zu8AYLQwM8JuXc7Fyvu-O7oUO0xjscGEgxtjn4rX6IrajT_0LobwmSd4Sc6D-8i4_MsFeXm4f14_lvVuMynVZWSCjaVnugUn0ChE04QWtALOofUyOA0ygOJeSe9bOkk3aLVT2DROCqq0hpbyBbn63Y2IeDgOsXPD1-H_I_8GSdBIPg
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP48485.2024.10446400
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Music
EISBN 9798350344851
EISSN 2379-190X
EndPage 1170
ExternalDocumentID 10446400
Genre orig-research
GroupedDBID 23M
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i242t-c27d0a4e86ee8bfd0760330dc5fa705f063c65ccd1446be97a6ebba5416770d13
IEDL.DBID RIE
ISICitedReferencesCount 2
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001285850001096&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:33:51 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i242t-c27d0a4e86ee8bfd0760330dc5fa705f063c65ccd1446be97a6ebba5416770d13
PageCount 5
ParticipantIDs ieee_primary_10446400
PublicationCentury 2000
PublicationDate 2024-04-14
PublicationDateYYYYMMDD 2024-04-14
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-04-14
  day: 14
PublicationDecade 2020
PublicationTitle Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998)
PublicationTitleAbbrev ICASSP
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.344087
Snippet The ability to automatically generate music that appropriately matches an arbitrary input track is a challenging task. We present a novel controllable system...
SourceID ieee
SourceType Publisher
StartPage 1166
SubjectTerms accompaniment
Adaptation models
Aerospace electronics
bass
Control systems
diffusion
Encoding
generation
Impedance matching
music
Production
Training
Title Bass Accompaniment Generation Via Latent Diffusion
URI https://ieeexplore.ieee.org/document/10446400
WOSCitedRecordID wos001285850001096&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEB5sFdGLWiu-WcHr1n1kN8lRq0VBSqEqvZU8JrCXVmrr73eSbasePHhbAsvCN5mdmUzm-wCuU0zJx4SOCy3LmKnUu5QxcYkU74zLpHK12ATv98VoJAfLYfUwC4OI4fIZdvxj6OXbqVn4ozLycCpeaNM1oME5r4e11r9dwZnYhqsliebNU_d2OBwwwURBVWDGOquXf8mohCjS2_vn9_eh_T2PFw3WkeYANnDSgt0fVIIt2AyCzYeQ3VE2HHkNCO_ngbo_qqmlvQWit0pFz5Re0up95dzCn5W14bX38NJ9jJe6CHFFAXUem4zbRDEUJaLQzvrmWp4n1hRO8aRwlHUYAtpYX-tplFyVqLUqKPfiPLFpfgTNyXSCxxBJI4QjLLUtkZWJITvl0uZSycTwVPMTaHsYxu819cV4hcDpH-tnsOPB9u2WlJ1Dcz5b4AVsmc959TG7DAb7AtQQlQ4
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LTwIxEJ4oGB8XFTG-XROvi_vobtujogQiEhLQcCN9TJO9gEHw99t2AfXgwVvTpEkz0-nMdDrfB3AbY2xtjMkwkzwPiYidSSkV5mj9nTIJF6Ykm6C9HhuNeH_ZrO57YRDRfz7Dhhv6Wr6eqoV7KrMWbpMXe-g2oZoRksRlu9b64mWUsG24WcJo3nWa94NBnzDCMpsHJqSxWv6LSMX7kdb-P3dwAPXvjrygv_Y1h7CBkxrs_QATrEHVUzYfQfJg4-HAsUA4S_fg_UEJLu10ELwVIujaANPOPhbGLNxrWR1eW0_DZjtcMiOEhXWp81AlVEeCIMsRmTTaldfSNNIqM4JGmbFxh7KiVtplexI5FTlKKTIbfVEa6Tg9hspkOsETCLhizFBKpc6R5JGymkq5TrngkaKxpKdQd2IYv5fgF-OVBM7-mL-GnfbwpTvudnrP57DrBO-KLzG5gMp8tsBL2FKf8-JjduWV9wXIi5hV
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+of+the+...+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%281998%29&rft.atitle=Bass+Accompaniment+Generation+Via+Latent+Diffusion&rft.au=Pasini%2C+Marco&rft.au=Grachten%2C+Maarten&rft.au=Lattner%2C+Stefan&rft.date=2024-04-14&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1166&rft.epage=1170&rft_id=info:doi/10.1109%2FICASSP48485.2024.10446400&rft.externalDocID=10446400