Voice adaptation by color-encoded frame matching as a multi-objective optimization problem for future games
Voice adaptation is an interactive speech processing technique that allows the speaker to transmit with a chosen target voice. We propose a novel method that is intended for dynamic scenarios, such as online video games, where the source speaker’s and target speaker’s data are nonaligned. This would...
Uloženo v:
| Vydáno v: | Complex & intelligent systems Ročník 8; číslo 2; s. 1539 - 1550 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Cham
Springer International Publishing
01.04.2022
Springer Nature B.V |
| Témata: | |
| ISSN: | 2199-4536, 2198-6053 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Voice adaptation is an interactive speech processing technique that allows the speaker to transmit with a chosen target voice. We propose a novel method that is intended for dynamic scenarios, such as online video games, where the source speaker’s and target speaker’s data are nonaligned. This would yield massive improvements to immersion and experience by fully becoming a character, and address privacy concerns to protect against harassment by disguising the voice. With unaligned data, traditional methods, e.g., probabilistic models become inaccurate, while recent methods such as deep neural networks (DNN) require too substantial preparation work. Common methods require multiple subjects to be trained in parallel, which constraints practicality in productive environments. Our proposal trains a subject nonparallel into a voice profile used against any unknown source speaker. Prosodic data such as pitch, power and temporal structure are encoded into RGBA-colored frames used in a multi-objective optimization problem to adjust interrelated features based on color likeness. Finally, frames are smoothed and adjusted before output. The method was evaluated using Mean Opinion Score, ABX, MUSHRA, Single Ease Questions and performance benchmarks using two voice profiles of varying sizes and lastly discussion regarding game implementation. Results show improved adaptation quality, especially in a larger voice profile, and audience is positive about using such technology in future games. |
|---|---|
| AbstractList | Voice adaptation is an interactive speech processing technique that allows the speaker to transmit with a chosen target voice. We propose a novel method that is intended for dynamic scenarios, such as online video games, where the source speaker’s and target speaker’s data are nonaligned. This would yield massive improvements to immersion and experience by fully becoming a character, and address privacy concerns to protect against harassment by disguising the voice. With unaligned data, traditional methods, e.g., probabilistic models become inaccurate, while recent methods such as deep neural networks (DNN) require too substantial preparation work. Common methods require multiple subjects to be trained in parallel, which constraints practicality in productive environments. Our proposal trains a subject nonparallel into a voice profile used against any unknown source speaker. Prosodic data such as pitch, power and temporal structure are encoded into RGBA-colored frames used in a multi-objective optimization problem to adjust interrelated features based on color likeness. Finally, frames are smoothed and adjusted before output. The method was evaluated using Mean Opinion Score, ABX, MUSHRA, Single Ease Questions and performance benchmarks using two voice profiles of varying sizes and lastly discussion regarding game implementation. Results show improved adaptation quality, especially in a larger voice profile, and audience is positive about using such technology in future games. |
| Author | Sato, Yuji Midtlyng, Mads Hosobe, Hiroshi |
| Author_xml | – sequence: 1 givenname: Mads orcidid: 0000-0002-4303-1697 surname: Midtlyng fullname: Midtlyng, Mads email: midtlyng.madsalexander.9c@stu.hosei.ac.jp organization: Department of Computer Science, Hosei University – sequence: 2 givenname: Yuji surname: Sato fullname: Sato, Yuji organization: Faculty of Computer and Information Sciences, Hosei University – sequence: 3 givenname: Hiroshi surname: Hosobe fullname: Hosobe, Hiroshi organization: Faculty of Computer and Information Sciences, Hosei University |
| BookMark | eNp9kMtOwzAQRS1UJErpD7CyxNowfsRulqjiJVViA2wtx3FKShMX20EqX4_bILFjNbO4c-bqnKNJ73uH0CWFawqgbqIAJRQBRgmABEHkCZoyWi6IhIJPjntJRMHlGZrHuAEAqtSCA5uijzffWodNbXbJpNb3uNpj67c-ENdbX7saN8F0Dncm2fe2X2MTscHdsE0t8dXG2dR-Oex3qe3a75GwC77aug43PuBmSENweJ0R8QKdNmYb3fx3ztDr_d3L8pGsnh-elrcrYnPHRLionWooKwtR06KyleWVqHgjhGFqoQSVMvdnYACM5QYcrRrDDCvNolS2lnyGrkZuLvI5uJj0xg-hzy81k4Uss6yizCk2pmzwMQbX6F1oOxP2moI-eNWjV5296qNXfUDz8SjmcL924Q_9z9UPJ019qQ |
| Cites_doi | 10.21437/Interspeech.2010-596 10.1109/ICASSP.2018.8462342 10.1109/TASL.2012.2227735 10.1109/SMC.2018.00351 10.1109/APSIPA.2016.7820786 10.21437/Odyssey.2018-28 10.1109/ICASSP.2001.941046 10.1109/APSIPA.2017.8282216 10.1109/ICASSP.1988.196671 10.21437/Interspeech.2018-1190 10.1109/TASLP.2021.3049336 10.1109/SLT.2012.6424242 10.1016/j.chb.2013.07.014 10.23919/APSIPA.2018.8659628 10.1109/TASLP.2014.2333242 10.1109/CoG47356.2020.9231643 10.1109/TSA.2005.860839 10.1109/ICASSP.1998.674423 10.1109/ICASSP.2011.5947510 10.1109/ICASSP39728.2021.9413391 10.1109/TASLP.2021.3066047 10.1109/ICASSP.2009.4960401 10.1109/ICASSP.2019.8682897 10.1109/TEVC.2007.892759 10.1109/SMC.2016.7844220 10.21437/Interspeech.2007-550 10.5220/0006193301640174 10.1109/TASLP.2014.2353991 10.1016/0167-6393(95)90054-3 10.21437/Eurospeech.2003-664 10.1016/j.asoc.2004.06.005 |
| ContentType | Journal Article |
| Copyright | The Author(s) 2021 The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: The Author(s) 2021 – notice: The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | C6C AAYXX CITATION 8FE 8FG ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS |
| DOI | 10.1007/s40747-021-00604-6 |
| DatabaseName | Springer Nature OA Free Journals CrossRef ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest SciTech Premium Collection Technology Collection Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central ProQuest Technology Collection ProQuest One ProQuest Central Korea SciTech Premium Collection Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China |
| DatabaseTitle | CrossRef Publicly Available Content Database Advanced Technologies & Aerospace Collection Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central Advanced Technologies & Aerospace Database ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) |
| DatabaseTitleList | CrossRef Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Mathematics |
| EISSN | 2198-6053 |
| EndPage | 1550 |
| ExternalDocumentID | 10_1007_s40747_021_00604_6 |
| GrantInformation_xml | – fundername: Japan Society for the Promotion of Science grantid: JP19K12162 funderid: http://dx.doi.org/10.13039/501100001691 |
| GroupedDBID | 0R~ 8FE 8FG AAJSJ AAKKN ABEEZ ABFTD ACACY ACGFS ACULB ADINQ ADMLS AFGXO AFKRA AHBYD AHSBF AHYZX ALMA_UNASSIGNED_HOLDINGS AMKLP ARAPS ASPBG AVWKF BAPOH BENPR BGLVJ C24 C6C CCPQU EBLON EBS EJD GROUPED_DOAJ HCIFZ IAO ISR ITC M~E OK1 P62 PIMPY PROAC RSV SOJ AASML AAYXX AFFHD CITATION PHGZM PHGZT PQGLB ABUWG AZQEC DWQXO PKEHL PQEST PQQKQ PQUKI PRINS |
| ID | FETCH-LOGICAL-c453t-34de7f12954d15bcbc3b4b3f44a2787416601720a00ac3a0e1bfa2a29a897cd63 |
| IEDL.DBID | C24 |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000738565700011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2199-4536 |
| IngestDate | Wed Oct 08 14:21:12 EDT 2025 Sat Nov 29 05:48:57 EST 2025 Fri Feb 21 02:45:55 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Keywords | Video games Multi-objective optimization problems Voice adaptation Color-encoding Speech processing |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c453t-34de7f12954d15bcbc3b4b3f44a2787416601720a00ac3a0e1bfa2a29a897cd63 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0000-0002-4303-1697 |
| OpenAccessLink | https://link.springer.com/10.1007/s40747-021-00604-6 |
| PQID | 2656974759 |
| PQPubID | 2044308 |
| PageCount | 12 |
| ParticipantIDs | proquest_journals_2656974759 crossref_primary_10_1007_s40747_021_00604_6 springer_journals_10_1007_s40747_021_00604_6 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-04-01 |
| PublicationDateYYYYMMDD | 2022-04-01 |
| PublicationDate_xml | – month: 04 year: 2022 text: 2022-04-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Cham |
| PublicationPlace_xml | – name: Cham – name: Heidelberg |
| PublicationTitle | Complex & intelligent systems |
| PublicationTitleAbbrev | Complex Intell. Syst |
| PublicationYear | 2022 |
| Publisher | Springer International Publishing Springer Nature B.V |
| Publisher_xml | – name: Springer International Publishing – name: Springer Nature B.V |
| References | Takashima R, Takiguchi T, Ariki Y (2012) Exemplar-based Voice conversion in noisy environment. In: IEEE Spoken Language Technology Workshop (SLT), Miami, pp 313–317 Li Y, Lee KA, Yuan Y, Li H, Yang Z (2018) Many-to-many voice conversion based on bottleneck features with variational autoencoder for non-parallel training data. In: Asia–Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Hawaii, pp 829–833 YeHYoungSQuality-enhanced voice morphing using maximum likelihood transformationsIEEE Trans Audio Speech Lang Process20061441301131210.1109/TSA.2005.860839 Kotani G, Saito D, Minematsu N (2017) Voice conversion based on deep neural networks for time-variant linear transformations. In: Asia–Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, pp 1259–1262 Microsoft WebView2 web rendering (2020) [Online]. Available: https://docs.microsoft.com/en-us/microsoft-edge/webview2 VillavicencioFBonadaJVoice conversion using deep neural networks with layer-wise generative trainingIEEE/ACM Trans Audio Speech Lange Process (TASLP) J2014221859187210.1109/TASLP.2014.2353991 FoxJTangWYSexism in online video games: the role of conformity to masculine norms and social dominance orientationComput Hum Behav20143331432010.1016/j.chb.2013.07.014 Villavicencio F, Bonada J (2010) Applying voice conversion to concatenative singing-voice synthesis. In: 11th Annual Conference of the International Speech Communication Association (INTERSPEECH), Chiba, pp 2162–2165 Liu L, Ling Z, Jiang Y, Zhou M, Dai L (2018) WaveNet Vocoder with limited training data for voice conversion. In: Proc. Interspeech 2018, pp 1983–1987 Midtlyng M, Sato Y (2016) Real-time voice adaptation with abstract normalization and sound-indexed based search. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, pp 60–65 Sekii Y, Orihara R, Kojima K, Sei Y, Tahara Y, Ohsuga A (2017) Fast many-to-one voice conversion using autoencoders. In: International Conference on Agents and Artificial Intelligence (ICAART), Porto, pp 164–174 Kaneko T, Kameoka T, Tanaka K, Hojo N (2019) Cyclegan-VC2: improved cyclegan-based non-parallel voice conversion. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, pp 6820–6824 Erro D, Moreno A (2007) Weighted frequency warping for voice conversion. In: 8th Annual Conference of the International Speech Communication Association INTERSPEECH, Antwerp, pp 1965–1968 StylianouYCappéOMoulinesEContinuous probabilistic transform for voice conversionIEEE Trans Speech Audio Process19981285288 ErroDNavasEHernáezIParametric voice conversion based on bilinear frequency warping plus amplitude scalingIEEE Trans Audio Speech Lang Process201321355656610.1109/TASL.2012.2227735 ZhangQLiHMOEA/D: a multiobjective evolutionary algorithm based on decompositionIEEE Trans Evol Comput200711671273110.1109/TEVC.2007.892759 Rideout V (2015) The common sense census: media use by tweens and teens. Analysis & Policy Observatory, Common Sense Media Zhou K, Sisman B, Liu R, Li H (2021) Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset. In: ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp 920–924 Midtlyng M, Sato Y (2018) Voice adaptation from mean dataset voice profile with dynamic power. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), Shizuoka, pp 2037–2042 Unity3D, Unity Technologies. Accessed on: January 1. 2019 [Online]. Available: https://unity.com Chen Y, Chu M, Chang E, Liu J, Liu R (2003) Voice conversion with smoothed GMM and map adaptation. In: 8th European Conference on Speech Communication and Technology (Eurospeech 2003—Interspeech 2003), Geneva, pp 2413–2416 Tamura M, Morita M, Kagoshima T, Akamine M (2011) One sentence voice adaptation using GMM-based frequency warping and shift with a sub-band basis spectrum model. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, pp 5124–5127 SatoYVoice quality conversion using interactive evolution of prosodic controlAppl Soft Comput J2004518119210.1016/j.asoc.2004.06.005 MoulinesESagisakaYVoice conversion: state of the art and perspectivesSpeech Commun199516212512610.1016/0167-6393(95)90054-3Special Issue Eason Y, Stylianou (2009) Voice transformation: a survey. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Taipei, pp 3585–3588 Fang F, Yamagishi J, Echizen I, Lorenzo-Trueba J (2018) High-quality nonparallel voice conversion based on cycle-consistent adversarial network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, pp 5279–5283 Abe M, Nakamura S, Shikano K, Kuwabara H (1988) Voice conversion through vector quantization. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, New York, pp 655–658 WuZVirtanenTChngESLiHExemplar-based sparse representation with residual compensation for voice conversionIEEE Trans Audio Speech Lang Process201422101506152110.1109/TASLP.2014.2333242 Hsu C-C, Hwang H-T, Wu Y-C, Tsao Y, Wang H-M (2016) Voice conversion from non-parallel corpora using variational auto-encoder. In: Asia–Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Jeju, pp 1–6 HuangW-CHayashiTWuY-CKameokaHTodaTPretraining techniques for sequence-to-sequence voice conversionIEEE/ACM Trans Audio Speech Lang Process20212974575510.1109/TASLP.2021.3049336 Lorenzo-Trueba J, Yamagishi J, Toda T, Saito D, Villavicencio F, Kinnunen T et al. (2018) The voice conversion challenge 2018: promoting development of parallel and nonparallel methods, Odyssy 2018 Midtlyng M, Sato Y (2020) Lightweight multi-objective voice adaptation for real-time speech interaction applied in games. In: IEEE Conference on Games (CoG), Osaka, pp 237–243 Microsoft .NET5 SDK (2020) [Online]. Available: https://dotnet.microsoft.com/download/dotnet/current ZhangMZhouYZhaoLLiHTransfer learning from speech synthesis to voice conversion with non-parallel training dataIEEE/ACM Trans Audio Speech Lang Process2021291290130210.1109/TASLP.2021.3066047 Toda T, Saruwatari H, Shikano K (2001) Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In: Proc. ICASSP, pp 841–844 Kain A, Macon MW (1998) Spectral voice conversion for text-to-speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seattle, pp 285–299 Y Stylianou (604_CR3) 1998; 1 H Ye (604_CR7) 2006; 14 Y Sato (604_CR12) 2004; 5 604_CR20 M Zhang (604_CR31) 2021; 29 E Moulines (604_CR5) 1995; 16 604_CR21 Q Zhang (604_CR23) 2007; 11 F Villavicencio (604_CR11) 2014; 22 604_CR22 604_CR25 604_CR26 604_CR27 604_CR28 W-C Huang (604_CR32) 2021; 29 604_CR29 604_CR2 604_CR1 604_CR6 604_CR4 604_CR8 D Erro (604_CR30) 2013; 21 J Fox (604_CR24) 2014; 33 604_CR10 604_CR33 604_CR34 604_CR13 604_CR35 604_CR14 604_CR36 604_CR15 604_CR16 604_CR17 604_CR18 604_CR19 Z Wu (604_CR9) 2014; 22 |
| References_xml | – reference: Microsoft WebView2 web rendering (2020) [Online]. Available: https://docs.microsoft.com/en-us/microsoft-edge/webview2/ – reference: Erro D, Moreno A (2007) Weighted frequency warping for voice conversion. In: 8th Annual Conference of the International Speech Communication Association INTERSPEECH, Antwerp, pp 1965–1968 – reference: Li Y, Lee KA, Yuan Y, Li H, Yang Z (2018) Many-to-many voice conversion based on bottleneck features with variational autoencoder for non-parallel training data. In: Asia–Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Hawaii, pp 829–833 – reference: Chen Y, Chu M, Chang E, Liu J, Liu R (2003) Voice conversion with smoothed GMM and map adaptation. In: 8th European Conference on Speech Communication and Technology (Eurospeech 2003—Interspeech 2003), Geneva, pp 2413–2416 – reference: YeHYoungSQuality-enhanced voice morphing using maximum likelihood transformationsIEEE Trans Audio Speech Lang Process20061441301131210.1109/TSA.2005.860839 – reference: Kotani G, Saito D, Minematsu N (2017) Voice conversion based on deep neural networks for time-variant linear transformations. In: Asia–Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, pp 1259–1262 – reference: VillavicencioFBonadaJVoice conversion using deep neural networks with layer-wise generative trainingIEEE/ACM Trans Audio Speech Lange Process (TASLP) J2014221859187210.1109/TASLP.2014.2353991 – reference: Liu L, Ling Z, Jiang Y, Zhou M, Dai L (2018) WaveNet Vocoder with limited training data for voice conversion. In: Proc. Interspeech 2018, pp 1983–1987 – reference: Midtlyng M, Sato Y (2016) Real-time voice adaptation with abstract normalization and sound-indexed based search. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), Budapest, pp 60–65 – reference: Kaneko T, Kameoka T, Tanaka K, Hojo N (2019) Cyclegan-VC2: improved cyclegan-based non-parallel voice conversion. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, pp 6820–6824 – reference: Midtlyng M, Sato Y (2018) Voice adaptation from mean dataset voice profile with dynamic power. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), Shizuoka, pp 2037–2042 – reference: Toda T, Saruwatari H, Shikano K (2001) Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum. In: Proc. ICASSP, pp 841–844 – reference: Abe M, Nakamura S, Shikano K, Kuwabara H (1988) Voice conversion through vector quantization. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, New York, pp 655–658 – reference: Fang F, Yamagishi J, Echizen I, Lorenzo-Trueba J (2018) High-quality nonparallel voice conversion based on cycle-consistent adversarial network. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, pp 5279–5283 – reference: Zhou K, Sisman B, Liu R, Li H (2021) Seen and unseen emotional style transfer for voice conversion with a new emotional speech dataset. In: ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021, pp 920–924 – reference: Tamura M, Morita M, Kagoshima T, Akamine M (2011) One sentence voice adaptation using GMM-based frequency warping and shift with a sub-band basis spectrum model. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, pp 5124–5127 – reference: Kain A, Macon MW (1998) Spectral voice conversion for text-to-speech synthesis. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seattle, pp 285–299 – reference: Eason Y, Stylianou (2009) Voice transformation: a survey. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Taipei, pp 3585–3588 – reference: MoulinesESagisakaYVoice conversion: state of the art and perspectivesSpeech Commun199516212512610.1016/0167-6393(95)90054-3Special Issue – reference: Sekii Y, Orihara R, Kojima K, Sei Y, Tahara Y, Ohsuga A (2017) Fast many-to-one voice conversion using autoencoders. In: International Conference on Agents and Artificial Intelligence (ICAART), Porto, pp 164–174 – reference: ZhangMZhouYZhaoLLiHTransfer learning from speech synthesis to voice conversion with non-parallel training dataIEEE/ACM Trans Audio Speech Lang Process2021291290130210.1109/TASLP.2021.3066047 – reference: WuZVirtanenTChngESLiHExemplar-based sparse representation with residual compensation for voice conversionIEEE Trans Audio Speech Lang Process201422101506152110.1109/TASLP.2014.2333242 – reference: HuangW-CHayashiTWuY-CKameokaHTodaTPretraining techniques for sequence-to-sequence voice conversionIEEE/ACM Trans Audio Speech Lang Process20212974575510.1109/TASLP.2021.3049336 – reference: ZhangQLiHMOEA/D: a multiobjective evolutionary algorithm based on decompositionIEEE Trans Evol Comput200711671273110.1109/TEVC.2007.892759 – reference: Lorenzo-Trueba J, Yamagishi J, Toda T, Saito D, Villavicencio F, Kinnunen T et al. (2018) The voice conversion challenge 2018: promoting development of parallel and nonparallel methods, Odyssy 2018 – reference: Villavicencio F, Bonada J (2010) Applying voice conversion to concatenative singing-voice synthesis. In: 11th Annual Conference of the International Speech Communication Association (INTERSPEECH), Chiba, pp 2162–2165 – reference: Midtlyng M, Sato Y (2020) Lightweight multi-objective voice adaptation for real-time speech interaction applied in games. In: IEEE Conference on Games (CoG), Osaka, pp 237–243 – reference: Hsu C-C, Hwang H-T, Wu Y-C, Tsao Y, Wang H-M (2016) Voice conversion from non-parallel corpora using variational auto-encoder. In: Asia–Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), Jeju, pp 1–6 – reference: StylianouYCappéOMoulinesEContinuous probabilistic transform for voice conversionIEEE Trans Speech Audio Process19981285288 – reference: SatoYVoice quality conversion using interactive evolution of prosodic controlAppl Soft Comput J2004518119210.1016/j.asoc.2004.06.005 – reference: FoxJTangWYSexism in online video games: the role of conformity to masculine norms and social dominance orientationComput Hum Behav20143331432010.1016/j.chb.2013.07.014 – reference: ErroDNavasEHernáezIParametric voice conversion based on bilinear frequency warping plus amplitude scalingIEEE Trans Audio Speech Lang Process201321355656610.1109/TASL.2012.2227735 – reference: Rideout V (2015) The common sense census: media use by tweens and teens. Analysis & Policy Observatory, Common Sense Media – reference: Microsoft .NET5 SDK (2020) [Online]. Available: https://dotnet.microsoft.com/download/dotnet/current – reference: Unity3D, Unity Technologies. Accessed on: January 1. 2019 [Online]. Available: https://unity.com/ – reference: Takashima R, Takiguchi T, Ariki Y (2012) Exemplar-based Voice conversion in noisy environment. In: IEEE Spoken Language Technology Workshop (SLT), Miami, pp 313–317 – ident: 604_CR34 – ident: 604_CR14 doi: 10.21437/Interspeech.2010-596 – ident: 604_CR15 doi: 10.1109/ICASSP.2018.8462342 – volume: 21 start-page: 556 issue: 3 year: 2013 ident: 604_CR30 publication-title: IEEE Trans Audio Speech Lang Process doi: 10.1109/TASL.2012.2227735 – ident: 604_CR21 doi: 10.1109/SMC.2018.00351 – ident: 604_CR17 doi: 10.1109/APSIPA.2016.7820786 – ident: 604_CR18 doi: 10.21437/Odyssey.2018-28 – ident: 604_CR36 – ident: 604_CR4 doi: 10.1109/ICASSP.2001.941046 – ident: 604_CR27 doi: 10.1109/APSIPA.2017.8282216 – ident: 604_CR13 doi: 10.1109/ICASSP.1988.196671 – ident: 604_CR19 doi: 10.21437/Interspeech.2018-1190 – volume: 29 start-page: 745 year: 2021 ident: 604_CR32 publication-title: IEEE/ACM Trans Audio Speech Lang Process doi: 10.1109/TASLP.2021.3049336 – ident: 604_CR10 doi: 10.1109/SLT.2012.6424242 – volume: 33 start-page: 314 year: 2014 ident: 604_CR24 publication-title: Comput Hum Behav doi: 10.1016/j.chb.2013.07.014 – ident: 604_CR29 doi: 10.23919/APSIPA.2018.8659628 – volume: 22 start-page: 1506 issue: 10 year: 2014 ident: 604_CR9 publication-title: IEEE Trans Audio Speech Lang Process doi: 10.1109/TASLP.2014.2333242 – ident: 604_CR22 doi: 10.1109/CoG47356.2020.9231643 – volume: 14 start-page: 1301 issue: 4 year: 2006 ident: 604_CR7 publication-title: IEEE Trans Audio Speech Lang Process doi: 10.1109/TSA.2005.860839 – ident: 604_CR6 doi: 10.1109/ICASSP.1998.674423 – ident: 604_CR28 doi: 10.1109/ICASSP.2011.5947510 – ident: 604_CR33 doi: 10.1109/ICASSP39728.2021.9413391 – volume: 29 start-page: 1290 year: 2021 ident: 604_CR31 publication-title: IEEE/ACM Trans Audio Speech Lang Process doi: 10.1109/TASLP.2021.3066047 – ident: 604_CR1 doi: 10.1109/ICASSP.2009.4960401 – ident: 604_CR16 doi: 10.1109/ICASSP.2019.8682897 – ident: 604_CR35 – volume: 11 start-page: 712 issue: 6 year: 2007 ident: 604_CR23 publication-title: IEEE Trans Evol Comput doi: 10.1109/TEVC.2007.892759 – ident: 604_CR20 doi: 10.1109/SMC.2016.7844220 – ident: 604_CR2 doi: 10.21437/Interspeech.2007-550 – ident: 604_CR26 doi: 10.5220/0006193301640174 – volume: 22 start-page: 1859 year: 2014 ident: 604_CR11 publication-title: IEEE/ACM Trans Audio Speech Lange Process (TASLP) J doi: 10.1109/TASLP.2014.2353991 – volume: 1 start-page: 285 year: 1998 ident: 604_CR3 publication-title: IEEE Trans Speech Audio Process – ident: 604_CR25 – volume: 16 start-page: 125 issue: 2 year: 1995 ident: 604_CR5 publication-title: Speech Commun doi: 10.1016/0167-6393(95)90054-3 – ident: 604_CR8 doi: 10.21437/Eurospeech.2003-664 – volume: 5 start-page: 181 year: 2004 ident: 604_CR12 publication-title: Appl Soft Comput J doi: 10.1016/j.asoc.2004.06.005 |
| SSID | ssj0001778302 ssib044733412 ssib045327741 |
| Score | 2.1772277 |
| Snippet | Voice adaptation is an interactive speech processing technique that allows the speaker to transmit with a chosen target voice. We propose a novel method that... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Index Database Publisher |
| StartPage | 1539 |
| SubjectTerms | Adaptation Artificial neural networks Color matching Complexity Computational Intelligence Computer & video games Data Structures and Information Theory Engineering Frames (data processing) Intelligent systems Linguistics Methods Multiple objective analysis Neural networks Optimization Original Article Privacy Probabilistic models Speech Speech processing Training Voice |
| SummonAdditionalLinks | – databaseName: Advanced Technologies & Aerospace Database dbid: P5Z link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1NS8QwEA26etCD3-L6RQ7eNNgmabM5iYjiQcWDingpk6QVFbfrdhX892bSdhcFvXgrlAY6k8xMMi_vEbKXOKstF4L1fLpksgeaQco141YUOgajuQmevlBXV737e33dHLhVDayyjYkhULvS4hn5IfeFB9a-iT4avDFUjcLuaiOhMU1mkCUBpRuuk4d2PkmphJCT9C0TwVWrNBPOYJRC-ivUn4u1ZjJ0MjfHt-skksszxDAga4lk6ffcNSlIf_RQQ2o6W_zvTy2RhaYopcf1LFomU3l_hcxfjhldq1Xyclf6kELBwaBu3lPzSZHxesiQCtPljhaI86L-gwDPpFBRoAGvyErzXMdVWvoI9dpc_aSNmA31dTOtuU3oI4J218jt2enNyTlrhBqY9TYbMSFdrooYW4YuTow1VhhpRCElcB8QfM2X4lYzgigCKyDKY1MAB66hp5V1qVgnnX7ZzzcI5Up7i4B0qd-oJXmknVL-SVsoOI8MdMl-64JsUPNxZGPm5eCwzDssCw7L0i7Zbu2eNWuzyiZG75KD1nOT17-Ptvn3aFtkjuPdiADr2Sad0fA93yGz9mP0VA13w8z8AmPB5Zc priority: 102 providerName: ProQuest |
| Title | Voice adaptation by color-encoded frame matching as a multi-objective optimization problem for future games |
| URI | https://link.springer.com/article/10.1007/s40747-021-00604-6 https://www.proquest.com/docview/2656974759 |
| Volume | 8 |
| WOSCitedRecordID | wos000738565700011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: Acceso a contenido Full Text - Doaj customDbUrl: eissn: 2198-6053 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001778302 issn: 2199-4536 databaseCode: DOA dateStart: 20150101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2198-6053 dateEnd: 99991231 omitProxy: false ssIdentifier: ssib044733412 issn: 2199-4536 databaseCode: M~E dateStart: 20150101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: Advanced Technologies & Aerospace Database customDbUrl: eissn: 2198-6053 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001778302 issn: 2199-4536 databaseCode: P5Z dateStart: 20151201 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 2198-6053 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001778302 issn: 2199-4536 databaseCode: BENPR dateStart: 20151201 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 2198-6053 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001778302 issn: 2199-4536 databaseCode: PIMPY dateStart: 20151201 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest – providerCode: PRVAVX databaseName: SpringerLINK customDbUrl: eissn: 2198-6053 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0001778302 issn: 2199-4536 databaseCode: C24 dateStart: 20151201 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS-wwFD74WuhCvQ9xfAxZuPMG2iRtmqWKoqBDEa9476acJK2oOCPTUfDfm6TtjFdwcd2UlrSlnJOcR893vgDsJdYowzinmXOXVGSoKKZMUWZ4pWLUiumg6XM5GGQ3Nypvm8LqDu3elSSDpZ42uwnP9U49pMCTiAiazsNiEmfKA_mOZpzjQkjOReu0w58WKT3Jld9lLlaKilCv3Pr8tf96qFnY-aFSGhzQydrXPn0dVtuAkxw0M-QbzJXD77DyjobQXV1MuVvrH_BwPXLGg6DFp6ZMT_Qr8dzWY-pJL21pSeURXcQ9EICYBGuCJCAT6UjfNxaUjJwtemybPEm7bQ1xETJpWEzIrYfn_oTfJ8dXR6e03ZKBGie3CeXClrKKfXHQxok22nAtNK-EQOaWvovuUp9URhhFaDhGZawrZMgUZkoam_INWBiOhuUmECZVqiQKm7qULCkjZaV0Z8pgxViksQf7nRqKp4Z5o5hyLAeBFk6gRRBokfZgp9NU0a7CumAuWPX5UqJ68KvTzGz487dt_d_t27DMfFdEAPTswMJk_FzuwpJ5mdzV4z4sHh4P8st-mKX9kPS7Y578dSP52UX-5w2e4ODX |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lj9MwEB4tXSTgwBvRZQEf4ATWJrYT1weEELDaatuqhwUtpzC2EwSIpjQFtH-K34jHSVqBBLc9cIsUxVEyn-fhmfkG4FHmnXFCSj4K5pKrERqOuTBcOFmZFK0RNkp6omez0empme_Az74Xhsoqe50YFbWvHZ2RH4jgeJDvm5nny6-cpkZRdrUfodHC4rg8-xFCtubZ-FWQ72MhDl-fvDzi3VQB7lQm11wqX-oqpfyWTzPrrJNWWVkphSKgNzgoOcVFCSYJOolJmdoKBQqDI6Odz2VY9wLsKgL7AHbn4-n8XY9gpbSUauswhBcK3c-2iac-WhPhFk28S43hKuZO9zb9fIro7DlVTRBPiuL579Zy6wL_kbWNxvDw2v_2G6_D1c7tZi_afXIDdsrFTbgy3XDWNrfg89s6KE2GHpdteQKzZ4w4vVecyD596VlFlWwsPBALUBk2DFmsyOS1_dRaDlYHHfyla25l3bgeFiID1rK3sA9Ulnwb3pzL196BwaJelHeBCW2CBFD5PISiWZkYr3W4Mg4rIRKLQ3jSi7xYtowjxYZbOgKkCAApIkCKfAj7vZyLTvs0xVbIQ3jaI2V7---r7f17tYdw6ehkOikm49nxPbgsqBMkFjHtw2C9-lbeh4vu-_pjs3rQ7QsG788bQ78A179DRw |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwED5BQQgG3oiWlwc2sEhsN65HBFQgoGIAxBad7QQBoqnagsS_x3bSFpAYEFuiJJZyd7lH7rvPAPtNa5RhnNOWC5dUtFBRTJiizPBcxagV00HTV7LTaT08qJsvU_wB7T5qSZYzDZ6lqTs86tn8aDz4JjzvO_XwAk8oImgyDTO-I-Vt_GTCPy6E5FxUATz8dZHSE175HedipagIvcvG78t-j1aTFPRH1zQEo_bS_19jGRarRJQcl5azAlNZdxUWvtATurPrMafrYA1e7gvnVAha7JXte6I_iOe87lNPhmkzS3KP9CLugQDQJDggSAJikRb6ufSspHA-6rUa_iTVdjbEZc6kZDchjx62uw537bPbk3NabdVAjZPhkHJhM5nHvmlo46Y22nAtNM-FQOZcgsv6El9sRhhFaDhGWaxzZMgUtpQ0NuEbUOsW3WwTCJMqURKFTVyp1swiZaV0R8pgzliksQ4HI5WkvZKRIx1zLweBpk6gaRBomtRhe6S1tPo6BylzSayvo5qqDocjLU0u_75a42-378HczWk7vbroXG7BPPODEwHzsw21Yf8t24FZ8z58GvR3g9F-AgC9518 |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Voice+adaptation+by+color-encoded+frame+matching+as+a+multi-objective+optimization+problem+for+future+games&rft.jtitle=Complex+%26+intelligent+systems&rft.au=Midtlyng%2C+Mads&rft.au=Sato%2C+Yuji&rft.au=Hosobe%2C+Hiroshi&rft.date=2022-04-01&rft.pub=Springer+International+Publishing&rft.issn=2199-4536&rft.eissn=2198-6053&rft.volume=8&rft.issue=2&rft.spage=1539&rft.epage=1550&rft_id=info:doi/10.1007%2Fs40747-021-00604-6&rft.externalDocID=10_1007_s40747_021_00604_6 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2199-4536&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2199-4536&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2199-4536&client=summon |