Lightweight Text-to-Image Generation Model Based on Contrastive Language-Image Pre-Training Embeddings and Conditional Variational Autoencoders

Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Cond...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Electronics (Basel) Ročník 14; číslo 11; s. 2185
Hlavní autori: Wang, Yubo, Zhang, Gaofeng
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Basel MDPI AG 01.06.2025
Predmet:
ISSN:2079-9292, 2079-9292
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Conditional Variational Autoencoder (CVAE), leveraging CLIP embeddings for semantic guidance and enabling explicit attribute control, thereby reducing computational load and data dependency. Key to our approach is a specialized mapping network that bridges CLIP text–image modalities for improved fidelity and Rényi divergence for latent space regularization to foster diversity, as evidenced by richer latent representations. Experiments on CelebA demonstrate competitive generation (FID: 40.53, 42 M params, 21 FPS) with enhanced diversity. Crucially, our model also shows effective generalization to the more complex MS COCO dataset and maintains a favorable balance between visual quality and efficiency (8 FPS at 256 × 256 resolution with 54 M params). Ablation studies and component validations (detailed in appendices) confirm the efficacy of our contributions. This work offers a practical, efficient T2I solution that balances generative performance with resource constraints across different datasets and is suitable for specialized, data-limited domains.
AbstractList Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Conditional Variational Autoencoder (CVAE), leveraging CLIP embeddings for semantic guidance and enabling explicit attribute control, thereby reducing computational load and data dependency. Key to our approach is a specialized mapping network that bridges CLIP text–image modalities for improved fidelity and Rényi divergence for latent space regularization to foster diversity, as evidenced by richer latent representations. Experiments on CelebA demonstrate competitive generation (FID: 40.53, 42 M params, 21 FPS) with enhanced diversity. Crucially, our model also shows effective generalization to the more complex MS COCO dataset and maintains a favorable balance between visual quality and efficiency (8 FPS at 256 × 256 resolution with 54 M params). Ablation studies and component validations (detailed in appendices) confirm the efficacy of our contributions. This work offers a practical, efficient T2I solution that balances generative performance with resource constraints across different datasets and is suitable for specialized, data-limited domains.
Audience Academic
Author Zhang, Gaofeng
Wang, Yubo
Author_xml – sequence: 1
  givenname: Yubo
  surname: Wang
  fullname: Wang, Yubo
– sequence: 2
  givenname: Gaofeng
  orcidid: 0009-0003-7141-1691
  surname: Zhang
  fullname: Zhang, Gaofeng
BookMark eNptUclOwzAQtRBIlNIv4GKJc8BLnOVYqrJIRXAoXCPHngRXqV1sl-Ur-GVctQcOjKWZN6P3xrbeGTq2zgJCF5RccV6TaxhARe-sUYHmlDJaiSM0YqSss5rV7PgPPkWTEFYkRU15xckI_SxM_xY_YZfxEr5iFl32sJY94Duw4GU0zuJHp2HANzKAxqmdORu9DNF8AF5I228T_SB69pAtvTTW2B7P1y1onVDA0uqdTJvdPjngV-mNPODpNjqwKt3hwzk66eQQYHKoY_RyO1_O7rPF093DbLrIFKc8ZizXuiwqwoQoc8VzUipRFLmogDItBKOy5ACCtLQgBSigXSVFW7e0AqlBtXyMLvd7N969byHEZuW2Pr0mNJzRsmQFY2ViXe1ZvRygMbZz6d8qHQ1ro5INnUnzaZULUtckZ0nA9wLlXQgeumbjzVr674aSZudW849b_BdrEI6J
Cites_doi 10.1109/CVPR.2018.00068
10.1109/CVPR.2018.00143
10.1109/CVPR52688.2022.01042
10.1007/978-3-031-19836-6_6
10.1109/TPAMI.2025.3569700
10.1109/CVPR52729.2023.02171
10.1007/978-3-031-73033-7_13
10.1145/3528233.3530757
10.1109/ICCV48922.2021.00209
10.1051/itmconf/20257003006
10.26599/BDMA.2024.9020090
10.1109/CVPR46437.2021.01268
10.1109/TPAMI.2024.3511621
10.1007/978-3-319-10602-1_48
10.1109/ICCV.2015.425
10.1109/ICCV.2017.629
ContentType Journal Article
Copyright COPYRIGHT 2025 MDPI AG
2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: COPYRIGHT 2025 MDPI AG
– notice: 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
7SP
8FD
8FE
8FG
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L7M
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
DOI 10.3390/electronics14112185
DatabaseName CrossRef
Electronics & Communications Abstracts
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
SciTech Premium Collection
Advanced Technologies Database with Aerospace
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Databases
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
DatabaseTitle CrossRef
Publicly Available Content Database
Advanced Technologies & Aerospace Collection
Technology Collection
Technology Research Database
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
Electronics & Communications Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
Advanced Technologies & Aerospace Database
ProQuest One Applied & Life Sciences
ProQuest One Academic UKI Edition
ProQuest Central Korea
ProQuest Central (New)
ProQuest One Academic
Advanced Technologies Database with Aerospace
ProQuest One Academic (New)
DatabaseTitleList CrossRef

Publicly Available Content Database
Database_xml – sequence: 1
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Architecture
EISSN 2079-9292
ExternalDocumentID A845099042
10_3390_electronics14112185
GeographicLocations Mississippi
GeographicLocations_xml – name: Mississippi
GroupedDBID 5VS
8FE
8FG
AAYXX
ADMLS
AFFHD
AFKRA
ALMA_UNASSIGNED_HOLDINGS
ARAPS
BENPR
BGLVJ
CCPQU
CITATION
HCIFZ
IAO
ITC
KQ8
MODMG
M~E
OK1
P62
PHGZM
PHGZT
PIMPY
PQGLB
PROAC
7SP
8FD
ABUWG
AZQEC
DWQXO
L7M
PKEHL
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c313t-24dd768025574c3407c566458e12d5521a73ee50b1606ece1f8a5b9b18eadecb3
IEDL.DBID P5Z
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001505911300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2079-9292
IngestDate Fri Jul 25 09:41:25 EDT 2025
Tue Nov 04 18:17:44 EST 2025
Sat Nov 29 07:19:51 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 11
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c313t-24dd768025574c3407c566458e12d5521a73ee50b1606ece1f8a5b9b18eadecb3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0009-0003-7141-1691
OpenAccessLink https://www.proquest.com/docview/3217726227?pq-origsite=%requestingapplication%
PQID 3217726227
PQPubID 2032404
ParticipantIDs proquest_journals_3217726227
gale_infotracacademiconefile_A845099042
crossref_primary_10_3390_electronics14112185
PublicationCentury 2000
PublicationDate 2025-06-01
PublicationDateYYYYMMDD 2025-06-01
PublicationDate_xml – month: 06
  year: 2025
  text: 2025-06-01
  day: 01
PublicationDecade 2020
PublicationPlace Basel
PublicationPlace_xml – name: Basel
PublicationTitle Electronics (Basel)
PublicationYear 2025
Publisher MDPI AG
Publisher_xml – name: MDPI AG
References ref_14
ref_35
ref_12
ref_34
ref_11
ref_33
ref_10
ref_32
ref_31
ref_30
Tang (ref_25) 2025; 47
ref_19
ref_18
ref_17
Hinton (ref_36) 2008; 9
ref_16
ref_38
ref_15
ref_37
Zhang (ref_39) 2025; 70
ref_24
ref_23
ref_22
ref_44
ref_21
ref_43
Wang (ref_13) 2025; 8
ref_20
ref_42
ref_41
ref_40
ref_1
ref_3
ref_2
ref_29
ref_28
ref_27
ref_26
ref_9
ref_8
ref_5
ref_4
ref_7
ref_6
References_xml – ident: ref_28
– ident: ref_33
  doi: 10.1109/CVPR.2018.00068
– ident: ref_9
– ident: ref_24
  doi: 10.1109/CVPR.2018.00143
– ident: ref_4
  doi: 10.1109/CVPR52688.2022.01042
– ident: ref_32
– ident: ref_3
– ident: ref_16
  doi: 10.1007/978-3-031-19836-6_6
– ident: ref_26
– ident: ref_7
  doi: 10.1109/TPAMI.2025.3569700
– ident: ref_41
  doi: 10.1109/CVPR52729.2023.02171
– ident: ref_40
– ident: ref_11
  doi: 10.1007/978-3-031-73033-7_13
– ident: ref_37
– ident: ref_14
– ident: ref_42
– ident: ref_1
– ident: ref_35
– ident: ref_44
– ident: ref_5
  doi: 10.1145/3528233.3530757
– ident: ref_18
  doi: 10.1109/ICCV48922.2021.00209
– ident: ref_21
– volume: 70
  start-page: 03006
  year: 2025
  ident: ref_39
  article-title: Evaluation of Natural Image Generation and Reconstruction Capabilities Based on the β-VAE Model
  publication-title: ITM Web Conf.
  doi: 10.1051/itmconf/20257003006
– volume: 8
  start-page: 496
  year: 2025
  ident: ref_13
  article-title: Large Language Model for Medical Images: A Survey of Taxonomy, Systematic Review, and Future Trends
  publication-title: Big Data Min. Anal.
  doi: 10.26599/BDMA.2024.9020090
– ident: ref_2
  doi: 10.1109/CVPR46437.2021.01268
– ident: ref_6
– ident: ref_8
– volume: 47
  start-page: 1958
  year: 2025
  ident: ref_25
  article-title: Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2024.3511621
– ident: ref_31
– ident: ref_29
– ident: ref_27
– ident: ref_12
– ident: ref_10
– ident: ref_34
  doi: 10.1007/978-3-319-10602-1_48
– ident: ref_15
– volume: 9
  start-page: 2579
  year: 2008
  ident: ref_36
  article-title: Visualizing Data using t-SNE
  publication-title: J. Mach. Learn. Res.
– ident: ref_38
– ident: ref_17
– ident: ref_19
– ident: ref_43
– ident: ref_22
– ident: ref_30
  doi: 10.1109/ICCV.2015.425
– ident: ref_20
– ident: ref_23
  doi: 10.1109/ICCV.2017.629
SSID ssj0000913830
Score 2.3205504
Snippet Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation...
SourceID proquest
gale
crossref
SourceType Aggregation Database
Index Database
StartPage 2185
SubjectTerms Ablation
Architecture
Datasets
Diffusion models
Effectiveness
Efficiency
Image processing
Normal distribution
Regularization
Semantics
Title Lightweight Text-to-Image Generation Model Based on Contrastive Language-Image Pre-Training Embeddings and Conditional Variational Autoencoders
URI https://www.proquest.com/docview/3217726227
Volume 14
WOSCitedRecordID wos001505911300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2079-9292
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000913830
  issn: 2079-9292
  databaseCode: M~E
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: Advanced Technologies & Aerospace Database
  customDbUrl:
  eissn: 2079-9292
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000913830
  issn: 2079-9292
  databaseCode: P5Z
  dateStart: 20120301
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 2079-9292
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000913830
  issn: 2079-9292
  databaseCode: BENPR
  dateStart: 20120301
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 2079-9292
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000913830
  issn: 2079-9292
  databaseCode: PIMPY
  dateStart: 20120301
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3JTuNAEC2xHQYktgERlqgPI3EZi7iX2DmhgIJACpE1CiNmLpZ7sYQECcQGbvwCv0yV3YEcEBeOtmXZ8uuurlfVfg_gV54JGyseBVJneSCFVTjnHI5l3cb0VIW5dhXS_WgwiK-vO4kvuBV-W-U0JlaB2o4N1ciPBObOEW9zHh3fPwTkGkXdVW-hMQ-LpJJA1g2J-v9eYyHNy1i0arEhgez-6MNbpgglphoheSjPLEifh-VqrTlb--5brsOqzzJZtx4WGzDnRpuw0p1pGmzC8owU4U947RNLf64KpWxIdLgcBxd3GG1YrUxNADJyTrtlJ7jyWYaHJG01yQqKmKzvC5_-pmTigqG3n2C9O-1s1eRi2cjSbfamrkGyv8jVfT2SdR_LMelq0t7qLbg66w1PzwNv1hAYEYoy4NJapC5EUSJpBPJEg5miVLELuVWYJGSRcE61dIiUyRkX5nGmdEeHMW3ZNlpsw8JoPHI7wFpatHJhlOAdpHvOakV5izLtrGMjJ1UDfk8RS-9rTY4UuQwBnH4CcAMOCdWUZix-FJP5Hw_wYaR9lXZjqag9KHkD9qeopn4qF-kHpLtfX96DH5zMgasSzT4slJNHdwBL5qm8KSZNWDzpDZI_TZi_fOk1q3GK55KLy-TfG2n-9Ws
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9NAEB6VgkRB4lFaESiwBxAXrNr7iO0DQgFaNaqJcgio4mL2ZSlSmxTbpeqv6D_hNzLjB82h4tYDR8taW9r9duabmd35AF4XWrhE8TiQRheBFE7hnvOIZTNEeqqiwvhmpbN4MkmOjtLpGvzu78LQscreJjaG2i0t5ch3BXLnmA85jz-c_gxINYqqq72ERguLQ39xjiFb9X78Gdf3Def7e7NPB0GnKhBYEYk64NI55NjEpWNpBQY0FimNVImPuFPozXQsvFehiZDbe-ujItHKpCZK6GyxNQK_ewtuS8lDUkyYqu9_czrUYzMRYdvcSIg03L3SsqkiidQmIs3mFQd4vRtofNv-w_9tVh7Bg45Fs1EL-8ew5hebcH-0UhTZhHsrrRafwGVGWYjzJhHMZhTu18tgfILWlLWdtwmgjJThjtlH9OyO4SO17ip1RR6BZV1itxs0LX0w6-Q12N6J8a4p4jG9cDTMzdscK_umy3mXb2Wjs3pJfUPp7PgWfL2RCdqG9cVy4Z8CC40IC2GV4CmGs94ZRbxM2aFOXeylGsC7HiH5adtzJMdYjQCVXwOoAbwlFOVkkXBSrO4uVuDPqLdXPkqkovKn5APY6VGUd6aqyq8g9Ozfr1_B3YPZlyzPxpPD57DBSQi5SUftwHpdnvkXcMf-qudV-bLZFQx-3DTg_gCNSUzl
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Nb9NAEB2VFCFA4qNQESiwBxAXrNq73tg-IJTSRESNoggF1Ju7X5Yi0aS1XSp-Bf-HX8eMvaY5VNx64GhZa0u7b2fevJ2dAXhTKGFTyZMg1qoIYmEl7jmHWNYDpKcyKrRrVnqazGbp8XE234Lf3V0YSqvsbGJjqO3akEa-L5A7J3zAMVQvfFrE_HD88ew8oA5SdNLatdNoIXLkfl5i-FZ9mBziWr_lfDxafPoc-A4DgRGRqAMeW4t8m3h1EhuBwY1BehPL1EXcSvRsKhHOyVBHyPOdcVGRKqkzHaWUZ2y0wO_egu1EYNDTg-2D0Wz-5a_CQxU3UxG2pY6EyML9q842VRQj0Ymog_OGO7zeKTSebvzwf56jR_DA82s2bDfEY9hyqx24P9w4LtmBextFGJ_ArynpE5eNRMwWJATU62ByinaWtTW5CbqMesZ9Zwfo8y3DRyrqVaqKfAWbesnXD5qXLlj4xhtsdKqdbY73mFpZGmaXrfrKvqly6ZVYNryo11RRlLLKn8LXG5mgXeit1iv3DFioRVgIIwXPMNB1VktibNIMVGYTF8s-vO_Qkp-11UhyjOIIXPk14OrDO0JUTrYKJ8Uof-UCf0ZVv_JhGks6GI15H_Y6ROXeiFX5FZye__v1a7iDOMunk9nRC7jLqUNyo1PtQa8uL9xLuG1-1MuqfOW3CIOTm0bcH0j3VuY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Lightweight+Text-to-Image+Generation+Model+Based+on+Contrastive+Language-Image+Pre-Training+Embeddings+and+Conditional+Variational+Autoencoders&rft.jtitle=Electronics+%28Basel%29&rft.au=Wang%2C+Yubo&rft.au=Zhang%2C+Gaofeng&rft.date=2025-06-01&rft.pub=MDPI+AG&rft.issn=2079-9292&rft.eissn=2079-9292&rft.volume=14&rft.issue=11&rft_id=info:doi/10.3390%2Felectronics14112185&rft.externalDocID=A845099042
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2079-9292&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2079-9292&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2079-9292&client=summon