Lightweight Text-to-Image Generation Model Based on Contrastive Language-Image Pre-Training Embeddings and Conditional Variational Autoencoders
Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Cond...
Saved in:
| Published in: | Electronics (Basel) Vol. 14; no. 11; p. 2185 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Basel
MDPI AG
01.06.2025
|
| Subjects: | |
| ISSN: | 2079-9292, 2079-9292 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Conditional Variational Autoencoder (CVAE), leveraging CLIP embeddings for semantic guidance and enabling explicit attribute control, thereby reducing computational load and data dependency. Key to our approach is a specialized mapping network that bridges CLIP text–image modalities for improved fidelity and Rényi divergence for latent space regularization to foster diversity, as evidenced by richer latent representations. Experiments on CelebA demonstrate competitive generation (FID: 40.53, 42 M params, 21 FPS) with enhanced diversity. Crucially, our model also shows effective generalization to the more complex MS COCO dataset and maintains a favorable balance between visual quality and efficiency (8 FPS at 256 × 256 resolution with 54 M params). Ablation studies and component validations (detailed in appendices) confirm the efficacy of our contributions. This work offers a practical, efficient T2I solution that balances generative performance with resource constraints across different datasets and is suitable for specialized, data-limited domains. |
|---|---|
| AbstractList | Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Conditional Variational Autoencoder (CVAE), leveraging CLIP embeddings for semantic guidance and enabling explicit attribute control, thereby reducing computational load and data dependency. Key to our approach is a specialized mapping network that bridges CLIP text–image modalities for improved fidelity and Rényi divergence for latent space regularization to foster diversity, as evidenced by richer latent representations. Experiments on CelebA demonstrate competitive generation (FID: 40.53, 42 M params, 21 FPS) with enhanced diversity. Crucially, our model also shows effective generalization to the more complex MS COCO dataset and maintains a favorable balance between visual quality and efficiency (8 FPS at 256 × 256 resolution with 54 M params). Ablation studies and component validations (detailed in appendices) confirm the efficacy of our contributions. This work offers a practical, efficient T2I solution that balances generative performance with resource constraints across different datasets and is suitable for specialized, data-limited domains. |
| Audience | Academic |
| Author | Zhang, Gaofeng Wang, Yubo |
| Author_xml | – sequence: 1 givenname: Yubo surname: Wang fullname: Wang, Yubo – sequence: 2 givenname: Gaofeng orcidid: 0009-0003-7141-1691 surname: Zhang fullname: Zhang, Gaofeng |
| BookMark | eNptUclOwzAQtRBIlNIv4GKJc8BLnOVYqrJIRXAoXCPHngRXqV1sl-Ur-GVctQcOjKWZN6P3xrbeGTq2zgJCF5RccV6TaxhARe-sUYHmlDJaiSM0YqSss5rV7PgPPkWTEFYkRU15xckI_SxM_xY_YZfxEr5iFl32sJY94Duw4GU0zuJHp2HANzKAxqmdORu9DNF8AF5I228T_SB69pAtvTTW2B7P1y1onVDA0uqdTJvdPjngV-mNPODpNjqwKt3hwzk66eQQYHKoY_RyO1_O7rPF093DbLrIFKc8ZizXuiwqwoQoc8VzUipRFLmogDItBKOy5ACCtLQgBSigXSVFW7e0AqlBtXyMLvd7N969byHEZuW2Pr0mNJzRsmQFY2ViXe1ZvRygMbZz6d8qHQ1ro5INnUnzaZULUtckZ0nA9wLlXQgeumbjzVr674aSZudW849b_BdrEI6J |
| Cites_doi | 10.1109/CVPR.2018.00068 10.1109/CVPR.2018.00143 10.1109/CVPR52688.2022.01042 10.1007/978-3-031-19836-6_6 10.1109/TPAMI.2025.3569700 10.1109/CVPR52729.2023.02171 10.1007/978-3-031-73033-7_13 10.1145/3528233.3530757 10.1109/ICCV48922.2021.00209 10.1051/itmconf/20257003006 10.26599/BDMA.2024.9020090 10.1109/CVPR46437.2021.01268 10.1109/TPAMI.2024.3511621 10.1007/978-3-319-10602-1_48 10.1109/ICCV.2015.425 10.1109/ICCV.2017.629 |
| ContentType | Journal Article |
| Copyright | COPYRIGHT 2025 MDPI AG 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: COPYRIGHT 2025 MDPI AG – notice: 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | AAYXX CITATION 7SP 8FD 8FE 8FG ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L7M P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS |
| DOI | 10.3390/electronics14112185 |
| DatabaseName | CrossRef Electronics & Communications Abstracts Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Health Research Premium Collection ProQuest Central Essentials ProQuest Central Technology collection ProQuest One Community College ProQuest Central SciTech Premium Collection Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China |
| DatabaseTitle | CrossRef Publicly Available Content Database Advanced Technologies & Aerospace Collection Technology Collection Technology Research Database ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest One Academic Eastern Edition Electronics & Communications Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central Advanced Technologies & Aerospace Database ProQuest One Applied & Life Sciences ProQuest One Academic UKI Edition ProQuest Central Korea ProQuest Central (New) ProQuest One Academic Advanced Technologies Database with Aerospace ProQuest One Academic (New) |
| DatabaseTitleList | CrossRef Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: PIMPY name: Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Architecture |
| EISSN | 2079-9292 |
| ExternalDocumentID | A845099042 10_3390_electronics14112185 |
| GeographicLocations | Mississippi |
| GeographicLocations_xml | – name: Mississippi |
| GroupedDBID | 5VS 8FE 8FG AAYXX ADMLS AFFHD AFKRA ALMA_UNASSIGNED_HOLDINGS ARAPS BENPR BGLVJ CCPQU CITATION HCIFZ IAO ITC KQ8 MODMG M~E OK1 P62 PHGZM PHGZT PIMPY PQGLB PROAC 7SP 8FD ABUWG AZQEC DWQXO L7M PKEHL PQEST PQQKQ PQUKI PRINS |
| ID | FETCH-LOGICAL-c313t-24dd768025574c3407c566458e12d5521a73ee50b1606ece1f8a5b9b18eadecb3 |
| IEDL.DBID | BENPR |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001505911300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2079-9292 |
| IngestDate | Fri Jul 25 09:41:25 EDT 2025 Tue Nov 04 18:17:44 EST 2025 Sat Nov 29 07:19:51 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 11 |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c313t-24dd768025574c3407c566458e12d5521a73ee50b1606ece1f8a5b9b18eadecb3 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ORCID | 0009-0003-7141-1691 |
| OpenAccessLink | https://www.proquest.com/docview/3217726227?pq-origsite=%requestingapplication% |
| PQID | 3217726227 |
| PQPubID | 2032404 |
| ParticipantIDs | proquest_journals_3217726227 gale_infotracacademiconefile_A845099042 crossref_primary_10_3390_electronics14112185 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-06-01 |
| PublicationDateYYYYMMDD | 2025-06-01 |
| PublicationDate_xml | – month: 06 year: 2025 text: 2025-06-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | Basel |
| PublicationPlace_xml | – name: Basel |
| PublicationTitle | Electronics (Basel) |
| PublicationYear | 2025 |
| Publisher | MDPI AG |
| Publisher_xml | – name: MDPI AG |
| References | ref_14 ref_35 ref_12 ref_34 ref_11 ref_33 ref_10 ref_32 ref_31 ref_30 Tang (ref_25) 2025; 47 ref_19 ref_18 ref_17 Hinton (ref_36) 2008; 9 ref_16 ref_38 ref_15 ref_37 Zhang (ref_39) 2025; 70 ref_24 ref_23 ref_22 ref_44 ref_21 ref_43 Wang (ref_13) 2025; 8 ref_20 ref_42 ref_41 ref_40 ref_1 ref_3 ref_2 ref_29 ref_28 ref_27 ref_26 ref_9 ref_8 ref_5 ref_4 ref_7 ref_6 |
| References_xml | – ident: ref_28 – ident: ref_33 doi: 10.1109/CVPR.2018.00068 – ident: ref_9 – ident: ref_24 doi: 10.1109/CVPR.2018.00143 – ident: ref_4 doi: 10.1109/CVPR52688.2022.01042 – ident: ref_32 – ident: ref_3 – ident: ref_16 doi: 10.1007/978-3-031-19836-6_6 – ident: ref_26 – ident: ref_7 doi: 10.1109/TPAMI.2025.3569700 – ident: ref_41 doi: 10.1109/CVPR52729.2023.02171 – ident: ref_40 – ident: ref_11 doi: 10.1007/978-3-031-73033-7_13 – ident: ref_37 – ident: ref_14 – ident: ref_42 – ident: ref_1 – ident: ref_35 – ident: ref_44 – ident: ref_5 doi: 10.1145/3528233.3530757 – ident: ref_18 doi: 10.1109/ICCV48922.2021.00209 – ident: ref_21 – volume: 70 start-page: 03006 year: 2025 ident: ref_39 article-title: Evaluation of Natural Image Generation and Reconstruction Capabilities Based on the β-VAE Model publication-title: ITM Web Conf. doi: 10.1051/itmconf/20257003006 – volume: 8 start-page: 496 year: 2025 ident: ref_13 article-title: Large Language Model for Medical Images: A Survey of Taxonomy, Systematic Review, and Future Trends publication-title: Big Data Min. Anal. doi: 10.26599/BDMA.2024.9020090 – ident: ref_2 doi: 10.1109/CVPR46437.2021.01268 – ident: ref_6 – ident: ref_8 – volume: 47 start-page: 1958 year: 2025 ident: ref_25 article-title: Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2024.3511621 – ident: ref_31 – ident: ref_29 – ident: ref_27 – ident: ref_12 – ident: ref_10 – ident: ref_34 doi: 10.1007/978-3-319-10602-1_48 – ident: ref_15 – volume: 9 start-page: 2579 year: 2008 ident: ref_36 article-title: Visualizing Data using t-SNE publication-title: J. Mach. Learn. Res. – ident: ref_38 – ident: ref_17 – ident: ref_19 – ident: ref_43 – ident: ref_22 – ident: ref_30 doi: 10.1109/ICCV.2015.425 – ident: ref_20 – ident: ref_23 doi: 10.1109/ICCV.2017.629 |
| SSID | ssj0000913830 |
| Score | 2.3205504 |
| Snippet | Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation... |
| SourceID | proquest gale crossref |
| SourceType | Aggregation Database Index Database |
| StartPage | 2185 |
| SubjectTerms | Ablation Architecture Datasets Diffusion models Effectiveness Efficiency Image processing Normal distribution Regularization Semantics |
| Title | Lightweight Text-to-Image Generation Model Based on Contrastive Language-Image Pre-Training Embeddings and Conditional Variational Autoencoders |
| URI | https://www.proquest.com/docview/3217726227 |
| Volume | 14 |
| WOSCitedRecordID | wos001505911300001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2079-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000913830 issn: 2079-9292 databaseCode: M~E dateStart: 20120101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVPQU databaseName: Advanced Technologies & Aerospace Database customDbUrl: eissn: 2079-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000913830 issn: 2079-9292 databaseCode: P5Z dateStart: 20120301 isFulltext: true titleUrlDefault: https://search.proquest.com/hightechjournals providerName: ProQuest – providerCode: PRVPQU databaseName: ProQuest Central customDbUrl: eissn: 2079-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000913830 issn: 2079-9292 databaseCode: BENPR dateStart: 20120301 isFulltext: true titleUrlDefault: https://www.proquest.com/central providerName: ProQuest – providerCode: PRVPQU databaseName: Publicly Available Content Database customDbUrl: eissn: 2079-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000913830 issn: 2079-9292 databaseCode: PIMPY dateStart: 20120301 isFulltext: true titleUrlDefault: http://search.proquest.com/publiccontent providerName: ProQuest |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Na9wwEB3y0UNbaNI0pdsmQYdCLxVZS_JKPoVN2ZBAspiyLWkvRl8LgWY3XTvtrX-hfzkztrbZQ-glF4NthI1nNJo3I78H8L7QwvloLffaGa4iNQnNwHArlHZymjk7Na3YhB6PzeVlUaaCW522VS5jYhuow9xTjfxQYu6sxUAIfXTzk5NqFHVXk4TGOmwSUxn6-ebxaFx-_ldlIdZLI_sd3ZBEfH94ry5TZwqTjYxUlFeWpIcDc7vanGw99j234UXKM9mwc4yXsBZnO_B8uNI22IFnK2SEr-DvOeH0322plE0IEDdzfnaN8YZ13NRkQkbaaT_YMa59geEpkVstbE0xk52n0mcaVC4inyQBCja6djG0bS5mZ4GGhauuCsm-IlpPFUk2vG3mxKxJu6t34cvJaPLplCe5Bu5lJhsuVAgIXgikaOUlIkWPuaLKTcxEyDFNsFrGmPddhqAp-phNjc1d4TJDm7a9k69hYzafxTfAMGmTRk9xeXVCGRfRo_reeVEMCuVs0D34uLRYddOxclSIZsjA1QMG7sEHsmpFcxY_irfp1wN8GLFfVUOjcmoQKtGDvaVVqzSZ6-repG__f_sdPBUkD9wWafZgo1ncxn144n81V_XiIPnmAaxf_Bnhscy_47Xy7KL8dgeWk_Z2 |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Nb9QwEB2VLRIfEh-FioUCPoC4YDWxnbVzQGiBVl11u9rDgsop2I5XWqndLUlKxa_gn_AbmckH3UPFrQeOUZREcp5nnmfs9wBepVo4H6zlXjvDVaAmoRkYboXSTs5jZ-emNpvQk4k5Pk6nG_C7OwtD2yq7mFgH6nzlqUa-K5E7azEQQr8_-87JNYq6q52FRgOLw_DzApds5bvRJ_y_r4XY35t9POCtqwD3MpYVFyrPkWMTl9bKS1zQeKQ0KjEhFnmC2cxqGUISuRi5ffAhnhubuNTFhvYWeyfxvTdgUyHYox5sTkdH069_qzqksmlk1MgbSZlGu5duNmWskNzE5Nq8lgKvTgR1dtu__7-NywO41_JoNmyA_xA2wnIL7g7X2iJbcGdNbPER_BpTHeKiLgWzGS34qxUfnWI8ZY32NkGUkTfcCfuAuT1neEniXYUtKSewcVvabR-aFoHPWoMNtnfqQl638Zhd5vRYvmiqrOyLLRZtxZUNz6sVKYfS7vHH8PlaBmgbesvVMjwBhqRUGj1H-uCEMi7gjIm88yIdpMrZXPfhbYeQ7KxRHclwtUaAyq4AVB_eEIoyikk4KN62RyvwY6TulQ2NSqgBqkQfdjoUZW2wKrNLCD399-2XcOtgdjTOxqPJ4TO4LcgKuS5I7UCvKs7Dc7jpf1SLsnjRzgsG364bcn8A5h1Otg |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Nb9NAEB2VghAg8dEWkVJgDyAuWI1319n1AaFAGxE1inIIqOLi7pelSDQptkvFr-D_8Os64w-aQ8WtB46RZVvavJ15-2Y8D-B1qrh1wZjIKasjGahIqAc6MlwqK_LYmlzXZhNqOtXHx-lsA_5038JQW2UXE-tA7VeONPJ9gdxZ8QHHo3retkXMDkYfzn5E5CBFldbOTqOByFH4dYHHt_L9-AD_6zecjw7nnz5HrcNA5EQsqohL75FvE69W0gk83DikNzLRIeY-wcxmlAgh6dsYeX5wIc61SWxqY019xs4KfO4tuK3kQFM72Sz59lffoXmbWvSbQUdCpP39K1-bMpZIc2Lyb15LhtenhDrPjR79zyv0GB627JoNm-3wBDbCcgseDNeKJVtwf20E4zb8npA6cVELxGxOMkC1isanGGVZM5GbgMvIMe47-4gZ3zP8SSO9ClNSpmCTVvBtb5oVIZq3thvs8NQGXxf3mFl6us0vGu2VfTXFotVh2fC8WtE8Ueop34EvN7JAT2FzuVqGZ8CQqgqtciQVlkttA-6jvrOOp4NUWuNVD951aMnOmlkkGZ7hCFzZNeDqwVtCVEaRChfFmfaDC3wZzfzKhlomVBaVvAd7HaKyNoSV2RWcdv99-RXcRZxlk_H06Dnc4-SPXKtUe7BZFefhBdxxP6tFWbysNwiDk5vG2yWp11Yi |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Lightweight+Text-to-Image+Generation+Model+Based+on+Contrastive+Language-Image+Pre-Training+Embeddings+and+Conditional+Variational+Autoencoders&rft.jtitle=Electronics+%28Basel%29&rft.au=Wang%2C+Yubo&rft.au=Zhang%2C+Gaofeng&rft.date=2025-06-01&rft.issn=2079-9292&rft.eissn=2079-9292&rft.volume=14&rft.issue=11&rft.spage=2185&rft_id=info:doi/10.3390%2Felectronics14112185&rft.externalDBID=n%2Fa&rft.externalDocID=10_3390_electronics14112185 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2079-9292&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2079-9292&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2079-9292&client=summon |