Lightweight Text-to-Image Generation Model Based on Contrastive Language-Image Pre-Training Embeddings and Conditional Variational Autoencoders

Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Cond...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) Jg. 14; H. 11; S. 2185
Hauptverfasser: Wang, Yubo, Zhang, Gaofeng
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Basel MDPI AG 01.06.2025
Schlagworte:
ISSN:2079-9292, 2079-9292
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Deploying text-to-image (T2I) models is challenging due to high computational demands, extensive data needs, and the persistent goal of enhancing generation quality and diversity, particularly on resource-constrained devices. We introduce a lightweight T2I framework that uses a dual-conditioned Conditional Variational Autoencoder (CVAE), leveraging CLIP embeddings for semantic guidance and enabling explicit attribute control, thereby reducing computational load and data dependency. Key to our approach is a specialized mapping network that bridges CLIP text–image modalities for improved fidelity and Rényi divergence for latent space regularization to foster diversity, as evidenced by richer latent representations. Experiments on CelebA demonstrate competitive generation (FID: 40.53, 42 M params, 21 FPS) with enhanced diversity. Crucially, our model also shows effective generalization to the more complex MS COCO dataset and maintains a favorable balance between visual quality and efficiency (8 FPS at 256 × 256 resolution with 54 M params). Ablation studies and component validations (detailed in appendices) confirm the efficacy of our contributions. This work offers a practical, efficient T2I solution that balances generative performance with resource constraints across different datasets and is suitable for specialized, data-limited domains.
Bibliographie:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2079-9292
2079-9292
DOI:10.3390/electronics14112185