MLLM-Based UI2Code Automation Guided by UI Layout Information

Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development efficiency. There exist deep learning-based methods for the t...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the ACM on software engineering Vol. 2; no. ISSTA; pp. 1123 - 1145
Main Authors:	Wu, Fan, Gao, Cuiyun, Li, Shuqing, Wen, Xin-Cheng, Liao, Qing
Format:	Journal Article
Language:	English
Published:	New York, NY, USA ACM 22.06.2025
Subjects:	Automatic programming Human-centered computing Software and its engineering User interface programming Web App Development Automated Software Engineering Multimodal Large Language Models
ISSN:	2994-970X, 2994-970X
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development efficiency. There exist deep learning-based methods for the task; however, they heavily rely on a large amount of labeled training data and struggle with generalizing to real-world, unseen web page designs. The advent of Multimodal Large Language Models (MLLMs) presents potential for alleviating the issue, but they are difficult to comprehend the complex layouts in UIs and generate the accurate code with layout preserved. To address these issues, we propose LayoutCoder, a novel MLLM-based framework generating UI code from real-world webpage images, which includes three key modules: (1) Element Relation Construction, which aims at capturing UI layout by identifying and grouping components with similar structures; (2) UI Layout Parsing, which aims at generating UI layout trees for guiding the subsequent code generation process; and (3) Layout-Guided Code Fusion, which aims at producing the accurate code with layout preserved. For evaluation, we build a new benchmark dataset which involves 350 real-world websites named Snap2Code, divided into seen and unseen parts for mitigating the data leakage issue, besides the popular dataset Design2Code. Extensive evaluation shows the superior performance of LayoutCoder over the state-of-the-art approaches. Compared with the best-performing baseline, LayoutCoder improves 10.14% in the BLEU score and 3.95% in the CLIP score on average across all datasets.
AbstractList	Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development efficiency. There exist deep learning-based methods for the task; however, they heavily rely on a large amount of labeled training data and struggle with generalizing to real-world, unseen web page designs. The advent of Multimodal Large Language Models (MLLMs) presents potential for alleviating the issue, but they are difficult to comprehend the complex layouts in UIs and generate the accurate code with layout preserved. To address these issues, we propose LayoutCoder, a novel MLLM-based framework generating UI code from real-world webpage images, which includes three key modules: (1) Element Relation Construction, which aims at capturing UI layout by identifying and grouping components with similar structures; (2) UI Layout Parsing, which aims at generating UI layout trees for guiding the subsequent code generation process; and (3) Layout-Guided Code Fusion, which aims at producing the accurate code with layout preserved. For evaluation, we build a new benchmark dataset which involves 350 real-world websites named Snap2Code, divided into seen and unseen parts for mitigating the data leakage issue, besides the popular dataset Design2Code. Extensive evaluation shows the superior performance of LayoutCoder over the state-of-the-art approaches. Compared with the best-performing baseline, LayoutCoder improves 10.14% in the BLEU score and 3.95% in the CLIP score on average across all datasets.
ArticleNumber	ISSTA050
Author	Li, Shuqing Wen, Xin-Cheng Wu, Fan Liao, Qing Gao, Cuiyun
Author_xml	– sequence: 1 givenname: Fan orcidid: 0009-0002-9090-1832 surname: Wu fullname: Wu, Fan email: codenobuge@163.com organization: Harbin Institute of Technology, Shenzhen, Shenzhen, China – sequence: 2 givenname: Cuiyun orcidid: 0000-0003-4774-2434 surname: Gao fullname: Gao, Cuiyun email: gaocuiyun@hit.edu.cn organization: Harbin Institute of Technology, Shenzhen, Shenzhen, China – sequence: 3 givenname: Shuqing orcidid: 0000-0001-6323-1402 surname: Li fullname: Li, Shuqing email: sqli21@cse.cuhk.edu.hk organization: Chinese University of Hong Kong, Hong Kong, China – sequence: 4 givenname: Xin-Cheng orcidid: 0000-0002-2115-9921 surname: Wen fullname: Wen, Xin-Cheng email: xiamenwxc@foxmail.com organization: Harbin Institute of Technology, Shenzhen, Shenzhen, China – sequence: 5 givenname: Qing orcidid: 0000-0003-1012-5301 surname: Liao fullname: Liao, Qing email: liaoqing@hit.edu.cn organization: Harbin Institute of Technology, Shenzhen, Shenzhen, China
BookMark	eNpNjzFPwzAQRi1UJEpbsTN5Ywr4bKe2B4YSQYmUqkuR2KJL7EhBJEZxMuTfE5SCmO7TvafTfddk0frWEXID7B5Axg9CcW14fEGW3BgZGcXeF__yFdmE8MEYmzYAii3J4yHLDtETBmfpW8oTbx3dDb1vsK99S_dDbSdSjBOkGY5-6GnaVr6b-ZpcVvgZ3OY8V-T08nxKXqPsuE-TXRahlnHErRayFAAcY-5cjFKUwEQldOH09Mq2LAC21ihjDRiOUmqoUHLGlHHKFWJF7uazZedD6FyVf3V1g92YA8t_eufn3pN5O5tYNn_SL_wGwYVRKw
Cites_doi	10.1145/3540250.3549138 10.1007/978-3-030-01246-5_41 10.1145/3289600.3290610 10.1145/3472749.3474763 10.1145/3180155.3180240 10.1109/CVPR52729.2023.01765 10.1007/s00530-021-00804-7 10.1109/TSE.2018.2844788 10.1115/DETC2024-143139 10.1145/3220134.3220135 10.1609/aaai.v36i1.19994 10.1145/3368089.3417940 10.1155/2022/4415479 10.3115/1073083.1073135 10.1109/ICCV51070.2023.00371 10.1016/j.displa.2024.102679 10.1145/3126594.3126651 10.1007/s11042-023-15108-3
ContentType	Journal Article
Copyright	Owner/Author
Copyright_xml	– notice: Owner/Author
DBID	AAYXX CITATION
DOI	10.1145/3728925
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2994-970X
EndPage	1145
ExternalDocumentID	10_1145_3728925 3728925
GrantInformation_xml	– fundername: Shenzhen Basic Research grantid: No. JCYJ20220531095214031 – fundername: CCF-Huawei Populus Grove Fund – fundername: National Key R&D Program of China grantid: No. 2022YFB3103900 – fundername: Shenzhen-Hong Kong Jointly Funded Project grantid: No. SGDX20230116091246007 – fundername: National Natural Science Foundation of China under project grantid: No. 62472126 – fundername: Natural Science Foundation of Guangdong Province grantid: No. 2023A1515011959 funderid: https://doi.org/10.13039/501100003453 – fundername: Shenzhen International Science and Technology Cooperation Project grantid: No. GJHZ20220913143008015
GroupedDBID	AAKMM ACM AEJOY AKRVB ALMA_UNASSIGNED_HOLDINGS LHSKQ M~E AAYXX CITATION
ID	FETCH-LOGICAL-a845-2d834c3112a52ee5a43c103f38be80296cb116d979d9192a4481fa420079e7eb3
ISSN	2994-970X
IngestDate	Sat Nov 29 07:43:49 EST 2025 Mon Jul 14 20:48:59 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	ISSTA
Keywords	Web App Development Automated Software Engineering Multimodal Large Language Models
Language	English
License	This work is licensed under Creative Commons Attribution International 4.0.
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-a845-2d834c3112a52ee5a43c103f38be80296cb116d979d9192a4481fa420079e7eb3
ORCID	0000-0002-2115-9921 0000-0001-6323-1402 0009-0002-9090-1832 0000-0003-4774-2434 0000-0003-1012-5301
OpenAccessLink	https://dl.acm.org/doi/10.1145/3728925
PageCount	23
ParticipantIDs	crossref_primary_10_1145_3728925 acm_primary_3728925
PublicationCentury	2000
PublicationDate	20250622 2025-06-22
PublicationDateYYYYMMDD	2025-06-22
PublicationDate_xml	– month: 06 year: 2025 text: 20250622 day: 22
PublicationDecade	2020
PublicationPlace	New York, NY, USA
PublicationPlace_xml	– name: New York, NY, USA
PublicationTitle	Proceedings of the ACM on software engineering
PublicationTitleAbbrev	ACM PACMSE
PublicationYear	2025
Publisher	ACM
Publisher_xml	– name: ACM
References	Zhaoyun Jiang, Jiaqi Guo, Shizhao Sun, Huayu Deng, Zhongkai Wu, Vuksan Mijovic, Zijiang James Yang, Jian-Guang Lou, and Dongmei Zhang. 2023. LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA. 18403–18412. https://doi.org/10.1109/CVPR52729.2023.01765 10.1109/CVPR52729.2023.01765 André Armstrong Janino Cizotto, Rodrigo Clemente Thom de Souza, Viviana Cocco Mariani, and Leandro dos Santos Coelho. 2023. Web pages from mockup design based on convolutional neural network and class activation mapping. 82, 25 (2023), March, 38771–38797. issn:1380-7501 https://doi.org/10.1007/s11042-023-15108-3 10.1007/s11042-023-15108-3 Chunyang Chen, Ting Su, Guozhu Meng, Zhenchang Xing, and Yang Liu. 2018. From UI design image to GUI skeleton: a neural machine translator to bootstrap mobile GUI implementation. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). Association for Computing Machinery, New York, NY, USA. 665–676. isbn:9781450356381 https://doi.org/10.1145/3180155.3180240 10.1145/3180155.3180240 Zhaoyun Jiang, Shizhao Sun, Jihua Zhu, Jian-Guang Lou, and Dongmei Zhang. 2022. Coarse-to-Fine Generative Modeling for Graphic Layouts. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 1 (2022), Jun., 1096–1103. https://doi.org/10.1609/aaai.v36i1.19994 10.1609/aaai.v36i1.19994 Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung-Yeung Shum. 2022. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. arxiv:2203.03605. arxiv:2203.03605 Chenglei Si, Yanzhe Zhang, Zhengyuan Yang, Ruibo Liu, and Diyi Yang. 2024. Design2Code: How Far Are We From Automating Front-End Engineering? arXiv preprint arXiv:2403.03163. Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD’96. AAAI Press, 226–231. Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything. arxiv:2304.02643. arxiv:2304.02643 Mulong Xie, Zhenchang Xing, Sidong Feng, Xiwei Xu, Liming Zhu, and Chunyang Chen. 2022. Psychologically-inspired, unsupervised inference of perceptual groups of GUI widgets from GUI images. ESEC/FSE 2022. Association for Computing Machinery, New York, NY, USA. 332–343. isbn:9781450394130 https://doi.org/10.1145/3540250.3549138 10.1145/3540250.3549138 Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST ’17). Association for Computing Machinery, New York, NY, USA. 845–854. isbn:9781450349819 https://doi.org/10.1145/3126594.3126651 10.1145/3126594.3126651 Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, and Fangyu Liu. 2023. Pix2Struct: screenshot parsing as pretraining for visual language understanding. In Proceedings of the 40th International Conference on Machine Learning (ICML’23). JMLR.org, Article 780, 20 pages. Jean-Baptiste Alayrac, Jeff Donahue, and Pauline Luc. 2024. Flamingo: a visual language model for few-shot learning. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22). Curran Associates Inc., Red Hook, NY, USA. Article 1723, 21 pages. isbn:9781713871088 Kevin Moran, Carlos Bernal-Cárdenas, Michael Curcio, Richard Bonett, and Denys Poshyvanyk. 2018. Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps. IEEE Transactions on Software Engineering, 46 (2018), 196–221. https://api.semanticscholar.org/CorpusID:3629347 OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, and Lama Ahmad. 2024. GPT-4 Technical Report. arxiv:2303.08774. arxiv:2303.08774 Tuan Anh Nguyen and Christoph Csallner. 2015. Reverse Engineering Mobile Application User Interfaces with REMAUI (T). 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 248–259. https://api.semanticscholar.org/CorpusID:7499368 Daniel Baulé, Christiane Gresse von Wangenheim, Aldo von Wangenheim, Jean C. R. Hauck, and Edson C. Vargas Júnior. 2021. Automatic code generation from sketches of mobile applications in end-user development using Deep Learning. arxiv:2103.05704. arxiv:2103.05704 Vanita Jain, Piyush Agrawal, Subham Banga, Rishabh Kapoor, and Shashwat Gulyani. 2019. Sketch2Code: Transformation of Sketches to UI in Real-time Using Deep Neural Network. arxiv:1910.08930. arxiv:1910.08930 Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, and Xixuan Song. 2023. Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079. Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arxiv:2301.12597. arxiv:2301.12597 Tony Beltramelli. 2018. pix2code: Generating Code from a Graphical User Interface Screenshot. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS ’18). Association for Computing Machinery, New York, NY, USA. Article 3, 6 pages. isbn:9781450358972 https://doi.org/10.1145/3220134.3220135 10.1145/3220134.3220135 Shuyu Zheng, Ziniu Hu, and Yun Ma. 2019. FaceOff: Assisting the Manifestation Design of Web Graphical User Interface. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19). Association for Computing Machinery, New York, NY, USA. 774–777. isbn:9781450359405 https://doi.org/10.1145/3289600.3290610 10.1145/3289600.3290610 Anthropic. 2024. Claude 3.5. https://www.anthropic.com Accessed: 2024-10-30 Xu Zhong, Jianbin Tang, and Antonio Jimeno-Yepes. 2019. PubLayNet: Largest Dataset Ever for Document Layout Analysis. 2019 International Conference on Document Analysis and Recognition (ICDAR), 1015–1022. https://api.semanticscholar.org/CorpusID:201124789 Jason Wu, Xiaoyi Zhang, Jeff Nichols, and Jeffrey P Bigham. 2021. Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots. In The 34th Annual ACM Symposium on User Interface Software and Technology (UIST ’21). Association for Computing Machinery, New York, NY, USA. 470–483. isbn:9781450386357 https://doi.org/10.1145/3472749.3474763 10.1145/3472749.3474763 Shuhong Xiao, Yunnong Chen, Jiazhi Li, Liuqing Chen, Lingyun Sun, and Tingting Zhou. 2024. Prototype2Code: End-to-end Front-end Code Generation from UI Design Prototypes. arxiv:2405.04975. arxiv:2405.04975 Jianan Li, Jimei Yang, Aaron Hertzmann, Jianming Zhang, and Tingfa Xu. 2019. LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators. arxiv:1901.06767. arxiv:1901.06767 Fan Wu. 2025. MLLM-Based UI2Code Automation Guided by UI Layout Information. https://github.com/ay7u1009/LayoutCoder/ Accessed: 2025-04-05 Microsoft Azure. 2018. Turn your whiteboard sketches to working code in seconds with sketch2code. https://azure.microsoft.com/en-us/blog/turn-your-whiteboard-sketches-to-working-code-in-seconds-with-sketch2code/ Accessed: 2024-10-30 Wei Zhang, Shangmin Luan, Liqin Tian, and Nima Jafari Navimipour. 2022. A Rapid Combined Model for Automatic Generating Web UI Codes. Wirel. Commun. Mob. Comput., 2022 (2022), Jan., 10 pages. issn:1530-8669 https://doi.org/10.1155/2022/4415479 10.1155/2022/4415479 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, and Jack Clark. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. 8748–8763. Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph R-CNN for Scene Graph Generation. In Proceedings of the European Conference on Computer Vision (ECCV). Junyi Zhang, Jiaqi Guo, Shizhao Sun, Jian-Guang Lou, and D. Zhang. 2023. LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 7192–7202. https://api.semanticscholar.org/CorpusID:257636725 Xiaoyi Zhang, Lilian de Greef, and Amanda Swearngin. 2021. Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels.. In CHI, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 275:1–275:15. isbn:978-1-4503-8096-6 http://dblp.uni-trier.de/db/conf/chi/chi2021.html##ZhangGSWMYSNWFE21 Mulong Xie, Sidong Feng, Zhenchang Xing, Jieshan Chen, and Chunyang Chen. 2020. UIED: a hybrid tool for GUI element detection. ESEC/FSE 2020. Association for Computing Machinery, New York, NY, USA. 1655–1659. isbn:9781450370431 https://doi.org/10.1145/3368089.3417940 10.1145/3368089.3417940 Shuhong Xiao, Yunnong Chen, Yaxuan Song, Liuqing Chen, Lingyun Sun, Yankun Zhen, and Yanfang Chang. 2024. UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface. arxiv:2403.04984. arxiv:2403.04984 Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL ’02). Association for Computational Linguistics, USA. 311–318. https://doi.org/10.3115/1073083.1073135 10.3115/1073083.1073135 Ting Zhou, Yanjie Zhao, Xinyi Hou, Xiaoyu Sun, Kai Chen, and Haoyu Wang. 2024. Bridging Design and Development with Automated Declarative UI Code Generation. arxiv:2409.11667. arxiv:2409.11667 Hugo Laurençon, Léo Tronchon, and Victor Sanh. 202 Alayrac Jean-Baptiste (e_1_2_1_1_1) 2024 Aşıroğlu Batuhan (e_1_2_1_3_1) 2019 Wan Yuxuan (e_1_2_1_32_1) 2024 e_1_2_1_20_1 Nguyen Tuan Anh (e_1_2_1_24_1) 2015 e_1_2_1_40_1 e_1_2_1_23_1 e_1_2_1_45_1 e_1_2_1_21_1 e_1_2_1_43_1 e_1_2_1_28_1 e_1_2_1_25_1 e_1_2_1_26_1 e_1_2_1_47_1 e_1_2_1_29_1 Lee Kenton (e_1_2_1_19_1) 2023 Radford Alec (e_1_2_1_27_1) 2021 Baulé Daniel (e_1_2_1_6_1) 2021 Wang Weihan (e_1_2_1_33_1) 2023 Ester Martin (e_1_2_1_12_1) Zhang Hao (e_1_2_1_41_1) 2022 e_1_2_1_7_1 e_1_2_1_31_1 e_1_2_1_8_1 e_1_2_1_30_1 e_1_2_1_5_1 e_1_2_1_35_1 Zhang Xiaoyi (e_1_2_1_44_1) 2021; 275 e_1_2_1_4_1 e_1_2_1_34_1 e_1_2_1_10_1 e_1_2_1_2_1 e_1_2_1_11_1 e_1_2_1_16_1 Liu Haotian (e_1_2_1_22_1) 2023 e_1_2_1_39_1 Zhang Junyi (e_1_2_1_42_1) Hessel Jack (e_1_2_1_13_1) 2022 e_1_2_1_17_1 e_1_2_1_38_1 e_1_2_1_14_1 e_1_2_1_37_1 e_1_2_1_15_1 e_1_2_1_36_1 e_1_2_1_9_1 e_1_2_1_18_1 Zhong Xu (e_1_2_1_46_1) 2019
References_xml	– reference: Vanita Jain, Piyush Agrawal, Subham Banga, Rishabh Kapoor, and Shashwat Gulyani. 2019. Sketch2Code: Transformation of Sketches to UI in Real-time Using Deep Neural Network. arxiv:1910.08930. arxiv:1910.08930 – reference: Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, and Xixuan Song. 2023. Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079. – reference: Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual Instruction Tuning. ArXiv, abs/2304.08485 (2023), https://api.semanticscholar.org/CorpusID:258179774 – reference: Jean-Baptiste Alayrac, Jeff Donahue, and Pauline Luc. 2024. Flamingo: a visual language model for few-shot learning. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22). Curran Associates Inc., Red Hook, NY, USA. Article 1723, 21 pages. isbn:9781713871088 – reference: Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL ’02). Association for Computational Linguistics, USA. 311–318. https://doi.org/10.3115/1073083.1073135 10.3115/1073083.1073135 – reference: Andy Rutledge. 2009. Gestalt Principles - 3: Proximity, Uniform Connectedness, and Good Continuation. https://andyrutledge.com/gestalt-principles-3.html Accessed: 2024-10-30 – reference: Chunyang Chen, Ting Su, Guozhu Meng, Zhenchang Xing, and Yang Liu. 2018. From UI design image to GUI skeleton: a neural machine translator to bootstrap mobile GUI implementation. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). Association for Computing Machinery, New York, NY, USA. 665–676. isbn:9781450356381 https://doi.org/10.1145/3180155.3180240 10.1145/3180155.3180240 – reference: Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. 2022. CLIPScore: A Reference-free Evaluation Metric for Image Captioning. arxiv:2104.08718. arxiv:2104.08718 – reference: Alex Robinson. 2019. Sketch2code: Generating a website from a paper mockup. arxiv:1905.13750. arxiv:1905.13750 – reference: Xiaoyi Zhang, Lilian de Greef, and Amanda Swearngin. 2021. Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels.. In CHI, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 275:1–275:15. isbn:978-1-4503-8096-6 http://dblp.uni-trier.de/db/conf/chi/chi2021.html##ZhangGSWMYSNWFE21 – reference: Shuhong Xiao, Yunnong Chen, Yaxuan Song, Liuqing Chen, Lingyun Sun, Yankun Zhen, and Yanfang Chang. 2024. UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface. arxiv:2403.04984. arxiv:2403.04984 – reference: Kevin Moran, Carlos Bernal-Cárdenas, Michael Curcio, Richard Bonett, and Denys Poshyvanyk. 2018. Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps. IEEE Transactions on Software Engineering, 46 (2018), 196–221. https://api.semanticscholar.org/CorpusID:3629347 – reference: Anthropic. 2024. Claude 3.5. https://www.anthropic.com Accessed: 2024-10-30 – reference: Jianan Li, Jimei Yang, Aaron Hertzmann, Jianming Zhang, and Tingfa Xu. 2019. LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators. arxiv:1901.06767. arxiv:1901.06767 – reference: Tuan Anh Nguyen and Christoph Csallner. 2015. Reverse Engineering Mobile Application User Interfaces with REMAUI (T). 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 248–259. https://api.semanticscholar.org/CorpusID:7499368 – reference: Chenglei Si, Yanzhe Zhang, Zhengyuan Yang, Ruibo Liu, and Diyi Yang. 2024. Design2Code: How Far Are We From Automating Front-End Engineering? arXiv preprint arXiv:2403.03163. – reference: Junyi Zhang, Jiaqi Guo, Shizhao Sun, Jian-Guang Lou, and D. Zhang. 2023. LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 7192–7202. https://api.semanticscholar.org/CorpusID:257636725 – reference: Microsoft Azure. 2018. Turn your whiteboard sketches to working code in seconds with sketch2code. https://azure.microsoft.com/en-us/blog/turn-your-whiteboard-sketches-to-working-code-in-seconds-with-sketch2code/ Accessed: 2024-10-30 – reference: Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arxiv:2308.12966. arxiv:2308.12966 – reference: Wei Zhang, Shangmin Luan, Liqin Tian, and Nima Jafari Navimipour. 2022. A Rapid Combined Model for Automatic Generating Web UI Codes. Wirel. Commun. Mob. Comput., 2022 (2022), Jan., 10 pages. issn:1530-8669 https://doi.org/10.1155/2022/4415479 10.1155/2022/4415479 – reference: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, and Lama Ahmad. 2024. GPT-4 Technical Report. arxiv:2303.08774. arxiv:2303.08774 – reference: Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD’96. AAAI Press, 226–231. – reference: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything. arxiv:2304.02643. arxiv:2304.02643 – reference: André Armstrong Janino Cizotto, Rodrigo Clemente Thom de Souza, Viviana Cocco Mariani, and Leandro dos Santos Coelho. 2023. Web pages from mockup design based on convolutional neural network and class activation mapping. 82, 25 (2023), March, 38771–38797. issn:1380-7501 https://doi.org/10.1007/s11042-023-15108-3 10.1007/s11042-023-15108-3 – reference: Zhaoyun Jiang, Jiaqi Guo, Shizhao Sun, Huayu Deng, Zhongkai Wu, Vuksan Mijovic, Zijiang James Yang, Jian-Guang Lou, and Dongmei Zhang. 2023. LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA. 18403–18412. https://doi.org/10.1109/CVPR52729.2023.01765 10.1109/CVPR52729.2023.01765 – reference: Hugo Laurençon, Léo Tronchon, and Victor Sanh. 2024. Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset. arxiv:2403.09029. arxiv:2403.09029 – reference: Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, and Fangyu Liu. 2023. Pix2Struct: screenshot parsing as pretraining for visual language understanding. In Proceedings of the 40th International Conference on Machine Learning (ICML’23). JMLR.org, Article 780, 20 pages. – reference: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, and Jack Clark. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. 8748–8763. – reference: Digital Silk. 2024. How Many Websites Are There In 2024? https://www.digitalsilk.com/digital-trends/how-many-websites-are-there/ Accessed: 2024-10-31 – reference: Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph R-CNN for Scene Graph Generation. In Proceedings of the European Conference on Computer Vision (ECCV). – reference: Fan Wu. 2025. MLLM-Based UI2Code Automation Guided by UI Layout Information. https://github.com/ay7u1009/LayoutCoder/ Accessed: 2025-04-05 – reference: Shuhong Xiao, Yunnong Chen, Jiazhi Li, Liuqing Chen, Lingyun Sun, and Tingting Zhou. 2024. Prototype2Code: End-to-end Front-end Code Generation from UI Design Prototypes. arxiv:2405.04975. arxiv:2405.04975 – reference: Mulong Xie, Sidong Feng, Zhenchang Xing, Jieshan Chen, and Chunyang Chen. 2020. UIED: a hybrid tool for GUI element detection. ESEC/FSE 2020. Association for Computing Machinery, New York, NY, USA. 1655–1659. isbn:9781450370431 https://doi.org/10.1145/3368089.3417940 10.1145/3368089.3417940 – reference: Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST ’17). Association for Computing Machinery, New York, NY, USA. 845–854. isbn:9781450349819 https://doi.org/10.1145/3126594.3126651 10.1145/3126594.3126651 – reference: Jason Wu, Xiaoyi Zhang, Jeff Nichols, and Jeffrey P Bigham. 2021. Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots. In The 34th Annual ACM Symposium on User Interface Software and Technology (UIST ’21). Association for Computing Machinery, New York, NY, USA. 470–483. isbn:9781450386357 https://doi.org/10.1145/3472749.3474763 10.1145/3472749.3474763 – reference: Shuyu Zheng, Ziniu Hu, and Yun Ma. 2019. FaceOff: Assisting the Manifestation Design of Web Graphical User Interface. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19). Association for Computing Machinery, New York, NY, USA. 774–777. isbn:9781450359405 https://doi.org/10.1145/3289600.3290610 10.1145/3289600.3290610 – reference: Xu Zhong, Jianbin Tang, and Antonio Jimeno-Yepes. 2019. PubLayNet: Largest Dataset Ever for Document Layout Analysis. 2019 International Conference on Document Analysis and Recognition (ICDAR), 1015–1022. https://api.semanticscholar.org/CorpusID:201124789 – reference: Daniel Baulé, Christiane Gresse von Wangenheim, Aldo von Wangenheim, Jean C. R. Hauck, and Edson C. Vargas Júnior. 2021. Automatic code generation from sketches of mobile applications in end-user development using Deep Learning. arxiv:2103.05704. arxiv:2103.05704 – reference: Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung-Yeung Shum. 2022. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. arxiv:2203.03605. arxiv:2203.03605 – reference: Batuhan Aşıroğlu, Büşta Rümeysa Mete, Eyyüp Yıldız, Yağız Nalçakan, Alper Sezen, Mustafa Dağtekin, and Tolga Ensari. 2019. Automatic HTML code generation from mock-up images using machine learning techniques. In 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT). 1–4. – reference: Wen-Yin Chen, Pavol Podstreleny, Wen-Huang Cheng, Yung-Yao Chen, and Kai-Lung Hua. 2022. Code generation from a graphical user interface via attention-based encoder–decoder model. Multimedia Systems, 28, 1 (2022), 121–130. – reference: Mulong Xie, Zhenchang Xing, Sidong Feng, Xiwei Xu, Liming Zhu, and Chunyang Chen. 2022. Psychologically-inspired, unsupervised inference of perceptual groups of GUI widgets from GUI images. ESEC/FSE 2022. Association for Computing Machinery, New York, NY, USA. 332–343. isbn:9781450394130 https://doi.org/10.1145/3540250.3549138 10.1145/3540250.3549138 – reference: Ting Zhou, Yanjie Zhao, Xinyi Hou, Xiaoyu Sun, Kai Chen, and Haoyu Wang. 2024. Bridging Design and Development with Automated Declarative UI Code Generation. arxiv:2409.11667. arxiv:2409.11667 – reference: Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arxiv:2301.12597. arxiv:2301.12597 – reference: Zhaoyun Jiang, Shizhao Sun, Jihua Zhu, Jian-Guang Lou, and Dongmei Zhang. 2022. Coarse-to-Fine Generative Modeling for Graphic Layouts. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 1 (2022), Jun., 1096–1103. https://doi.org/10.1609/aaai.v36i1.19994 10.1609/aaai.v36i1.19994 – reference: Yuxuan Wan, Chaozheng Wang, Yi Dong, Wenxuan Wang, Shuqing Li, Yintong Huo, and Michael R. Lyu. 2024. Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach. arxiv:2406.16386. arxiv:2406.16386 – reference: Tony Beltramelli. 2018. pix2code: Generating Code from a Graphical User Interface Screenshot. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS ’18). Association for Computing Machinery, New York, NY, USA. Article 3, 6 pages. isbn:9781450358972 https://doi.org/10.1145/3220134.3220135 10.1145/3220134.3220135 – volume-title: Lyu year: 2024 ident: e_1_2_1_32_1 – ident: e_1_2_1_21_1 – volume-title: Aldo von Wangenheim, Jean C. R. Hauck, and Edson C. Vargas Júnior. year: 2021 ident: e_1_2_1_6_1 – ident: e_1_2_1_39_1 doi: 10.1145/3540250.3549138 – volume-title: Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079. year: 2023 ident: e_1_2_1_33_1 – ident: e_1_2_1_40_1 doi: 10.1007/978-3-030-01246-5_41 – ident: e_1_2_1_45_1 doi: 10.1145/3289600.3290610 – volume-title: Visual Instruction Tuning. ArXiv, abs/2304.08485 year: 2023 ident: e_1_2_1_22_1 – ident: e_1_2_1_35_1 doi: 10.1145/3472749.3474763 – volume-title: Proceedings of the 40th International Conference on Machine Learning (ICML’23) year: 2023 ident: e_1_2_1_19_1 – ident: e_1_2_1_8_1 doi: 10.1145/3180155.3180240 – ident: e_1_2_1_15_1 doi: 10.1109/CVPR52729.2023.01765 – ident: e_1_2_1_9_1 doi: 10.1007/s00530-021-00804-7 – ident: e_1_2_1_20_1 – ident: e_1_2_1_29_1 – ident: e_1_2_1_34_1 – ident: e_1_2_1_18_1 – ident: e_1_2_1_23_1 doi: 10.1109/TSE.2018.2844788 – ident: e_1_2_1_31_1 – ident: e_1_2_1_36_1 doi: 10.1115/DETC2024-143139 – volume-title: Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22) year: 2024 ident: e_1_2_1_1_1 – ident: e_1_2_1_4_1 – ident: e_1_2_1_7_1 doi: 10.1145/3220134.3220135 – ident: e_1_2_1_16_1 doi: 10.1609/aaai.v36i1.19994 – ident: e_1_2_1_38_1 doi: 10.1145/3368089.3417940 – ident: e_1_2_1_14_1 – ident: e_1_2_1_43_1 doi: 10.1155/2022/4415479 – volume-title: International conference on machine learning. 8748–8763 year: 2021 ident: e_1_2_1_27_1 – ident: e_1_2_1_25_1 – ident: e_1_2_1_26_1 doi: 10.3115/1073083.1073135 – ident: e_1_2_1_17_1 doi: 10.1109/ICCV51070.2023.00371 – ident: e_1_2_1_37_1 doi: 10.1016/j.displa.2024.102679 – volume-title: 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT). 1–4. year: 2019 ident: e_1_2_1_3_1 – volume-title: PubLayNet: Largest Dataset Ever for Document Layout Analysis. 2019 International Conference on Document Analysis and Recognition (ICDAR), 1015–1022 year: 2019 ident: e_1_2_1_46_1 – ident: e_1_2_1_2_1 – volume-title: A density-based algorithm for discovering clusters in large spatial databases with noise. KDD’96 ident: e_1_2_1_12_1 – volume-title: DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. arxiv:2203.03605. arxiv:2203.03605 year: 2022 ident: e_1_2_1_41_1 – ident: e_1_2_1_5_1 – ident: e_1_2_1_47_1 – ident: e_1_2_1_30_1 – ident: e_1_2_1_11_1 doi: 10.1145/3126594.3126651 – volume-title: Ronan Le Bras, and Yejin Choi year: 2022 ident: e_1_2_1_13_1 – ident: e_1_2_1_10_1 doi: 10.1007/s11042-023-15108-3 – volume-title: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 248–259 year: 2015 ident: e_1_2_1_24_1 – ident: e_1_2_1_28_1 – volume: 275 start-page: 1 year: 2021 ident: e_1_2_1_44_1 article-title: Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels.. In CHI, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.) publication-title: ACM – volume-title: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 7192–7202 ident: e_1_2_1_42_1
SSID	ssj0002991170
Score	2.2952707
Snippet	Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code...
SourceID	crossref acm
SourceType	Index Database Publisher
StartPage	1123
SubjectTerms	Automatic programming Human-centered computing Software and its engineering User interface programming
SubjectTermsDisplay	Human-centered computing -- User interface programming Software and its engineering -- Automatic programming
Title	MLLM-Based UI2Code Automation Guided by UI Layout Information
URI	https://dl.acm.org/doi/10.1145/3728925
Volume	2
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2994-970X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002991170 issn: 2994-970X databaseCode: M~E dateStart: 20240101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV07b9swECaatEOWNI8GSZsEHLoZQi0-THF0jbQ1YAcB7ADeAlKkEA-Vndhq46W_vUeRloggQzN0EQRSFKT7Tse70z0Q-pwSAyaadH6NLkuYIDqRmqqkpy3XNtWw55i62YS4vs5mM3kTgthXdTsBUZbZ05Nc_leoYQzAdqmzr4C7uSkMwDmADkeAHY7_BPx4NBonX2FzMp3bIRksjO30q_XCpygCS8yNVzpvh52R2riw5JCS1EAUdNWbZm9bbSMJ-oOx-7uwAtn924WM2baaYSPdq1ofbpnuu_Le2Gq-qdr4nzqIYHJfPcRLvQiczctkcG_DePBIEO4ipwiJeChEuddSjLjaw1J0Z7HIJRFnDSeTaT8SoaAA0mg7BnuNvyzqmauKQQVYjD51-lnd7DCzg94SwaUTeOM_rfsNHsv13PFp1O5eX8L1TkvJf0ZaSqRuTA_QfrATcN_je4je2PIIvd_24MBBJB-jCG4c4MYt3NjDjfUGJrGHG0dwf0DTb1fTwY8ktMRIVMZ4QkxGWU6BRIoTa7liNE-7tKCZthm8Uy_XadozUkgjQXVXYHunhWLOHy2tgI_yBO2Wi9KeIpwbKzUQRrGMMylUlvOiyKVNDSihRUHP0BHQ4W7pa57cBeqcwcJAl2bKp7bz7SUfX1z4Ce21rHKOdtePlb1A7_Jf6_nq8bLG5i_qZUug
linkProvider	ISSN International Centre
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MLLM-Based+UI2Code+Automation+Guided+by+UI+Layout+Information&rft.jtitle=Proceedings+of+the+ACM+on+software+engineering&rft.au=Wu%2C+Fan&rft.au=Gao%2C+Cuiyun&rft.au=Li%2C+Shuqing&rft.au=Wen%2C+Xin-Cheng&rft.date=2025-06-22&rft.pub=ACM&rft.eissn=2994-970X&rft.volume=2&rft.issue=ISSTA&rft.spage=1123&rft.epage=1145&rft_id=info:doi/10.1145%2F3728925&rft.externalDocID=3728925
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2994-970X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2994-970X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2994-970X&client=summon