MLLM-Based UI2Code Automation Guided by UI Layout Information
Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development efficiency. There exist deep learning-based methods for the t...
Saved in:
| Published in: | Proceedings of the ACM on software engineering Vol. 2; no. ISSTA; pp. 1123 - 1145 |
|---|---|
| Main Authors: | , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
New York, NY, USA
ACM
22.06.2025
|
| Subjects: | |
| ISSN: | 2994-970X, 2994-970X |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development efficiency. There exist deep learning-based methods for the task; however, they heavily rely on a large amount of labeled training data and struggle with generalizing to real-world, unseen web page designs. The advent of Multimodal Large Language Models (MLLMs) presents potential for alleviating the issue, but they are difficult to comprehend the complex layouts in UIs and generate the accurate code with layout preserved. To address these issues, we propose LayoutCoder, a novel MLLM-based framework generating UI code from real-world webpage images, which includes three key modules: (1) Element Relation Construction, which aims at capturing UI layout by identifying and grouping components with similar structures; (2) UI Layout Parsing, which aims at generating UI layout trees for guiding the subsequent code generation process; and (3) Layout-Guided Code Fusion, which aims at producing the accurate code with layout preserved. For evaluation, we build a new benchmark dataset which involves 350 real-world websites named Snap2Code, divided into seen and unseen parts for mitigating the data leakage issue, besides the popular dataset Design2Code. Extensive evaluation shows the superior performance of LayoutCoder over the state-of-the-art approaches. Compared with the best-performing baseline, LayoutCoder improves 10.14% in the BLEU score and 3.95% in the CLIP score on average across all datasets. |
|---|---|
| AbstractList | Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code is essential to streamline this task, beneficial for improving the development efficiency. There exist deep learning-based methods for the task; however, they heavily rely on a large amount of labeled training data and struggle with generalizing to real-world, unseen web page designs. The advent of Multimodal Large Language Models (MLLMs) presents potential for alleviating the issue, but they are difficult to comprehend the complex layouts in UIs and generate the accurate code with layout preserved. To address these issues, we propose LayoutCoder, a novel MLLM-based framework generating UI code from real-world webpage images, which includes three key modules: (1) Element Relation Construction, which aims at capturing UI layout by identifying and grouping components with similar structures; (2) UI Layout Parsing, which aims at generating UI layout trees for guiding the subsequent code generation process; and (3) Layout-Guided Code Fusion, which aims at producing the accurate code with layout preserved. For evaluation, we build a new benchmark dataset which involves 350 real-world websites named Snap2Code, divided into seen and unseen parts for mitigating the data leakage issue, besides the popular dataset Design2Code. Extensive evaluation shows the superior performance of LayoutCoder over the state-of-the-art approaches. Compared with the best-performing baseline, LayoutCoder improves 10.14% in the BLEU score and 3.95% in the CLIP score on average across all datasets. |
| ArticleNumber | ISSTA050 |
| Author | Li, Shuqing Wen, Xin-Cheng Wu, Fan Liao, Qing Gao, Cuiyun |
| Author_xml | – sequence: 1 givenname: Fan orcidid: 0009-0002-9090-1832 surname: Wu fullname: Wu, Fan email: codenobuge@163.com organization: Harbin Institute of Technology, Shenzhen, Shenzhen, China – sequence: 2 givenname: Cuiyun orcidid: 0000-0003-4774-2434 surname: Gao fullname: Gao, Cuiyun email: gaocuiyun@hit.edu.cn organization: Harbin Institute of Technology, Shenzhen, Shenzhen, China – sequence: 3 givenname: Shuqing orcidid: 0000-0001-6323-1402 surname: Li fullname: Li, Shuqing email: sqli21@cse.cuhk.edu.hk organization: Chinese University of Hong Kong, Hong Kong, China – sequence: 4 givenname: Xin-Cheng orcidid: 0000-0002-2115-9921 surname: Wen fullname: Wen, Xin-Cheng email: xiamenwxc@foxmail.com organization: Harbin Institute of Technology, Shenzhen, Shenzhen, China – sequence: 5 givenname: Qing orcidid: 0000-0003-1012-5301 surname: Liao fullname: Liao, Qing email: liaoqing@hit.edu.cn organization: Harbin Institute of Technology, Shenzhen, Shenzhen, China |
| BookMark | eNpNjzFPwzAQRi1UJEpbsTN5Ywr4bKe2B4YSQYmUqkuR2KJL7EhBJEZxMuTfE5SCmO7TvafTfddk0frWEXID7B5Axg9CcW14fEGW3BgZGcXeF__yFdmE8MEYmzYAii3J4yHLDtETBmfpW8oTbx3dDb1vsK99S_dDbSdSjBOkGY5-6GnaVr6b-ZpcVvgZ3OY8V-T08nxKXqPsuE-TXRahlnHErRayFAAcY-5cjFKUwEQldOH09Mq2LAC21ihjDRiOUmqoUHLGlHHKFWJF7uazZedD6FyVf3V1g92YA8t_eufn3pN5O5tYNn_SL_wGwYVRKw |
| Cites_doi | 10.1145/3540250.3549138 10.1007/978-3-030-01246-5_41 10.1145/3289600.3290610 10.1145/3472749.3474763 10.1145/3180155.3180240 10.1109/CVPR52729.2023.01765 10.1007/s00530-021-00804-7 10.1109/TSE.2018.2844788 10.1115/DETC2024-143139 10.1145/3220134.3220135 10.1609/aaai.v36i1.19994 10.1145/3368089.3417940 10.1155/2022/4415479 10.3115/1073083.1073135 10.1109/ICCV51070.2023.00371 10.1016/j.displa.2024.102679 10.1145/3126594.3126651 10.1007/s11042-023-15108-3 |
| ContentType | Journal Article |
| Copyright | Owner/Author |
| Copyright_xml | – notice: Owner/Author |
| DBID | AAYXX CITATION |
| DOI | 10.1145/3728925 |
| DatabaseName | CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2994-970X |
| EndPage | 1145 |
| ExternalDocumentID | 10_1145_3728925 3728925 |
| GrantInformation_xml | – fundername: Shenzhen Basic Research grantid: No. JCYJ20220531095214031 – fundername: CCF-Huawei Populus Grove Fund – fundername: National Key R&D Program of China grantid: No. 2022YFB3103900 – fundername: Shenzhen-Hong Kong Jointly Funded Project grantid: No. SGDX20230116091246007 – fundername: National Natural Science Foundation of China under project grantid: No. 62472126 – fundername: Natural Science Foundation of Guangdong Province grantid: No. 2023A1515011959 funderid: https://doi.org/10.13039/501100003453 – fundername: Shenzhen International Science and Technology Cooperation Project grantid: No. GJHZ20220913143008015 |
| GroupedDBID | AAKMM ACM AEJOY AKRVB ALMA_UNASSIGNED_HOLDINGS LHSKQ M~E AAYXX CITATION |
| ID | FETCH-LOGICAL-a845-2d834c3112a52ee5a43c103f38be80296cb116d979d9192a4481fa420079e7eb3 |
| ISSN | 2994-970X |
| IngestDate | Sat Nov 29 07:43:49 EST 2025 Mon Jul 14 20:48:59 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | ISSTA |
| Keywords | Web App Development Automated Software Engineering Multimodal Large Language Models |
| Language | English |
| License | This work is licensed under Creative Commons Attribution International 4.0. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-a845-2d834c3112a52ee5a43c103f38be80296cb116d979d9192a4481fa420079e7eb3 |
| ORCID | 0000-0002-2115-9921 0000-0001-6323-1402 0009-0002-9090-1832 0000-0003-4774-2434 0000-0003-1012-5301 |
| OpenAccessLink | https://dl.acm.org/doi/10.1145/3728925 |
| PageCount | 23 |
| ParticipantIDs | crossref_primary_10_1145_3728925 acm_primary_3728925 |
| PublicationCentury | 2000 |
| PublicationDate | 20250622 2025-06-22 |
| PublicationDateYYYYMMDD | 2025-06-22 |
| PublicationDate_xml | – month: 06 year: 2025 text: 20250622 day: 22 |
| PublicationDecade | 2020 |
| PublicationPlace | New York, NY, USA |
| PublicationPlace_xml | – name: New York, NY, USA |
| PublicationTitle | Proceedings of the ACM on software engineering |
| PublicationTitleAbbrev | ACM PACMSE |
| PublicationYear | 2025 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| References | Zhaoyun Jiang, Jiaqi Guo, Shizhao Sun, Huayu Deng, Zhongkai Wu, Vuksan Mijovic, Zijiang James Yang, Jian-Guang Lou, and Dongmei Zhang. 2023. LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA. 18403–18412. https://doi.org/10.1109/CVPR52729.2023.01765 10.1109/CVPR52729.2023.01765 André Armstrong Janino Cizotto, Rodrigo Clemente Thom de Souza, Viviana Cocco Mariani, and Leandro dos Santos Coelho. 2023. Web pages from mockup design based on convolutional neural network and class activation mapping. 82, 25 (2023), March, 38771–38797. issn:1380-7501 https://doi.org/10.1007/s11042-023-15108-3 10.1007/s11042-023-15108-3 Chunyang Chen, Ting Su, Guozhu Meng, Zhenchang Xing, and Yang Liu. 2018. From UI design image to GUI skeleton: a neural machine translator to bootstrap mobile GUI implementation. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). Association for Computing Machinery, New York, NY, USA. 665–676. isbn:9781450356381 https://doi.org/10.1145/3180155.3180240 10.1145/3180155.3180240 Zhaoyun Jiang, Shizhao Sun, Jihua Zhu, Jian-Guang Lou, and Dongmei Zhang. 2022. Coarse-to-Fine Generative Modeling for Graphic Layouts. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 1 (2022), Jun., 1096–1103. https://doi.org/10.1609/aaai.v36i1.19994 10.1609/aaai.v36i1.19994 Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung-Yeung Shum. 2022. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. arxiv:2203.03605. arxiv:2203.03605 Chenglei Si, Yanzhe Zhang, Zhengyuan Yang, Ruibo Liu, and Diyi Yang. 2024. Design2Code: How Far Are We From Automating Front-End Engineering? arXiv preprint arXiv:2403.03163. Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD’96. AAAI Press, 226–231. Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything. arxiv:2304.02643. arxiv:2304.02643 Mulong Xie, Zhenchang Xing, Sidong Feng, Xiwei Xu, Liming Zhu, and Chunyang Chen. 2022. Psychologically-inspired, unsupervised inference of perceptual groups of GUI widgets from GUI images. ESEC/FSE 2022. Association for Computing Machinery, New York, NY, USA. 332–343. isbn:9781450394130 https://doi.org/10.1145/3540250.3549138 10.1145/3540250.3549138 Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST ’17). Association for Computing Machinery, New York, NY, USA. 845–854. isbn:9781450349819 https://doi.org/10.1145/3126594.3126651 10.1145/3126594.3126651 Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, and Fangyu Liu. 2023. Pix2Struct: screenshot parsing as pretraining for visual language understanding. In Proceedings of the 40th International Conference on Machine Learning (ICML’23). JMLR.org, Article 780, 20 pages. Jean-Baptiste Alayrac, Jeff Donahue, and Pauline Luc. 2024. Flamingo: a visual language model for few-shot learning. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22). Curran Associates Inc., Red Hook, NY, USA. Article 1723, 21 pages. isbn:9781713871088 Kevin Moran, Carlos Bernal-Cárdenas, Michael Curcio, Richard Bonett, and Denys Poshyvanyk. 2018. Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps. IEEE Transactions on Software Engineering, 46 (2018), 196–221. https://api.semanticscholar.org/CorpusID:3629347 OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, and Lama Ahmad. 2024. GPT-4 Technical Report. arxiv:2303.08774. arxiv:2303.08774 Tuan Anh Nguyen and Christoph Csallner. 2015. Reverse Engineering Mobile Application User Interfaces with REMAUI (T). 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 248–259. https://api.semanticscholar.org/CorpusID:7499368 Daniel Baulé, Christiane Gresse von Wangenheim, Aldo von Wangenheim, Jean C. R. Hauck, and Edson C. Vargas Júnior. 2021. Automatic code generation from sketches of mobile applications in end-user development using Deep Learning. arxiv:2103.05704. arxiv:2103.05704 Vanita Jain, Piyush Agrawal, Subham Banga, Rishabh Kapoor, and Shashwat Gulyani. 2019. Sketch2Code: Transformation of Sketches to UI in Real-time Using Deep Neural Network. arxiv:1910.08930. arxiv:1910.08930 Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, and Xixuan Song. 2023. Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079. Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arxiv:2301.12597. arxiv:2301.12597 Tony Beltramelli. 2018. pix2code: Generating Code from a Graphical User Interface Screenshot. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS ’18). Association for Computing Machinery, New York, NY, USA. Article 3, 6 pages. isbn:9781450358972 https://doi.org/10.1145/3220134.3220135 10.1145/3220134.3220135 Shuyu Zheng, Ziniu Hu, and Yun Ma. 2019. FaceOff: Assisting the Manifestation Design of Web Graphical User Interface. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19). Association for Computing Machinery, New York, NY, USA. 774–777. isbn:9781450359405 https://doi.org/10.1145/3289600.3290610 10.1145/3289600.3290610 Anthropic. 2024. Claude 3.5. https://www.anthropic.com Accessed: 2024-10-30 Xu Zhong, Jianbin Tang, and Antonio Jimeno-Yepes. 2019. PubLayNet: Largest Dataset Ever for Document Layout Analysis. 2019 International Conference on Document Analysis and Recognition (ICDAR), 1015–1022. https://api.semanticscholar.org/CorpusID:201124789 Jason Wu, Xiaoyi Zhang, Jeff Nichols, and Jeffrey P Bigham. 2021. Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots. In The 34th Annual ACM Symposium on User Interface Software and Technology (UIST ’21). Association for Computing Machinery, New York, NY, USA. 470–483. isbn:9781450386357 https://doi.org/10.1145/3472749.3474763 10.1145/3472749.3474763 Shuhong Xiao, Yunnong Chen, Jiazhi Li, Liuqing Chen, Lingyun Sun, and Tingting Zhou. 2024. Prototype2Code: End-to-end Front-end Code Generation from UI Design Prototypes. arxiv:2405.04975. arxiv:2405.04975 Jianan Li, Jimei Yang, Aaron Hertzmann, Jianming Zhang, and Tingfa Xu. 2019. LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators. arxiv:1901.06767. arxiv:1901.06767 Fan Wu. 2025. MLLM-Based UI2Code Automation Guided by UI Layout Information. https://github.com/ay7u1009/LayoutCoder/ Accessed: 2025-04-05 Microsoft Azure. 2018. Turn your whiteboard sketches to working code in seconds with sketch2code. https://azure.microsoft.com/en-us/blog/turn-your-whiteboard-sketches-to-working-code-in-seconds-with-sketch2code/ Accessed: 2024-10-30 Wei Zhang, Shangmin Luan, Liqin Tian, and Nima Jafari Navimipour. 2022. A Rapid Combined Model for Automatic Generating Web UI Codes. Wirel. Commun. Mob. Comput., 2022 (2022), Jan., 10 pages. issn:1530-8669 https://doi.org/10.1155/2022/4415479 10.1155/2022/4415479 Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, and Jack Clark. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. 8748–8763. Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph R-CNN for Scene Graph Generation. In Proceedings of the European Conference on Computer Vision (ECCV). Junyi Zhang, Jiaqi Guo, Shizhao Sun, Jian-Guang Lou, and D. Zhang. 2023. LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 7192–7202. https://api.semanticscholar.org/CorpusID:257636725 Xiaoyi Zhang, Lilian de Greef, and Amanda Swearngin. 2021. Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels.. In CHI, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 275:1–275:15. isbn:978-1-4503-8096-6 http://dblp.uni-trier.de/db/conf/chi/chi2021.html##ZhangGSWMYSNWFE21 Mulong Xie, Sidong Feng, Zhenchang Xing, Jieshan Chen, and Chunyang Chen. 2020. UIED: a hybrid tool for GUI element detection. ESEC/FSE 2020. Association for Computing Machinery, New York, NY, USA. 1655–1659. isbn:9781450370431 https://doi.org/10.1145/3368089.3417940 10.1145/3368089.3417940 Shuhong Xiao, Yunnong Chen, Yaxuan Song, Liuqing Chen, Lingyun Sun, Yankun Zhen, and Yanfang Chang. 2024. UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface. arxiv:2403.04984. arxiv:2403.04984 Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL ’02). Association for Computational Linguistics, USA. 311–318. https://doi.org/10.3115/1073083.1073135 10.3115/1073083.1073135 Ting Zhou, Yanjie Zhao, Xinyi Hou, Xiaoyu Sun, Kai Chen, and Haoyu Wang. 2024. Bridging Design and Development with Automated Declarative UI Code Generation. arxiv:2409.11667. arxiv:2409.11667 Hugo Laurençon, Léo Tronchon, and Victor Sanh. 202 Alayrac Jean-Baptiste (e_1_2_1_1_1) 2024 Aşıroğlu Batuhan (e_1_2_1_3_1) 2019 Wan Yuxuan (e_1_2_1_32_1) 2024 e_1_2_1_20_1 Nguyen Tuan Anh (e_1_2_1_24_1) 2015 e_1_2_1_40_1 e_1_2_1_23_1 e_1_2_1_45_1 e_1_2_1_21_1 e_1_2_1_43_1 e_1_2_1_28_1 e_1_2_1_25_1 e_1_2_1_26_1 e_1_2_1_47_1 e_1_2_1_29_1 Lee Kenton (e_1_2_1_19_1) 2023 Radford Alec (e_1_2_1_27_1) 2021 Baulé Daniel (e_1_2_1_6_1) 2021 Wang Weihan (e_1_2_1_33_1) 2023 Ester Martin (e_1_2_1_12_1) Zhang Hao (e_1_2_1_41_1) 2022 e_1_2_1_7_1 e_1_2_1_31_1 e_1_2_1_8_1 e_1_2_1_30_1 e_1_2_1_5_1 e_1_2_1_35_1 Zhang Xiaoyi (e_1_2_1_44_1) 2021; 275 e_1_2_1_4_1 e_1_2_1_34_1 e_1_2_1_10_1 e_1_2_1_2_1 e_1_2_1_11_1 e_1_2_1_16_1 Liu Haotian (e_1_2_1_22_1) 2023 e_1_2_1_39_1 Zhang Junyi (e_1_2_1_42_1) Hessel Jack (e_1_2_1_13_1) 2022 e_1_2_1_17_1 e_1_2_1_38_1 e_1_2_1_14_1 e_1_2_1_37_1 e_1_2_1_15_1 e_1_2_1_36_1 e_1_2_1_9_1 e_1_2_1_18_1 Zhong Xu (e_1_2_1_46_1) 2019 |
| References_xml | – reference: Vanita Jain, Piyush Agrawal, Subham Banga, Rishabh Kapoor, and Shashwat Gulyani. 2019. Sketch2Code: Transformation of Sketches to UI in Real-time Using Deep Neural Network. arxiv:1910.08930. arxiv:1910.08930 – reference: Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, and Xixuan Song. 2023. Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079. – reference: Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. 2023. Visual Instruction Tuning. ArXiv, abs/2304.08485 (2023), https://api.semanticscholar.org/CorpusID:258179774 – reference: Jean-Baptiste Alayrac, Jeff Donahue, and Pauline Luc. 2024. Flamingo: a visual language model for few-shot learning. In Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22). Curran Associates Inc., Red Hook, NY, USA. Article 1723, 21 pages. isbn:9781713871088 – reference: Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL ’02). Association for Computational Linguistics, USA. 311–318. https://doi.org/10.3115/1073083.1073135 10.3115/1073083.1073135 – reference: Andy Rutledge. 2009. Gestalt Principles - 3: Proximity, Uniform Connectedness, and Good Continuation. https://andyrutledge.com/gestalt-principles-3.html Accessed: 2024-10-30 – reference: Chunyang Chen, Ting Su, Guozhu Meng, Zhenchang Xing, and Yang Liu. 2018. From UI design image to GUI skeleton: a neural machine translator to bootstrap mobile GUI implementation. In Proceedings of the 40th International Conference on Software Engineering (ICSE ’18). Association for Computing Machinery, New York, NY, USA. 665–676. isbn:9781450356381 https://doi.org/10.1145/3180155.3180240 10.1145/3180155.3180240 – reference: Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. 2022. CLIPScore: A Reference-free Evaluation Metric for Image Captioning. arxiv:2104.08718. arxiv:2104.08718 – reference: Alex Robinson. 2019. Sketch2code: Generating a website from a paper mockup. arxiv:1905.13750. arxiv:1905.13750 – reference: Xiaoyi Zhang, Lilian de Greef, and Amanda Swearngin. 2021. Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels.. In CHI, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.). ACM, 275:1–275:15. isbn:978-1-4503-8096-6 http://dblp.uni-trier.de/db/conf/chi/chi2021.html##ZhangGSWMYSNWFE21 – reference: Shuhong Xiao, Yunnong Chen, Yaxuan Song, Liuqing Chen, Lingyun Sun, Yankun Zhen, and Yanfang Chang. 2024. UI Semantic Group Detection: Grouping UI Elements with Similar Semantics in Mobile Graphical User Interface. arxiv:2403.04984. arxiv:2403.04984 – reference: Kevin Moran, Carlos Bernal-Cárdenas, Michael Curcio, Richard Bonett, and Denys Poshyvanyk. 2018. Machine Learning-Based Prototyping of Graphical User Interfaces for Mobile Apps. IEEE Transactions on Software Engineering, 46 (2018), 196–221. https://api.semanticscholar.org/CorpusID:3629347 – reference: Anthropic. 2024. Claude 3.5. https://www.anthropic.com Accessed: 2024-10-30 – reference: Jianan Li, Jimei Yang, Aaron Hertzmann, Jianming Zhang, and Tingfa Xu. 2019. LayoutGAN: Generating Graphic Layouts with Wireframe Discriminators. arxiv:1901.06767. arxiv:1901.06767 – reference: Tuan Anh Nguyen and Christoph Csallner. 2015. Reverse Engineering Mobile Application User Interfaces with REMAUI (T). 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 248–259. https://api.semanticscholar.org/CorpusID:7499368 – reference: Chenglei Si, Yanzhe Zhang, Zhengyuan Yang, Ruibo Liu, and Diyi Yang. 2024. Design2Code: How Far Are We From Automating Front-End Engineering? arXiv preprint arXiv:2403.03163. – reference: Junyi Zhang, Jiaqi Guo, Shizhao Sun, Jian-Guang Lou, and D. Zhang. 2023. LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models. 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 7192–7202. https://api.semanticscholar.org/CorpusID:257636725 – reference: Microsoft Azure. 2018. Turn your whiteboard sketches to working code in seconds with sketch2code. https://azure.microsoft.com/en-us/blog/turn-your-whiteboard-sketches-to-working-code-in-seconds-with-sketch2code/ Accessed: 2024-10-30 – reference: Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond. arxiv:2308.12966. arxiv:2308.12966 – reference: Wei Zhang, Shangmin Luan, Liqin Tian, and Nima Jafari Navimipour. 2022. A Rapid Combined Model for Automatic Generating Web UI Codes. Wirel. Commun. Mob. Comput., 2022 (2022), Jan., 10 pages. issn:1530-8669 https://doi.org/10.1155/2022/4415479 10.1155/2022/4415479 – reference: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, and Lama Ahmad. 2024. GPT-4 Technical Report. arxiv:2303.08774. arxiv:2303.08774 – reference: Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. KDD’96. AAAI Press, 226–231. – reference: Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. 2023. Segment Anything. arxiv:2304.02643. arxiv:2304.02643 – reference: André Armstrong Janino Cizotto, Rodrigo Clemente Thom de Souza, Viviana Cocco Mariani, and Leandro dos Santos Coelho. 2023. Web pages from mockup design based on convolutional neural network and class activation mapping. 82, 25 (2023), March, 38771–38797. issn:1380-7501 https://doi.org/10.1007/s11042-023-15108-3 10.1007/s11042-023-15108-3 – reference: Zhaoyun Jiang, Jiaqi Guo, Shizhao Sun, Huayu Deng, Zhongkai Wu, Vuksan Mijovic, Zijiang James Yang, Jian-Guang Lou, and Dongmei Zhang. 2023. LayoutFormer++: Conditional Graphic Layout Generation via Constraint Serialization and Decoding Space Restriction. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA. 18403–18412. https://doi.org/10.1109/CVPR52729.2023.01765 10.1109/CVPR52729.2023.01765 – reference: Hugo Laurençon, Léo Tronchon, and Victor Sanh. 2024. Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset. arxiv:2403.09029. arxiv:2403.09029 – reference: Kenton Lee, Mandar Joshi, Iulia Turc, Hexiang Hu, and Fangyu Liu. 2023. Pix2Struct: screenshot parsing as pretraining for visual language understanding. In Proceedings of the 40th International Conference on Machine Learning (ICML’23). JMLR.org, Article 780, 20 pages. – reference: Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, and Jack Clark. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. 8748–8763. – reference: Digital Silk. 2024. How Many Websites Are There In 2024? https://www.digitalsilk.com/digital-trends/how-many-websites-are-there/ Accessed: 2024-10-31 – reference: Jianwei Yang, Jiasen Lu, Stefan Lee, Dhruv Batra, and Devi Parikh. 2018. Graph R-CNN for Scene Graph Generation. In Proceedings of the European Conference on Computer Vision (ECCV). – reference: Fan Wu. 2025. MLLM-Based UI2Code Automation Guided by UI Layout Information. https://github.com/ay7u1009/LayoutCoder/ Accessed: 2025-04-05 – reference: Shuhong Xiao, Yunnong Chen, Jiazhi Li, Liuqing Chen, Lingyun Sun, and Tingting Zhou. 2024. Prototype2Code: End-to-end Front-end Code Generation from UI Design Prototypes. arxiv:2405.04975. arxiv:2405.04975 – reference: Mulong Xie, Sidong Feng, Zhenchang Xing, Jieshan Chen, and Chunyang Chen. 2020. UIED: a hybrid tool for GUI element detection. ESEC/FSE 2020. Association for Computing Machinery, New York, NY, USA. 1655–1659. isbn:9781450370431 https://doi.org/10.1145/3368089.3417940 10.1145/3368089.3417940 – reference: Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. 2017. Rico: A Mobile App Dataset for Building Data-Driven Design Applications. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology (UIST ’17). Association for Computing Machinery, New York, NY, USA. 845–854. isbn:9781450349819 https://doi.org/10.1145/3126594.3126651 10.1145/3126594.3126651 – reference: Jason Wu, Xiaoyi Zhang, Jeff Nichols, and Jeffrey P Bigham. 2021. Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots. In The 34th Annual ACM Symposium on User Interface Software and Technology (UIST ’21). Association for Computing Machinery, New York, NY, USA. 470–483. isbn:9781450386357 https://doi.org/10.1145/3472749.3474763 10.1145/3472749.3474763 – reference: Shuyu Zheng, Ziniu Hu, and Yun Ma. 2019. FaceOff: Assisting the Manifestation Design of Web Graphical User Interface. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining (WSDM ’19). Association for Computing Machinery, New York, NY, USA. 774–777. isbn:9781450359405 https://doi.org/10.1145/3289600.3290610 10.1145/3289600.3290610 – reference: Xu Zhong, Jianbin Tang, and Antonio Jimeno-Yepes. 2019. PubLayNet: Largest Dataset Ever for Document Layout Analysis. 2019 International Conference on Document Analysis and Recognition (ICDAR), 1015–1022. https://api.semanticscholar.org/CorpusID:201124789 – reference: Daniel Baulé, Christiane Gresse von Wangenheim, Aldo von Wangenheim, Jean C. R. Hauck, and Edson C. Vargas Júnior. 2021. Automatic code generation from sketches of mobile applications in end-user development using Deep Learning. arxiv:2103.05704. arxiv:2103.05704 – reference: Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M. Ni, and Heung-Yeung Shum. 2022. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. arxiv:2203.03605. arxiv:2203.03605 – reference: Batuhan Aşıroğlu, Büşta Rümeysa Mete, Eyyüp Yıldız, Yağız Nalçakan, Alper Sezen, Mustafa Dağtekin, and Tolga Ensari. 2019. Automatic HTML code generation from mock-up images using machine learning techniques. In 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT). 1–4. – reference: Wen-Yin Chen, Pavol Podstreleny, Wen-Huang Cheng, Yung-Yao Chen, and Kai-Lung Hua. 2022. Code generation from a graphical user interface via attention-based encoder–decoder model. Multimedia Systems, 28, 1 (2022), 121–130. – reference: Mulong Xie, Zhenchang Xing, Sidong Feng, Xiwei Xu, Liming Zhu, and Chunyang Chen. 2022. Psychologically-inspired, unsupervised inference of perceptual groups of GUI widgets from GUI images. ESEC/FSE 2022. Association for Computing Machinery, New York, NY, USA. 332–343. isbn:9781450394130 https://doi.org/10.1145/3540250.3549138 10.1145/3540250.3549138 – reference: Ting Zhou, Yanjie Zhao, Xinyi Hou, Xiaoyu Sun, Kai Chen, and Haoyu Wang. 2024. Bridging Design and Development with Automated Declarative UI Code Generation. arxiv:2409.11667. arxiv:2409.11667 – reference: Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. arxiv:2301.12597. arxiv:2301.12597 – reference: Zhaoyun Jiang, Shizhao Sun, Jihua Zhu, Jian-Guang Lou, and Dongmei Zhang. 2022. Coarse-to-Fine Generative Modeling for Graphic Layouts. Proceedings of the AAAI Conference on Artificial Intelligence, 36, 1 (2022), Jun., 1096–1103. https://doi.org/10.1609/aaai.v36i1.19994 10.1609/aaai.v36i1.19994 – reference: Yuxuan Wan, Chaozheng Wang, Yi Dong, Wenxuan Wang, Shuqing Li, Yintong Huo, and Michael R. Lyu. 2024. Automatically Generating UI Code from Screenshot: A Divide-and-Conquer-Based Approach. arxiv:2406.16386. arxiv:2406.16386 – reference: Tony Beltramelli. 2018. pix2code: Generating Code from a Graphical User Interface Screenshot. In Proceedings of the ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS ’18). Association for Computing Machinery, New York, NY, USA. Article 3, 6 pages. isbn:9781450358972 https://doi.org/10.1145/3220134.3220135 10.1145/3220134.3220135 – volume-title: Lyu year: 2024 ident: e_1_2_1_32_1 – ident: e_1_2_1_21_1 – volume-title: Aldo von Wangenheim, Jean C. R. Hauck, and Edson C. Vargas Júnior. year: 2021 ident: e_1_2_1_6_1 – ident: e_1_2_1_39_1 doi: 10.1145/3540250.3549138 – volume-title: Cogvlm: Visual expert for pretrained language models. arXiv preprint arXiv:2311.03079. year: 2023 ident: e_1_2_1_33_1 – ident: e_1_2_1_40_1 doi: 10.1007/978-3-030-01246-5_41 – ident: e_1_2_1_45_1 doi: 10.1145/3289600.3290610 – volume-title: Visual Instruction Tuning. ArXiv, abs/2304.08485 year: 2023 ident: e_1_2_1_22_1 – ident: e_1_2_1_35_1 doi: 10.1145/3472749.3474763 – volume-title: Proceedings of the 40th International Conference on Machine Learning (ICML’23) year: 2023 ident: e_1_2_1_19_1 – ident: e_1_2_1_8_1 doi: 10.1145/3180155.3180240 – ident: e_1_2_1_15_1 doi: 10.1109/CVPR52729.2023.01765 – ident: e_1_2_1_9_1 doi: 10.1007/s00530-021-00804-7 – ident: e_1_2_1_20_1 – ident: e_1_2_1_29_1 – ident: e_1_2_1_34_1 – ident: e_1_2_1_18_1 – ident: e_1_2_1_23_1 doi: 10.1109/TSE.2018.2844788 – ident: e_1_2_1_31_1 – ident: e_1_2_1_36_1 doi: 10.1115/DETC2024-143139 – volume-title: Proceedings of the 36th International Conference on Neural Information Processing Systems (NIPS ’22) year: 2024 ident: e_1_2_1_1_1 – ident: e_1_2_1_4_1 – ident: e_1_2_1_7_1 doi: 10.1145/3220134.3220135 – ident: e_1_2_1_16_1 doi: 10.1609/aaai.v36i1.19994 – ident: e_1_2_1_38_1 doi: 10.1145/3368089.3417940 – ident: e_1_2_1_14_1 – ident: e_1_2_1_43_1 doi: 10.1155/2022/4415479 – volume-title: International conference on machine learning. 8748–8763 year: 2021 ident: e_1_2_1_27_1 – ident: e_1_2_1_25_1 – ident: e_1_2_1_26_1 doi: 10.3115/1073083.1073135 – ident: e_1_2_1_17_1 doi: 10.1109/ICCV51070.2023.00371 – ident: e_1_2_1_37_1 doi: 10.1016/j.displa.2024.102679 – volume-title: 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT). 1–4. year: 2019 ident: e_1_2_1_3_1 – volume-title: PubLayNet: Largest Dataset Ever for Document Layout Analysis. 2019 International Conference on Document Analysis and Recognition (ICDAR), 1015–1022 year: 2019 ident: e_1_2_1_46_1 – ident: e_1_2_1_2_1 – volume-title: A density-based algorithm for discovering clusters in large spatial databases with noise. KDD’96 ident: e_1_2_1_12_1 – volume-title: DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. arxiv:2203.03605. arxiv:2203.03605 year: 2022 ident: e_1_2_1_41_1 – ident: e_1_2_1_5_1 – ident: e_1_2_1_47_1 – ident: e_1_2_1_30_1 – ident: e_1_2_1_11_1 doi: 10.1145/3126594.3126651 – volume-title: Ronan Le Bras, and Yejin Choi year: 2022 ident: e_1_2_1_13_1 – ident: e_1_2_1_10_1 doi: 10.1007/s11042-023-15108-3 – volume-title: 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE), 248–259 year: 2015 ident: e_1_2_1_24_1 – ident: e_1_2_1_28_1 – volume: 275 start-page: 1 year: 2021 ident: e_1_2_1_44_1 article-title: Screen Recognition: Creating Accessibility Metadata for Mobile Applications from Pixels.. In CHI, Yoshifumi Kitamura, Aaron Quigley, Katherine Isbister, Takeo Igarashi, Pernille Bjørn, and Steven Mark Drucker (Eds.) publication-title: ACM – volume-title: 2023 IEEE/CVF International Conference on Computer Vision (ICCV), 7192–7202 ident: e_1_2_1_42_1 |
| SSID | ssj0002991170 |
| Score | 2.2952707 |
| Snippet | Converting user interfaces into code (UI2Code) is a crucial step in website development, which is time-consuming and labor-intensive. The automation of UI2Code... |
| SourceID | crossref acm |
| SourceType | Index Database Publisher |
| StartPage | 1123 |
| SubjectTerms | Automatic programming Human-centered computing Software and its engineering User interface programming |
| SubjectTermsDisplay | Human-centered computing -- User interface programming Software and its engineering -- Automatic programming |
| Title | MLLM-Based UI2Code Automation Guided by UI Layout Information |
| URI | https://dl.acm.org/doi/10.1145/3728925 |
| Volume | 2 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2994-970X dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002991170 issn: 2994-970X databaseCode: M~E dateStart: 20240101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV07b9swECaatEOWNI8GSZsEHLoZQi0-THF0jbQ1YAcB7ADeAlKkEA-Vndhq46W_vUeRloggQzN0EQRSFKT7Tse70z0Q-pwSAyaadH6NLkuYIDqRmqqkpy3XNtWw55i62YS4vs5mM3kTgthXdTsBUZbZ05Nc_leoYQzAdqmzr4C7uSkMwDmADkeAHY7_BPx4NBonX2FzMp3bIRksjO30q_XCpygCS8yNVzpvh52R2riw5JCS1EAUdNWbZm9bbSMJ-oOx-7uwAtn924WM2baaYSPdq1ofbpnuu_Le2Gq-qdr4nzqIYHJfPcRLvQiczctkcG_DePBIEO4ipwiJeChEuddSjLjaw1J0Z7HIJRFnDSeTaT8SoaAA0mg7BnuNvyzqmauKQQVYjD51-lnd7DCzg94SwaUTeOM_rfsNHsv13PFp1O5eX8L1TkvJf0ZaSqRuTA_QfrATcN_je4je2PIIvd_24MBBJB-jCG4c4MYt3NjDjfUGJrGHG0dwf0DTb1fTwY8ktMRIVMZ4QkxGWU6BRIoTa7liNE-7tKCZthm8Uy_XadozUkgjQXVXYHunhWLOHy2tgI_yBO2Wi9KeIpwbKzUQRrGMMylUlvOiyKVNDSihRUHP0BHQ4W7pa57cBeqcwcJAl2bKp7bz7SUfX1z4Ce21rHKOdtePlb1A7_Jf6_nq8bLG5i_qZUug |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=MLLM-Based+UI2Code+Automation+Guided+by+UI+Layout+Information&rft.jtitle=Proceedings+of+the+ACM+on+software+engineering&rft.au=Wu%2C+Fan&rft.au=Gao%2C+Cuiyun&rft.au=Li%2C+Shuqing&rft.au=Wen%2C+Xin-Cheng&rft.date=2025-06-22&rft.pub=ACM&rft.eissn=2994-970X&rft.volume=2&rft.issue=ISSTA&rft.spage=1123&rft.epage=1145&rft_id=info:doi/10.1145%2F3728925&rft.externalDocID=3728925 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2994-970X&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2994-970X&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2994-970X&client=summon |