SeeAction: Towards Reverse Engineering How-What-Where of HCI Actions from Screencasts for UI Automation

UI automation is an useful technique for UI testing, bug reproduction and robotic process automation. Recording the user actions with an application assists rapid development of UI automation scripts, but existing recording techniques are intrusive, rely on OS or GUI framework accessibility support...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings / International Conference on Software Engineering s. 463 - 475
Hlavní autoři: Zhao, Dehai, Xing, Zhenchang, Lu, Qinghua, Xu, Xiwei, Zhu, Liming
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 26.04.2025
Témata:
ISSN:1558-1225
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:UI automation is an useful technique for UI testing, bug reproduction and robotic process automation. Recording the user actions with an application assists rapid development of UI automation scripts, but existing recording techniques are intrusive, rely on OS or GUI framework accessibility support or assume specific app implementations. Reverse engineering user actions from screencasts is non-intrusive, but a key reverse-engineering step is currently missing - recognize human-understandable structured user actions ([command] [widget] [location]) from action screencasts. To fill the gap, we propose a deep learning based computer vision model which can recognize 11 commands and 11 widgets, and generate location phrases from action screencasts, through joint learning and multi-task learning. We label a large dataset with 7260 video-action pairs, which record the user interactions with Word, Zoom, Firefox, Photoshop and Windows 10 Settings. Through extensive experiments, we confirm the effectiveness and generality of our model, and demonstrate the usefulness of a screencast-to-action-script tool built upon our model for bug reproduction.
ISSN:1558-1225
DOI:10.1109/ICSE55347.2025.00144