Automated Testing for Service-Oriented Architecture: Leveraging Large Language Models for Enhanced Service Composition

This article explores the application of Large Language Models (LLMs), including proprietary models such as OpenAI's ChatGPT 4o and ChatGPT 4o-mini, Anthropic's Claude 3.5 Sonnet and Claude 3.7 Sonnet, and Google's Gemini 1.5 Pro, Gemini 2.0 Flash, and Gemini 2.0 Flash-Lite, as well a...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:IEEE access Ročník 13; s. 89627 - 89640
Hlavní autori: Altin, Mahsun, Mutlu, Behcet, Kilinc, Deniz, Cakir, Altan
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Predmet:
ISSN:2169-3536, 2169-3536
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:This article explores the application of Large Language Models (LLMs), including proprietary models such as OpenAI's ChatGPT 4o and ChatGPT 4o-mini, Anthropic's Claude 3.5 Sonnet and Claude 3.7 Sonnet, and Google's Gemini 1.5 Pro, Gemini 2.0 Flash, and Gemini 2.0 Flash-Lite, as well as open-source alternatives including Qwen2.5-14B-Instruct-1M, and commercially accessed models such as DeepSeek R1 and DeepSeek V3, which were tested via APIs despite having open-source variants, to automate validation and verification in Application Programming Interface (API) testing within a Service-Oriented Architecture (SOA). Our system compares internal responses from the Enuygun Web Server against third-party API outputs in both JSON and XML formats, validating critical parameters such as flight prices, baggage allowances, and seat availability. We generated 100 diverse test scenarios across varying complexities (1-4 flight results) by randomly altering request and response parameters. Experimental results show that Google Gemini 2.0 Flash achieved high accuracy (up to 99.98%) with the lowest completion time (85.34 seconds), while Qwen2.5-14B-Instruct-1M exhibited limited capability in processing complex formats. Models such as OpenAI's ChatGPT and Anthropic's Claude Sonnet models also demonstrated strong performance in single-flight validation scenarios, making them suitable for low-latency, high-precision tasks. Our findings indicate that some open-source models can offer promising cost-effective alternatives, though performance significantly varies. This integration of LLMs reduced manual workload, improved test scalability, and enabled real-time validation across large-scale datasets. As LLM technologies mature, we anticipate further advances in automation, accuracy, and efficiency in software validation systems.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2025.3571994