Automating A Test Process using LLMs
Automating A Test Process using LLMs
I. I NTRODUCTION
Large Language Models (LLMs) are revolutionizing soft- heterogeneous conditions, it is not immediately apparent how
ware engineering. In the past few years, we have witnessed one can effectively integrate LLMs into a testing process and
the application of LLMs for assisting or automating numer- gain efficiencies. In response to these challenges, we present
ous software engineering tasks like requirements engineering, a case study that (1) focuses upon a real-world test process
software design, coding, and testing [1] [2]. Software testing, in the automotive industry that is largely performed manually,
in particular, is one area where LLMs have been applied and (2) automates it using a recipe that seamlessly combines
with vigor. Facing ever-increasing needs for automation due selective use of LLMs with conventional automation.
to the volume and intensity of work involved, testing is The focus of this case study – our system under test – is
rapidly benefiting from the generative capabilities of LLMs. SPAPI, a web server that is deployed in trucks made by a
As systematically surveyed in [3], LLMs have been applied in leading vehicle manufacturer. SPAPI exposes a set of REST
many testing tasks including system input generation, test case APIs which can be used by clients to read or write selected
generation, test oracle generation, debugging, and program vehicle states. For example, SPAPI exposes /speed that can
repair. be used to read the vehicle speed, and /climate that can be
While a considerable amount of recent literature has focused used to change the cabin climate. Essentially, SPAPI serves as
on applying LLMs in narrowly scoped tasks [4] – such as a gateway between web clients (like apps on a tablet) on one
specific unit tests [5][6], isolated integration tests [7], or side, and in-vehicle control and monitoring applications on
individual verification scenarios [8][9] – few have reported the other side. More importantly for the purposes of this pa-
on their application to automate a complete test process. per, since SPAPI enables crucial customer-facing applications,
Practical testing processes are a diverse mix of steps that are considerable effort is spent in ensuring its quality.
mechanical, creative, and anything in between [10][11]. They Testing SPAPI requires a dedicated team of 2-3 full-time
also involve several (teams of) engineers and tools, whose engineers. As shown in Figure 1 (left), when new APIs
harmonious cooperation is essential to ensure the quality are released, the team first reviews the API specifications.
and cadence of testing. The challenge is only greater when They then (2-3) consult multiple documentation sources to
testing automotive embedded systems, where software coexists understand the associated vehicle states, (4-5) organize this
with mechatronics and other physical systems. Under such information to determine appropriate mocks and test inputs,
and (6-7) write and integrate test cases into a nightly regression Client Server Database
API
suite. Finally, they assess results (8), particularly test failures, (1)
Selected From
Web API in
to identify valid problems. Notably, as highlighted in Figure classic three-tier
architecture Json/XML Requested Data
SPAPI’s engineering spans multiple teams with overlapping re- Gateway ECU
Virtual Vehicle (VV) system
API
sponsibilities. The three core components—the server, vehicle (3)
SPAPI server in a
state system, and mocking system—are developed by separate test rig
Json/XML Test virtual
vehicle’s status
Test API’s responses
teams, while testing falls to a fourth team that must interpret CAN link
(a) API Specification (b) Matching Results (c) Test Cases (d) Test Code
ClimateObject: "ClimateObject": [{ import pytest
type: object "api_property": "acMode", API response import json
description: Manipulate climate "api_property_mappings": { import time
settings on the truck. "can_signal": "ClimateAPIObject":
required: "APIACModeRqst", {
def test_put_climate(spapi_setup_teardown,
- type "vv_state": "type": "Climate",
api_client, vv):
properties: "apiacmode_rqst" "acMode":
response = api_client.put(
"ECONOMY" Jinja
Test rig
acMode : }, url="/api/climate",
type: string "api_value_mappings": [ { }
data=json.dumps({"type": "Climate",
enum: ["STANDARD", "api_value": "ECONOMY", "acMode": "ECONOMY"})
"ECONOMY"] "can_value": "LOW", Virtual vehicle )
autoFanLevel: "vv_state_value": "1"}, "ClimateVVObject":
type: string { { # Check for correct status cod==e
enum: ["LOW", "NORMAL", "api_value": ”STANDARD", "apiacmode_rqst": assert response.status_code 200
"HIGH"] "can_value": ”HIGH", "1"
isAuxiliaryHeaterActivated: "vv_state_value": ”2"},] } # Assert VV attributes to verify correct behavior
type: boolean }] assert vv.climate_control.apiacmode_rqst == 1
Fig. 5. Architecture and workflow of SPAPI-Tester: The pipeline largely preserves the manual process and selectively uses LLMs to automate discrete steps.
TABLE IV
P ERFORMANCE ON DIFFERENT TYPES OF F UZZY M ATCHING ( UPPER PART ) AND I NCONSISTENT U NITS ( LOWER PART ).
TABLE V
T IME TO GENERATE TEST CASES , PER STEP ( SECONDS ). DU IS
1.00
DOCUMENT UNDERSTANDING ; RI IS RETRIEVAL INFORMATION ; TSG IS
0.98 TEST CASE GENERATION ; RUN MEANS RUNNING THE TEST CASES .