Icstw 2018 00074
Icstw 2018 00074
Abstract—Automated test case generation promises to reduce The outcome is considered successful from the viewpoint
the high effort of manually developing and maintaining test cas- of the involved practitioners as well as the participating re-
es, to improve the effectiveness of testing, and to speed-up testing searchers. Automated test case generation has been applied for
cycles. Research on generating test cases has advanced over the testing software components and applications with a graphical
past decades and today a wide range of techniques and tools are user interface (GUI). Therefore test adapters have been intro-
available, including studies showing their successful evaluation in duced. First test adapters were used to bridge the technical gap
real-world scenarios. We conducted a multi-firm research pro- between the automated test case generation tools and the indus-
ject on automated software testing that involved the application try applications under test, but we found the adapters also fos-
of automated test case generation approaches in industry pro-
ter including system-specific and domain-specific knowledge
jects. This paper provides a retrospective on the related activi-
by the engineers in the otherwise system/domain-agnostic test
ties. It reports on our observations and insights from applying
automated test case generation in practice, identifies pitfalls and case generation process. This approach to automated test case
gaps in current research, and summarizes lessons learned from generation has been integrated in the tool infrastructure of the
transferring software testing research results to industry. industry projects and runs as part of the automated build pipe-
line. It is currently extended to other teams of the involved
Keywords— automated software testing, industry application, organizations as well. To support the transfer we distilled a
retrospective, lessons learned. model for introducing and adopting automated test case gen-
eration from our observations and lessons learned.
I. INTRODUCTION Technical results and the evaluation of the approach have
Research on automated test case generation has considera- been published in our previous papers. In [10] we describe the
bly advanced over the last decades. Today a range of methods, challenges involved in testing GUI components and we explore
techniques and tools for generating test cases are available the feasibility of harnessing tools for test case generation to
[1][2][3]. They have been investigated and evaluated in re- address the problem of covering the huge number of possible
search studies. There are also evaluations done in context of interaction scenarios. In [11] we describe the evaluation of the
systems from industry reporting that generated test cases are proposed test adapter approach for GUI testing at the unit, in-
able to find new defects that have not been detected by estab- tegration and system level in different industry projects. In [12]
lished testing techniques (e.g., [4][5][6][7]). These positive we proposed the model supporting the transition from manual
results and the wide availability of tool implementations en- testing to automated test case generation.
courage the adoption of automated test case generation in prac- In this paper we conduct a retrospective on the research and
tice. Nevertheless, researchers are aware that “there still exists technology transfer activities behind our work. The objectives
a big gap between real software application systems and practi- of the paper are as follows.
cal usability of test case generation techniques proposed by
research.” [1] x Describe our observations from applying automated test
case generation for a large GUI-based software system.
In this paper we describe the application of automated test
case generation in the development of industrial software sys- x Provide insights into what worked and what failed in the
tems in a real-world setting. It has been part of a multi-firm complex, real-world setting of our industry partners.
research project on automated software testing with the aim to
transfer results from software testing research to industry. The x Identify gaps in current research on automated test case
research project is based on a well-established, long-term col- generation to foster practical applications.
laboration with companies running joint projects since several x Discuss pitfalls and lessons learned from technology trans-
years ago. The transfer activities are inspired by the model for fer to industry.
technology transfer proposed by Gorschek et al. [8]. The over-
all collaboration follows an evolutionary approach producing
solutions for practical problems in highly-interactive cycles [9].
365
Selecting a candidate solution. First we narrowed down therefore decided to enhance the adapter used for sending
the list of applicable approaches. Automated GUI testing has a events to GUI controls with custom checks in form of Java
long history in practice [14] and gained increasing attention assertions. These checks assert post-conditions and invariants,
also in research [15]. However, in the beginning the focus was i.e., by asserting a state change in response to an interaction.
on testing GUI controls at the level of individual units, so an Since the adapter has no dependencies to Randoop it can be
approach based on unit testing frameworks was suggested ra- used by different tools as well as by any other kind of test au-
ther than the use of tools designed for system-level testing tomation. The approach of extending the adapter also allowed
and/or testing GUI applications. This would also increase the including other system- and domain-specific knowledge. It
acceptance of the solution by the development teams, who allowed, for example, including custom test data to be passed
were already acquainted with unit testing tools. Furthermore, to input parameters and it allowed controlling the generation
we excluded approaches that required an upfront investment in process to produce only relevant interaction sequences.
modelling in order to keep the initial effort low and to reduce
dependencies on key resources from industry partners. Developing a system-specific setup. Moving to higher
levels of testing, in particular to system testing, revealed anoth-
We selected the open source test case generators Randoop er technical hurdle. The class loading concept of the applied
[5], EvoSuite [15], and GRT [17] as candidate tools. We have component framework introduced technical restrictions that
experience with these tools from previous projects and we also prohibited accessing GUI controls over process and technology
have been in contact with the authors. For industry projects, boundaries. Direct access is not only necessary for executing
one of the practical benefits of these tools is that they are able the generated test cases but also at the time of generating tests
to generate tests requiring only Java code as input. The output since Randoop exercises the system under test in the generation
are test cases executable with the JUnit test framework. Due to process. We experimented with different implementations and
technical constraints introduced by the used GUI framework, finally selected a solution where Randoop is wrapped into a
the only tool that could immediately be applied on the code of component executed as part of the tested system itself. This
our industry partners was Randoop. solution also has the benefit that test case generation can be
done from within the developer’s IDE and the tested applica-
Adapting to GUI technology. Randoop is able to generate
tion does not have to be deployed to a separate test environ-
test cases for APIs (interfaces at code level for programming),
ment first.
but it is not applicable to GUIs (interfaces provided for human
users). In order to interact with GUI controls it is necessary to Abandoning system-level testing. Getting Randoop to
send events representing mouse movements or clicks, keyboard work for system testing required considerably more effort than
input and touch interactions via the operating system’s event for the other test levels. The additional effort cannot be at-
processing mechanism. We implemented an adapter that pro- tributed entirely to test case generation. Test automation at the
vides methods (e.g., doubleClick()) for sending corresponding system level is a difficult task in general [18] and automated
events to the GUI controls when called. TestFX, an open test case generation inherits these difficulties. Thus, inde-
source test framework for unit testing in context of JavaFX, pendently from our results, a business decision was made to
was included for that purpose. Hence, the overall technical cancel the team’s efforts invested in system testing and to
solution required a combination of tools working together, i.e., move the related tasks to a separate organizational unit off-site.
Randoop, TestFX and JUnit. Although we were able to provide successful results in apply-
ing test case generation at the level of system testing, the relat-
Exploring the approach on a small set of GUI controls. ed activities were not continued.
We applied Randoop using the adapter on a small set of simple
GUI controls (e.g., PushButton). These examples helped to Integrating test case generation in the build pipeline.
illustrate the underlying concept and to start the discussion Again, extra effort was required for solving technical issues.
with the members of the development teams. In a first evalua- They were stemming from the test framework and the generat-
tion [10] we were already able to show that the proposed ap- ed tests not running in headless mode. Next, organizational
proach is capable of finding bugs in the tested GUI controls questions were brought up. Test case generation can be inte-
and that the effort for implementing the necessary adapter is grated in the build process in two ways: (a) for regression test-
reasonably small, since it mostly contained boilerplate code we ing and (b) for finding new bugs.
were able to generate.
(a) When used for regression testing, the test cases generat-
Including domain-specific knowledge. Randoop comes ed in the previous build are re-executed on the current build to
with a generic oracle in form of built-in contracts that can be identify differences in the behavior of the new version, which
used to reveal basic programming errors. For example, it can indicate potential bugs. This approach is prone to produce
detect if a null-pointer exception is thrown although null was many false positives as most of the changes are usually intend-
never used as parameter value. Thus, Randoop’s ability to find ed. Furthermore, re-executing the old tests requires that the
bugs is limited to this generic oracle. Failures specific to the public interface of the tested components has not been changed
particular tested system cannot be detected out of the box. between the build runs. A prerequisite which could not be
Randoop provides an extension mechanism to add custom guaranteed in our projects as there were no organizational rules
checks. However, a proprietary extension mechanism has sev- on keeping interfaces stable in the development phase.
eral drawbacks. Extensions are tool-dependent and will only
work with Randoop but not with other tools. Furthermore, their (b) As alternative we proposed to generate new test cases in
implementation requires detailed tool-specific knowledge. We each build with the objective to find new bugs based on the
checks included in the adapter implementation. Thereby test
366
cases are re-generated from scratch likewise for new/changed ing JavaFX framework developed by Oracle. The advantage of
as well as old code. A failing generated test case indicates ei- the generation approach is its ability to exercise a large variety
ther a new bug or a false positive. New bugs are usually only of interaction sequences, which can neither be achieved by
found in new code as the old code has been exposed to auto- manual testing nor by manually writing automated test cases.
mated test case generation several times in previous build runs Hence, the technology transfer has been considered successful
and the discovered bugs got fixed over time. What remains in from the viewpoint of the industry partners as well as the par-
old code are typically only the false positives that keep reap- ticipating researchers. The proposed solution addressed the
pearing in every run. If the root causes of these false positives needs of the involved practitioners. One of members of the
(e.g., too restrictive checks in the adapter) are not resolved, development team commented on the approach as “this is real
then over time the share of new bugs among the failing gener- science”. Nevertheless, the adaption of automated test case
ated tests is decreasing and the effort required for analyzing generation for components other than those included in our
test results remains constant or even increases. studies and the diffusion of the approach to other development
teams happens at a very slow pace. If manually written tests
Introducing test case generation to other teams. Current are already in place for an existing component, these compo-
work includes the knowledge transfer to other development nents are usually not touched just for the sake of adding or im-
teams of the involved industrial partners. To support them in proving them with automatically generated test cases. Then
adopting test case generation we condensed our insights and again, new components are often not considered critical
lessons learned into a staged model for the transition from enough to apply test case generation in addition to manually
manual testing to automated test case generation [12]. It in- writing tests. However, the reluctance to invest more time and
cludes the mandatory step of manually writing automated tests effort in improving testing has to be understood in a broader
for the component/system to be tested before applying test case context. Adopting automated test case generation is considered
generation. This is based on the observation that automated test an important task. However, there are also many other open
case generation is sometimes misunderstood as a way to com- tasks in the backlogs of the projects, related e.g. to upgrading
pensate for not having any automated tests. Yet a lack of auto- to new technologies, refactoring and removing technical debt,
mated tests is usually accompanied with or caused by the prob- and improving the overall development process. And only a
lem of low testability (e.g., hardcoded dependencies to other few of these tasks can be accomplished in a development cycle.
components, complex and/or large input data objects are re-
quired, internal states are hard to reach) [19]. If testability is-
sues exist, test case generation is likely to be affected and may IV. DISCUSSION
even fail to produce useful results. Thus, as a rule of thumb, the
first step towards automated test case generation is to manually A. Factors Influencing Industry Application
write test cases. These test cases may already cover typical and Throughout the entire project we encountered factors that
important usage scenarios. However, their essential contribu- had a positive or negative influence on the application of the
tion is to assure testability as foundation for the effective use of proposed solution for automated test case generation. Table II
automated test case generation. The additionally generated test provides an overview of the factors we identified in a project
cases then augment the manually written test cases and cover retrospective. The table is organized along two dimensions.
the different interaction scenarios and combinations in depth.
The columns represent the time dimension, split into three
Reviewing the outcome. Automated test case generation high-level phases related to
produced tests that increased coverage and detected new, pre-
viously unknown bugs in already tested components. The tests x Exploring the solution: In the first phase we selected candi-
also revealed hard-to-find issues caused by bugs in the underly- date solutions for automated test case generation and inves-
367
tigated the feasibility of applying them for testing a selected that avoids tool-specific extensions. In our solution we there-
set of GUI components. fore exploited the adapter concept, as it is a simple technical
concept all developers are familiar with and that is almost uni-
x Adapting the solution: In the second phase system- and versally applicable.
domain-specific knowledge was included in the generation
process via test adapters. Furthermore the decision was Provide experience reports. Many software developers
made to investigate the applicability of the solution for in- are enthusiastic about trying new technologies and tools. First-
tegration-level and system-level testing. hand experience from practical applications is often valued
more than rigorous scientific studies, as it helps to understand
x Extending the solution: The final phase is characterized by the underlying concepts of a proposed solution, which in turn
integrating the solution in the build pipeline and the deci- increases the confidence in being able to apply the solution for
sion to transfer it to other development teams. a problem at hand. In contrast, scientific studies are suspected
The rows describe four categories to which the influence to suffer from publication bias as positive results are reported
factors are related to. These categories are more likely than failures. We therefore suggest facilitating
technology transfer by providing experience reports [20] that
x Technology: Concerns technical aspects of the solution. describe the application of a specific method, technique or tool
to solve a defined problem. The report should include an open
x System/Environment: Influence factors associated with the
discussion of the observed benefits as well as drawbacks plus
system under test and the related environment.
practical advice on how to apply the described solution.
x Business: Decisions made due to business considerations.
C. Lessons from Transferring Testing Research to Industry
x Organizational policies and processes: Constraints and
limitations introduced by the involved organizations. Increasing technology readiness. It has to be highlighted
that it has probably never been easier to transfer research re-
In the early phase of exploring the solutions many positive sults to practice. Industrial impact is gaining increasing atten-
factors (“+”) supported the application of automated test case tion of researchers, and major conferences acknowledge indus-
generation. Incompatibilities with the technology stack used by try applications by providing dedicated industry tracks. Nowa-
the industry partners were identified in this phase, but they days, thus, academic research is more often producing results
were resolved quickly using Randoop as candidate solution. readily prepared for the application in an industry environment.
The low initial effort and the quick results in terms of detected An example is the trend to make research results widely avail-
defects encouraged to proceed with this solution. ability in form of open source tools and plug-ins for popular
In the later phases more obstacles were encountered (“-”) development environments [21].
that were less technical but more organizational/business- Risk of change. Gorschek et al. [8] advise preparing com-
related. The increasing number of negative influence factors panies for necessary upcoming changes by “showing the peo-
matches the subjective perception of a slowing down progress ple in the organization that using the new solution is more ad-
in technology transfer over time. Adapting the solution re- vantageous than doing business as usual.” However, compa-
quired the integration of more system- and domain-specific nies may still be reluctant to adopt new methods and tech-
knowledge in the generation process, which was not well sup- niques, even if there is convincing evidence that their applica-
ported by the selected tool implementation and had to be tion will provide a noticeable advantage (e.g., increased effec-
solved by extending the test adapter. Establishing the solution tiveness and efficiency). This observation seems counterintui-
as part of the build pipeline and the transfer to other teams was tive. However one has to bear in mind that introducing a new
less a technical issue than an organizational one. Tackling chal- solution may also require changes to established processes,
lenges related to organizational policies and processes is often approaches or tools chains, which are considered reliable as
outside the initially identified research problem and requires they have proved to eventually lead to an expected outcome in
additional management support on the side of the companies. the past. For companies committed to deliver a specified set of
features at a defined deadline, cost is the only variable factor in
B. Suggestions for Research the “magic triangle of project management”. Thus, the compa-
Incorporate system/domain-specific knowledge in test nies will likely avoid the risk of not being able to deliver on
case generation. Many approaches to automated test case gen- time – due to adopting new methods and techniques – even if
eration aim to detect bugs autonomously. No upfront effort on their established way of working is less effective/efficient than
specification and modelling is required and no further human a proposed new approach or solution.
intervention is necessary when exercising the system under Knowledge transfer from industry to research. Models
test. However, in order to generate useful test cases we found for technology transfer often concentrate on the push of re-
that system- and domain-specific knowledge has to be included search results to industry, but they pay less attention to the nec-
in the generation process. It can be sufficient to provide small essary transfer of knowledge from industry to research. Even if
“hints” for the generator such as default values or general the research problem is clearly defined and well understood, it
properties that must never be violated. This kind of information is critical to also develop a thorough understanding of the in-
is usually obvious for human testers, but needs to be encoded dustry context in which the solution should be applied. Usual-
to be incorporated in test case generation. Researchers imple- ly, the relevant knowledge cannot be easily retrieved. It in-
menting tools should offer approaches to enrich test case gen- cludes tacit knowledge about complex real-world systems,
eration with additional knowledge more easily, ideally in a way
368
their technical peculiarities, as well as the restrictions and im- [3] L. Z. Micskei, “Evaluating code-based test input
plications derived from the organizational and business envi- generator tools,” Software Testing, Verification and Reliability. 27(6),
2017, pp. 1099–1689.
ronment. Continuous effort is required to transfer such
[4] T. Arts, J. Hughes, U. Norell, and H. Svensson, “Testing AUTOSAR
knowledge from industry to academia. In our case one of the software with QuickCheck,” Proc. of the Eighth International
researchers was involved already in previous development pro- Conference on Software Testing, Verification and Validation
jects conducted by our industry partners. All researchers kept Workshops (ICSTW), IEEE, 2015, pp. 1–4.
close contact with the project teams and were present on-site [5] C. Pacheco and M.D. Ernst, “Randoop: feedback-directed random
on a regular basis. Furthermore, our collaboration model also testing for Java,” Companion Proc. of the 22nd ACM SIGPLAN
allows practitioners to move from their work place to the lab Conference on Object-Oriented Programming Systems and
Applications, ACM, 2007, pp. 815–816.
for working together with the researchers over an extended
[6] R. Ramler, D. Winkler, and M. Schmidt, “Random test case generation
period of the project. and manual unit testing: Substitute or complement in retrofitting tests
for legacy code?,” Proc. of the 38th Euromicro Conference on Software
V. CONCLUSIONS Engineering and Advanced Applications, IEEE, 2012, pp. 286–293.
[7] B. Robinson, M.D. Ernst, J.H. Perkins, V. Augustine, and N. Li,
Automated test case generation is an intensively studied re- “Scaling up automated test generation: Automatically generating
search area offering a wide range of methods, techniques and maintainable regression unit tests for programs,” Proc. of the 26th
tools plus supporting empirical evaluations. This suggests the IEEE/ACM International Conference on Automated Software
practical application of automated test case generation in indus- Engineering, IEEE, 2011, pp. 23–32.
try. So why do we not see a widespread adaption yet? [8] T. Gorschek, C. Wohlin, P. Carre, and S. Larsson, “A model for
technology transfer in practice,” IEEE Software, 23(6), pp. 88-95, Nov.-
Our work shows that automated test case generation can be Dec. 2006.
successfully applied in real-world projects. However, we also [9] T. Mikkonen, C. Lassenius, T. Männistö, M. Oivo, and J. Järvinen,
found that there is still little reusable experience on how to “Continuous and collaborative technology transfer: Software
engineering research with real-time industry impact”. Information and
incorporate test case generation approaches in the daily prac- Software Technology, 95(3), 2018, pp 34–45.
tice of industry projects and about what mid-term to long-term
[10] C. Klammer, R. Ramler, and H. Stummer, “Harnessing Automated Test
effects one can expect. Industry applications come with a com- Case Generators for GUI Testing in Industry,” Proc. of the 42th
plex landscape of technologies and project requirements that Euromicro Conference on Software Engineering and Advanced
demand workarounds to make test case generation usable in Applications (SEAA), IEEE, 2016, pp. 227–234.
order to realize the promised benefits. The break-even is not [11] R. Ramler, G. Buchgeher, and C. Klammer, “Adapting automated test
easily reached. Most effort invested in test automation is for generation to GUI testing of industry applications,” Information and
Software Technology, 93, 2018, pp. 248–263.
making testing work and not for performing actual testing.
[12] C. Klammer and R. Ramler, “A Journey from Manual Testing to
Software testing research can help to improve the balance Automated Test Generation in an Industry Project,” Proc. of the IEEE
between benefits and costs of applying automated test case International Conference on Software Quality, Reliability and Security
Companion (QRS-C), 2017, pp. 591–592.
generation. So far a lot of research work has been devoted to
the improvement of quantitative measures such as coverage [13] T. Wetzlmaier and M. Winterer, “Test automation for multi-touch user
interfaces of industrial applications,” Proc. of the Eighth International
percentages and number of mutants/bugs found. These im- Conference on Software Testing, Verification and Validation
provements are mainly related to the benefits side of the bal- Workshops (ICSTW), 2015, pp. 1–3.
ance. While this is a necessary and valuable work, what we feel [14] D. Graham and M. Fewster, “Experiences of test automation: case
has been neglected are research results that help reducing the studies of software test automation,” Addison-Wesley, 2012.
costs of applying test generation. Besides making the existing [15] I. Banerjee, B. Nguyen, V. Garousi, and A. Memon, “Graphical user
methods and tools more versatile, there is a huge potential in interface (GUI) testing: Systematic mapping and repository,”
research such as generating stubs and mocks, creating test ora- Information and Software Technology, 55(10), 2013, pp. 1679–1694.
cles, visualizing and evaluating test results, or locating faults [16] G. Fraser and A. Arcuri, “EvoSuite: automatic test suite generation for
object-oriented software,” Proc. of the 19th ACM SIGSOFT
applicable in combination with automated test case generation. Symposium and the 13th European Conference on Foundations of
Software Engineering, ACM, 2011, pp. 416–419.
ACKNOWLEDGMENT [17] L. Ma, C. Artho, C. Zhang, H. Sato, J. Gmeiner, and R. Ramler, “GRT:
An automated test generator using orchestrated program analysis,” Proc.
The research reported in this paper has been supported by the Austrian Re-
of the 30th IEEE/ACM International Conference on Automated
search Promotion Agency FFG, the Austrian Ministry for Transport, Innova-
Software Engineering (ASE), IEEE, 2015, pp. 842–847.
tion and Technology, the Federal Ministry of Science, Research and Economy,
and the Province of Upper Austria in the frame of the COMET center SCCH. [18] A. Memon, “GUI testing: Pitfalls and process,” IEEE Computer, 35(8),
2002, pp. 87–88.
[19] C. Klammer and A. Kern, “Writing unit tests: It's now or never!,” Proc.
REFERENCES of the Eighth International Conference on Software Testing, Verification
[1] S. Anand, E.K. Burke, T.Y. Chen, J. Clark, M.B. Cohen, W. Grieskamp, and Validation Workshops (ICSTW), IEEE, 2015, pp. 1–4.
M. Harman, M.J. Harrold, P. McMinn, et al., “An orchestrated survey of [20] A. Averbakh, “Light-Weight Experience Collection in Distributed
methodologies for automated software test case generation,” Journal of Software Engineering,” Logos Verlag, Berlin, 2015.
Systems and Software. 86(8), 2013, pp. 1978–2001. [21] D. Shepherd, K. Damevski, and L. Pollock, “How and when to transfer
[2] S.J. Galler and B.K. Aichernig, “Survey on test data generation tools,” software engineering research via extensions,” Proc. of the 37th
International Journal on Software Tools for Technology Transfer. 16(6), International Conference on Software Engineering, Volume 2, 2015, pp.
2014, pp. 727–751. 239–240.
369