JEST_N1-version_Differential_Testing_of_Both_JavaS
JEST_N1-version_Differential_Testing_of_Both_JavaS
Abstract—Modern programming follows the continuous inte- software development, deployment, and back to development
gration (CI) and continuous deployment (CD) approach rather with feedback. Even the development of programming lan-
than the traditional waterfall model. Even the development of guages uses the CI/CD approach.
modern programming languages uses the CI/CD approach to
swiftly provide new language features and to adapt to new devel- Consider JavaScript, one of the most widely used pro-
opment environments. Unlike in the conventional approach, in the gramming languages for client-side and server-side program-
modern CI/CD approach, a language specification is no more the ming [3] and embedded systems [4]–[6]. Various JavaScript
oracle of the language semantics because both the specification engines provide diverse extensions to adapt to fast-changing
and its implementations (interpreters or compilers) can co-evolve. user demands. At the same time, ECMAScript, the offi-
In this setting, both the specification and implementations may
have bugs, and guaranteeing their correctness is non-trivial. cial specification that describes the syntax and semantics of
In this paper, we propose a novel N +1-version differential JavaScript, is annually updated since ECMAScript 6 (ES6,
testing to resolve the problem. Unlike the traditional differential 2015) [7] to support new features in response to user demands.
testing, our approach consists of three steps: 1) to automatically Such updates in both the specification and implementations in
synthesize programs guided by the syntax and semantics from a tandem make it difficult for them to be in sync.
given language specification, 2) to generate conformance tests by
injecting assertions to the synthesized programs to check their Another example is Solidity [8], the standard smart contract
final program states, 3) to detect bugs in the specification and programming language for the Ethereum blockchain. The
implementations via executing the conformance tests on multiple Solidity language specification is continuously updated, and
implementations, and 4) to localize bugs on the specification the Solidity compiler is also frequently released. According to
using statistical information. We actualize our approach for the Hwang and Ryu [9], the average number of days between
JavaScript programming language via JEST, which performs
N +1-version differential testing for modern JavaScript engines consecutive releases from Solidity 0.1.2 to 0.5.7 is 27. In
and ECMAScript, the language specification describing the most cases, the Solidity compiler reflects updates in the
syntax and semantics of JavaScript in a natural language. We specification, but even the specification is revised according to
evaluated JEST with four JavaScript engines that support all the semantics implemented in the compiler. As in JavaScript,
modern JavaScript language features and the latest version of bidirectional effects in the specification and the implementa-
ECMAScript (ES11, 2020). JEST automatically synthesized 1,700
programs that covered 97.78% of syntax and 87.70% of semantics tion make it hard to guarantee their correspondence.
from ES11. Using the assertion-injected JavaScript programs, In this approach, both the specification and the implemen-
it detected 44 engine bugs in four different engines and 27 tation may contain bugs, and guaranteeing their correctness
specification bugs in ES11. is a challenging task. The conventional approach to build
Index Terms—JavaScript, conformance test generation, mech- a programming language is uni-directional from a language
anized specification, differential testing
specification to its implementation. The specification is be-
I. I NTRODUCTION lieved to be correct and the conformance of an implementation
to the specification is checked by dynamic testing. Unlike in
In Peter O’Hearn’s keynote speech in ICSE 2020, he quoted the conventional approach, in the modern CI/CD approach,
the following from Mark Zuckerberg’s Letter to Investors [1]: the specification may not be the oracle, because both the
The Hacker Way is an approach to building that in- specification and the implementation can co-evolve.
volves continuous improvement and iteration. Hack- In this paper, we propose a novel N +1-version differential
ers believe that somethings can always be better, and testing, which enables testing of co-evolving specifications
that nothing is ever complete. and their implementations. The differential testing [10] is a
Indeed, modern programming follows the continuous integra- testing technique, which executes N implementations of a
tion (CI) and continuous deployment (CD) approach [2] rather specification concurrently for each input, and detects a prob-
than the traditional waterfall model. Instead of a sequential lem when the outputs are in disagreement. In addition to N
model that divides software development into several phases, implementations, our approach tests the specification as well
each of which takes time, CI/CD amounts to a cycle of quick using a mechanized specification. Recently, several approaches
to extract syntax and semantics directly from language specifi-
cations are presented [11]–[13]. We utilize them to bridge the
gap between specifications and their implementations through
conformance tests generated from mechanized specifications.
The N +1-version differential testing consists of three steps: 1)
to automatically synthesize programs guided by the syntax and
semantics from a given language specification, 2) to generate
conformance tests by injecting assertions to the synthesized
programs to check their final program states, 3) to detect
bugs in the specification and implementations via executing
the conformance tests on multiple implementations, and 4) to
localize bugs on the specification using statistical information. (a) The Abstract Equality Comparison abstract algorithm in ES11
Given a language specification and N existing real-world // JavaScript engines: exception with "err"
implementations of the specification, we automatically gen- // ECMAScript (ES11) : result === false
var obj = { valueOf: () => { throw "err"; } };
erate a conformance test suite from the specification with var result = 42 == obj;
assertions in each test code to make sure that the result of
(b) JavaScript code using abstract equality comparison
running the code conforms to the specification semantics.
Then, we run the test suite for N implementations of the try {
var obj = { valueOf: () => { throw "err"; } };
specification. Because generated tests strictly comply with the var result = 42 == obj;
specification, they reflect specification errors as well, if any. assert(result === false);
When one of the implementations fails in running a test, the } catch (e) {
assert(false);
implementation may have a bug, as in the differential testing. }
When most of the implementations fail in running a test, it is (c) JavaScript code with injected assertions
highly likely that the specification has a bug. By automatically
generating a rich set of test code from the specification and Fig. 1: Abstract algorithm in ES11 and code example using it
running them with implementations of the specification, we
can find and localize bugs either in the specification written • Evaluate JEST with four modern JavaScript engines
in a natural language or in its implementations. and the latest ECMAScript, ES11. Using the generated
To show the practicality of the proposed approach, we conformance test suite, the tool found and localized 44
present JEST, which is a JavaScript Engines and Specification engine bugs in four different engines and 27 specification
Tester using N +1-version differential testing. We implement bugs in ES11.
JEST by extending JISET [11], a JavaScript IR-based se- II. N +1- VERSION D IFFERENTIAL T ESTING
mantics extraction toolchain, to utilize the syntax and se- This section introduces the core concept of N +1-version
mantics automatically extracted from ECMAScript. Using the differential testing with a simple running example. The overall
extracted syntax, our tool automatically synthesizes initial structure consists of two phases: a conformance test generation
seed programs and expands the program pool by mutating phase and a bug detection and localization phase.
specific target programs guided by semantics coverage. Then,
the tool generates conformance tests by injecting assertions to A. Main Idea
synthesized programs. Finally, JEST detects and localizes bugs Differential testing utilizes the cross-referencing oracle,
using execution results of the tests on N JavaScript engines. which is an assumption that any discrepancies between pro-
We evaluate our tool with four JavaScript engines (Google V8 gram behaviors on the same input could be bugs. It compares
[14], GraalJS [15], QuickJS [16], and Moddable XS [17]) that the execution results of a program with the same input on N
support all modern JavaScript language features and the latest different implementations. When an implementation produces
ECMAScript (ES11, 2020). a different result from the one by the majority of the imple-
The main contributions of this paper include the following: mentations, differential testing reports that the implementation
• Present N +1-version differential testing, a novel solution may have a bug.
to the new problem of co-evolving language specifica- On the contrary, N +1-version differential testing utilizes not
tions and their implementations. only the cross-referencing oracle using multiple implementa-
• Implement N +1-version differential testing for JavaScript tions but also a mechanized specification. It first generates test
engines and ECMAScript as a tool called JEST. It is code from a mechanized specification, and tests N different
the first tool that automatically generates conformance implementations of the specification using the generated test
tests for JavaScript engines from ECMAScript. While code as in differential testing. In addition, it can detect possible
the coverage of Test262, the official conformance tests, bugs in the specification as well when most implementations
is 91.61% for statements and 82.91% for branches, the fail for a test. In such cases, because a bug in the specification
coverage of the conformance tests generated by the tool could be triggered by the test, it localizes the bug using
is 87.70% for statements and 78.30% for branches. statistical information as we explain later in this section.
Fig. 2: Overall structure of N +1-version differential testing for N implementations (engines) and one language specification
5) Bug Localizer: Then, the second phase executes the and returns a map M from non-terminals to shortest strings
conformance tests on N engines and collects their results. derivable from them. It utilizes a worklist W , a queue structure
For each test, if a small number of engines fail, it reports that includes syntax reduction rules affected by updated non-
potential bugs in the engines that fail the test. Otherwise, terminals. The function initializes the worklist W with all the
it reports potential bugs in the specification. In addition, its syntax reduction rules R. Then, for a syntax reduction rule
Bug Localizer module uses Spectrum Based Fault Localization (A, α), it updates the map M via the update function, and
(SBFL) [18], a localization technique utilizing the coverage propagates updated information via the propagate function.
and pass/fail results of test cases, to localize potential bugs. The update function checks whether a given alternative α of
a non-terminal A can derive a string shorter than the current
III. N +1- VERSION D IFFERENTIAL T ESTING FOR shortest one using the current map M . If possible, it stores the
JAVA S CRIPT mapping from the non-terminal A to the newly found shortest
We actualize N +1-version differential testing for the string in M and invokes propagate. The propagate function
JavaScript programming language as JEST, which uses mod- finds all the syntax reduction rules whose alternatives contain
ern JavaScript engines and ECMAScript. the updated non-terminal A and inserts them into W . The
shortestStrings function repeats this process until the worklist
A. Seed Synthesizer W becomes empty.
JEST synthesizes seed programs using two synthesizers. Using shortest strings derivable from non-terminals, the
1) Non-Recursive Synthesizer: The first synthesizer aims to nonRecSynthesize function in Algorithm 2 synthesize pro-
cover as many syntax cases as possible in two steps: 1) to find grams. It takes syntax reduction rules R and a start symbol S.
the shortest string for each non-terminal and 2) to synthesize For the first visit with a non-terminal A, the getProd function
JavaScript programs using the shortest strings. For presen- returns strings generated by getAlt with alternatives of the
tation brevity, we explain simple cases like terminals and non-terminal A. For an already visited non-terminal A, it
non-terminals, but the implementation supports the extended returns the single shortest string M [A]. The getAlt function
grammar of ECMAScript such as parametric non-terminals, takes a non-terminal A with an alternative α and returns a set
conditional alternatives, and special terminal symbols. of strings derivable from α via point-wise concatenation of
The shortestStrings function in Algorithm 1 shows the strings derived by symbols of α. When the numbers of strings
first step. We modified McKenize’s algorithm [19] that finds derived by symbols are different, it uses the shortest strings
random strings to find the shorted string. It takes syntax reduc- derived by symbols as default strings.
tion rules R, a set of pairs of non-terminals and alternatives, For example, Figure 3 shows a simplified MemberExpres-
Array(); Array(0); Array(0, 0);
new Array(); new Array(0); new Array(0, 0);
B. Target Selector
From the synthesized programs, Target Selector selects a
target program to mutate to increase the semantics coverage
Fig. 3: The MemberExpression production in ES11
of the program pool. Consider the Abstract Equality Com-
parison algorithm in Figure 1(a) again where the first step has
sion production in ES11. For the first step, we find the shortest the condition “If Type(x) is the same as Type(y).” Assuming
string for each non-terminal: () for Arguments and x for the that the current pool has the following three programs:
other non-terminals. Note that we use pre-defined shortest
strings for identifiers and literals such as x for identifiers and 1 + 2; true == false; 0 == 1;
0 for numerical literals. In the next step, we synthesize strings because later two programs that perform comparison have
derivable from MemberExpression. The first alternative is a values of the same type, the pool covers only the true
single non-terminal PrimaryExpression, which is never visited. branch of the condition in the algorithm. To cover its false
Thus, it generates all cases of PrimaryExpression. The fourth branch, Target Selector selects any program that covers the true
alternative consists of one terminal new and two non-terminals branch like true == false; and Program Mutator mutates it to
MemberExpression and Arguments. Because MemberExpres- 42 == false; for example. Then, since the mutated program
sion is already visited, it generates a single shortest string x. covers the false branch, the pool is extended as follows:
For the first visit of Arguments, it generates all cases: (), (x),
1 + 2; true == false; 0 == 1; 42 == false;
(...x), and (x,). Note that the numbers of strings generated
for symbols are different. In such cases, we use the shortest which now covers more steps in the algorithm. This process
strings for symbols like x for MemberExpression as follows: repeats until the semantics coverage converges.
C. Program Mutator
JEST increases the semantics coverage of the program pool
by mutating programs using five mutation methods randomly.
1) Random Mutation: The first naı̈ve method is to ran-
domly select a statement, a declaration, or an expression
in a given program and to replace it with a randomly se-
2) Built-in Function Synthesizer: JavaScript supports di- lected one from a set of syntax trees generated by the non-
verse built-in functions for primitive values and built-in ob- recursive synthesizer. For example, it may mutate a program
var x = 1 + 2; by replacing its random expression 1 with a
jects. To synthesize JavaScript programs that invoke built-
in functions, we extract the information of each built-in random expression true producing var x = true + 2;.
function from the mechanized ECMAScript. We utilize the 2) Nearest Syntax Tree Mutation: The second method tar-
Function.prototype.call function to invoke built-in func-
gets uncovered branches in abstract algorithms. When only
tions to easily handle the this object in Program Mutator; one branch is covered by a program, it finds the nearest syntax
we use a corresponding object or null as the this object by tree in the program that reaches the branch in the algorithm,
default. In addition, we synthesize function calls with optional and replaces the nearest syntax tree with a random syntax
and variable number of arguments and built-in constructor tree derivable from the same syntax production. For example,
calls with the new keyword. consider the following JavaScript program:
Consider the following Array.prototype.indexOf function var x = "" + (1 == 2);
for JavaScript array objects that have a parameter searchEle-
While it covers the false branch of the first step of Abstract
ment and an optional parameter fromIndex:
Equality Comparison in Figure 1(a), assume that no program
in the program pool can cover its true branch. Then, the
mutator targets this branch, finds its nearest syntax tree 1 == 2
the synthesizer generates the following calls with an array in the program, and replaces it with a random syntax tree.
object or null as the this object as follows: 3) String Substitutions: We collect all string literals used in
Array.prototype.indexOf.call(new Array(), 0); conditions of the algorithms in ES11 and use them for random
Array.prototype.indexOf.call(new Array(), 0, 0); expression substitutions. Because most string literals in the
Array.prototype.indexOf.call(null, 0);
Array.prototype.indexOf.call(null, 0, 0);
specification represent corner cases such as -0, Infinity, and
NaN, they are necessary for mutation to increase the semantics
Moreover, Array is a built-in function and a built-in construc- coverage. For example, the semantics of the [[DefineOwn-
tor with a variable number of arguments. Thus, we synthesize Property]] internal method of array exotic objects depends on
the following six programs for Array: whether the value of its parameter P is "length" or not.
4) Object Substitutions: We also collect string literals The postfix increment operator (++) increases the number value
and symbols used as arguments of object property access stored in the variable x. However, because of a typo in the
algorithms in ES11, randomly generate objects using them, Evaluation algorithm for such update expressions in ES11, the
and replace random expressions with the generated objects. behavior of the program is not defined in ES11. To represent
Because some abstract algorithms in the specification access this situation in the conformance test, we tag Abort in the
object properties using HasProperty, GetMethod, Get, and comment as follows:
OrdinaryGetOwnProperty, objects with such properties are // Abort
necessary for mutation to achieve high coverage. Thus, the var x = 42; x++;
mutator mutates a randomly selected expression in a program 3) Variable Values: We inject assertions that compare the
with a randomly generated object that has properties whose values of variables with expected values. To focus on variables
keys are from collected string literals and symbols. introduced by tests, we do not check the values of pre-defined
5) Statement Insertion: To synthesize more complex pro- variables like built-in objects. For numbers, we distinguish
grams, the mutator inserts random statements at the end of -0 from +0 using division by zero because 1/-0 and 1/+0
randomly selected blocks like top-level code and function bod- produce negative and positive infinity values, respectively. The
ies. We generate random statements using the non-recursive following example checks whether the value of x is 3:
synthesizer with pre-defined special statements. The special var x = 1 + 2;
statements are control diverters, which have high chances $assert.sameValue(x, 3);
of changing execution paths, such as function calls, return, 4) Object Values: To check the equality of object values,
break, and throw statements. The mutator selects special state- we keep a representative path for each object. If the injector
ments with a higher probability than the statements randomly meets an object for the first time, it keeps the current path
synthesized by the non-recursive synthesizer. of the object as its representative path and injects assertions
for the properties of the object. Otherwise, the injector adds
D. Assertion Injector assertions to compare the values of the objects with the current
After generating JavaScript programs, Assertion Injector path and the representative path. In the following example:
injects assertions to them using their final states as specified in var x = {}, y = {}, z = { p: x, q: y };
ECMAScript. It first obtains the final state of a given program $assert.sameValue(z.p, x);
$assert.sameValue(z.q, y);
from the mechanized specification and injects seven kinds
of assertions in the beginning of the program. To check the because the injector meets two different new objects stored in
final state after executing all asynchronous jobs, we enclose x and y, it keeps the paths x and y. Then, the object stored
assertions with setTimeout to wait 100 ms when a program in z is also a new object but its properties z.p and z.q store
uses asynchronous features such as Promise and async: already visited objects values. Thus, the injector inserts two
... /* a given program */
assertions that check whether z.p and x have the same object
setTimeout(() => { ... /* assertions */ }, 100) value and z.q and y as well. To handle built-in objects, we
store all the paths of built-in objects in advance.
1) Exceptions: JavaScript supports both internal exceptions 5) Object Properties: Checking object properties involves
like SyntaxError and TypeError and custom exceptions with checking four attributes for each property. We implement
the keyword throw. Note that catching such exceptions using a helper $verifyProperty to check the attributes of each
the try-catch statement may change the program semantics. property for each object. For example, the following code
For example, the following does not throw any exception: checks the attributes of the property of x.p:
var x; function x() {} var x = { p: 42 };
$verifyProperty(x, "p", {
but the following: value: 42.0, writable: true,
enumerable: true, configurable: true
try { var x; function x() {} } catch (e) {} });
throws SyntaxError because declarations of a variable and a 6) Property Keys: Since ECMAScript 2015 (ES6), the
function with the same name are not allowed in try-catch. specification defines orders between property keys in objects.
To resolve this problem, we exploit a comment in the first We check the order of property keys by Reflect.ownKeys,
line of a program. If the program throws an internal exception, which takes an object and returns an array of the object’s
we tag its name in the comment. Otherwise, we tag // Throw property keys. We implement a helper $assert.compareArray
for a custom exception and // Normal for normal termination. that takes two arrays and compares their lengths and contents.
Using the tag in the comment, JEST checks the execution For example, the following program checks the property keys
result of a program in each engine. and their order of the object in x:
2) Aborts: The mechanized semantics of ECMAScript can var x = {[Symbol.match]: 0, p: 0, 3: 0, q: 0, 1: 0}
$assert.compareArray(
abort due to unspecified cases. For example, consider the Reflect.ownKeys(x),
following JavaScript program: ["1", "3", "p", "q", Symbol.match]
);
var x = 42; x++;
7) Internal Methods and Slots: While internal methods and We use abstract algorithms of ECMAScript as program
slots of JavaScript objects are generally inaccessible by users, elements used for SBFL. To improve the localization accu-
the names in the following are accessible by indirect getters: racy, we use method-level aggregation [21]. It first calculates
Name Indirect Getter
SBFL scores for algorithm steps and aggregates them up to
[[Prototype]] Object.getPrototypeOf(x) algorithm-level using the highest score among those from steps
[[Extensible]] Object.isExtensible(x) of each algorithm.
[[Call]] typeof f === "function"
[[Construct]] Reflect.construct(function(){},[],x)
IV. E VALUATION
The internal slot [[Prototype]] represents the prototype
object of an object, which is available by a built-in function To evaluate JEST that performs N +1-version differential
Object.getPrototypeOf. The internal slot [[Extensible]] is testing of JavaScript engines and its specification, we applied
also available by a built-in function Object.isExtensible. the tool to four JavaScript engines that fully support modern
The internal methods [[Call]] and [[Construct]] represent JavaScript features and the latest specification, ECMAScript
whether a given object is a function and a constructor, re- 2020 (ES11, 2020). Our experiments use the following four
spectively. Because the methods are not JavaScript values, we JavaScript engines, all of which support ES11:
simply check their existence using helpers $assert.callable • V8(v8.3)1 : An open-source high-performance engine for
and $assert.constructable. For [[Call]], we use the typeof JavaScript and WebAssembly developed by Google [14]
operator because it returns "function" if and only if a given • GraalJS(v20.1.0)2 : A JavaScript implementation built on
value is an object with the [[Call]] method. For [[Construct]] GraalVM [15], which is a Java Virtual Machine (JVM)
method, we use the Reflect.construct built-in function that based on HotSpot/OpenJDK developed by Oracle
checks the existence of the [[Construct]] methods and invokes • QuickJS(2020-04-12)3 : A small and embedded
it. To avoid invoking [[Construct]] unintentionally, we call JavaScript engine developed by Fabrice Bellard
Reflect.construct with a dummy function function(){} as
and Charlie Gordon [16]
its first argument and a given object as its third argument. For • Moddable XS(v10.3.0)4 : A JavaScript engine at the
example, the following code shows how the injector injects center of the Moddable SDK [17], which is a combination
assertions for internal methods and slots: of development tools and runtime software to create
function f() {} applications for micro-controllers
$assert.sameValue(Object.getPrototypeOf(f),
Function.prototype); To extract a mechanized specification from ECMAScript,
$assert.sameValue(Object.isExtensible(x), true); we utilize the tool JISET, which is a JavaScript IR-based
$assert.callable(f);
$assert.constructable(f); semantics extraction toolchain, to automatically generate a
JavaScript interpreter from ECMAScript. To focus on the
core semantics of JavaScript, we consider only the semantics
E. Bug Localizer
of strict mode JavaScript code that pass syntax checking
The bug detection and localization phase uses the execution including the EarlyError rules. To filter out JavaScript code
results of given conformance tests on multiple JavaScript that are not strict or fail syntax checking, we utilize the
engines. If a small number of engines fail in running a specific syntax checker of the most reliable JavaScript engine, V8.
conformance test, the engines may have bugs causing the test We performed our experiments on a machine equipped with
failure. If most engines fail for a test, the test may be incorrect, 4.0GHz Intel(R) Core(TM) i7-6700k and 32GB of RAM
which implies a bug in the specification. (Samsung DDR4 2133MHz 8GB*4). We evaluated JEST with
When we have a set of failed test cases that may contain the following four research questions:
bugs of an engine or a specification, we classify the test cases
using their failure messages and give ranks between possible • RQ1 (Coverage of Generated Tests) Is the semantics
buggy program elements to localize the bug. We use Spectrum coverage of the tests generated by JEST comparable to
Based Fault Localization (SBFL) [18], which is a ranking that of Test262, the official conformance test suite for
technique based on likelihood of being faulty for each program ECMAScript, which is manually written?
element. We use the following formula called ER1b , which is • RQ2 (Accuracy of Bug Localization) Does JEST local-
one of the best SBFL formulae theoretically analyzed by Xie ize bug locations accurately?
et al. [20]: • RQ3 (Bug Detection in JavaScript Engines) How many
nep bugs of four JavaScript engines does JEST detect?
nef −
nep + nnp + 1 • RQ4 (Bug Detection in ECMAScript) How many bugs
of ES11 does JEST detect?
where nef , nep , nnf , and nnp represent the number of test
cases; subscripts e and n respectively denote whether a test 1 https://v8.dev/
case touches a relevant program element or not, and subscripts 2 https://github.com/graalvm/graaljs#current-status
passed. 4 https://blog.moddable.com/blog/xs10/
(a) Statement coverage (b) Branch coverage
Fig. 4: The semantics coverage changes during the test generation phase
TABLE II: The number of engine bugs detected by JEST because the tests generated by JEST detected many semantics
Engines Exc Abort Var Obj Desc Key In Total bugs that were not detected by other conformance tests: “Right
V8 0 0 0 0 0 2 0 2 now, we are running Test262 and the V8 and Nashorn unit test
GraalJS 6 0 0 0 2 8 0 16 suites in our CI for every change, it might make sense to add
QuickJS 3 0 1 0 0 2 0 6 your suite as well.”
Moddable XS 12 0 0 0 3 5 0 20
In QuickJS, JEST detected 6 engine bugs, most of which
Total 21 0 1 0 5 17 0 44
are due to corner cases of the function semantics. For example,
the following code should throw a ReferenceError exception:
semantics of ES11. Among 71 bugs, we excluded 7 syntax
function f (... { x = x }) { return x; } f()
bugs and localized only 64 semantics bugs. Figure 5 shows
the ranks of algorithms that caused the semantics bugs. The because the variable x is not yet initialized when it tries to
average rank is 3.19, and 82.8% of the algorithms causing the read the right-hand side of x = x. However, since QuickJS
bugs are ranked less than 5, 93.8% less than 10, and 98.4% assumes that the initial value of x is undefined, the function
less than 15. Note that the location of one bug is ranked 21 call f() returns undefined. The QuickJS team confirmed our
because of the limitation of SBFL; its localization accuracy bug reports and it has been fixing the bugs.
becomes low for a small number of failed test cases. JEST found the most bugs in Moddable XS; it detected 20
C. Bug Detection in JavaScript Engines bugs for various language features such as optional chains,
Number.prototype.toString, iterators of Map and Set, and
From four JavaScript engines, JEST detected 44 bugs: 2
complex assignment patterns. Among them, optional chains
from V8, 16 from GraalJS, 6 from QuickJS, and 20 from
are newly introduced in ES11, which shows that our approach
Moddable XS. Table II presents how many bugs for each
is applicable to finding bugs in new language features. We
assertion are detected for each engine. We injected seven
reported all the bugs found, and the Moddable XS team has
kinds of assertions: exceptions (Exc), aborts (Abort), variable
been fixing them. They showed interests in using our test suite:
values (Var), object values (Obj), object properties (Desc),
“As you know, it is difficult to verify changes because the
property keys (Key), and internal methods and slots (In). The
language specification is so big. Test262, as great a resource
effectiveness of bug finding is different for different assertions.
as it is, is not definitive.”
The Exc and Key assertions detected engine bugs the most; out
of 44 bugs, the former detected 21 bugs and the latter detected D. Bug Detection in ECMAScript
17 bugs. Desc and Var detected 5 and 1 bugs, respectively, but
the other assertions did not detect any engine bugs. From the latest ECMAScript ES11, JEST detected 27
The most reliable JavaScript engine is V8 because JEST specification bugs. Table III summarizes the bugs categorized
found only two bugs and the bugs are due to specification by their root causes. Among them, five categories (ES11-1 to
bugs in ES11. Because V8 strictly follows the semantics ES11-5) were already reported and fixed in the current draft of
of functions described in ES11, it also implemented wrong the next ECMAScript but ES11-6 was never reported before.
semantics that led to ES11-1 and ES11-2 listed in Table III. We reported it to TC39; they confirmed it and they will fix it
The V8 team confirmed the bugs and fixed them. in the next version, ECMAScript 2021 (ES12).
We detected 16 engine bugs in GraalJS and one of them ES11-1 contains 12 bugs; it is due to a wrong order between
caused an engine crash. When we apply the prefix incre- property keys of all kinds of function values such as async
ment operator for undefined as ++undefined, GraalJS throws and generator functions, arrow functions, and classes. For
java.lang.IllegalStateException. Because it crashes the example, if we define a class declaration with a name A
engine, developers even cannot catch the exception as follows: (class A {}), three properties are defined in the function
stored in the variable A: length with a number value 0,
try { ++undefined; } catch(e) { }
prototype with an object, and name with a string "A". The
The GraalJS team has been fixing the bugs we reported and problem is the different order of their keys because of the
asked whether we plan to publish the conformance test suite, wrong order of their creation. From ECMAScript 2015 (ES6),
the order between property keys is no more implementation- the concept of δ-diversity to efficiently find semantics bugs.
dependent but it is related to the creation order of properties. However, they have a fundamental limitation that they cannot
While the order of property keys in the class A should test specifications; they use only cross-referencing oracles and
be [length, prototype, name] according to the semantics target potential bugs in implementations. Our N +1-version
of ES11, the order is [length, name, prototype] in three differential testing extends the idea of differential testing with
engines except V8. We found that it was already reported as a not only N different implementations but also a mechanized
specification bug; we reported it to V8 and they fixed it. This specification to test both of them. In addition, our approach
bug was created on February 7, 2019 and TC39 fixed it on automatically generates conformance tests directly from the
April 11, 2020; the bug lasted for 429 days. specification.
ES11-2 contains 8 bugs that are due to the missing property Fuzzing: Fuzzing is a software testing technique for de-
name of anonymous functions. Until ES5.1, anonymous func- tecting security vulnerabilities by generating [28]–[30] or
tions, such as an identity arrow function x => x, had their own mutating [31]–[33] test inputs. For JavaScript [34] engines,
property name with an empty string "". While ES6 removed the Patrice et al. [35] presented white-box fuzzing using the
name property from anonymous functions, three engines except JavaScript grammar, Han et al. [36] presented CodeAlchemist
V8 still create the name property in anonymous functions. We that generates JavaScript code snippets based on semantics-
also found that it was reported as a specification bug and aware assembly, Wang et al. [37] presented Superion using
reported it to V8, and it will be fixed in V8. Grammar-aware greybox fuzzing, Park et al. [38] presented
The bug in ES11-3 comes from the misunderstanding of D IE using aspect-preserving mutation, and Lee et al. [39]
the term “iterator object” and “iterator record”. The algorithm presented Montage using neural network language models
ForIn/OfHeadEvaluation should return an iterator record, (NNLMs). While they focus on finding security vulnerabilities
which is an implicit record containing only internal slots. rather than semantics bugs, our N +1-version differential test-
However, In ES11, it returns an iterator object, which is a ing focuses on finding semantics bugs by comparing multiple
JavaScript object with some properties related to iteration. implementations with the mechanized specification, which was
It causes a TypeError exception when executing the code automatically extracted from ECMAScript by JISET. Note
for(var x in {}); according to ES11 but all engines execute that JEST can also localize not only specification bugs in
the code normally without any exceptions. This bug was ECMAScript but also bugs in JavaScript engines indirectly
resolved by TC39 on April 30, 2020. using the bug locations in ECMAScript.
ES11-4 contains four bugs caused by a typo for the variable Fault Localization: To localize detected bugs in EC-
in the semantics of four different update expressions: x++, MAScript, we used Spectrum Based Fault Localization
x--, ++x, and --x. In each Evaluation of four kinds of (SBFL) [18], which is a ranking technique based on likelihood
UpdateExpression, there exists a typo oldvalue in step 3 of being faulty for each program element. Tarantula [40], [41]
instead of oldValue declared in step 2. JEST could not execute was the first tool that supports SBFL with a simple formula
the code x++ using the semantics of ES11 because of the typo. and researchers have developed many formulae [42]–[45] to
For this case, we directly pass the code to Bug Localizer to test increase the accuracy of bug localization. Sohn and Yoo [21]
whether the code is executable in real-world engines and to introduced a novel approach for fault localization using code
localize the bug. Of course, four JavaScript engines executed and change metrics via learning of SBFL formulae. While we
the update expressions without any issues and this bug was utilize a specific formula ER1b introduced by Xie et al. [20],
resolved by TC39 on April 23, 2020. we believe that it is possible to improve the accuracy of bug
Two bugs in ES11-5 and ES11-6 are caused by unhandling localization by using more advanced SBFL techniques.
of abrupt completions in abstract equality comparison and
property definitions of object literals, respectively. The bug VI. C ONCLUSION
in ES11-5 was confirmed by TC39 and was fixed on April 28, The development of modern programming languages fol-
2020. The bug in ES11-6 was a genuine one, and we reported lows the continuous integration (CI) and continuous deploy-
it and received a confirmation from TC39 on August 18, 2020. ment (CD) approach to instantly support fast changing user
The bug will be fixed in the next version, ES12. demands. Such continuous development makes it difficult to
find semantics bugs in both the language specification and its
V. R ELATED W ORK various implementations. To alleviate this problem, we present
Our technique is related to three research fields: differential N +1-version differential testing, which is the first technique
testing, fuzzing, and fault localization. to test both implementations and its specification in tandem.
Differential Testing: Differential testing [10] utilizes mul- We actualized our approach for the JavaScript programming
tiple implementations as cross-referencing oracles to find se- language via JEST, using four modern JavaScript engines and
mantics bugs. Researchers applied this technique to various ap- the latest version of ECMAScript (ES11, 2020). It automat-
plications domain such as Java Virtual Machine (JVM) imple- ically generated 1,700 JavaScript programs with 97.78% of
mentations [22], SSL/TLS certification validation logic [23]– syntax coverage and 87.70% of semantics coverage on ES11.
[25], web applications [26], and binary lifters [27]. Moreover, JEST injected assertions to the generated JavaScript programs
N EZHA [23] introduces a guided differential testing tool with to convert them as conformance tests. We executed generated
conformance tests on four engines that support ES11: V8, [20] X. Xie, T. Y. Chen, F.-C. Kuo, and B. Xu, “A theoretical analysis
GraalJS, QuickJS, and Moddable XS. Using the execution re- of the risk evaluation formulas for spectrum-based fault localization,”
ACM Transactions on Software Engineering and Methodology (TOSEM),
sults, we found 44 engine bugs (16 for GraalJS, 6 for QuickJS, vol. 22, no. 4, pp. 1–40, 2013.
20 for Moddable XS, and 2 for V8) and 27 specification [21] J. Sohn and S. Yoo, “Fluccs: Using code and change metrics to
bugs. All the bugs were confirmed by TC39, the committee of improve fault localization,” in Proceedings of the 26th ACM SIGSOFT
International Symposium on Software Testing and Analysis. ACM,
ECMAScript, and the corresponding engine teams, and they 2017, pp. 273–283.
will be fixed in the specification and the engines. We believe [22] Y. Chen, T. Su, C. Sun, Z. Su, and J. Zhao, “Coverage-directed
that JEST takes the first step towards co-evolution of software differential testing of jvm implementations,” in proceedings of the 37th
ACM SIGPLAN Conference on Programming Language Design and
specifications, tests, and their implementations for CI/CD. Implementation, 2016, pp. 85–99.
[23] T. Petsios, A. Tang, S. Stolfo, A. D. Keromytis, and S. Jana, “Nezha:
ACKNOWLEDGEMENTS Efficient domain-independent differential testing,” in Proceedings of
IEEE Symposium on Security and Privacy, 2017, pp. 615–632.
This work was supported by National Research Founda- [24] Y. Chen and Z. Su, “Guided differential testing of certificate validation in
tion of Korea (NRF) (Grants NRF-2017R1A2B3012020 and ssl/tls implementations,” in Proceedings of the 2015 10th Joint Meeting
on Foundations of Software Engineering, 2015, pp. 793–804.
2017M3C4A7068177). [25] M. Georgiev, S. Iyengar, S. Jana, R. Anubhai, D. Boneh, and
V. Shmatikov, “The most dangerous code in the world: validating ssl
R EFERENCES certificates in non-browser software,” in Proceedings of the 2012 ACM
conference on Computer and communications security, 2012, pp. 38–49.
[1] (2012) Mark Zuckerberg’s Letter to Investors: ’The Hacker Way’. [26] P. Chapman and D. Evans, “Automated black-box detection of side-
[Online]. Available: https://www.wired.com/2012/02/zuck-letter/ channel vulnerabilities in web applications,” in Proceedings of the 18th
[2] (2020) What is CI/CD? Continuous integration ACM conference on Computer and communications security, 2011, pp.
and continuous delivery explained. [Online]. 263–274.
Available: https://www.infoworld.com/article/3271126/ [27] S. Kim, M. Faerevaag, M. Jung, S. Jung, D. Oh, J. Lee, and S. K.
what-is-cicd-continuous-integration-and-continuous-delivery-explained. Cha, “Testing intermediate representations for binary analysis,” in
html Proceedings of ACM International Conference on Automated Software
[3] (2020) Node.js - A JavaScript runtime built on Chrome’s V8 JavaScript Engineering, 2017, pp. 353–364.
engine. [Online]. Available: https://nodejs.org/ [28] H. Han and S. K. Cha, “Imf: Inferred model-based fuzzer,” in Pro-
[4] (2020) Moddable - Tools to create open IoT products using ceedings of the 2017 ACM SIGSAC Conference on Computer and
standard JavaScript on low cast microcontrollers. [Online]. Available: Communications Security, 2017, pp. 2345–2358.
https://www.moddable.com/ [29] C. Holler, K. Herzig, and A. Zeller, “Fuzzing with code fragments,”
[5] (2020) Espruino - JavaScript for Microcontrollers. [Online]. Available: in Presented as part of the 21st {USENIX} Security Symposium
https://www.espruino.com/ ({USENIX} Security 12), 2012, pp. 445–458.
[6] (2020) Tessel 2 - a robust IoT and robotics development platform. [30] X. Yang, Y. Chen, E. Eide, and J. Regehr, “Finding and understanding
[Online]. Available: https://tessel.io/ bugs in c compilers,” in Proceedings of the ACM Conference on
[7] (2015) Standard ECMA-262 6th Edition ECMAScript 2015 Programming Language Design and Implementation, 2011, pp. 283–
Language Specification. [Online]. Available: https://ecma-international. 294.
org/ecma-262/6.0/ [31] S. K. Cha, M. Woo, and D. Brumley, “Program-adaptive mutational
[8] Solidity. (2019) Official solidity documentation. [Online]. Available: fuzzing,” in 2015 IEEE Symposium on Security and Privacy. IEEE,
https://solidity.readthedocs.io/en/v0.5.7/ 2015, pp. 725–741.
[9] S. Hwang and S. Ryu, “Gap between theory and practice: An empirical [32] A. Rebert, S. K. Cha, T. Avgerinos, J. Foote, D. Warren, G. Grieco, and
study of security patches in solidity,” in Proceedings of the ACM/IEEE D. Brumley, “Optimizing seed selection for fuzzing,” in 23rd {USENIX}
International Conference on Software Engineering, 2020. Security Symposium ({USENIX} Security 14), 2014, pp. 861–875.
[10] W. M. McKeeman, “Differential testing for software,” Digital Technical [33] M. Woo, S. K. Cha, S. Gottlieb, and D. Brumley, “Scheduling black-
Journal, vol. 10, no. 1, pp. 100–107, 1998. box mutational fuzzing,” in Proceedings of the 2013 ACM SIGSAC
[11] J. Park, J. Park, S. An, and S. Ryu, “JISET: Javascript ir-based conference on Computer & communications security, 2013, pp. 511–
semantics extraction toolchain,” in Proceedings of ACM International 522.
Conference on Automated Software Engineering, 2020. [34] A. Wirfs-Brock and B. Eich, “Javascript: the first 20 years,” in Proceed-
[12] H. Nguyen, “Automatic extraction of x86 formal semantics from its ings of the ACM on Programming Languages, vol. 4, 2020, pp. 1–189.
natural language description,” Information Science, 2018. [35] P. Godefroid, A. Kiezun, and M. Y. Levin, “Grammar-based whitebox
[13] A. V. Vu and M. Ogawa, “Formal semantics extraction from natural fuzzing,” in Proceedings of the ACM Conference on Programming
language specifications for arm,” in International Symposium on Formal Language Design and Implementation, 2008, pp. 206–215.
Methods. Springer, 2019, pp. 465–483. [36] H. Han, D. Oh, and S. K. Cha, “CodeAlchemist: Semantics-aware code
[14] (2020) Google’s open source high-performance JavaScript and generation to find vulnerabilities in javascript engines.” in Proceedings
WebAssembly engine, written in C++. [Online]. Available: https: of the Network and Distributed System Security Symposium, 2019.
//v8.dev/ [37] J. Wang, B. Chen, L. Wei, and Y. Liu, “Superion: Grammar-aware
[15] (2020) A high performance implementation of the JavaScript greybox fuzzing,” in 2019 IEEE/ACM 41st International Conference on
programming language. Built on the GraalVM by Oracle Labs. Software Engineering (ICSE). IEEE, 2019, pp. 724–735.
[Online]. Available: https://github.com/graalvm/graaljs [38] S. Park, W. Xu, I. Yun, D. Jang, and T. Kim, “Fuzzing javascript engines
[16] (2020) A small and embeddable Javascript engine by Fabrice Bellard with aspect-preserving mutation,” in 2020 IEEE Symposium on Security
and Charlie Gordon. [Online]. Available: https://bellard.org/quickjs/ and Privacy (SP). IEEE, 2020, pp. 1629–1642.
[17] (2020) The JavaScript engine at the center of the Moddable [39] S. Lee, H. Han, S. K. Cha, and S. Son, “Montage: A neural network
SDK. [Online]. Available: https://github.com/Moddable-OpenSource/ language model-guided javascript engine fuzzer,” 2020.
moddable [40] J. A. Jones, M. J. Harrold, and J. Stasko, “Visualization of test informa-
[18] W. E. Wong, R. Gao, Y. Li, R. Abreu, and F. Wotawa, “A survey on tion to assist fault localization,” in Proceedings of the 24th International
software fault localization,” IEEE Transactions on Software Engineering, Conference on Software Engineering. ICSE 2002. IEEE, 2002, pp. 467–
vol. 42, no. 8, pp. 707–740, 2016. 477.
[19] B. McKenzie, “Generating strings at random from a context free [41] J. A. Jones, M. J. Harrold, and J. T. Stasko, “Visualization for fault
grammar,” Department of Computer Science, University of Canterbury, localization,” in in Proceedings of ICSE 2001 Workshop on Software
Tech. Rep. TR-COSC 10/97, 1997. Visualization. Citeseer, 2001.
[42] V. Dallmeier, C. Lindig, and A. Zeller, “Lightweight bug localization
with ample,” in Proceedings of the sixth international symposium on
Automated analysis-driven debugging, 2005, pp. 99–104.
[43] T. Janssen, R. Abreu, and A. J. van Gemund, “Zoltar: A toolset for au-
tomatic fault localization,” in 2009 IEEE/ACM International Conference
on Automated Software Engineering. IEEE, 2009, pp. 662–664.
[44] L. Naish, H. J. Lee, and K. Ramamohanarao, “A model for spectra-
based software diagnosis,” ACM Transactions on software engineering
and methodology (TOSEM), vol. 20, no. 3, pp. 1–32, 2011.
[45] W. E. Wong, Y. Qi, L. Zhao, and K.-Y. Cai, “Effective fault localization
using code coverage,” in 31st Annual International Computer Software
and Applications Conference (COMPSAC 2007), vol. 1. IEEE, 2007,
pp. 449–456.