Cloud 9
Cloud 9
CHAPTER 1
INTRODUCTION
Software testing is resource-hungry, time-consuming, labor-intensive, and prone to
human omission and error. Despite massive investments in quality assurance, serious code
defects are routinely discovered after software has been released [17], and fixing them at so late
a stage carries substantial cost [16].
Cloud9 is a cloud-based testing service that promises to make high-quality testing fast,
cheap, and practical. Cloud9 runs on compute utilities like Amazon EC2 [1], and we envision the
following three use cases: First, developers can upload their software to Cloud9 and test it
swiftly, as part of their development cycle. Second, end users can upload recently downloaded
programs or patches and test them before installing, with no upfront cost. Third, Cloud9 can
function as a quality certification service, akin to Underwriters Labs [20], by publishing official
coverage results for tested applications. In an ideal future, software companies would be required
to subject their software to quality validation on such a service, akin to mandatory crash testing
of vehicles. In the absence of such certification, software companies could be held liable for
damages resulting from bugs. For a software testing service to be viable, it must aim for maximal
levels of automation. This means the service must explore as many of the software’s execution
paths as possible without requiring a human to explicitly write test scripts. But such automation
can suffer from the tension between soundness and completeness—e.g., static analysis can be
complete on large code bases, but typically has a large number of false positives (i.e., is
unsound), while model checking is sound, but takes too long to achieve practical completeness
on real, large code bases. Of course, some level of assistance is inherently necessary to specify
the difference between correct and wrong behavior, but driving the program down execution
paths should not require human effort.
memory consumption and CPU-intensive constraint solving, both exponential in program size.
On a present day computer, it is only feasible to test programs with a few thousand lines of code;
for larger programs, typically only the shorter paths can be explored. Thus, symbolic execution is
virtually unheard of in the general software industry, because real software often has millions of
lines of code, rendering symbolic execution infeasible. Cloud9 is the first parallel symbolic
execution engine to run on large shared-nothing clusters of computers, thus harnessing their
aggregate memory and CPU resources. While parallelizing symbolic execution is a natural way
to improve the technique’s scalability, doing so in a cluster presents significant research
challenges: First, balancing execution workload among nodes becomes a complex multi-
dimensional optimization problem with several unknown inputs. Second, global coordination can
only be done infrequently, so new search strategies must be devised for exploring a program’s
paths in parallel.
CHAPTER 2
1. First, Cloud9 offers a cost-effective, flexible way to run massive test jobs with no upfront
cost. Unlike owning a private cluster, Cloud9 allows necessary machines to be
commissioned only when needed for testing, in a number suitable to the complexity of
the testing task. If a team required, e.g., 1,000 nodes for one hour every fortnight, the
corresponding yearly budget could be as low as $2,500 on EC2. This is orders of
magnitude less than the cost of acquiring and operating a private cluster of the same size.
2. Second, an automated-test service reduces the learning curve associated with test
frameworks. A standard Web service API can hide the complexity of operating an
automated test infrastructure, thus encouraging developers to use it more frequently and,
especially for new hires, to adopt thorough testing practices early on.
3. Third, running test infrastructure as a service offers high flexibility in resource allocation.
Whereas one would normally have to reserve a test cluster ahead of time, a cloud-based
service can provision resources on-demand, corresponding to the complexity of the
testing task at hand (e.g., depending on program size). It can also elastically recruit more
resources during compute-intensive phases of the tests and release them during the other
phases.
Cloud9 returns to the user a set of automatically discovered input tuples that trigger
conditions specified in the test goal, together with statistical information. For example, if the test
goal was 90% line coverage, Cloud9 would produce a test suite (i.e., a set of program input
tuples, or test cases) that, in the aggregate, exercises 90% of the uploaded program.
Alternatively, the goal may be to test for crashes, in which case Cloud9 produces a set of
pathological input tuples that can be used to crash the program, prioritized based on severity.
Each such input tuple can be accompanied by a corresponding core dump and stack trace, to
speed up debugging. Cloud9 does not require any special software on the user’s side—the input
tuples serve as the most eloquent evidence of the discovered bugs.
Upon receiving the results, users may be charged for the service based on previously
agreed terms. It is essential that the pricing model capture the true value offered by Cloud9.
While compute clouds today adopt a rental model (e.g., EC2 nodes cost $0.10/hour/node), a
Cloud9 user does not derive value proportional to this cost. We favor a model in which users are
charged according to their test goal specification. For example, if the goal is a certain level of
coverage, then the user is charged a telescoping amount $x for each percentage point of
coverage. If the goal is to find crashes, then a charge of $x for each crash-inducing defect is
reasonable. In both cases, x can be proportional to program size. A good pricing model
encourages frequent use of the service, thus increasing the aggregate quality of the software on
the market.
Finally, a viable testing service must address issues related to confidentiality of both
uploaded material and corresponding test results, as well as multi-tenancy. There are also
opportunities for amortizing costs across customers, e.g., by reusing test results for frequently
used libraries and frameworks, like libc, STL, log4j, etc.
CHAPTER 3
Symbolic execution [12] offers great promise as a technique for automated testing [10,
15, 4], as it can find bugs without human assistance. Instead of running the program with regular
inputs, a symbolic execution engine executes a program with “symbolic” inputs that are
unconstrained, e.g., an integer input x is given as value a symbol α that can take on any integer
value. When the program encounters a branch that depends on x, program state is forked to
produce two parallel executions, one following the then branch and another following the else-
branch. The symbolic values are constrained in the two clones so as to make the branch
condition evaluate to true (e.g. α<0), respectively false (e.g. α!0). Execution recursively splits
into sub-executions at each relevant branch, turning an otherwise linear execution into an
execution tree (Fig. 3.1).
Symbolic execution, then, consists of the systematic exploration of this execution tree.
Each inner node is a branching decision, and each leaf is a program state that contains its own
address space, program counter, and set of constraints on program variables. When an execution
encounters a testing goal (e.g., a bug), the constraints collected from the root to the goal leaf can
be solved to produce concrete program inputs that exercise the path to the bug. Thus, symbolic
execution is substantially more efficient than exhaustive input-based testing (it analyzes the
behavior of code for entire classes of inputs at a time, without having to try each one out), and
equally complete.
3.1 Challenges
1. The path exploration work must be distributed among worker nodes without
knowing how much work each portion of the execution tree entails. The size of sub-
trees cannot be known a priori: determining the propagation of symbolic inputs to
other program variables requires executing the program first. It is precisely this
propagation that determines which branch instructions will create new execution
states, i.e., nodes in the execution tree. As it turns out, execution trees are highly
unbalanced, and statically finding a balanced partitioning of an unexpanded
execution tree reduces to the halting problem. In addition to sub-tree size, another
unknown is how much memory and CPU will be required for a given state—the
amount of work for a sub-tree is the sum of all nodes’ work. Thus, work distribution
requires (as we will see later) a dynamic load balancing technique.
hashing data structure would need to be implemented, which requires special effort
and also incurs some performance penalties.
In general, the methods used so far in parallel model checkers [19, 3, 13, 2, 11] do not
scale to shared-nothing clusters. They also rely often on a priori partitioning a finite state space.
In a cloud setting, running parallel symbolic execution further requires coping with frequent
fluctuation in resource quality, availability, and cost. Machines have variable performance
characteristics, their network proximity to each other is unpredictable, and failures are frequent.
A system like Cloud9 must therefore cope with these problems in addition to the fundamental
challenges of parallel symbolic execution.
A smart exploration strategy helps find sooner the paths leading to the requested goal.
This is particularly relevant for symbolic execution trees of infinite size. The searcher can choose
any node on the unexplored horizon of the execution tree, not just the immediate descendants of
the current node.
The overall exploration is global, while Cloud9 searchers have visibility only into the
execution trees assigned to their particular workers. Thus, worker-level strategies must be
coordinated—a tightly coordinated strategy could achieve as efficient an exploration (i.e., with
as little redundant work) as a single-node symbolic execution engine. It is also possible to run
multiple instances of the runtime and searcher on the same physical machine, in which case the
strategies of the co-located searchers can see all sub-trees on that machine. But tight coupling
limits the ability of workers to function independently of each other, and would thus hurt
scalability.
In order to steer path exploration toward the global goal, Cloud9 employs several
techniques: Two-phased load balancing (§3.3) starts with an initial static split of the execution
tree and adjusts it dynamically, as exploration progresses. Replacing a single strategy with a
portfolio of strategies (§3.4) not only compensates for the limited local visibility, but can also
better exploit Cloud9’s parallelism. Finally, we employ techniques for reducing redundancy,
handling worker failures, and coping with heterogeneity (§3.5).
When Cloud9 starts, there is no load information, so it statically partitions the search
space (execution tree) among workers. Figure 3.3 illustrates an initial choice for a two-worker
Cloud9: one branch of the first branch instruction in the program is explored by worker W1 and
the other branch by workerW2.
The search space must be repartitioned on the fly, because the execution tree can become
highly imbalanced. Consider the execution tree in Figure 3.1: calling foo(α) could execute
substantially more branch instructions that depend on α than bar(α), thereby causing more new
states to be created in the corresponding sub-tree. Figure 3b illustrates one of the simplest cases
of work imbalance: worker W2 finishes exploring its sub-tree before W1, so W1 delegates toW2
the expansion of sub-tree S4.
More generally, the load balancer declares a load imbalance when the most loaded
worker W has at least x times the load of the least loaded worker w. We obtained good results by
using x= 10, which helps preventing high overheads associated with frequent load balancing. At
this point, the load balancer instructs W and w to negotiate a way of equalizing their load. The
two workers agree on the set of states {Si} to delegate from W to w, based not only on the
number of states, but also on other locally-computed metrics, such as which states (sub-trees)
appear to have highest constraint solving time or highest memory footprint. The main load
Our bitvector encoding resembles encodings used in stateless search, first introduced
by Verisoft [9] for model checking concurrent programs.
Choosing the best candidate for delegation is governed by the CPU vs. network tradeoff:
sending bitvectors is network-efficient, but consumes CPU for reconstruction on the target
worker, while transferring states is CPU efficient, but consumes network bandwidth. State
reconstruction is cheaper for sub-trees whose roots are shallow in the execution tree. In addition,
to optimize reconstruction time, the target worker reconstructs from the deepest common
ancestor between already-explored nodes and the newly received sub-tree. Since Cloud9 uses
copy-onwrite to share common memory objects between states, the longer the common prefix of
two nodes in the execution tree, the higher the memory sharing benefit will be. Finally, during
reconstruction, Cloud9 need not invoke the constraint solver, since the bitvector-encoded path is
guaranteed to be feasible.
Load balancing provides the means of connecting local strategies to the global goal. For
instance, if the goal is to obtain high coverage tests, Cloud9 searchers will assign a local score
for each state S indicating the expected coverage one might obtain by exploring S. High-score
states are moved to the head of each worker’s queue and prioritized for delegation to less loaded
workers, to be executed as soon as possible. Thus, each load balancing decision moves Cloud9
closer to the global goal.
small number of workers to a strategy that works exceptionally well, but only for a small fraction
of programs. Running this exploration on a copy of the execution tree in parallel with a classic
strategy that bears less risk may improve the expected time of reaching the overall goal.
We found constraint solving to account for half or more of total symbolic execution time.
Some of this time goes into re-solving constraints previously solved. Thus, we are building a
distributed cache of constraint solutions, which allows workers to reuse the computation
performed by other workers.
CHAPTER 4
INITIAL PROTOTYPE
A preliminary Cloud9 is prototype that runs on Amazon EC2 [1] and uses the single-node
Klee symbolic execution engine [4]. Preliminary measurements indicate that Cloud9 can achieve
substantial speedups over Klee. For our measurements, we used single-core EC2 nodes and
instructed both Klee and Cloud9 to automatically generate tests to exercise various UNIX
utilities, with the aim of maximizing test coverage, as in [4].
We measured how much faster a 16-node Cloud9 can achieve the same level of coverage
that Klee achieves in one hour. We tested a random subset of 32 UNIX utilities, with a uniform
distribution of binary sizes between the smallest utility (echo at 40 KB) and the largest one (ls at
170 KB). Figure 4 shows the results: speedup ranges from 2× to 250×, with an average speedup
of 47×. The speedup exceeds the 16-fold increase in computation resources, because Cloud9 not
only partitions the search across 16 nodes, but also increases the probability that a given worker
will find states with high coverage potential.
Fig 4.2 Coverage obtained by Cloud 9 and Klee, using identical number of CPU-hours
We also compared the amount of coverage obtained for a given level of CPU usage. We
ran Klee for 16 hours on one node and Cloud9 for 1 hour on 16 nodes (Figure 4.2), thus giving
each tool 16 CPU-hours. Cloud9 outperformed Klee in 28 out of 32 cases, reconfirming the
multiplicative benefit of parallel symbolic execution.
4.1 INTRODUCTION
With the virtualization achieving high-level performance, the resource requirements of
the applications in data centers is increasing at a tremendous cost. Resource management will
play a key role in the next generation virtualization. Where traditional computing mainly focuses
on resource cloud computing seeks to utilize resources with minimal wastage. We argue that this
will require a new strategic plan towards dealing with cloud computing resources which are
constrained.
efficiently. A key to this work is the implementation of swapping technique for virtual machine
migration for resource utilization. We consider memory as the resource requirement. There is
two types of virtual machine movements transfer and swap Transfer moves a job from one
machine to another, while swap exchanges two jobs assigned to different machines. We use the
movements, transfer and move in our work. Most of today's virtual machine provisioning method
swaps virtual machines without considering the resource optimization. This results in wasteful
virtual machine swapping and resource wastage. While the focus of the paper is on resource
optimization, we note that virtual machine swapping is considered as an important component in
virtual machine placement and scheduling.
We focus this paper on the virtual machine swapping that supports virtual machine
placement for efficient resource utilization. We describe our swapping model, and through a
preliminary performance evaluation, demonstrate that resources can be effectively utilized with
minimal wastage. The remainder of this abstract is organized as follows: In chapter 4.2 we
describe the existing work in virtual machine scheduling and migration. Chapter 6.1 we present
the algorithm for resource utilization with VM migration and swapping. In chapter 6.2 we
provide preliminary results of the effectiveness of our algorithm through simulation, and in
chapter 6.3 we describe the future directions and preliminary conclusions that can be drawn from
our work.
CHAPTER 5
RELATED WORK
To our knowledge, we are the first to parallelize symbolic execution to clusters of
computers. There has been work, however, on parallel model checking [19, 3, 2]. Nevertheless,
there are currently no model checkers that can scale to many loosely connected computers,
mainly due to the overhead of coordinating the search across multiple machines and transferring
explicit states. SPIN is a mature model checker that parallelizes and diversifies its search strategy
on a shared-memory multi-core system [3, 11]; we cannot directly apply those techniques to
shared-nothing clusters. Moreover, for the programs tested in [3, 11], the search space could be
statically partitioned a priori, which is not feasible for Cloud9.
There have been previous efforts to scale symbolic execution that do not involve
parallelization. For example, concolic testing [18] runs a program concretely, while at the same
time collecting path constraints along the explored paths; the constraints are then used to find
alternate inputs that would take the program along different paths. Another example is S2E [6],
which improves scalability by automatically executing symbolically only those parts of a system
that are of interest. Our techniques are complementary, and in our future work we intend to
combine S2EwithCloud9. In general,Cloud9 benefits from almost all single-node improvements
of symbolic execution.
CONCLUSION
This paper proposes Cloud9, a cloud-based parallel symbolic execution service. Our
work is motivated by the severe limitations of symbolic execution—memory and CPU usage—
that prevent its wide use. Cloud9 is designed to scale gracefully to large shared-nothing clusters;
by harnessing the aggregate resources of such clusters, we aim to make automated testing based
on symbolic execution feasible for large, real software systems.
Cloud9 is designed to run as a Web service, thus opening up the possibility of doing
automated testing in a payas-you-go manner. We believe that a cloud-based testing service can
become an essential component of software development infrastructure: it provides affordable
and effective software testing that can be provisioned on demand and be accessible to all
software developers.
REFERENCES
[1] Amazon EC2. http://aws.amazon.com/ec2.
[2] J. Barnat, L. Brim, and P. Rockai. Scalable multi-core LTL model-checking. In Intl. SPIN
Workshop, 2007.
[3] J. Barnat, L. Brim, and J. Stribna. Distributed LTL modelchecking in SPIN. In Intl. SPIN
Workshop, 2001.
[4] C. Cadar, D. Dunbar, and D. R. Engler. KLEE: Unassisted and automatic generation of
high-coverage tests for complex systems programs. In Symp. on Operating Systems Design
and Implementation, 2008.
[5] C. Cadar, V. Ganesh, P. M. Pawlowski, D. L. Dill, and D. R. Engler. EXE: Automatically
generating inputs of death. In Conf. on Computer and Communication Security, 2006.
[6] V. Chipounov, V. Georgescu, C. Zamfir, and G. Candea. Selective symbolic execution.
In Workshop on Hot Topics in Dependable Systems, 2009.
[7] Eucalyptus software. http://open.eucalyptus.com/.
[8] C. Flanagan and P. Godefroid. Dynamic partial-order reduction for model checking
software. SIGPLAN Not., 2005.
[9] P. Godefroid. Model checking for programming languages using Verisoft. In Symp. on
Principles of Programming Languages, 1997.
[10] P. Godefroid, N. Klarlund, and K. Sen. DART: Directed automated random testing. In
Conf. on Programming Language Design and Implementation, 2005.
[11] G. J. Holzmann, R. Joshi, and A. Groce. Swarm verification. In Intl. Conf. on Automated
Software Engineering, 2008.
[12] J. C. King. Symbolic execution and program testing. Communications of the ACM, 1976.
[13] R. Kumar and E. G. Mercer. Load balancing parallel explicit state model checking. In
Intl. Workshop on Parallel and Distributed Methods in Verification, 2004.
[14] C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis
and transformation. In Intl. Symp. on Code Generation and Optimization, 2004.
[15] R. Majumdar and K. Sen. Hybrid concolic testing. In Intl. Conf. on Software
Engineering, 2007.