Generating Unit Tests for Floating Point Embedded Software using Compositional Dynamic Symbolic Execution

Compositional dynamic symbolic execution [1, 2] is a well-known technique for generating unit test data that would achieve high branch coverage for software with no floating point data types. However, if source code heavily depends on floating point computations, this method is unable to achieve good code coverage results. In this paper we present a method that would integrate both: compositional dynamic symbolic execution [3] and searchbased test data generation methods to achieve better code coverage for embedded software that largely depends on floating point computations. We have implemented our method as an extension of a well-know symbolic execution engine – PEX [4]. Our extension implements search-based testing as an optimization technique using AVM [5] method. We present coverage comparison for several benchmark functions.


Introduction
Compositional dynamic symbolic execution [1,2] is a well-known technique for generating unit test data that would achieve high branch coverage for software with no floating point data types.However, if source code heavily depends on floating point computations, this method is unable to achieve good code coverage results.In this paper we present a method that would integrate both: compositional dynamic symbolic execution [3] and searchbased test data generation methods to achieve better code coverage for embedded software that largely depends on floating point computations.We have implemented our method as an extension of a well-know symbolic execution engine -PEX [4].Our extension implements search-based testing as an optimization technique using AVM [5] method.We present coverage comparison for several benchmark functions.

Dynamic symbolic execution
Static symbolic execution was first proposed by J. C. King [6].However, in real-world scenarios this method suffers because of external calls, unsolvable constraints and functions that cannot be reasoned about symbolically.This is why dynamic symbolic execution (DSE) was proposed.DSE combines concrete and symbolic execution by providing constraint solver runtime values.Constraints in DSE are simplified in this way so that they become more amenable to constraint solving.However, if source code is analyzed as a single "flat" unit, this method quickly faces the state space explosion problem because its' complexity is exponential in terms of function count in analyzed program.

Compositional dynamic symbolic execution
Compositional dynamic symbolic execution is an extension to DSE and is currently implemented in several popular tools, such as PEX [4], EXE [7], SMART [1], CUTE [8].The main idea behind compositional dynamic symbolic execution is to extend DSE with inter-and intrafunction call support.Calls to internal functions are analyzed, and every function is augmented with additional meta-data -function summaries.Authors of SMART [1] define function summary φ f for a function f as a formula of propositional logic whose propositions are constraints expressed in some theory T. φ f can be computed by successive iterations and defined as a disjunction of formulas φ w of the form φ w =pre w ∧post w , where pre w is a conjunction of constraints on the inputs of f while post w is a conjunction of constraints on the outputs of f. φ w can be computed from the path constraint corresponding to the execution path w.If we analyze functions in this way we will see that the number of execution paths considered by compositional dynamic symbolic execution will be at most nb (where n is the number of functions f in program P, b is the search depth bound) and is therefore linear in nb.However, this method would end performing random search for floating point constraints that are present in program.

Problem formulation
In order to reach a branch with some concrete input values, a path constraint for this branch must be passed to constraint solver.Then, a constraint solver must find a counter-example for this constraint or end search with answer "unsatisfiable".Currently search for a counterexample for floating-point path constraints are implemented in one of the following ways: 1) approximating floating point data types as real types and solving constraint in theory of real numbers [9] (authors' note: no SMT theory of floating point numbers was available at the moment of writing this publication) or 2) performing random search.However, both solutions have their drawbacks.First option does not take into account specifics of floating point arithmetic (rounding error, normalized forms, etc.) [10].This method can even sometimes return incorrect results.For example when approximating path constraint as real constraint (despite x is a floating point variable), constraint solver would not be able to find any solutions that would satisfy this path constraint.However, any floating-point number (represented as defined in IEEE 754 standard [11] [12]).

Optimization problem: reaching branches with unsolvable and floating-point path constraints
As soon as our method gets response from constraint solver with answer "unsatisfiable" and constraint contains floating point data types, the corresponding path constraint is forwarded to search-based AVM sub-routine.Entire path constraint (PC) can be formulated as a disjunction of individual branch statement constraints in the following way Vector x  is what we are trying to find: concrete input values for method under test (MUT).First, we initialize this vector with all random values.Second, using AVM method, we try to move candidate solution towards optimum solution.To do this we first must define what the optimum is by defining the objective function.This function will be used to define whether new generated test input is better or not.We define the objective (also known as fitness) function  as a weighted sum of branch distance [13] measures (  ) for each branch statement Algorithm goal is to minimize function  .When our algorithm starts, all weights i w for each branch statement i pc are initialized to 1.The weight is increased if no solution is found for specified number of iterations.This helps algorithm to concentrate on difficult tasks, because whole solution is found only if it satisfies all clauses i pc .
Distance function  measures how far current candidate solution x  is from satisfying the branch constraint i pc ..NET languages may have various constraint operators in branches.We have defined distance measure function for each of these operators in Table 1.

Evaluation
We have implemented proposed method as an extension to a well known dynamic symbolic execution tool PEX [4].Extension was implemented as a custom arithmetic solver using interface Microsoft.ExtendedReflection.Reasoning.ArithmeticSolving.IArithmeticSolver.Our extension can be enabled by simply adding custom AVMCustomArithmeticSolver attribute to the test assembly containing parameterized unit tests.

Evaluation subjects -benchmark functions
We have evaluated our method with several classical optimization problems [14], manually rewritten into C# language, .NET Micro framework.To implement benchmark functions we needed several mathematical functions (such as Pow, Atan, Sqrt, Exp) that are not implemented in .NET Micro framework.For this purpose we will use our extended version of Microsoft.SPOT.Math class -"exMath".

Experimental setup
We have implemented simple parameterized test for each benchmark function.Parameterized tests would take function inputs as arguments and have only one "IF" statement.If global optimum for given benchmark function is reached, then test would pass, otherwise, Assert.Fail() will be called.Because of stochastic nature of experiment, we repeated each iteration 100 times.In every iteration for every benchmark function we performed following actions: generated available test inputs, run the generated test and measured block coverage.Average block coverage for each function is given in Fig. 1.In order to have proper results we had to limit exploration bounds.We set maximum exploration time bound for both random and AVM methods to 1 minute.

Coverage increase
We can clearly see from Fig. 1, that our proposed method outperforms random search in terms of branch coverage for most of benchmark functions.The only function for which average branch coverage was not increased using AVM method was Rosenbrock.Function itself has a long, parabolic shaped flat valley.The global minimum is inside that valley.AVM method finds valley easily, however to converge to the global minimum given previously defined exploration bounds is difficult.For Beale and Powell badly scaled functions our proposed method was able to find solution, but not for all iterations.This is why average block coverage for those functions was not 100%.

Conclusions
In this article we presented a method for generating test for embedded software that is highly dependent on floating-point expressions and calculations.We have combined two previously well known techniques to solve this problem: compositional dynamic symbolic execution for path exploration and search-based testing (using alternating variable method) for complex floating-point computations.We have evaluated our proposed method against a suite of benchmark functions, compiled in .NET Micro Framework.In our future work we plan to implement more algorithms as custom arithmetic solvers, compare their performance and coverage increase.Furthermore, there has been several attempts to integrate floating point arithmetic into several well know SMT solvers (such as Z3, JPF).We plan to evaluate same test subjects as soon as any of these SMT solvers gets updated with floating-point solving capabilities.There has recently been issued several studies on real-world software [15,16] and studies showed, that approximating floating-point types as real types is not always an issue.This is why we plan to perform further experiments with real-world open source projects as well as other optimization algorithms [17].

Table 1 .
Distance measures for .NET branch operators