STM Unit - Iv
STM Unit - Iv
• The first set of parallel paths is denoted by X + Y + d and the second set by U +V + W + h
+ i + j. The set of all paths in this flowgraph is
f(X + Y + d)g(U + V + W + h + i + j)k
• The path is a set union operation, it is clearly Commutative and Associative.
DISTRIBUTIVE LAWS:
• The product and sum operations are distributive, and the ordinary rules of multiplication
apply; that is
` RULE 4: A(B+C)=AB+AC and (B+C)D=BD+CD
• Applying these rules to the below Figure (a) yields
e(a+b)(c+d)f=e(ac+ad+bc+bd)f= eacf+eadf+ebcf+ebdf
ABSORPTION RULE:
• If X and Y denote the same set of paths, then the union of these sets is unchanged;
consequently,
RULE 5: X+X=X (Absorption Rule)
• For example, if X=a+aa+abc+abcd+def then X+a = X+aa = X+abc = X+abcd = X+def =
X
LOOPS:
• Loops can be understood as an infinite set of parallel paths. Say that the loop consists of a
single link b. then the set of all paths through that loop point is
b0+b1+b2+b3+b4+b5+..............
This potentially infinite sum is denoted by b* for an individual link and by X* when X is a path
expression.
• The path expression for the above figure is denoted by the notation:
ab*c=ac+abc+abbc+abbbc+................
• Evidently, aa*=a*a=a+ and XX*=X*X=X+
2. REDUCTION PROCEDURE ALGORITHM:
• This section presents a reduction procedure for converting a flow graph whose links are
labeled with names into a path expression that denotes the set of all entry/exit paths in that
flow graph. The procedure is a node-by-node removal algorithm.
The steps in Reduction Algorithm are as follows:
1. Combine all serial links by multiplying their path expressions.
2. Combine all parallel links by adding their path expressions.
3. Remove all self-loops (from any node to itself) by replacing them with a link of the
form X*, where X is the path expression of the link in that loop.
STEPS 4 - 8 ARE IN THE ALGORIHTM'S LOOP:
A flow graph can have many equivalent path expressions between a given pair of nodes;
that is, there are many different ways to generate the set of all paths between two nodes
without affecting the content of that set.
The appearance of the path expression depends, in general, on the order in which nodes
are removed.
• Removing the loop and then node 6 result in the following expression:
a(bgjf)*b(c+gkh)d((ilhd)*imf(bjgf)*b(c+gkh)d)*(ilhd)*e
THE PROBLEM:
The generic flow-anomaly detection problem (note: not just data-flow anomalies, but
any flow anomaly) is that of looking for a specific sequence of options considering all
possible paths through a routine.
Let the operations be SET and RESET, denoted by s and r respectively, and we want to
know if there is a SET followed immediately a SET or a RESET followed immediately
by a RESET (an ss or an rr sequence).
Some more application examples:
1. A file can be opened (o), closed (c), read (r), or written (w). If the file is read or
written to after it's been closed, the sequence is nonsensical. Therefore, cr andcw are
anomalous. Similarly, if the file is read before it's been written, just after opening, we
THE METHOD:
Annotate each link in the graph with the appropriate operator or the null operator 1.
Simplify things to the extent possible, using the fact that a + a = a and 12 = 1.
You now have a regular expression that denotes all the possible sequences of operators
in that graph. You can now examine that regular expression for the sequences of
interest.
EXAMPLE: Let A, B, C, be nonempty sets of character sequences whose smallest
string is at least one character long. Let T be a two-character string of characters. Then if
T is a substring of (i.e., if T appears within) ABnC, then T will appear in AB2C.
(HUANG's Theorem)
As an example, let
A = pp
B = srr
C = rp
T = ss
The theorem states that ss will appear in pp(srr)nrp if it appears in pp(srr)2rp.
However, let
A = p + pp + ps
B = psr + ps(r + ps)
C = rp
T = P4
Multiplying out the expression and simplifying shows that there is no p4 sequence.
Incidentally, the above observation is an informal proof of the wisdom of looping
twice discussed in Unit 2. Because data-flow anomalies are represented by
two-character sequences, it follows the above theorem that looping twice is what
you need to do to find such anomalies.
LIMITATIONS:
Huang's theorem can be easily generalized to cover sequences of greater length than
two characters. Beyond three characters, though, things get complex and this method
has probably reached its utilitarian limit for manual application.
There are some nice theorems for finding sequences that occur at the beginnings and
ends of strings but no nice algorithms for finding strings buried in an expression.
Static flow analysis methods can't determine whether a path is or is not achievable.
Unless the flow analysis includes symbolic execution or similar techniques, the impact
of unachievable paths will not be included in the analysis.
The flow-anomaly application, for example, doesn't tell us that there will be a flow
anomaly - it tells us that if the path is achievable, then there will be a flow anomaly.
Such analytical problems go away, of course, if you take the trouble to design routines
for which all paths are achievable.
4. APPLICATIONS:
The purpose of the node removal algorithm is to present one very generalized
concept- the path expression and way of getting it.
Every application follows this common pattern:
1. Convert the program or graph into a path expression.
2. Identify a property of interest and derive an appropriate set of "arithmetic"
rules that characterizes the property.
3. Replace the link names by the link weights for the property of interest. The
path expression has now been converted to an expression in some algebra,
The question is not simple. Here are some ways you could ask it:
1. What is the maximum number of different paths possible?
2. What is the fewest number of paths possible?
3. How many different paths are there really?
4. What is the average number of paths?
Determining the actual number of different paths is an inherently difficult problem
because there could be unachievable paths resulting from correlated and dependent
predicates.
If we know both of these numbers (maximum and minimum number of possible paths)
we have a good idea of how complete our testing is.
Asking for "the average number of paths" is meaningless.
Label each link with a link weight that corresponds to the number of paths that link
represents.
Also mark each loop with the maximum number of times that loop can be taken. If the
answer is infinite, you might as well stop the analysis because it is clear that the
maximum number of paths will be infinite.
There are three cases of interest: parallel links, serial links, and loops.
Each link represents a single link and consequently is given a weight of "1" to start. Lets
say the outer loop will be taken exactly four times and inner Loop Can be taken zero or
three times Its path expression, with a little work, is:
Path expression: a(b+c)d{e(fi)*fgj(m+l)k}*e(fi)*fgh
A: The flow graph should be annotated by replacing the link name with the maximum
of paths through that link (1) and also note the number of times for looping.
B: Combine the first pair of parallel loops outside the loop and also the pair in the outer
loop.
C: Multiply the things out and remove nodes to clear the clutter.
Structured code can be defined in several different ways that do not involve ad-hoc rules
such as not using GOTOs.
A structured flowgraph is one that can be reduced to a single link by successive
application of the transformations of Below Figure.
A lower bound on the number of paths in a routine can be approximated for structured
flow graphs.
The arithmetic is as follows:
The values of the weights are the number of members in a set of paths.
EXAMPLE:
Applying the arithmetic to the earlier example gives us the identical steps unitl step 3 (C) as
below: