1.lecture Notes
1.lecture Notes
Unit –IV
Intermediate code is used to translate the source code into the machine code. Intermediate code lies between the
high-level language and the machine language.
● If the compiler directly translates source code into the machine code without generating intermediate
code then a full native compiler is required for each new machine.
● The intermediate code keeps the analysis portion same for all the compilers that's why it doesn't need a
full compiler for every unique machine.
● Intermediate code generator receives input from its predecessor phase and semantic analyzer phase. It
takes input in the form of an annotated syntax tree.
● Using the intermediate code, the second phase of the compiler synthesis phase is changed according to
the target machine.
In the analysis-synthesis model of a compiler, the front end of a compiler translates a source program into an
independent intermediate code, then the back end of the compiler uses this intermediate code to generate the
target code (which can be understood by the machine). The benefits of using machine-independent intermediate
code are:
· Because of the machine-independent intermediate code, portability will be enhanced. For ex, suppose, if
a compiler translates the source language to its target machine language without having the option for
generating intermediate code, then for each new machine, a full native compiler is required. Because,
obviously, there were some modifications in the compiler itself according to the machine specifications.
· Retargeting is facilitated.
· It is easier to apply source code modification to improve the performance of source code by optimizing
the intermediate code.
Fig:1 Position of intermediate code generator
If we generate machine code directly from source code then for n target machine we will have optimizers and n
code generator but if we will have a machine-independent intermediate code, we will have only one optimizer.
Intermediate code can be either language-specific (e.g., Byte code for Java) or language. independent
(three-address code). The following are commonly used intermediate code representations:
1. Postfix Notation:
· Also known as reverse Polish notation or suffix notation.
· In the infix notation, the operator is placed between operands, e.g., a + b. Postfix notation positions
the operator at the right end, as in ab +.
· For any postfix expressions e1 and e2 with a binary operator (+) , applying the operator yields
e1e2+.
· Postfix notation eliminates the need for parentheses, as the operator’s position and arity allow
unambiguous expression decoding.
· In postfix notation, the operator consistently follows the operand.
Example 1: The postfix representation of the expression (a + b) * c is : ab + c *
Example 2: The postfix representation of the expression (a – b) * (c + d) + (a – b) is : ab – cd + *ab -+
2. Three-Address Code:
· A three address statement involves a maximum of three references, consisting of two for operands
and one for the result.
· A sequence of three address statements collectively forms a three address code.
· The typical form of a three address statement is expressed as x = y op z, where x, y, and z represent
memory addresses.
· Each variable (x, y, z) in a three address statement is associated with a specific memory location.
· While a standard three address statement includes three references, there are instances where a
statement may contain fewer than three references, yet it is still categorized as a three address statement.
Example: The three address code for the expression a + b * c + d : T1 = b * c T2 = a + T1 T3 = T2 + d;
T 1 , T2 , T3 are temporary variables.
There are 3 ways to represent a Three-Address Code in compiler design:
i) Quadruples
ii) Triples
iii) Indirect Triples
Syntax Tree:
· A syntax tree serves as a condensed representation of a parse tree.
· The operator and keyword nodes present in the parse tree undergo a relocation process to become
part of their respective parent nodes in the syntax tree. the internal nodes are operators and child nodes
are operands.
· Creating a syntax tree involves strategically placing parentheses within the expression. This
technique contributes to a more intuitive representation, making it easier to discern the sequence in
which operands should be processed.
· The syntax tree not only condenses the parse tree but also offers an improved visual representation of
the program’s syntactic structure,
Example: x = (a + b * c) / (a – b * c)
Easier to implement: Intermediate code generation can simplify the code generation process by reducing the
complexity of the input code, making it easier to implement.
Intermediate representation:
High level intermediate code can be represented as source code. To enhance performance of source code, we can
easily apply code modification. But to optimize the target machine, it is less preferred.
Low level intermediate code is close to the target machine, which makes it suitable for register and memory
allocation etc. it is used for machine-dependent optimizations.
Assignment Statement:
An Assignment statement is a statement that is used to set a value to the variable name in a program.
Assignment statement allows a variable to hold different types of values during its program lifespan. Another
way of understanding an assignment statement is, it stores a value in the memory location which is denoted by a
variable name.
Syntax
The symbol used in an assignment statement is called as an operator. The symbol is ‘=’.
Note: The Assignment Operator should never be used for Equality purpose which is double equal sign ‘==’.
variable = expression ;
where,
variable = variable name
Few programming languages such as Java, C, C++ require data type to be specified for the variable, so that it is
easy to allocate memory space and store those values during program execution.
Example –
int a = 50 ;
float b ;
a = 25 ;
b = 34.25f ;
In the above-given examples, Variable ‘a’ is assigned a value in the same statement as per its defined data type.
A data type is only declared for Variable ‘b’. In the 3rd line of code, Variable ‘a’ is reassigned the value 25. The
4th line of code assigns the value for Variable ‘b’.
1. Basic Form
This is one of the most common forms of Assignment Statements. Here the Variable name is defined, initialized,
and assigned a value in the same statement. This form is generally used when we want to use the Variable quite
a few times and we do not want to change its value very frequently.
int RollNo = 25 ;
printf("%d",RollNo);
Output –
25
2. Tuple Assignment
Generally, we use this form when we want to define and assign values for more than 1 variable at the same time.
This saves time and is an easy method. Note that here every individual variable has a different value assigned to
it.
a, b = 50, 100 ;
print(a) ;
print(b) ;
Output –
50
100
3. Sequence Assignment
x,y,z = 'HEY' ;
print('x = ', x) ;
print('y = ', y) ;
print('z = ', z) ;
Output –
x= H
y= E
z= Y
a = b = 40 ;
print(a, b) ;
Output –
40 40
5. Augmented Assignment
In this format, we use the combination of mathematical expressions and values for the Variable. Other
augmented Assignment forms are: &=, -=, **=, etc.
speed = 40 ;
Output –
Speed = 50
Boolean expressions:
Before starting with the topic directly lets us see what is a Boolean expression. It is an expression that always
yields two values either true or false when evaluated. If the condition is true then it will return true or false and
vice versa.
Let’s take one simple example that will clear the concept of Boolean expression so the expression (5>2) i.e. 5
greater than 2 as we can see it is true that means 5 is greater than 2 therefore the result will be true as we can see
the expression yields true as value, therefore, it is called Boolean expression.
When more than two expressions are to be evaluated (expression 1, expression 2) so it is done using boolean
operators so we need to connect them using Boolean operators like (expression 1, boolean operator, expression
2).
Boolean Operators :
Boolean operators are used to connecting more than two expressions so that they will be evaluated. When you
want multiple expressions to be evaluated then we use Boolean operators and using that multiple expressions are
evaluated and we get the answer as true or false.
Examples: let’s check if 5 is greater than 2 and less than 10 using boolean operators and expressions so in
programming they will be written like (5>2&&5<10), as we want both expressions to be evaluated as true so we
used AND so the result of this boolean expression will be true as 5, is greater than 2 and it is less than 10,
therefore, it will produce GFG! as output in the program given below also various examples are included in it.
Switch statements
The switch statement is available in a variety of languages. Our switch-statement syntax is shown below. There's
a selection expression E to evaluate, then ‘n’ constant values C1, C2,..., Cn that the expression could take,
possibly including a default "value" that always matches the expression if no other value does.
There's a selection expression to evaluate, then ‘n’ constant values the expression could take, including a default
"value" that always matches the expression if no other value does.
Step 2: Determine which value in the list of situations is the same as the expression's value.
Backpatching:
Backpacking in compiler design refers to reducing the size of a compiler by removing unnecessary components,
such as unused variables, functions, or code, to make it more efficient and optimized. This process is known as
"compiler backpacking" or "compiler slimming".
During the code generation phase, the compiler must conduct leaps, but the values required for these jumps may
not be known in one pass, so it improvises by putting in values that will be replaced once the correct values are
known, a process known as Backpatching.
Backpatching is a technique used in compiler design to delay the assignment of addresses to code or data
structures until a later stage of the compilation process. This allows the compiler to generate code with
placeholder addresses that are later updated or "back patched" with the correct addresses once they are known.
Backpatching is commonly used in compilers for languages that support complex control structures or dynamic
memory allocation.
Backpatching can be used to generate a boolean expressions program and the flow of control statements in a
single pass. In jumping code for Boolean statements, the synthesized properties true list and false list of
non-terminal B are utilized to handle labels.
· B.truelist, which is a list of the jump or conditional jump instructions, should contain the label to which
control should move if B is true.
· When B is false, B.falselist is a list of instructions that will eventually result in the label to which
control will be assigned.
All of the jumps to true and false, as well as the label field, are left blank when the program for B is created. The
lists B.truelist and B.falselist include these early leaps.
A list of jumps to the instruction immediately after the code for S is displayed by the synthesized attribute
S.nextlist on a statement S, for example. It can generate instructions into an instruction array, with labels acting
as indexes. Three functions are used to change the list of leaps:
· Makelist (i): Make a new list with only i, an index into the array of instructions, then return a pointer to
the newly produced list with the makelist command.
· Merge (p1,p2): Returns a pointer to the concatenated list after concatenating the lists pointed to by p1
and p2.
· Backpatch (p, i): For each of the instructions on the record pointed to by p, inserts i like the target label.
1. Boolean expression:
Boolean expressions are statements whose results can be either true or false. A boolean expression which is
named for mathematician George Boole is an expression that evaluates to either true or false. Let’s look at some
common language examples:
· My favorite color is blue. → true
· I am afraid of mathematics. → false
· 2 is greater than 5. → false
The most elementary programming language construct for changing the flow of control in a program is a label
and goto. When a compiler encounters a statement like goto L, it must check that there is exactly one statement
with label L in the scope of this goto statement.
Code generator converts the intermediate representation of source code into a form that can be readily executed
by the machine. A code generator is expected to generate the correct code. Designing of the code generator
should be done in such a way that it can be easily implemented, tested, and maintained.
Input to code generator – The input to the code generator is the intermediate code generated by the front end,
along with information in the symbol table that determines the run-time addresses of the data objects denoted by
the names in the intermediate representation. Intermediate codes may be represented mostly in quadruples,
triples, indirect triples, Postfix notation, syntax trees, DAGs, etc. The code generation phase just proceeds on an
assumption that the input is free from all syntactic and state semantic errors, the necessary type checking has
taken place and the type-conversion operators have been inserted wherever necessary.
· Target program: The target program is the output of the code generator. The output may be absolute
machine language, relocatable machine language, or assembly language.
· Absolute machine language as output has the advantages that it can be placed in a fixed
memory location and can be immediately executed. For example, WATFIV is a compiler that
produces the absolute machine code as output.
· Relocatable machine language as an output allows subprograms and subroutines to be
compiled separately. Relocatable object modules can be linked together and loaded by a linking
loader. But there is added expense of linking and loading.
· Assembly language as output makes the code generation easier. We can generate symbolic
instructions and use the macro-facilities of assemblers in generating code. And we need an
additional assembly step after code generation.
· Memory Management – Mapping the names in the source program to the addresses of data objects is
done by the front end and the code generator. A name in the three address statements refers to the symbol
table entry for the name. Then from the symbol table entry, a relative address can be determined for the
name.
Allocation vs Assignment:
Allocation –
Maps an unlimited namespace onto that register set of the target machine.
· Reg. to Reg. Model: Maps virtual registers to physical registers but spills excess amount to memory.
· Mem. to Mem. Model: Maps some subset of the memory location to a set of names that models the
physical register set.
Allocation ensures that code will fit the target machine’s reg. set at each instruction.
Assignment –
Maps an allocated name set to the physical register set of the target machine.
· Assumes allocation has been done so that code will fit into the set of physical registers.
· No more than ‘k’ values are designated into the registers, where ‘k’ is the no. of physical register
In the compilation process, the high level code must be transformed into low level code. To perform this
transformation, the object code generated must retain the exact meaning of the source code. Hence DAG is used
to depict the structure of the basic blocks, and helps to see the flow of the values among the blocks and offers
some degree of optimization too.
A DAG is used in compiler design to optimize the basic block. It is constructed using Three Address Code.
Then after construction, multiple transformations are applied such as dead code elimination, and common
subexpression elimination. DAG's are useful in compilers because topological ordering can be defined in the
case of DAGs, which is crucial for construction of object level code. Transitive reduction as well as closure are
uniquely defined for DAGs. This is how a DAG looks like:
Now, we will discuss some characteristics of DAG.
Fig Name
Directed Acyclic Graph Characteristics
The following are some characteristics of DAG.
· DAG is a type of Data Structure used to represent the structure of basic blocks.
· Its main aim is to perform the transformation on basic blocks.
· The leaf nodes of the directed acyclic graph represent a unique identifier that can be a variable or a
constant.
· The non-leaf nodes represent an operator symbol.
· Moreover, the nodes are also given a string of identifiers to use as labels for the computed value.
· Transitive closure and transitive reduction are defined differently in DAG.
· DAG has defined topological ordering.
Next, we will discuss the algorithm to draw a DAG.
2. Case 2: x = op y
where x and y are operands and op is an operator.
3. Case 3: x = y
where x and y are operands.
Now, we will discuss the steps to draw a DAG handling the above three cases.
Steps
To draw a DAG, follow these three steps.
1. Step 1:
According to step 1,
1. If, in any of the three cases, the y operand is not defined, then create a node(y).
2. If, in case 1, the z operand is not defined, then create a node(z).
2. Step 2:
According to step 2,
1. For case 1, create a node(op) with node(y) as its left child and node(z) as its right child. Let the
name of this node be n.
2. For case 2, check whether there is a node(op) with one child node as node(y). If there is no such
node, then create a node.
3. For case 3, node n will be node(y).
3. Step 3:
For a node(x), delete x from the list of identifiers. Add x to the list of attached identifiers list found in step
2. At last, set node(x) to n.
Now, we will consider an example of drawing a DAG.
Examples:
Example 1
Consider the following statements with their three address code-
a=b*c
d=b
e=d*c
b=e
f=b+c
g=d+f
We will construct a DAG for these six statements.
Solution- Since we have six different statements, we will start drawing a graph from the first one.
Step 1: The first statement is a = b * c. This statement lies in the first case from the three cases defined above.
So, we will draw a node(*) with its left child node(b) and right child node(c).
Step 2: The second statement is d = b. This statement lies in the third case from the three cases defined above.
Since we already have a node defining operand b, i.e., node(b), we will append d to node(b).
Step 3: The third statement is e = d * c. This statement lies in the first case from the three cases defined above.
Since we already have a node defining the * (multiplication) operation, i.e., node(*), and nodes defining operand
d and c, i.e., node(d) and node(c), respectively. Thus, we will append the result of d * c, i.e., e to a.
Step 4: The fourth statement is b = e. This statement lies in the third case from the three cases defined above.
Since we have both b and e defined in our graph, we will simply append b to e.
Step 5: The fifth statement is f = b + c. This statement lies in the first case from the three cases defined above.
Since we already have a node defining operands b and c, i.e., node(b) and node(c), respectively but no node
representing + (addition) operation. We will draw a node(+) with its left child node(b) and right child node(c).
Step 6: The sixth statement is g = d + f. This statement lies in the first case from the three cases defined above.
Since we already have a node defining operands d and f. We will draw a node(+) with its left child node(d) and
right child node(f).
The above graph is the final DAG representation for the given basic block.
y = x + 5;
i = y;
z = i;
w = z * 3;
Optimized code:
y = x + 5;
//* We've removed two redundant variables i & z whose value were just being copied from one another.
B. Constant folding: The code that can be simplified by the user itself, is simplified. Here simplification to be
done at runtime are replaced with simplified code to avoid additional computation.
Initial code:
x = 2 * 3;
Optimized code:
x = 6;
C. Strength Reduction: The operators that consume higher execution time are replaced by the operators
consuming less execution time.
Initial code:
y = x * 2;
Optimized code:
y = x + x; or y = x << 1;
Initial code:
y = x / 2;
Optimized code:
y = x >> 1;
a := a * 1;
a := a/1;
a := a - 0;
F. Deadcode Elimination: Dead code refers to portions of the program that are never executed or do not affect
the program’s observable behavior. Eliminating dead code helps improve the efficiency and performance of the
compiled program by reducing unnecessary computations and memory usage.
Initial Code:-
int Dead(void)
int a=10;
int z=50;
int c;
c=z*5;
printf(c);
a=20;
return 0;
}
Optimized Code:-
int Dead(void)
int a=10;
int z=50;
int c;
c=z*5;
printf(c);
return 0;
}
The advantage of generating code for a basic block from its dag representation is that from a dag we can easily
see how to rearrange the order of the final computation sequence than we can start from a linear sequence of
three-address statements or quadruples.
The order in which computations are done can affect the cost of resulting object code. For example, consider the
following basic block:
t1 : = a + b
t2 : = c + d
t3 : = e - t2
t4 : = t1 - t3
In this order, two instructions MOV R0 , t1 and MOV t1 , R1 have been saved.
The heuristic ordering algorithm attempts to make the evaluation of a nod the evaluation of its leftmost
argument. The algorithm shown below produces the ordering in reverse.
Algorithm:
1) while unlisted interior nodes remain do begin
2) select an unlisted node n, all of whose parents have been listed;
3) list n;
4) while the leftmost child m of n has no unlisted parents and is not a leaf do
begin
5) list m;
6) n:=m
end
end
Code sequence:
t8 : = d + e
t6 : = a + b
t5 : = t6 - c
t4 : = t5 * t8
t3 : = t4 - e
t2 : = t6 + t4
t1 : = t2 * t3
This will yield an optimal code for the DAG on machine whatever be the number of registers.
Recourses:
1. https://www.geeksforgeeks.org/intermediate-code-generation-in-compiler-design/
2. https://www.javatpoint.com/intermediate-code
3. https://www.toppr.com/guides/computer-science/introduction-to-c/
4. https://www.geeksforgeeks.org/boolean-search/
5. https://www.naukri.com/code360/library/switch-statements
6. https://archive.nptel.ac.in/content/storage2/courses/106104072/chapter_8/8_40.htm
7. https://www.naukri.com/code360/library/backpatching
8. https://www.geeksforgeeks.org/register-allocations-in-code-generation/
9. https://www.naukri.com/code360/library/dag-representation
10. https://www.brainkart.com/article/Generating-Code-From-DAGs_8107/