2.question Bank
2.question Bank
Unit-4
o The input to the code generator contains the intermediate representation of the source
program and the information of the symbol table. The source program is produced by the
front end.
o Intermediate representation has the several choices:
a) Postfix notation
b) Syntax tree
c) Three address code
o We assume front end produces low-level intermediate representation i.e. values of names
in it can directly manipulated by the machine instructions.
o The code generation phase needs complete error-free intermediate code as an input
requires.
2. Target program:
The target program is the output of the code generator. The output can be:
a) Assembly language: It allows subprogram to be separately compiled.
c) Absolute machine language: It can be placed in a fixed location in memory and can be
executed immediately.
3. Memory management
o During code generation process the symbol table entries have to be mapped to actual p
addresses and levels have to be mapped to instruction address.
o Mapping name in the source program to address of data is co-operating done by the front
end and code generator.
o Local variables are stack allocation in the activation record while global variables are in
static area.
4. Instruction selection:
o Nature of instruction set of the target machine should be complete and uniform.
o When you consider the efficiency of target machine then the instruction speed and
machine idioms are important factors.
o The quality of the generated code can be determined by its speed and size.
Example:
1. a:= b + c
2. d:= a + e
5. Register allocation
Register can be accessed faster than memory. The instructions involving operands in register are
shorter and faster than those involving in memory operand.
Register allocation: In register allocation, we select the set of variables that will reside in
register.
Register assignment: In Register assignment, we pick the register that contains variable.
Certain machine requires even-odd pairs of registers for some operands and result.
For example:
1. D x, y
Where,
y is the divisor
6. Evaluation order
The efficiency of the target code can be affected by the order in which the computations are
performed. Some computation orders need fewer registers to hold results of intermediate than
others.
Basic block contains a sequence of statement. The flow of control enters at the beginning of the
statement and leave at the end without any halt (except may be the last instruction of the block).
1. t1:= x * x
2. t2:= x * y
3. t3:= 2 * t2
4. t4:= t1 + t3
5. t5:= y * y
6. t6:= t4 + t5
Output: it contains a list of basic blocks with each three address statement in exactly one block
Method: First identify the leader in the code. The rules for finding leaders are as follows:
Consider the following source code for dot product of two vectors a and b of length 10:
1. begin
2. prod :=0;
3. i:=1;
4. do begin
5. prod :=prod+ a[i] * b[i];
6. i :=i+1;
7. end
8. while i <= 10
9. end
The three address code for the above source program is given below:
B1
B2
1. (3) t1 := 4* i
2. (4) t2 := a[t1]
3. (5) t3 := 4* i
4. (6) t4 := b[t3]
5. (7) t5 := t2*t4
6. (8) t6 := prod+t5
7. (9) prod := t6
8. (10) t7 := i+1
9. (11) i := t7
10.(12) if i<=10 goto (3)
A simple but effective technique for improving the target code is peephole optimization,
a method for trying to improving the performance of the target program by examining a short
sequence of target instructions (called the peephole) and replacing these instructions by a shorter
or faster sequence, whenever possible.
The peephole is a small, moving window on the target program. The code in the peephole
need not be contiguous, although some implementations do require this. It is characteristic of
peephole optimization that each improvement may spawn opportunities for additional
improvements.
Answer: The stack allocation is a runtime storage management technique. The activation records
are pushed and popped as activations begin and end respectively.
Storage for the locals in each call of the procedure is contained in the activation record for that
call. Thus, locals are bound to fresh storage in each activation, because a new activation record is
pushed onto the stack when the call is made.
It can be determined the size of the variables at a run time & hence local variables can have
different storage locations & different values during various activations. Suppose that the
registered top marks the top of the stack. At runtime, an activation record can be allocated and
deal located by incrementing and decrementing top, respectively, by the size of the record.
If the procedure q has an activation record of size a then the top is incremented by before the
target code of q is executed. When the control returns from q, the top of the stack are
decremented by a.
The memory organization for the C program on the UNIX platform is as follows −
In C, data can be global, meaning it is allocated static storage and available to any procedure, or
local, meaning it can be accessed only by the procedure in which it is discarded. A program
consists of a list of global data declarations and procedures in which it is declared.
There are two pointers as one is stack pointer (SP) always points to a particular position in the
activation record for the currently activate procedure. The second pointer is called top, which
always points to the top of the stack i.e., top of the activation record.
The temporaries are used for expression evaluation and allocated above the activation record. An
Activation Record is a data structure that is activated/ created when a procedure/function is
invoked, and it contains the following information about the function.
Activation Record in 'C' language consist of
● Actual Parameters
● Number of Arguments
● Return Address
● Return Value
● Old Stack Pointer (SP)
● Local Data in a function or procedure
Q5. Define a Directed Acyclic graph. Construct a DAG and write the sequence of
Instructions for the expression a+a*(b-c)+(b-c)*d.
Answer:
Directed Acyclic graph in Compiler Design (with examples)
Directed Acyclic Graph :
The Directed Acyclic Graph (DAG) is used to represent the structure of basic blocks, to visualize
the flow of values between basic blocks, and to provide optimization techniques in the basic
block. To apply an optimization technique to a basic block, a DAG is a three-address code that is
generated as the result of an intermediate code generation.
● Directed acyclic graphs are a type of data structure and they are used to apply
transformations to basic blocks.
● The Directed Acyclic Graph (DAG) facilitates the transformation of basic blocks.
● DAG is an efficient method for identifying common sub-expressions.
● It demonstrates how the statement’s computed value is used in subsequent statements.
Examples of directed acyclic graph :
● 2.d2=leaf(id,entry-a)=d1
● 3.d3=leaf(id,entry-b)
● 4.d4=leaf(id,entry-c)
● 5.d5=node('-',d3,d4)
● 6.d6=node('*',d1,d5)
● 7.d7=node('+',d1,d6)
● 8.d8=leaf(id,entry-b)=d3
● 9.d9=leaf(id,entry-c)=d4
● 10.d10=node('-',d3,d4)=d5
● 11.d11=leaf(id,entry-d)
● 12.d12=node('*',d5,d11)
● 13.d13=node('+',d7,d12)
Q6. What is an activation record? Explain how it is related with run time storage
organization.
Access Link: It is used to refer to non-local data held in other activation records.
Saved Machine Status: It holds the information about status of machine before the procedure is
called.
Local Data: It holds the data that is local to the execution of the procedure.
Answer:
There are three different ways to express three address codes:
● Quadruple
● Triples
● Indirect Triples
Quadruple
It is a structure that has four fields: op, arg1, arg2, and result. The operator is denoted by op,
arg1, and arg2 denote the operands, and the result is used to record the outcome of the
expression.
These quadruples play a crucial role in breaking down high-level language statements into more
digestible parts, facilitating compilation-stage analysis and optimization procedures.
Benefits of Quadrule
● For global optimization, it's simple to restructure code.
● Using a symbol table, one may rapidly get the value of temporary variables.
Drawbacks of Quadrule
● There are a lot of temporary items.
● The establishment of temporary variables adds to the time and space complexity.
Example
Convert a = -b * c + d into three address codes.
The following is the three-address code:
t₁ = -b
t₂ = c + d
t₃ = t₁ * t₂
a = t₃
(0) unimus b - t₁
(1) + c d t₂
(2) * t₁ t₂ t₃
(3) = t₃ - a
Triples
Instead of using an additional temporary variable to represent a single action, a pointer to the
triple is utilized when a reference to another triple's value is required. As a result, it only has
three fields: op, arg1, and arg2.
Benefits of Triples
● Triples make it easier to analyze and optimize code by disassembling difficult high-level
language constructs into smaller, more manageable parts
● Triples facilitate error, data flow, and control flow analysis of code, facilitating improved
debugging and comprehension
Drawbacks of Triples
● It's tough to optimize since it necessitates the relocation of intermediary code. When a triple
is relocated, all triples that relate to it must likewise be changed. The symbol table entry can
be accessed directly using the pointer.
Example
Convert a = -b * c + d into three address codes.
The following is the three-address code:
t₁ = -b
t₂ = c + dM
t₃ = t₁ * t₂
a = t₃
(0) unimus b -
(1) + c d
(3) = (2) -
● They simplify the intricate address calculations needed for nested structures,
multi-dimensional arrays, and other memory architectures
Drawbacks of Indirect Triples
● Indirect triples can increase the complexity of the intermediate representation and
optimization phases of the compiler, complicating the design and implementation of the
compiler
● Due to the additional pointer dereferencing and memory access operations required by using
indirect triples, there may be performance overhead that could slow down execution
Example
Convert a = b * – c + b * – c into three address codes.
The following is the three-address code:
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
# Op Arg1 Arg2
(14) unimus c -
(15) * (14) b
(16) unimus c -
(17) * (16) b
(19) = a (18)
Q8. Translate the arithmetic expression a*- (b+c) into syntax tree and postfix notation.
Answer:
Answer: Three-address code (3AC) is a crucial concept in compiler design and has several
important applications in the compilation process:
● Intermediate Representation (IR): 3AC serves as an intermediate representation of the
source code. It simplifies the complexity of high-level language constructs into a format
that's easier to analyze, optimize, and translate into machine code. It provides a structured
representation of program semantics that retains essential information while abstracting
away from the specifics of the source language.
● Code Optimization: 3AC facilitates various optimization techniques such as constant
folding, common subexpression elimination, dead code elimination, and loop optimizations.
Because 3-address code is relatively simple and uniform, it becomes easier for compilers to
apply optimization algorithms to improve the efficiency and performance of generated code.
● Register Allocation: Register allocation is a critical optimization phase where the compiler
assigns variables and temporary values to processor registers. 3AC can be transformed into
a form suitable for register allocation algorithms, helping compilers efficiently utilize
available hardware resources.
● Code Generation: Once the code has been optimized and registered allocated, 3-address
code can be translated into the target machine code. The simplicity and structured nature of
3AC make code generation more manageable and enable compilers to produce efficient and
correct machine code for different target architectures.
● Target Independence: 3-address code abstracts away from the intricacies of specific
hardware architectures, allowing compilers to generate code for various target platforms
from the same intermediate representation. This level of abstraction makes it easier to port
compilers across different architectures and optimize code for multiple platforms.
Output:
The body of the loop will run 9 times during this loop. We can, however, cut down on the
number of iterations if we unroll the loop. For instance, the code might seem as follows if the
loop were unrolled by a factor of 3:
for (int i = 0; i < 9; i += 3) {
cout<<”Coding Ninjas\n”;
cout<<”Coding Ninjas\n”;
cout<<”Coding Ninjas\n”;
}
Output:
In this case, each iteration of the loop involves running 3 times. This reduces the number of
iterations to 3, which may lead to a noticeable boost in performance. This is because compiler
don’t have to check conditions of “for loops” for every single iteration. There is no difference in
result whatsoever as we can see from the output above.