Experiment No 6 - DONE
Experiment No 6 - DONE
THEORY:
1) Definition: Code generation can be considered as the final phase of compilation. Through
post code generation, optimization process can be applied on the code, but that can be seen
as a part of code generation phase itself. The code generated by the compiler is an object
code of some lower-level programming language, for example, assembly language.
2) Directed Acyclic Graph: Directed Acyclic Graph (DAG) is a tool that depicts the structure of
basic blocks, helps to see the flow of values flowing among the basic blocks, and offers
optimization too. DAG provides easy transformation on basic blocks. DAG can be
understood here:
Leaf nodes represent identifiers, names or constants.
Interior nodes represent operators.
Interior nodes also represent the results of expressions or the identifiers/name where the
values are to be stored or assigned.
Example:
t0 = a + b
t1 = t0 + c
d = t0 + t1
[t0 = a + b]
[t1 = t0 + c]
[d = t0 + t1]
3) Peephole Optimization: This optimization technique works locally on the source code to
transform it into an optimized code. By locally, we mean a small portion of the code block at
hand. These methods can be applied on intermediate codes as well as on target codes. A
bunch of statements is analyzed and are checked for the following possible optimization.
4) Redundant instruction elimination: At source code level, the following can be done by the
user:
At compilation level, the compiler searches for instructions redundant in nature. Multiple
loading and storing of instructions may carry the same meaning even if some of them are
removed. For example:
MOV x, R0
MOV R0, R1
We can delete the first instruction and re-write the sentence as:
MOV x, R1
5) Unreachable code: Unreachable code is a part of the program code that is never accessed
because of programming constructs. Programmers may have accidently written a piece of
code that can never be reached.
Example:
void add_ten(int x)
return x + 10;
In this code segment, the printf statement will never be executed as the program control
returns back before it can execute, hence printf can be removed.
6) Flow of control optimization; There are instances in a code where the program control
jumps back and forth without performing any significant task. These jumps can be removed.
Consider the following chunk of code:
...
MOV R1, R2
GOTO L1
...
L1 : GOTO L2
L2 : INC R1
In this code,label L1 can be removed as it passes the control to L2. So instead of jumping to L1 and
then to L2, the control can directly reach L2, as shown below:
...
MOV R1, R2
GOTO L2
...
L2 : INC R1
8) Strength reduction
There are operations that consume more time and space. Their ‘strength’ can be reduced by
replacing them with other operations that consume less time and space, but produce the same
result.
For example, x * 2 can be replaced by x << 1, which involves only one left shift. Though the
output of a * a and a2 is same, a2 is much more efficient to implement.
9) Accessing machine instructions; The target machine can deploy more sophisticated
instructions, which can have the capability to perform specific operations much efficiently.
If the target code can accommodate those instructions directly, that will not only improve
the quality of code, but also yield more efficient results.
10) Code Generator: A code generator is expected to have an understanding of the target
machine’s runtime environment and its instruction set. The code generator should take the
following things into consideration to generate the code:
• Target language: The code generator has to be aware of the nature of the target language
for which the code is to be transformed. That language may facilitate some machine-specific
instructions to help the compiler generate the code in a more convenient way. The target
machine can have either CISC or RISC processor architecture.
• IR Type: Intermediate representation has various forms. It can be in Abstract Syntax Tree
(AST) structure, Reverse Polish Notation, or 3-address code.
• Selection of instruction: The code generator takes Intermediate Representation as input
and converts (maps) it into target machine’s instruction set. One representation can have many
ways (instructions) to convert it, so it becomes the responsibility of the code generator to
choose the appropriate instructions wisely.
• Ordering of instructions: At last, the code generator decides the order in which the
instruction will be executed. It creates schedules for instructions to execute them.
11) Descriptors: The code generator has to track both the registers (for availability) and
addresses (location of values) while generating the code. For both of them, the following
two descriptors are used:
• Register descriptor : Register descriptor is used to inform the code generator about the
availability of registers. Register descriptor keeps track of values stored in each register.
Whenever a new register is required during code generation, this descriptor is consulted for
register availability.
• Address descriptor : Values of the names (identifiers) used in the program might be stored
at different locations while in execution. Address descriptors are used to keep track of memory
locations where the values of identifiers are stored. These locations may include CPU registers,
heaps, stacks, memory or a combination of the mentioned locations.
Code generator keeps both the descriptor updated in real-time. For a load statement, LD R1, x,
the code generator:
updates the Address Descriptor (x) to show that one instance of x is in R1.
12) Code Generation: Basic blocks comprise of a sequence of three-address instructions. Code
generator takes these sequence of instructions as input.
Note: If the value of a name is found at more than one place (register, cache, or memory), the
register’s value will be preferred over the cache and main memory. Likewise cache’s value will
be preferred over the main memory. Main memory is barely given any preference.
getReg: Code generator uses getReg function to determine the status of available registers and
the location of name values. getReg works as follows:
If variable Y is already in register R, it uses that register.
Else if both the above options are not possible, it chooses a register that requires minimal
number of load and store instructions.
For an instruction x = y OP z, the code generator may perform the following actions. Let us
assume that L is the location (preferably register) where the output of y OP z is to be saved:
Determine the present location (register or memory) of y by consulting the Address Descriptor
of y. If y is not presently in register L, then generate the following instruction to copy the value
of y to L:
MOV y’, L
Determine the present location of z using the same method used in step 2 for y and generate the
following instruction:
OP z’, L
If y and z has no further use, they can be given back to the system.
Other code constructs like loops and conditional statements are transformed into assembly
language in general assembly way.
PROGRAM:
#include<stdio.h>
void main()
{
char stmt[4][6] = {{"T=A-B"},{"U=A-C"},{"V=T+U"},{"W=V+U"}};
struct code{
char nemo[4];
char op1[3];
char op2[3];
};
struct code c[7];
char add_dis[2][3],op;
int i,cp=0,reg,j=0,flag,fnd_add;
for(i=0;i<=3;i++){
printf("\n%s",stmt[i]);
op = stmt[i][3];
flag = 0;
switch(op){
case '-':
reg = getreg();
strcpy(c[cp].nemo,"MOV");
c[cp].op1[0] = stmt[i][2];
c[cp].op1[1] = '\0';
c[cp].op2[0] = 'R';
c[cp].op2[1] = reg;
c[cp].op2[2] = '\0';
printf("\n%s\t%s\t%s",c[cp].nemo,c[cp].op1,c[cp].op2);
cp++;
strcpy(c[cp].nemo,"SUB");
c[cp].op1[0] = stmt[i][4];
c[cp].op1[1] = '\0';
c[cp].op2[0] = 'R';
c[cp].op2[1] = reg;
c[cp].op2[2] = '\0';
printf("\n%s\t%s\t%s",c[cp].nemo,c[cp].op1,c[cp].op2);
//Assign Address Discriptor to variable on LHS of '=' sign
add_dis[j][0] = stmt[i][0];
printf("\nAddress Discriptor of ");
printf("%c is ",add_dis[j][0]);
add_dis[j][1] = 'R';
printf("%c",add_dis[j][1]);
add_dis[j][2] = reg;
printf("%c",add_dis[j][2]);
add_dis[j][3] = '\0';
j++;
cp++;
break;
case '+':
strcpy(c[cp].nemo,"ADD");
c[cp].op1[0] = 'R';
c[cp].op1[1] = add_dis[j][2];
c[cp].op1[2] = '\0';
for(j=0;add_dis[j][0]!=stmt[i][2] ;j++);
c[cp].op2[0] = 'R';
c[cp].op2[1] = add_dis[j][2];
c[cp].op2[2] = '\0';
printf("\n%s\t%s\t%s",c[cp].nemo,c[cp].op1,c[cp].op2);
//Assign Address Discriptor to variable on LHS of '=' sign
add_dis[j][0] = stmt[i][0];
printf("\nAddress Discriptor of %c is %c%c",add_dis[j][0],add_dis[j]
[1],add_dis[j][2]);
cp++;
if(i==3){
strcpy(c[cp].nemo,"MOV");
c[cp].op1[0] = 'R';
c[cp].op1[1] = add_dis[j][2];
c[cp].op1[2] = '\0';
c[cp].op2[0] = stmt[i][0];
c[cp].op2[1] = '\0';
printf("\n%s\t%s\t%s",c[cp].nemo,c[cp].op1,c[cp].op2);
break;
}
}
int getreg(){
static int r=48;
//printf("\n Register is %c",r);
return r++;
}
OUTPUT: