0% found this document useful (0 votes)
7 views102 pages

CD UNIT 4

The document covers intermediate-code generation in compilers, including syntax trees, three-address code, and type checking. It also discusses run-time environments, storage organization, stack allocation, and garbage collection. Additionally, it addresses control flow, backpatching, and the management of activation records during procedure calls.

Uploaded by

oltofer9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views102 pages

CD UNIT 4

The document covers intermediate-code generation in compilers, including syntax trees, three-address code, and type checking. It also discusses run-time environments, storage organization, stack allocation, and garbage collection. Additionally, it addresses control flow, backpatching, and the management of activation records during procedure calls.

Uploaded by

oltofer9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

UNIT – IV

• Intermediate-Code Generation: Variants of


Syntax Trees, Three – Address Code, Types and
Declarations, Type Checking, Control Flow,
Back patching, Switch – Statements
• Run-Time Environments: Storage organization,
Stack Allocation of Space, Access to Nonlocal
Data on the Stack, Heap Management,
Introduction to Garbage Collection
Intermediate-Code Generation

• During the translation of a source program


into the object code for a target machine, a
compiler may generate a middle-level
language code, which is known as
intermediate code.
• commonly used intermediate code
representation : 1 Syntax tree 2 Postfix
Notation 3Three-Address Code
• For simplicity, we assume that a compiler front end is organized as in Fig. 6.1,
where parsing, static checking, and intermediate-code generation are done
sequentially; sometimes they can be combined and folded into parsing.
• Many of the translation schemes can be implemented during either bottom-up or
top-down parsing,
• All schemes can be implemented by creating a syntax tree and then walking the
tree.
• Static checking includes type checking, which ensures that operators are applied to
compatible operands.
• Syntax trees are high level; they depict the natural hierarchical structure of the
source program and are well suited to tasks like static type checking.
• A low-level representation is suitable for machine-dependent tasks like register
allocation and instruction selection.
• The choice or design of an intermediate representation varies from compiler
to compiler. An intermediate representation may either be an actual language or it
may consist of internal data structures that are shared by phases of the compiler.
• Variants of Syntax Trees
• 1.Directed Acyclic Graphs for Expressions
• 2. The Value-Number Method for Constructing DAG's
• Like the syntax tree for an expression, a DAG has
leaves corresponding to atomic operands and interior
nodes corresponding to operators. The difference is
that a node N in a DAG has more than one parent if N
represents a common subexpression;
2. The Value-Number Method for
Constructing DAG's
Three-Address Code
• Three-address code is a linearized representation
of a syntax tree or a DAG in which explicit names
correspond to the interior nodes of the graph.
• In three-address code, there is at most one
operator on the right side of an instruction; that
is, no built-up arithmetic expressions are
permitted.
• For Ex: x+y*z might be translated into the
sequence of three-address instructions
Addresses and Instructions
• Three-address code is built from two concepts:
addresses and instructions.
• An address can be one of the following:
• A name, A constant, A compiler-generated
temporary.
• Instructions:
Representation
• The description of three-address instructions
specifies the components of each type of
instruction, but it does not specify the
representation of these instructions in a data
structure.
• In a compiler, these instructions can be
implemented as objects or as records with fields
for the operator and the operands.
• Three such representations are called
“quadruples," triples," and indirect triples."
Quadruple
• quadruple has four fields, which we call op,
arg1, arg2 and result. The op field contains an
internal code for the operator.
• For instance, the three-address instruction x =
y + z is represented by placing + in op, y in
arg1, z in arg2, and x in result.
• The following are some exceptions to this rule:
Triples
• A triple has only three fields, which we call
op, arg1, and arg2.
• Using triples, we refer to the result of an
operation x op y by its position, rather than by
an explicit temporary name.
• Thus, instead of the temporary t1 in the above
figure, a triple representation would refer to
position (0). Parenthesized numbers represent
pointers into the triple structure itself.
Indirect triples
• Indirect triples consist of a listing of pointers
to triples, rather than a listing of triples
themselves.
Static Single-Assignment Form
Static Single-Assignment Form
• Static single-assignment form (SSA) is an
intermediate representation that facilitates
certain code optimizations.
Types and Declarations
Type Expressions
• a type expression is either a basic type or is formed by
applying an operator called a type constructor to a type
expression.
• The sets of basic types and constructors depend on the
language to be checked.
• A basic type is a type expression.
• A type name is a type expression.
• A type expression can be formed by applying the array type
constructor to a number and a type expression.
• A type expression can be formed by applying the record
type constructor to the field names and their types.
• A type expression can be formed by using the type
constructor for function types.
•Representation of type expressions:
DAG can be used for representing a type
expression, with interior nodes for type
constructors and leaves for basic
types, type names, and type variables;
• Graphs
Type Equivalence
• Two types are structurally equivalent if and only if
one of the following conditions is true:
• They are the same basic type.
• They are formed by applying the same constructor
to structurally equivalent types.
• One is a type name that denotes the other.
• If type names are treated as standing for
themselves, then the first two conditions in the
above definition lead to name equivalence of type
expressions.
Declarations
Storage Layout for Local Names
• The SDT uses synthesized attributes type and
width for each nonterminal and two variables
t and w to pass type and width information
from a B node in a parse tree to the node for
the production C ->ꜫ.
• In a syntax-directed definition, t and w would
be inherited attributes for C.
• From the type of a name, we can determine
the amount of storage that will be needed for
the name at run time.
• At compile time, we can use these amounts to

assign each name a relative address.


• The type and relative address are saved in the
symbol-table entry for the name.
Translation of Expressions
• The syntax-directed definition in the above Fig.
builds up the three-address code for an
assignment statement S using attribute code for S
and attributes addr and code for an expression E.
• Attributes S.code and E.code denote the
three-address code for S and E, respectively.
Attribute E.addr denotes the address that will hold
the value of E.
• Consider the last production, E -> id, in the
syntax-directed definition
• When an expression is a single identifier, say x,
then x itself holds the value of the expression. The
semantic rules for this production define E.addr to
point to the symbol-table entry for this instance of
id.
• Let top denote the current symbol table. Function
top.get retrieves the entry when it is applied to
the string representation id.lexeme of this
instance of id.
• E.code is set to the empty string.
Type Checking
• Type checking has the potential for catching
errors in programs.
• To do type checking a compiler needs to assign
a type expression to each component of the
source program.
• The compiler must then determine that these
type expressions confirm to a collection of
logical rules that is called the type system for
the source language.
• A sound type system eliminates the need for
dynamic checking for type errors, because it
allows us to determine statically that these errors
cannot occur when the target program runs.
• An implementation of a language is strongly

typed if a compiler guarantees that the programs


it accepts will run without type errors.
Rules for Type Checking
• Type checking can take on two forms:
synthesis and inference.
• Type synthesis builds up the type of an
expression from the types of its
subexpressions.
• It requires names to be declared before they
are used.
• Type inference determines the type of a
language construct from the way it is used.
X=(float) 3

Type Conversions
• Type conversion rules vary from language to
language.
• widening conversions, which are intended to
preserve information, and narrowing conversions,
which can lose information.
• Conversion from one type to another is said to be
implicit if it is done automatically by the compiler.
• Implicit type conversions are also called coercions.
• Conversion is said to be explicit if the programmer
must write something to cause the conversion.
• Explicit conversions are also called casts.
Overloading of Functions and
Operators
• An overloaded symbol has different meanings depending on
its context.
• Overloading is resolved when a unique meaning is
determined for each occurrence of a name.
• The + operator in Java denotes either string concatenation
or addition, depending on the types of its operands.
• The signature for a function consists of the function name
and the types of its arguments. Overloading of functions
can be resolved based on signatures.
• void x() { }
• void x(String s) { }
Control Flow
• Boolean expressions are composed of the
boolean operators (which we denote &&, ||,
and !, using the C convention for the operators
AND, OR, and NOT, respectively) applied to
elements that are boolean variables or
relational expressions.
• Short-Circuit Code:
In short-circuit (or jumping) code, the boolean
operators &&, ||, and ! translate into jumps. The
operators themselves do not appear in the code;
instead, the value of a boolean expression is
represented by a position in the code sequence.
Backpatching
• Boolean expressions are usually translated using the jump method
since this is convenient for optimization.
• However, more than a single pass may be needed in order to
generate code for boolean expressions and flow of control during
bottom-up parsing.
• Indeed, when translating forward jumps, at the time we generate
the code we do not know the (numerical) address of the label we
want to branch to.
• The solution is to generate a sequence of branching statements
where the addresses of the jumps are temporarily left unspecified.
• For each boolean expression E we maintain two lists
• E.truelist which is the list of the (addresses of the) jump
statements appearing in the translation of E and forwarding to
E.true.
• E.falselist which is the list of the (addresses of the) jump
statements appearing in the translation of E and forwarding to
E.false.
• When the label E.true (resp. E.false) is eventually defined we can
walk down the list, patching in the value of its address.
Switch-Statements
Intermediate Code for Procedures
Unit 4 Part-2
• Run-Time Environments: Storage organization, Stack
Allocation of Space, Access to Nonlocal Data on the
Stack, Heap Management, Introduction to Garbage
Collection, Introduction to Trace – Based Collection.
• Compiler must cooperate with OS and other system software
to support implementation of different abstractions (names,
scopes, bindings, data types, operators, procedures,
parameters, flow-of-control) on the target machine
•Compiler does this by Run-Time Environment in which it
assumes its target programs are being executed
• Run-Time Environment deals with
–Layout and allocation of storage for the objects named in the
source program
–the mechanisms used by the target program to access
variables
– Linkage between procedures
– the mechanisms for Parameter passing
– Interface to OS, I/O devices and other programs etc
Storage Organization
• the executing target program runs in its own logical
address space in which each program value has a
location.
• The management and organization of this logical
address space is shared between the compiler,
operating system, and target machine.
• The operating system maps the logical addresses into
physical addresses, which are usually spread
throughout memory.
• The run-time representation of an object program in
the logical address space consists of data and program
areas as shown in Fig.
Int a;
Goto L1;
• The storage layout for data objects is strongly
influenced by the addressing constraints of the
target machine.
• The size of the generated target code is fixed at
compile time, so the compiler can place the
executable target code in a statically determined
area Code, usually in the low end of memory.
• Similarly, the size of some program data objects,
such as global constants, and data generated by
the compiler, such as information to support
garbage collection, may be known at compile
time, and these data objects can be placed in
another statically determined area called Static.
• Dynamic space areas – size changes during
program execution.
• • Heap Grows towards higher address

and Stores data allocated under program


control
•Stack Grows towards lower address and
Stores activation records that get generated
during procedure calls.
Static Versus Dynamic Storage
Allocation
• static and dynamic distinguish between
compile time and run time, respectively.
• A storage-allocation decision is static, if it
can be made by the compiler looking only at the
text of the program, not at what the program
does when it executes.
• Conversely, a decision is dynamic if it can be
decided only while the program is running.
Stack Allocation of Space
• Almost all compilers for languages that use
procedures, functions, or methods as units of
user-defined actions manage at least part of
their run-time memory as a stack.
• Each time a procedure is called, space for its
local variables is pushed onto a stack, and
when the procedure terminates, that space is
popped off the stack.
Activation Trees
• The activations of procedures during the running of an
entire program is represented by a tree, called an activation
tree.
• Each node corresponds to one activation, and the root is
the activation of the “main" procedure that initiates
execution of the program.
• At a node for an activation of procedure p, the children
correspond to activations of the procedures called by this
activation of p.
• We show these activations in the order that they are called,
from left to right.
• Notice that one child must finish before the activation to its
right can begin.
• activation tree is used to show the way control enters
and leaves activations.
• In an activation tree
▪ Each node represents an activation of a procedure.
▪ The root represents the activation of the main
program.
▪ The node A is a parent of the node B iff the control
flows from A to B.
▪ The node A is left to to the node B iff the lifetime of A
occurs before the lifetime of B.
Activation Records
• Procedure calls and returns are usually managed
by a run-time stack called the control stack.
Each live activation has an activation record
(sometimes called a frame) on the control stack,
with the root of the activation tree at the
bottom, and the entire sequence of activation
records on the stack corresponding to the path
in the activation tree to the activation where
control currently resides.
• The latter activation has its record at the top of
the stack.
The contents of activation records vary with the language being implemented.
Calling Sequences
• Procedure calls are implemented by what are
known as calling sequences, which consists of
code that allocates an activation record on the
stack and enters information into its fields.
• A return sequence is similar code to restore
the state of the machine so the calling
procedure can continue its execution after the
call.
• When designing calling sequences and the
layout of activation records, the following
principles are helpful:
• 1. Values communicated between caller and
callee are generally placed at the beginning of
the callee's activation record, so they are as
close as possible to the caller's activation
record.
• 2. Fixed-length items are generally placed in
the middle. (typically include the control link,
the access link, and the machine status fields.)
• 3. Items whose size may not be known early
enough are placed at the end of the activation
record. Most local variables have a fixed
length, which can be determined by the
compiler by examining the type of the
variable.
• 4. We must locate the top-of-stack pointer
judiciously. A common approach is to have it
point to the end of the fixed-length fields in
the activation record.
• register top sp points to the end of the machines
tatus field in the current top activation record.
• The calling sequence and its division between caller
and callee are as follows:
1. The caller evaluates the actual parameters.
2.The caller stores a return address and the old value
of top sp into the callee's activation record. The caller
then increments top sp to the position shown in Fig.
That is, top sp is moved past the caller's local data and
temporaries and the callee's parameters and status
fields.
3.The callee saves the register values and other status
information.
4.The callee initializes its local data and begins
execution.
• A suitable, corresponding return sequence is:
1.The callee places the return value next to the
parameters, as in Fig.
2.Using information in the machine-status field,
the callee restores top sp and other registers, and
then branches to the return address that the caller
placed in the status field.
3.Although top sp has been decremented, the
caller knows where the return value is, relative to
the current value of top sp; the caller therefore
may use that value.
Unit 4- Part-2
• Run-Time Environments: Storage organization, Stack
Allocation of Space, Access to Nonlocal Data on the
Stack, Heap Management, Introduction to Garbage
Collection, Introduction to Trace – Based Collection.
• Compiler must cooperate with OS and other system software
to support implementation of different abstractions (names,
scopes, bindings, data types, operators, procedures,
parameters, flow-of-control) on the target machine
•Compiler does this by Run-Time Environment in which it
assumes its target programs are being executed
• Run-Time Environment deals with
–Layout and allocation of storage for the objects named in the
source program
–the mechanisms used by the target program to access
variables
– Linkage between procedures
– the mechanisms for Parameter passing
– Interface to OS, I/O devices and other programs etc
Storage Organization
• the executing target program runs in its own logical
address space in which each program value has a
location.
• The management and organization of this logical
address space is shared between the compiler,
operating system, and target machine.
• The operating system maps the logical addresses into
physical addresses, which are usually spread
throughout memory.
• The run-time representation of an object program in
the logical address space consists of data and program
areas as shown in Fig.
Int a;
Goto L1;
• The storage layout for data objects is strongly
influenced by the addressing constraints of the
target machine.
• The size of the generated target code is fixed at
compile time, so the compiler can place the
executable target code in a statically determined
area Code, usually in the low end of memory.
• Similarly, the size of some program data objects,
such as global constants, and data generated by
the compiler, such as information to support
garbage collection, may be known at compile
time, and these data objects can be placed in
another statically determined area called Static.
• Dynamic space areas – size changes during
program execution.
• • Heap Grows towards higher address

and Stores data allocated under program


control
•Stack Grows towards lower address and
Stores activation records that get generated
during procedure calls.
Static Versus Dynamic Storage
Allocation
• static and dynamic distinguish between
compile time and run time, respectively.
• A storage-allocation decision is static, if it
can be made by the compiler looking only at the
text of the program, not at what the program
does when it executes.
• Conversely, a decision is dynamic if it can be
decided only while the program is running.
Stack Allocation of Space
• Almost all compilers for languages that use
procedures, functions, or methods as units of
user-defined actions manage at least part of
their run-time memory as a stack.
• Each time a procedure is called, space for its
local variables is pushed onto a stack, and
when the procedure terminates, that space is
popped off the stack.
Activation Trees
• The activations of procedures during the running of an
entire program is represented by a tree, called an activation
tree.
• Each node corresponds to one activation, and the root is
the activation of the “main" procedure that initiates
execution of the program.
• At a node for an activation of procedure p, the children
correspond to activations of the procedures called by this
activation of p.
• We show these activations in the order that they are called,
from left to right.
• Notice that one child must finish before the activation to its
right can begin.
• activation tree is used to show the way control enters
and leaves activations.
• In an activation tree
▪ Each node represents an activation of a procedure.
▪ The root represents the activation of the main
program.
▪ The node A is a parent of the node B iff the control
flows from A to B.
▪ The node A is left to to the node B iff the lifetime of A
occurs before the lifetime of B.
Activation Records
• Procedure calls and returns are usually managed
by a run-time stack called the control stack.
Each live activation has an activation record
(sometimes called a frame) on the control stack,
with the root of the activation tree at the
bottom, and the entire sequence of activation
records on the stack corresponding to the path
in the activation tree to the activation where
control currently resides.
• The latter activation has its record at the top of
the stack.
The contents of activation records vary with the language being implemented.
Calling Sequences
• Procedure calls are implemented by what are
known as calling sequences, which consists of
code that allocates an activation record on the
stack and enters information into its fields.
• A return sequence is similar code to restore
the state of the machine so the calling
procedure can continue its execution after the
call.
• When designing calling sequences and the
layout of activation records, the following
principles are helpful:
• 1. Values communicated between caller and
callee are generally placed at the beginning of
the callee's activation record, so they are as
close as possible to the caller's activation
record.
• 2. Fixed-length items are generally placed in
the middle. (typically include the control link,
the access link, and the machine status fields.)
• 3. Items whose size may not be known early
enough are placed at the end of the activation
record. Most local variables have a fixed
length, which can be determined by the
compiler by examining the type of the
variable.
• 4. We must locate the top-of-stack pointer
judiciously. A common approach is to have it
point to the end of the fixed-length fields in
the activation record.
• register top sp points to the end of the machines
tatus field in the current top activation record.
• The calling sequence and its division between caller
and callee are as follows:
1. The caller evaluates the actual parameters.
2.The caller stores a return address and the old value
of top sp into the callee's activation record. The caller
then increments top sp to the position shown in Fig.
That is, top sp is moved past the caller's local data and
temporaries and the callee's parameters and status
fields.
3.The callee saves the register values and other status
information.
4.The callee initializes its local data and begins
execution.
• A suitable, corresponding return sequence is:
1.The callee places the return value next to the
parameters, as in Fig.
2.Using information in the machine-status field,
the callee restores top sp and other registers, and
then branches to the return address that the caller
placed in the status field.
3.Although top sp has been decremented, the
caller knows where the return value is, relative to
the current value of top sp; the caller therefore
may use that value.
Heap Management
The heap is the portion of the store that is used for data that lives indefinitely, or until the
program explicitly deletes it. While local variables typically become inaccessible when their
procedures end, many languages enable us to create objects or other data whose existence is
not tied to the procedure activation that creates them. For example, both C++ and Java give
the programmer new to create objects that may be passed | or pointers to them may be passed |
from procedure to procedure, so they continue to exist long after the procedurethat created
them is gone. Such objects are stored on a heap.

• The Memory Manager


• The Memory Hierarchy of a Computer
• Locality in Programs
• Reducing Fragmentation
• Manual Deallocation Requests
1. The Memory Manager
The memory manager keeps track of all the free space in heap storage at all times.
It performs two basic functions:
• Allocation.
• Deallocation
Here are the properties we desire of memory managers:
• Space Efficiency.
• Program Efficiency.
• Low Overhead.

2. The Memory Hierarchy of a Computer


3. Locality in Programs
• Most programs exhibit a high degree of locality; that is, they spend most of their
time executing a relatively small fraction of the code and touching only a small
fraction of the data. We say that a program has temporal locality if the memory
locations it accesses are likely to be accessed again within a short period of time.
We say that a program has spatial locality if memory locations close to the location
accessed are likely also to be accessed within a short period of time.
• The typical program spends most of its time executing innermost loops and tight
recursive cycles in a program
• It has been found that many programs exhibit both temporal and spatial locality in
how they access both instructions and data.
• Optimization Using the Memory Hierarchy
• The policy of keeping the most recently used instructions in the cache tends to
work well; in other words, the past is generally a good predictor of future memory
usage. When a new instruction is executed, there is a high proba- bility that the next
instruction also will be executed.
• We can also improve the temporal and spatial locality of data accesses in a program
by changing the data layout or the order of the computation
4. Reducing Fragmentation

• Best-Fit and Next-Fit Object Placement


• Managing and Coalescing Free Space
When an object is deallocated manually, the memory manager must make its chunk free,
so it can be allocated again. In some circumstances, it may also be possible to combine
(coalesce) that chunk with adjacent chunks of the heap, to form a larger chunk. There is
an advantage to doing so, since we can always use a large chunk to do the work of small
chunks of equal total size, but many small chunks cannot hold one large object, as the
combined chunk could
• Below Figure shows part of a heap with three adjacent chunks, A, B, and C. Chunk B,
of size 100, has just been deallocated and returned to the free list. Since we know the
beginning (left end) of B, we also know the end of the chunk that happens to be
immediately to B's left, namely A in this example. The free/used bit at the right end of A
is currently 0, so A too is free. We may therefore coalesce A and B into one chunk of 300
bytes.
5. Manual Deallocation Requests
• any storage that will no longer be accessed should be deleted

• Problems with Manual Deallocation


 Manual memory management is error-prone
 It is hard for programmers to tell if a program will never refer to some storage in the
future, so the rest common mistake is not deleting storage that will never be referenced
 Automatic garbage collection gets rid of memory leaks by deallocating all the garbage.

• Programming Conventions and Tools


 Object ownership is useful when an object's lifetime can be statically rea- soned about
 Reference counting is useful when an object's lifetime needs to be deter- mined
dynamically.
 Region-based allocation is useful for collections of objects whose lifetimes are tied to
speci c phases in a computation
Introduction to Garbage Collection
• Data that cannot be referenced is generally known as garbage.
• Many high-level programming languages remove the burden of
manual memory management from the programmer by offering
automatic garbage collection, which deallocates unreachable data.
• Design Goals for Garbage Collectors
Garbage collection is the reclamation of chunks of storage
holding objects that can no longer be accessed by a program
 A Basic Requirement: Type Safety
 Performance Metrics-Overall Execution Time, Space Usage,
Pause Time, Program Locality
 Reachability-Reachability becomes a bit more complex when the
program has been optimized by the compiler. First, a compiler may
keep reference variables in registers. These references must also be
considered part of the root set. Sec- ond, even though in a type-safe
language programmers do not get to manipulate memory addresses
directly, a compiler often does so for the sake of speeding up the
code.
There are four basic operations that a mutator performs to change the set of reachable
objects:
Object Allocations
Parameter Passing and Return Values
Reference Assignments
Procedure Returns
There are two basic ways to nd unreachable objects. Either we catch the transitions as
reachable objects turn unreachable, or we periodically locate all the reachable objects and
then infer that all the other objects are unreachable

s
Reference Counting Garbage Collectors
With a reference-counting garbage collector, every object must have a held for the
reference count. Reference counts can be maintained as follows:
1. Object Allocation. The reference count of the new object is set to 1.
2. Parameter Passing. The reference count of each object passed into a procedure is
incremented
3. Reference Assignments. For statement u = v, where u and v are refer- ences, the
reference count of the object referred to by v goes up by one, and the count for the old
object referred to by u goes down by one.
4. Procedure Returns. As a procedure exits, objects referred to by the localvariables in its
activation record have their counts decremented. If several local variables hold references
to the same object, that object's count must be decremented once for each such reference.
5. Transitive Loss of Reachability. Whenever the reference count of an object becomes zero,
we must also decrement the count of each object pointed to by a reference within the
object.
Reference counting has two main disadvantages: it cannot collect unreach- able, cyclic data
structures, and it is expensive. Cyclic data structures are quite plausible; data structures
often point back to their parent nodes, or point toeach other as cross references.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy