0% found this document useful (0 votes)
4 views50 pages

UNIT I-Introduction To Compiler

The document outlines the course structure for Compiler Design at Dr.N.G.P Institute of Technology, detailing prerequisites, objectives, and outcomes. It explains the phases of a compiler, including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. Additionally, it discusses the role of language processors, types of translators, and error handling in the compilation process.

Uploaded by

fragdhyanesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views50 pages

UNIT I-Introduction To Compiler

The document outlines the course structure for Compiler Design at Dr.N.G.P Institute of Technology, detailing prerequisites, objectives, and outcomes. It explains the phases of a compiler, including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. Additionally, it discusses the role of language processors, types of translators, and error handling in the compilation process.

Uploaded by

fragdhyanesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Dr.N.G.

P Institute of Technology
Coimbatore- 48
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

22UCS403 – COMPILER DESIGN

Dr.K.Moorthi
ASP/CSE
1
CS8602- COMPILER DESIGN
PREREQUISITES FOR LEARNING
COMPILER DESIGN
• Knowledge of Automata Theory

• Context free languages

• Computer Architecture

• Data structures

• Simple graph algorithms

• Logic or algebra

CS8602- COMPILER DESIGN 2


Course Objective
The main objective of compiler design is
• Acquire knowledge in different phases of a Compiler and its
applications
• Understand the categorization of tokens using lexical analyzer and
pattern recognition using parsers
• To understand intermediate code generation and run-time
environment.
• To learn to implement front-end of the compiler
• Familiar with the code generation schemes and optimization
methods.

CS8602- COMPILER DESIGN 3


Course Outcome
Course Outcomes: At the completion of course, students should be able
to

• CO1:Analyze the output generated in each phase of the compiler

• CO2: Design a lexical analyzer for a sample language.

• CO3: Apply different parsing algorithms to develop the parsers for a


given grammar.

• CO4: Generate intermediate code for programming constructs

• CO5: Apply optimization techniques in code generation and analyze


the issues in code generation.

CS8602- COMPILER DESIGN 4


UNIT I - INTRODUCTION TO COMPILER
Language processors - Structure of a compiler - Grouping of phases
into passes- Compiler construction tools - Applications of compiler
technology: Implementation of high-level programming languages
- Optimizations for computer architectures-Design of new
computer architecture - Program Translations- Software
productivity tools.

CS8602- COMPILER DESIGN 5


Story
• Consider a person A speaks English and person B speaks
Chinese.
• As English language is widely used and understandable almost
by every individual compare it with High level programming
language and Chinese as Low level language (Byte Code).
• Suppose Person A and B met for a business deal and they
both can’t do communication due to language problem, Now
comes a translator (person c) to help them understand by
translating the languages and make them understand each
other, this is what exactly our compiler does -
• Here, the compiler acts as person C — It translates the Entire
Source Code of a program to Machine Code in one go.

2/12/2025 CS8591- COMPUTER NETWORKS 6


Language Processor
• Computer programs are generally written in high-level
languages (like C++, Python, and Java).
• A language processor, or language translator, is a computer
program that convert source code from one programming
language to another language or to machine code (also
known as object code).
• They also find errors during translation.

2/12/2025 CS8591- COMPUTER NETWORKS 7


Language Processor

Types
• Compiler
• Assembler
• Interpreter

2/12/2025 CS8591- COMPUTER NETWORKS 8


TRANSLATOR
•It translates a high-level language program into a machine
language program that the central processing unit (CPU) can
understand. It also detects errors in the program

TYPES OF TRANSLATORS

1.Interpreter
2.Compiler
3.Assembler

CS8602- COMPILER DESIGN 9


COMPILER
A compiler is a program takes a program written in a source
language (High level) and translates it into an equivalent program
in a target language(Machine Understandable).

CS8602- COMPILER DESIGN 10


Technically the Job of compiler
• “Compiler is not a hardware its a piece of code or
sftware (set of instructions) that compiles a program.”
1. Translates the entire source code of program into
machine code and generates an intermediate code.
2. The Generated code stored in a separate file which can
be executed multiple times.
3. All the syntactical errors are reported all at once during
compilation.
4. Compiled codes runs faster, as its converted to machine
code, that a machine can understand easily.

2/12/2025 CS8591- COMPUTER NETWORKS 11


Dr.N.G.P Institute of Technology
Coimbatore- 48

CS8602- COMPILER DESIGN 12


.c, .bak, .obj, and .exe
1 .c File
Definition: This is the source code file in which you write your C program.
Role: It contains the human-readable C programming code.
Usage:
You create and edit this file in a text editor or an Integrated Development
Environment (IDE) like Turbo C or Code::Blocks.
The compiler processes this file to create machine-readable files.
Example:c
#include <stdio.h>
int main()
{
printf("Hello, World!");
return 0;
}
Naming Convention: The file name ends with .c (e.g., program.c).

2/12/2025 CS8591- COMPUTER NETWORKS 13


.c, .bak, .obj, and .exe
2. .bak File
Definition: A backup file automatically generated by the IDE or
editor.
Role: Serves as a backup of your .c file in case of accidental
changes or loss.
Usage:
If the IDE creates a .bak file for program.c, it will be named
something like program.c.bak.
You can manually rename it back to .c if needed.
Example:
program.c.bak is a backup of program.c.

2/12/2025 CS8591- COMPUTER NETWORKS 14


.c, .bak, .obj, and .exe
3. .obj File
Definition: This is an object file generated by the compiler after
translating the .c file.
Role: It contains machine-readable code, but it's not yet executable.
The .obj file is an intermediate step between source code and the final
program.
Usage:
After compiling program.c, the compiler generates program.obj
(on Windows).
It serves as input for the linker, which combines it with libraries
and other object files to create an .exe file.
Example:
If you compile program.c, you might get program.obj as an
intermediate file.
2/12/2025 CS8591- COMPUTER NETWORKS 15
.c, .bak, .obj, and .exe
4. .exe File
Definition: This is the executable file that can be run directly on
your computer.
Role: It contains the final machine code that the operating
system can execute.
Usage:
After linking, the .exe file is created from the .obj file.
You can run it on your system by double-clicking it or
executing it in the command prompt.
Example:
If your source file is program.c, the final executable will be
program.exe.
2/12/2025 CS8591- COMPUTER NETWORKS 16
Demo
Type C Program ans ave with .C extension
View bin
Compile it view bin or output directory
Run it view bin or output directory

2/12/2025 CS8591- COMPUTER NETWORKS 17


Structure or Phases of Compiler
Source Program

Lexical Analysis
Stream of Tokens
Syntax Analysis
Parse Tree
Semantic Analysis
Symbol table Error handling
management Parse Tree(semantically Verified)
Intermediate code
generation
Three Address Code
Code optimization

Code generation

TargetCS8602-
Program
COMPILER` DESIGN 18
CS8602- COMPILER DESIGN 19
Phases : LEXICAL ANALYZER
• Source program is scanned to read the stream of characters and those characters are
grouped to form a sequence called lexemes which produces token as output.

• Token: Token is a sequence of characters that represent lexical unit, which matches
with the pattern, such as keywords, operators, identifiers etc.

• Lexeme: Lexeme is instance of a token i.e., group of characters forming a token.

• Pattern: Pattern describes the rule that the lexemes of a token takes. It is the
structure that must be matched by strings.

• Once a token is generated the corresponding entry is made in the symbol table.

• Input: stream of characters

• Output: Token

CS8602- COMPILER DESIGN 20


Phases : LEXICAL ANALYZER
Example
• c=a+b*5;
• Token Template: <token-name, attribute-value>
• <id, 1><=>< id, 2>< +><id, 3 >< * >< 5>

Lexemes Tokens

c identifier

= assignment symbol

a identifier

+ + (addition symbol)

b identifier

* * (multiplication symbol)

5 5 (number)

CS8602- COMPILER DESIGN 21


SYNTAX ANALYZER(PARSER)
• Creates the syntactic structure (generally a parse tree) of the given
program.
• A syntax analyzer is also called as a parser.
• A parse tree describes a syntactic structure.
• Input is Tokens and Output is Parse Tree

expression

Assign-expression

expression = expression

subscript-expression additive-expression

Expression [ expression ] expression + expression

identifier identifier number number


a index COMPILER DESIGN4
CS8602- 2 22
SYNTAX ANALYZER(PARSER)
EXAMPLE

c=a+b*5;
<id, 1><=>< id, 2>< +><id, 3 >< * >< 5>
CS8602- COMPILER DESIGN 23
SEMANTIC ANALYZER

• The semantics of a program are its “meaning”, as


opposed to its syntax, or structure, that
• determines some of its running time behaviors prior to
execution.
• Declarations and type checking
c=a+b*5;

c=a+b*5;
<id, 1><=>< id, 2>< +><id, 3 >< * >< 5>

CS8602- COMPILER DESIGN 24


INTERMEDIATE CODE GENERATION
• Produces intermediate representations for the source
program which are of the following forms:
• Postfix notation
• Three address code (Form of a:=b op c)
• Syntax tree
Example: Three Address Code
t1 = int to float (5)
t2 = id3* t1
t3 = id2 + t2
id1 = t3
c=a+b*5;
CS8602- COMPILER DESIGN
<id, 1><=>< id, 2>< +><id, 3 >< * >< 5> 25
CODE OPTIMIZATION
• Gets the intermediate code as input and produces optimized
intermediate code as output.
• It results in faster running machine code.
• It can be done by reducing the number of lines of code for a program.
• This phase reduces the redundant code and attempts to improve the
intermediate code so that faster-running machine code will result.
• During the code optimization, the result of the program is not affected.
• To improve the code generation, the optimization involves
Deduction and removal of dead code (unreachable code).
Calculation of constants in expressions and terms.
Collapsing of repeated expression into temporary string.
Loop unrolling.
Moving code outside the loop.
Removal of unwanted temporary t1 = inttofloat (5) t1 = id3* 5.0
t2 = id3* t1 id1 = id2 + t1
variables.
t3 = id2 + t2
id1 = t3

c=a+b*5;
<id, 1><=>< id, 2>< +><id, 3 >< * >< 5> 26
CODE GENERATOR

• Gets input from code optimization phase and produces the target code
or object code as result.
• Intermediate instructions are translated into a sequence of machine
instructions that perform the same task.
LDF R2, id3
MULF R2, # 5.0
t1 = inttofloat (5) t1 = id3* 5.0 LDF R2, id3
LDF R1, id2
t2 = id3* tl id1 = id2 + t1 MULF R2, # 5.0
ADDF R1, R2 t3 = id2 + t2 LDF R1, id2
STF id1, R1 id1 = t3 ADDF R1, R2
STF id1, R1

c=a+b*5;
<id, 1><=>< id, 2>< +><id, 3 >< * >< 5> 27
SYMBOL TABLE

• Used to store all the information about identifiers used in the program.
• It is a data structure containing a record for each identifier, with fields for
the attributes of the identifier.
• It allows finding the record for each identifier quickly and to store or
retrieve data from that record.
• Whenever an identifier is detected in any of the phases, it is stored in the
symbol table.
Example Symbol name Type Address

int a, b; float c; char z; a Int 1000

b Int 1002

c Float 1004

z char 1008

CS8602- COMPILER DESIGN 28


ERROR HANDLING

 Lexical analysis:
• Faulty sequence of characters which does not result in
a token, e.g.Ö, 5EL, %K,‟string
 Syntax analysis:
• Syntax error (e.g. missing semicolon), (4 * (y + 5) -12))
 Semantic analysis:
• Type conflict, e.g. ‟HEJ‟+5
 Code optimization:
• Uninitialized variables, anomaly detection.

CS8602- COMPILER DESIGN 29


ERROR HANDLING
ERROR ENCOUNTERED IN DIFFERENT PHASES
 Code generation:
• Too large integers, run out of memory.
 Table management:

• Double declaration, table overflow.


• A good compiler finds an error at
the earliest occasion. Usually , some
errors are left to run time: array index
out of bounds

CS8602- COMPILER DESIGN 30


Structure or Phases of Compiler
Eg1: Translation of a Statement

CS8602- COMPILER DESIGN 31


Structure or Phases of Compiler

Eg2: Translation of a Statement

CS8602- COMPILER DESIGN 32


Self Assessment
• Illustrate the output of each Phase for the
input a=(b+c)*(b+c)*2

2/12/2025 CS8602- COMPILER DESIGN 33


Grouping of Phases
Front end back end
• Different phases of compiler can be grouped together to form a
front end and back end.
• The front end consists of those phases that primarily dependent on
the, source language and independent on the target language.
• The front end consists of analysis part. Typically it includes lexical
analysis, syntax analysis, and semantic analysis. Some amount of
code optimization can also be done at front end.
• The back end consists of those phases that are totally dependent
upon the target language and independent on the source language.
• It includes code generation and code optimization.

CS8602- COMPILER DESIGN 34


Grouping of Phases

The front end back end model of the compiler is very much advantageous
because
of following reasons -
1. By keeping the same front end and attaching different back ends one can
produce a compiler for same source language on different machines.
2. By keeping different front ends and same back end one can compile
several different languages on the same machine.

CS8602- COMPILER DESIGN 35


Grouping of Phases
Concept of Pass
• One complete scan of the source language is called pass.
• It includes reading an input file and writing to an output file. Many
phases can be grouped one pass.
• It is difficult to compile the source program into a single pass,
because the program may have some forward references. It is
desirable to have relatively few passes, because it takes time to read
and write intermediate file.
• On the other hand if we group several phases into one pass we may
be forced to keep the entire program in the memory. Therefore
memory requirement may be large.

CS8602- COMPILER DESIGN 36


Grouping of Phases

• In the first pass the source program is scanned completely and the
generated output will be an easy to use form which is equivalent to
the source program along with the additional information about the
storage allocation.
• It is possible to leave a blank slots for missing information and fill
the slot when the information becomes available. Hence there may
be a requirement of more than one pass in the compilation process.
• A typical arrangement for optimizing compiler is one pass for
scanning and parsing, one pass for semantic analysis and third pass
for code generation and target code optimization. C and PASCAL
permit one pass compilation, Modula-2 requires two passes.
CS8602- COMPILER DESIGN 37
Grouping of Phases

• Factors Affecting the Number of Passes


• Various factors affecting the number of passes in compiler
are -
1. Forward reference
2. Storage limitations
3. Optimization.
• The compiler can require more than one passes to
complete subsequent information.

CS8602- COMPILER DESIGN 38


Grouping of Phases

1.

CS8602- COMPILER DESIGN 39


COMPILER CONSTRUCTION TOOLS

1. Parser generators.
2. Scanner generators.
3. Syntax-directed translation engines.
4. Automatic code generators.
5. Data-flow analysis engines.

CS8602- COMPILER DESIGN 40


COMPILER CONSTRUCTION TOOLS

PARSER GENERATORS
• Input: Grammatical description of a programming language
• Output: Syntax analyzers.
• Parser generator takes the grammatical description of a programming
language and produces a syntax analyzer.
SCANNER GENERATORS
• Input: Regular expression description of the tokens of a language
• Output: Lexical analyzers.
• Scanner generator generates lexical analyzers from a regular expression
description of the tokens of a language.

CS8602- COMPILER DESIGN 41


COMPILER CONSTRUCTION TOOLS
SYNTAX-DIRECTED TRANSLATION ENGINES
• Input: Parse tree.
Output: Intermediate code.
• Syntax-directed translation engines produce collections of routines that
walk a parse tree and generates intermediate code.
AUTOMATIC CODE GENERATORS
• Input: Intermediate language.
• Output: Machine language.
• Code-generator takes a collection of rules that define the translation of
each operation of the intermediate language into the machine language
for a target machine.

CS8602- COMPILER DESIGN 42


COMPILER CONSTRUCTION TOOLS

DATA-FLOW ANALYSIS ENGINES


• Data-flow analysis engine gathers the information, that is, the values
transmitted from one part of a program to each of the other parts.
Data-flow analysis is a key part of code optimization.

CS8602- COMPILER DESIGN 43


APPLICATIONS OF COMPILER TECHNOLOGY
• Implementation of High-Level Programming
Languages
• Optimizations for computer architectures
• Design of new computer architecture
• Program Translations
• Software productivity tools

CS8602- COMPILER DESIGN 44


APPLICATIONS OF COMPILER TECHNOLOGY
Implementation of High-Level Programming Languages
• A high-level programming language defines a programming abstraction: the
programmer expresses an algorithm using the language, and the compiler must
translate that program to the target language.
• Generally, higher-level programming languages are easier to program in. but are less
efficient, that is, the target programs run more slowly.
• Programmers using a low-level language have more control over a computation and
can, in principle, produce more efficient code.
• Unfortunately. lower-level programs are harder to write and — worse still — less
portable. more prone to errors, and harder to maintain. Optimizing compilers include
techniques to improve the performance of generated code, thus offsetting the
inefficiency introduced by high-level abstractions,

CS8602- COMPILER DESIGN 45


APPLICATIONS OF COMPILER TECHNOLOGY
Optimizations for Computer Architectures
• The rapid evolution of computer architectures has also led to an insatiable
demand for new compiler technology.
• Almost all high-performance systems take advantage of the same two basic
techniques:parallelism and memory hierarchies.
• Parallelism can be found at several levels: at the instruction ievel,where
multiple operations are executed simultaneously and at the processor level,
where different threads of the same application are run on different
processors.
• Memory hierarchies are a response to the basic limitation that we can build
very fast storage or very large storage, but not storage that is both fast and
large

CS8602- COMPILER DESIGN 46


APPLICATIONS OF COMPILER TECHNOLOGY
• Design of New Computer Architectures
• In the early days of computer architecture design, compilers
were developed after the machines were built.
• That has changed. Since programming in high-level languages is
the norm, the performance of a computer system is determined
not by its raw speed but also by how well compilers can exploit
its features.
• Thus, in modern computer architecture development, compilers
are developed in the processor-design stage, and compiled code,
running on simulators, is used to evaluate the proposed
architectural features.
CS8602- COMPILER DESIGN 47
APPLICATIONS OF COMPILER TECHNOLOGY
Program Translations
• While we normally think of compiling as a translation from a
high-level language to the machine level, the same technology
can be applied to translate between different kinds of languages.
• The following are some of the important applications of
program-translation techniques

• Binary Translation
• Hardware Synthesis
• Database Query Interpreters
• Compiled Simulation

CS8602- COMPILER DESIGN 48


APPLICATIONS OF COMPILER TECHNOLOGY
Software Productivity Tools
• Programs are arguably the most complicated engineering
artifacts ever produced; they consist of many details, every one
of which must be correct before the program will work
completely.
• As a result, errors are rampant in programs; errors may crash a
system, produce wrong results, render a system vulnerable to
security attacks, or even lead to catastrophic failures in critical
systems.
• Testing is the primary technique for locating errors in programs.

CS8602- COMPILER DESIGN 49


Thank you

CS8602- COMPILER DESIGN 50

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy