0% found this document useful (0 votes)
5 views49 pages

Introduction To CD

The document provides an overview of compiler design, covering essential topics such as lexical analysis, syntactic analysis, and code generation. It discusses the historical development of compilers, their architecture, and the significance of studying compilers in programming and software engineering. Additionally, it outlines the roles of different components within a compiler, including the front-end and back-end processes, and emphasizes the importance of error detection and optimization.

Uploaded by

Mamata swain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views49 pages

Introduction To CD

The document provides an overview of compiler design, covering essential topics such as lexical analysis, syntactic analysis, and code generation. It discusses the historical development of compilers, their architecture, and the significance of studying compilers in programming and software engineering. Additionally, it outlines the roles of different components within a compiler, including the front-end and back-end processes, and emphasizes the importance of error detection and optimization.

Uploaded by

Mamata swain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Compiler Design

Parimal Kumar Giri

Lecture-Module 1
I N T R O D U C T I ON T O C O M P I L E R S :
AN OVERVIEW

Parimal Giri
2

 Compilers: Principles, Techniques, and


Tools, Aho, Sethi and Ullman
 http://dragonbook.stanford.edu/

Parimal Giri
Topics
 Overview of compilers
 Lexical analysis (Scanning)
 Syntactic analysis (Parsing)
 Context-sensitive analysis
 Type checking
 Runtime environments
 Symbol tables
 Intermediate representations
 Intermediate code generation
 Code optimization

Parimal Giri 3
A bit of history
4

 1952: First compiler (linker/loader) written by Grace


Hopper for A-0 programming language

 1957: First complete compiler for FORTRAN by


John Backus and team

 1960: COBOL compilers for multiple architectures

 1962: First self-hosting compiler for LISP(List


Processing)

Parimal Giri
Compiler learning
5

 Isn’t it an old discipline?


 Yes, it is a well-established discipline
 Algorithms, methods and techniques are researched and
developed in early stages of computer science growth
 There are many compilers around and many tools to
generate them automatically
 So, why we need to learn it?
 Although you may never write a full compiler
 But the techniques we learn is useful in many tasks like
writing an interpreter for a scripting language, validation
checking for forms and so on

Parimal Giri
Terminology
6

 Compiler:
 a program that translates an source program in one
language into an executable program in another language
 we expect the program produced by the compiler to be
better, in some way, than the original
 Interpreter:
 a program that reads an source program and produces
the results of running that program
 usually, this involves executing the source program in
some fashion
 Our course is mainly about compilers but many of
the same issues arise in interpreters

Parimal Giri
Disciplines involved
7

 Algorithms
 Languages and machines
 Operating systems
 Computer architectures

Parimal Giri
Compilers
 What is a compiler?
 A program that translates a program in one language (source
language) into an equivalent program in another language (target
language), and reports errors in the source program
 A compiler typically lowers the level of abstraction of the
program
C assembly code for EOS machine
Java Java bytecode
 What is an interpreter?
 A program that reads an executable program and produces the
results of executing that program
 C is typically compiled
 Scheme is typically interpreted
 Java is compiled to bytecodes, which are then
interpreted

Parimal Giri 8
Why build compilers?
9

 Compilers provide an essential interface between


applications and architectures
 High level programming languages:
 Increase programmer productivity
 Better maintenance
 Portable
 Low level machine details:
 Instruction selection
 Addressing modes
 Pipelines
 Registers and cache
 Compilers efficiently bridge the gap and shield the
application developers from low level machine details

Parimal Giri
Why study compilers?
 Compilers embody a wide range of theoretical
techniques and their application to practice
 DFAs, PDAs, formal languages, formal grammars, fixpoints
algorithms, lattice theory
 Compiler construction teaches programming and
software engineering skills
 Compiler construction involves a variety of areas
 theory, algorithms, systems, architecture
 The techniques used in various parts of compiler
construction are useful in a wide variety of applications
 Many practical applications have embedded languages,
commands, macros, etc.
 Is compiler construction a solved problem?
 No! New developments in programming languages (Java) and
machine architectures (multicore machines) present new
challenges

Parimal Giri 10
Compiler Architecture
11

In more detail:
Intermediate
Language

Source Target Language


Front End – Back End –
Language
language specific machine specific

Analysis Synthesis

•Separation of Concerns
•Retargeting

Parimal Giri
Abstract view
12

Source Machine
code Compiler code

errors
 Recognizes legal (and illegal) programs
 Generate correct code
 Manage storage of all variables and code
 Agreement on format for object (or assembly)
code

Parimal Giri
Front-end, Back-end division
13

Source IR Machine
code Front end Back end code

errors

 Front end maps legal code into IR(Instruction


Register)
 Back end maps IR onto target machine
 Simplify retargeting
 Allows multiple front ends
 Multiple passes -> better code

Parimal Giri
Front end
14

tokens
Source
code Scanner Parser IR

errors

 Recognize legal code


 Report errors
 Produce IR
 Preliminary storage maps

Parimal Giri
Front end
15
Source tokens IR
code Scanner Parser

errors

 Scanner:
 Maps characters into tokens – the basic unit of
syntax
 x = x + y becomes <id, x> = <id, x> + <id, y>
 Typical tokens: number, id, +, -, *, /, do, end
 Eliminate white space (tabs, blanks, comments)
 A key issue is speed so instead of using a tool
like LEX it sometimes needed to write your
own scanner
Parimal Giri
Front end
16
Source tokens IR
code Scanner Parser

errors

 Parser:
 Recognize context-free syntax
 Guide context-sensitive analysis
 Construct IR
 Produce meaningful error messages
 Attempt error correction
 There are parser generators like YACC which
automates much of the work
Parimal Giri
Front end
17

 Context free grammars are used to represent


programming language syntaxes:

<expr> ::= <expr> <op> <term> | <term>


<term> ::= <number> | <id>
<op> ::= + | -

Parimal Giri
Front end
18

 A parser tries to map a


program to the syntactic
elements defined in the
grammar
 A parse can be represented
by a tree called a parse or
syntax tree

Parimal Giri
Front end
19

 A parse tree can be


represented more compactly
referred to as Abstract
Syntax Tree (AST)
 AST is often used as IR
between front end and back
end

Parimal Giri
Back end
20

Instruction Register Machine code


IR
selection Allocation

errors

 Translate IR into target machine code


 Choose instructions for each IR operation
 Decide what to keep in registers at each point
 Ensure conformance with system interfaces

Parimal Giri
Back end
21

Instruction Register Machine code


IR
selection Allocation

errors

 Produce compact fast code


 Use available addressing modes

Parimal Giri
Back end
22

Instruction Register Machine code


IR
selection Allocation

errors

 Have a value in a register when used


 Limited resources
 Optimal allocation is difficult

Parimal Giri
Traditional three pass compiler
23

Source IR IR Machine
Middle
code Front end Back end code
end

errors

 Code improvement analyzes and change IR


 Goal is to reduce runtime

Parimal Giri
Desirable Properties of Compilers
24

 Compiler must generate a correct executable


 The input program and the output program must be equivalent, the
compiler should preserve the meaning of the input program
 Output program should run fast
 For optimizing compilers we expect the output program to be more
efficient than the input program
 Compiler itself should be fast
 Compiler should provide good diagnostics for
programming errors
 Compiler should support separate compilation
 Compiler should work well with debuggers
 Optimizations should be consistent and predictable
 Compile time should be proportional to code size

Parimal Giri
What are the issues in compiler construction?
25
 Source code  Target code
 written in a high level  Assembly language (chapter 9)
programming language which in turn is translated to
machine code
//simple example
L1:MOV total,R0
while (sum < total) CMP sum,R0
{ CJ< L2
sum = sum + x*10; GOTO L3
} L2:MOV #10,R0
MUL x,R0
ADD sum,R0
MOV R0,sum
GOTO L1
L3:first instruction
following the while
statement

Parimal Giri
What is the input?
26

 Input to the compiler is not

//simple example
while (sum < total)
{
sum = sum + x*10;
}

 Input to the compiler is

//simple\bexample\nwhile\b(sum\b<\btotal)\b{\n\tsum\b=
\bsum\b+\bx*10;\n}\n

 How does the compiler recognize the keywords,


identifiers, the structure etc.?

Parimal Giri
The Structure of a Compiler (1)
27

 Any compiler must perform two major tasks

Compiler

Analysis Phase Synthesis Phase

 Analysis of the source program(Lexical, Syntactic,Semantic and


Intermediate code generation)
 Synthesis of a machine-language program(Code optimization
and Code generation)

Parimal Giri
The Structure of a Compiler (2)
28

Source
Program Tokens Syntactic Semantic
Scanner Parser
(Character Stream) Structure Routines

Intermediate
Representation

Symbol and Optimizer


Attribute
Tables

(Used by all Phases of The Compiler)

Code
Generator
Parimal Giri
Target machine code
Lexical Analyzer
29

 Lexical Analyzer reads the source program character


by character and returns the tokens of the source
program.
 A token describes a pattern of characters having same
meaning in the source program. (such as identifiers,
operators, keywords, numbers, delimeters and so on)
Ex: newval := oldval + 12 => tokens: newval identifier
:= assignment operator
oldval identifier
+ add operator
12 a number

 Puts information about identifiers into the symbol table.


 Regular expressions are used to describe tokens (lexical
constructs).
 A (Deterministic) Finite State Automaton can be used in
the implementation of a lexical analyzer.
Parimal Giri
First step: Lexical analysis (Scanning)
30

 The compiler scans the input file and produces a


stream of tokens

WHILE,LPAREN,<ID,sum>,LT,<ID,total>,RPAREN,LBRACE,
<ID,sum>,EQ,<ID,sum>,PLUS,<ID,x>,TIMES,<NUM,10>,
SEMICOL,RBRACE

 Each token has a corresponding lexeme, the


character string that corresponds to the token
 For example “while” is the lexeme for token WHILE
 “sum”, “x”, “total” are lexemes for token ID

Parimal Giri
Lexical Analysis (Scanning)
31

 Compiler uses a set of patterns to specify valid


tokens
 tokens: LPAREN, ID, NUM, WHILE, etc.
 Each pattern is specified as a regular expression
 LPAREN should match: (

 WHILE should match: while

 ID should match: [a-zA-Z][0-9a-zA-Z]*

 It uses finite automata to recognize these patterns


0-9a-zA-Z
ID automaton a-zA-Z

Parimal Giri
Lexical analysis (Scanning)
32

 During the scan the lexical analyzer gets rid of the


white space (\b,\t,\n, etc.) and comments

 Important additional task: Error messages!


 var&1 Error! Not a token!
 whle Error? It matches the identifier token.

 Natural language analogy: Tokens correspond to


words and punctuation symbols in a natural
language

Parimal Giri
Syntax Analyzer
33

 A Syntax Analyzer creates the syntactic structure


(generally a parse tree) of the given program.
 A syntax analyzer is also called as a parser.
 A parse tree describes a syntactic structure.
assgstmt

identifier := expression • In a parse tree, all terminals are at leaves.

newval expression + expression • All inner nodes are non-terminals in


a context free grammar.
identifier number

oldval 12

Parimal Giri
Next Step: Syntax Analysis (Parsing)
34

 How does the compiler recognize the structure of the program?


Loops, blocks, procedures, nesting?

 Parse the stream of tokens parse tree


Stmt
WhileStmt
Stmt
Block
WHILE
Stmt RBRACE
LPAREN Expr RPAREN LBRACE
AssignStmt
RelExpr Expr
<ID,sum> EQ SEMICOL
<ID,sum> ArithExpr
LT <ID,total> Expr
Expr PLUS
ArithExpr
<ID,sum> Expr TIMES Expr
Parimal Giri <ID,x> <NUM,10>
Syntax Analysis (Parsing)
35

 The syntax a programming language is defined by a set of


recursive rules. These sets of rules are called context free
grammars.
Stmt WhileStmt | Block | ...
WhileStmt WHILE LPAREN Expr RPAREN Stmt
Expr RelExpr | ArithExpr | ...
RelExpr ...

 Compilers apply these rules to produce the parse tree


 Again, important additional task: Error messages!
 Missing semicolumn, missing parenthesis, etc.
 Natural language analogy: It is similar to parsing English
text. Paragraphs, sentences, nounphrases, verbphrases,
verbs, prepositions, articles, nouns, etc.

Parimal Giri
Semantic Analyzer
36

 A semantic analyzer checks the source program for semantic


errors and collects the type information for the code
generation.
 Type-checking is an important part of semantic analyzer.
 Normally semantic information cannot be represented by a
context-free language used in syntax analyzers.
 Context-free grammars used in the syntax analysis are
integrated with attributes (semantic rules)
 the result is a syntax-directed translation,
 Attribute grammars
 Ex:
newval := oldval + 12

 The type of the identifier newval must match with type of the expression
(oldval+12)

Parimal Giri
Next Step: Semantic (Context-Sensitive) Analysis
37

 Are variables declared before they are used?


 We can find out if “whle” is declared by looking at the symbol table
 Do variable types match?
sum = sum + x*10;
sum can be a floating point number,
+ x can be an integer

+
<id,sum> *
may become
<id,sum> int2float
<id,x> <num,10>

*
Symbol
Table sum float
<id,x> <num,10>
x int

Parimal Giri
Intermediate Code Generation
38

 A compiler may produce an explicit intermediate codes


representing the source program.
 These intermediate codes are generally machine
(architecture independent). But the level of intermediate
codes is close to the level of machine codes.
 Ex:
newval := oldval * fact + 1

id1 := id2 * id3 + 1

MULT id2,id3,temp1 Intermediates Codes


ADD temp1,#1,temp2
MOV temp2,,id1

Parimal Giri
Intermediate Representations
39

 The parse tree representation has too many details


LPAREN, LBRACE, SEMICOL, etc.
 Once the compiler understands the structure of the input program it does not
need these details (they were used to prevent ambiguities)
 Compilers generate a more abstract representation after constructing the parse
tree which does not include the details of the derivation
 Abstract syntax trees (AST): Nodes represent operators, children represent
operands
while
assign
<
+
<id,sum> <id,total> <id,sum>

<id,sum> *

<id,x> <num,10>
Parimal Giri
Intermediate Code Generator
40

 Translates from abstract-syntax tree to intermediate


code
 One possibility is 3-address code
each instruction involves at most 3 operands
Example:
temp1 = inttofloat(60)
temp2 = rate * temp1
temp3 = initial + temp2
position = temp3

Parimal Giri
Code Optimizer (for Intermediate Code
Generator)
41

 The code optimizer optimizes the code produced by


the intermediate code generator in the terms of time
and space.

 Ex:

MULT id2,id3,temp1
ADD temp1,#1,id1

Parimal Giri
Improving the Code: Code Optimization
42

 Compilers can improve the quality of code by static


analysis
 Data flow analysis, dependence analysis, code transformations,
dead code elimination, etc. We do not need to recompute x*10 in
while (sum < total)
each iteration of the loop
{
sum = sum + x*10; temp = x*10;
} while (sum < total)
transformation
to more efficient {
code sum = sum + temp;
}

Parimal Giri
Code Generator
43

 Produces the target language in a specific architecture.


 The target program is normally a re-locatable object file
containing the machine codes.

 Ex:
( assume that we have an architecture with instructions whose at least one of its
operands is
a machine register)

MOVE id2,R1
MULT id3,R1
ADD #1,R1
MOVE R1,id1

Parimal Giri
Next Step: Code Generation
44

 Abstract syntax trees are a high-level intermediate


representation used in earlier phases of the
compilation
 There are lower level (i.e., closer to the machine
code) intermediate representations
 Three-address code : every instruction has at most three
operands
 Jasmin: Assembly language for JVM (Java Virtual Machine) an
abstract stack machine (used in the project)
 Intermediate-code generation for these lower-level
representations and machine-code generation are
similar
Parimal Giri
Code Generation: Instruction Selection
45
 Source code
a = b + c;
d = a + e;

 Target code
If we generate code for each statement separately
MOV b,R0 we will not generate efficient code
code
for first ADD c,R0
statement MOV R0,a
MOV a,R0 This instruction is redundant
code for
second ADD e,R0
statement MOV R0,d

Parimal Giri
Code Generation: Register Allocation
46

 There are a limited number of registers available on real machines


 Registers are valuable resources, the compiler has to use them
efficiently

source code three-address code assembly code


d = (a-b)+(a-c)+(a-c); t =a-b; MOV a,R0
u=a-c; SUB b,R0
v=t+u; MOV a,R1
d=v+u; SUB c,R1
ADD R1,R0
ADD R1,R0
MOV R0,d

Parimal Giri
The Structure of a Compiler (8)
47
Code Generator
[Intermediate Code Generator]

Non-optimized Intermediate Code


Scanner
[Lexical Analyzer]

Tokens

Code Optimizer
Parser
[Syntax Analyzer]
Optimized Intermediate Code
Parse tree

Code Optimizer
Semantic Process
[Semantic analyzer] Target machine code

Abstract Syntax Tree w/ Attributes

Parimal Giri
Issues Driving Compiler Design
48

 Correctness
 Speed (runtime and compile time)
 Degrees of optimization

 Multiple passes

 Space
 Feedback to user
 Debugging

Parimal Giri
Tools
49

 Lexical Analysis – LeX, FLeX, JLeX

 Syntax Anaysis – JavaCC, SableCC

 Semantic Analysis – JavaCC, SableCC

 MiniJava programming language (Appel Book)

Parimal Giri

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy