Compiler Designassignment
Compiler Designassignment
BURIE CAMPUS
GROUP 1 MEMBERS:
This document provides a detailed discussion on the fundamental techniques, data structures,
and software tools used in compiler construction. Topics covered include lexical analysis,
syntax analysis, semantic analysis, intermediate code generation, and essential data structures
like abstract syntax trees and symbol tables. Additionally, tools such as lexical analyzer and
parser generators are explored
1
Part I: Basic Techniques in Compiler Construction
1. Lexical Analysis
Lexical analysis is the process of breaking down the source code of the program into
smaller parts, called tokens, such that a computer can easily understand. These tokens can
be individual words or symbols in a sentence, such as keywords, variable names, numbers,
and punctuation.
Lexical analysis is the first phase of the compilation process, which involves scanning the
source code and breaking it into tokens. It ensures that the source code conforms to the
lexical rules of the language.
a. Breaking Input into Tokens and Handling Lexical Errors
Breaking Input into Tokens:
The source program is divided into meaningful sequences called tokens. A token can be a
keyword, identifier, operator, or punctuation. For example:
int x = 10;
Tokens: int, x, =, 10, ;
Lexical Errors:
Errors like illegal characters or incomplete tokens can occur.
Example:
Input: int x = 10$;
Error: $ is an invalid character.
2
Fix: Modify the lexer to recognize only valid characters and provide meaningful error
messages.
b. Components of Lexical Analysis
Regular Expressions: Define patterns for tokens.
Example: [a-zA-Z_][a-zA-Z0-9_]* for identifiers
Finite Automata: DFA (Deterministic Finite Automaton) is used for token recognition.
Example: A DFA can identify keywords like int or operators like +.
2. Syntax Analysis
A syntax analyzer or parser takes the input from a lexical analyzer in the form of token
streams. The parser analyzes the source code (token stream) against the production rules to
detect any errors in the code. The output of this phase is a parse tree.
This way, the parser accomplishes two tasks, i.e., parsing the code, looking for errors and
generating a parse tree as the output of the phase.
Syntax analysis is the second phase where a parser checks whether the token sequence
adheres to the grammatical structure of the programming language.
3
a. Parser Implementation
A parser uses the grammar rules of a language to validate the structure of code.
Example Grammar Rule:
E→E+T|T
T→T*F|F
F → (E) | id
Input: x = a + b * c
Output: Syntax is correct, as it adheres to the grammar.
b. Constructing Parse Trees and Parsing Techniques
Parse Tree: Represents the syntactic structure of the code.
Parsing Techniques: Top-down Parsing: Starts from the start symbol and derives the input
string (e.g., LL(1)).Bottom-up Parsing: Reduces the input string to the start symbol (e.g.,
LR(1)).Recursive Descent Parsing: A top-down method using recursive procedures for each
grammar rule.
c. Ambiguous Grammar
Ambiguity arises when a grammar produces more than one parse tree for the same string.
Example:
E → E + E | E * E | id
Solution: Rewrite the grammar to remove ambiguity. For example, define precedence and
associativity rules.
Semantic analysis is the process of understanding the meaning and interpretation of words,
phrases, and sentences in a given context.
4
a. Type Checking and Type Mismatch Errors
5
Code Optimization: Improves the intermediate code for better performance without changing
its behaviour.
AST is a simplified representation of the parse tree that eliminates unnecessary details.
Example:
6
6. Symbol Tables
A symbol table is a crucial data structure in compilers that stores information about
identifiers, including their names, types, scopes, and memory locations, to facilitate error
checking, code optimization, and efficient code generation during the compilation process.
Stores information about variables, functions, objects, etc., including their types, scopes, and
memory locations.
Example:
Example:2
7
For int x = 10;, the symbol table contains:
x int Global 10
In the three-address code, at most three addresses are define any statement. Two addresses for
operand & one for the result.
Hence, op is an operator.
An only a single operation is allowed at a time at the right side of the expression.
Example− 1
Expression a = b + c + d can be converted into the following Three Address Code.
t1 = b + c
t2 = t1 + d
a = t2
8
where t1 and t2 are temporary variables generated by the compiler. Most of the time a
statement includes less than three references, but it is still known as a three address statement.
TAC represents statements as a sequence of simple instructions with at most three operands.
Example:2
Input: x = (a + b) * c
TAC:
t1 = a + b………………….//precedence rule are used
t2 = t1 * c
x = t2
8. Stack Machines
Stack Machine is a computational model like Turing Machine, Belt Machine and many
others. The central and most important component of a Stack Machine is a stack which is
used in place of registers to hold temporary variables.
Stack is a data structure with two basic operations:
You must think of Stack as a storage device and is used in place of registers in case of Stack
Machines. Stacks are used to store temporary variables during computations in a Stack
Machine
9
Example:
Input: a + b * c
Stack Operations:
Push b
Push c
Multiply
Push a
Add
10
Example: Using Lex, define token patterns in a .l file to generate a C program for token
recognition.
11
Summary
Compiler design is a crucial area of study, providing insights into how programming
languages are processed and executed. This document explored the main phases of
compilation, including lexical, syntax, and semantic analysis, as well as intermediate code
generation and optimization. It also highlighted the importance of data structures like abstract
syntax trees and symbol tables in organizing and managing program information.
Furthermore, tools like Lex, Flex, Yacc, and Bison were discussed, demonstrating their role
in automating various compiler components. Mastery of these concepts equips computer
science students with the skills to build efficient, error-free software and understand the
complexities of language processing.
By applying these techniques and tools, developers can optimize performance, minimize
errors, and contribute to advancements in programming and software development.
12
References
Aho, A. V., Lam, M. S., Sethi, R., & Ullman, J. D. (2006). Compilers: Principles,
Techniques, and Tools.
Dragon Book
13