0% found this document useful (0 votes)

5 views12 pages

Introduction to Compiler Design

The document provides an overview of compiler design, detailing the multi-stage process of translating high-level source code into machine code through various phases such as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, optimization, and code generation. Each phase has specific functions and tools, with a focus on modular design for clarity and correctness. The document emphasizes the importance of understanding both the theoretical and practical aspects of compiler construction.

Uploaded by

90loiq2y9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views12 pages

Introduction to Compiler Design

Uploaded by

90loiq2y9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Introduction to Compiler Design

Figure: A typical multi-stage compiler pipeline from source code to executable. In computing, a compiler is a
program that automates translation of source code (written in one high-level language) into another
language, often low-level machine code 1 . Compilers are fundamental in software development: they
allow developers to write in human-readable languages while ultimately producing efficient machine-
executable programs 1 . The phases of a compiler are organized into a pipeline, where each phase
transforms the program representation and passes its output to the next phase 2 1 . Figure above
illustrates this sequence from source code through lexical analysis, parsing (syntax analysis), semantic
analysis, intermediate-code generation, optimization, and finally target code generation 2 1 . In
practice, compilers are designed as modular components (front-end and back-end) that isolate each phase,
a structure that promotes clarity, correctness and reusability 3 . In fact, compiler designers invest much
effort to ensure correctness of each phase, since errors (e.g. misparsed constructs or incorrect code) can
lead to faulty executables that are hard to debug 4 .

A compiler’s front end (lexical, syntax, semantic) checks and understands the source program according to
the language’s rules, while the back end (code generation, optimization) produces efficient code for the
target machine 5 . For example, compiling C source eventually produces machine code or an object file;
this output is typically much faster and more efficient than interpreting the source at run time 6 .
Throughout compilation, auxiliary components like the symbol table and error handler support all phases.
The symbol table stores information about every identifier (name, type, scope) 7 8 , and error handling
ensures informative diagnostics. A summary of the major phases is given in Table below, with each phase’s
input, output, and key concepts:

1
Phase Input Output Core Concepts/Tools

Tokens (e.g. Regular expressions, finite

Source code
Lexical Analysis identifiers, automata, scanner generators (e.g.
characters
keywords) Lex/Flex) 9 10

Context-free grammars (BNF), parse

Syntax Analysis Parse tree (concrete
Token stream trees, LL/LR parsers, parser
(Parsing) syntax tree)
generators (Yacc/Bison) 11 12

Type checking, scope rules, symbol

Semantic Parse tree (and Annotated AST or
table management, attribute
Analysis symbol table) intermediate form
grammars 13 14

Intermediate code Abstract operations, temporary

Intermediate
Annotated AST/IR (e.g. three-address variables, intermediate
Code Generation
code) representations 15

Control-flow graphs, data-flow

Code Intermediate Optimized
analysis, constant folding, dead-
Optimization code intermediate code
code elimination 16 17

(Optimized) Instruction selection, register

Machine code or
Code Generation intermediate allocation, stack-frame layout 18
assembly
code 19

This pipeline shows how a high-level language program is progressively transformed into efficient machine
instructions 2 18 . Over the course, we will build understanding of each phase and how compiler tools
(like Lex and Yacc) automate parts of this process. By the end, you should appreciate the theory (formal
languages, automata, type systems) and practice (tool usage, coding strategies) behind compilers, making
you better at both constructing compilers and understanding how languages are implemented.

Lexical Analysis
The first phase of a compiler is lexical analysis (often called scanning). Its job is to read the raw source-
character stream and group characters into meaningful tokens. A token is a sequence of characters that
represents a basic syntactic unit (such as an identifier, keyword, literal constant, operator, or separator) 9 .
For example, in C the character sequence int is recognized as a keyword token, x1 as an identifier
token, + as an operator token, and 123 as a numeric-literal token. Lexical analysis simplifies later phases
by collapsing characters into token units; it also removes irrelevant material (like whitespace or comments)
and catches simple errors (like illegal characters) early.

Formally, tokens are defined by patterns (often given as regular expressions). Regular expressions describe
the set of strings belonging to each token type. For instance, one might specify that an integer literal token
matches the regex [0-9]+ , and an identifier matches [A-Za-z_][A-Za-z0-9_]* . In practice, a finite
automaton (deterministic or nondeterministic) is built from these regexes to recognize tokens. The lexical
analyzer runs the automaton on the input: when the automaton reaches an accepting (final) state, it has
recognized one token 20 . Deterministic finite automata (DFAs) are especially popular because they scan

2
input in one pass (left-to-right) and decide each token in linear time. In fact, the process of converting
regex-based token specifications into an efficient DFA can be automated.

Because defining regex and DFAs by hand is error-prone, compiler writers use scanner generators. Tools
like Lex or Flex let the developer write token patterns (regexes) and corresponding actions in a specification
file. The tool then automatically constructs a DFA and generates C code for the scanner. At compile time,
this scanner reads characters and outputs a stream of tokens (each token is typically represented by a token
type and possibly an attribute, like the text of an identifier) 10 21 . For example, the Flex manual notes that
Flex was explicitly designed to produce lexical analyzers faster than the original Lex 21 . Lex/Flex integrates
easily with later phases: they can hand off each recognized token to a parser generator like Yacc for syntax
analysis.

Common token categories include identifiers (names defined by the programmer), keywords (reserved
words like if , for ), constants/literals (numeric, character, or string literals), operators (such as + ,
- , * , < , > ), and punctuation or delimiters (such as semicolons, parentheses, braces) 9 22 . The
scanner typically maintains a table of keyword tokens (to distinguish the word int as a keyword vs. an
identifier), and it enters each new identifier into the symbol table for later use. In summary, lexical analysis
simplifies the source text into tokens for the parser, using regular expressions and finite automata – often
implemented via tools like Lex/Flex 9 10 .

Syntax Analysis
After tokenization, syntax analysis (parsing) examines the sequence of tokens to determine its
grammatical structure. The language’s syntax is defined by a context-free grammar (CFG) – a formal set of
production rules that describe which token sequences form valid programs 23 11 . For example, a simple
rule might say that if ( <expr> ) <stmt> is a valid form for an if-statement. The parser reads tokens
left-to-right and tries to build a parse tree (or concrete syntax tree): a tree whose internal nodes
correspond to nonterminal symbols of the grammar and whose leaves are the actual tokens 11 24 . Each
branch in the parse tree shows how a grammar rule applies. If the parser finds that the token sequence
cannot be derived by the grammar, it reports a syntax error. Well-formedness in syntax is crucial: parsing
ensures the program structure matches the language’s rules.

There are two broad classes of parsing strategies:

• Top-down parsing (LL parsing): These parsers start from the start symbol of the grammar and build
the parse tree from the root downward. An LL(1) parser is a simple predictive parser that reads input
Left-to-right, producing a Leftmost derivation, with 1 token of lookahead. LL(1) grammars must be
unambiguous and free of certain patterns (no left-recursion, no common prefixes requiring
backtracking). Practical top-down parsers often use recursive-descent algorithms guided by FIRST/
FOLLOW sets.

• Bottom-up parsing (LR parsing): These parsers start from the input tokens and build the parse tree
up to the root. LR parsers (including variants SLR(1), LALR(1), and canonical LR(1)) perform a shift-
reduce analysis. For example, Yacc (Yet Another Compiler-Compiler) generates LALR(1) parsers, a
form of LR parser with one-symbol lookahead. LR parsers can handle a larger class of grammars
(including most programming language grammars) than LL parsers. Practically, most compiler

3
courses use LALR(1) because Yacc/Bison are readily available. (The term “LALR” stands for
Lookahead LR, a compact version of full LR.)

Whether using LL or LR, the parser verifies that token sequences conform to the grammar and constructs
the parse tree. The parse tree is then usually transformed into an abstract syntax tree (AST) by dropping
extraneous grammar nodes (e.g. parentheses or punctuation), producing a more compact representation of
program structure. The AST is typically used in semantic analysis and code generation instead of the raw
parse tree.

Parser generators simplify this phase. Tools like Yacc and Bison allow the compiler writer to specify the
grammar rules and attach semantic actions to them. Yacc was originally developed at Bell Labs and
generates C code for an LALR(1) parser from a BNF grammar specification 12 . Its GNU successor, Bison, is
widely used today and can generate LALR(1) parsers (and even canonical LR or GLR parsers) from a similar
grammar file 25 . Both Yacc and Bison parse a .y file that contains grammar productions and C snippets;
they produce a parser that reads token streams (from Lex/Flex) and builds a parse tree or performs on-the-
fly processing. Yacc/Bison integrate well with Lex: typically, the lexer provides tokens to the parser, and
semantic actions (written in C) construct nodes of the AST or populate data structures 26 25 .

In summary, syntax analysis enforces the grammar of the language and builds the parse tree (AST). LL(1)
parsers (top-down) and LR(1)/LALR(1) parsers (bottom-up) are the common algorithms. Yacc/Bison are tools
that generate these parsers automatically 12 25 . By the end of this phase, we have a structured tree that
represents the full program syntax, ready for semantic checks.

Semantic Analysis
Once the parse tree (or AST) is obtained, semantic analysis verifies that the program is meaningful under
the language’s rules and gathers necessary type information. This phase uses both the syntax tree and the
symbol table built so far. Key tasks include type checking (ensuring operators are applied to compatible
types, function calls have correct arguments, etc.), enforcing scoping rules (e.g. variables are declared
before use), and evaluating constant expressions if possible. The semantic analyzer traverses the AST and
performs these checks, often using an attribute grammar approach where each node is annotated with
type and other semantic information. For example, it ensures you cannot add a string to an integer, or
cannot call a function with wrong number of parameters. If a semantic violation is found (like an undeclared
variable or mismatched types), the compiler reports an error.

A crucial data structure in this phase is the symbol table. As identifiers are encountered (in declarations or
uses), entries are made in the symbol table recording their name, type, scope level, memory location, and
other attributes 8 . The symbol table enables quick lookup of any identifier and enforcement of scope
rules (e.g. different functions may have variables with the same name in separate scopes). According to
tutorials, “every compiler uses a symbol table to track all variables, functions, and identifiers in a
program” 8 . During semantic analysis, the symbol table is filled (at declarations) and consulted (at uses) to
confirm correctness: for instance, the analyzer checks that an identifier is already declared before use 13
8 . It also notes type information so that, for example, an assignment x = y + 1 can be checked that

both sides are numerical or castable.

4
Once semantic analysis is done, the AST is typically annotated with type and other semantic information.
Some compilers also directly generate an intermediate representation (IR) in this phase. IR is a machine-
independent code form (such as three-address code or static single-assignment form) that abstracts away
high-level syntax while still being easier to optimize than raw machine code 15 . The Princeton notes define
IR as “an abstract machine language” that is independent of any particular machine or source language 15 .
In any case, semantic analysis outputs either an annotated AST or an initial intermediate code. This AST/IR
will be used by the next phases (code generation and optimization).

Finally, any semantic transformations (like implicit type promotions, array bounds checks, or short-circuit
evaluation of logical operators) are also handled here. The semantic checker may insert conversion nodes in
the AST or otherwise rewrite the tree to ensure type compatibility. After semantic analysis, the program is
guaranteed to be correct in meaning (according to the language), and all high-level structure is resolved.
In textbook terms, the front end (lexical + syntax + semantic) has now completely understood the program.
The remaining task is to produce efficient code from this representation.

(If any semantic errors were found, compilation typically stops here. Assuming none, the compiler proceeds.)

Code Generation
With a semantically sound, annotated AST or IR, the compiler proceeds to code generation. This phase
translates the high-level intermediate representation into low-level machine or assembly code for the target
architecture. The main challenge is to map each operation in the IR into one or more target instructions and
to manage limited hardware resources (like registers and memory) efficiently.

Typically, code generation is structured around traversing the AST or IR and emitting code. For simple
expressions, the compiler recursively generates code for subexpressions and then applies the
corresponding machine instruction (e.g. to add two values). For control structures, it emits jump/branch
instructions to implement loops and conditionals. Often this phase uses templates or patterns: for each IR
operation, there is a template sequence of machine instructions. Modern compilers use intermediate steps
such as instruction selection (choosing the best instruction sequences) and instruction scheduling
(ordering instructions to avoid hazards), but in our scope we focus on basics.

An important subproblem in code generation is register allocation. The target machine has a limited
number of registers, but the IR may use an unlimited number of temporaries. A register allocator assigns IR
variables or temporaries to physical registers, spilling some to memory if needed. Good allocators use
algorithms like graph-coloring to minimize spills. As GeeksforGeeks notes, “register allocation is an NP-
complete problem” but can be approximated by graph-coloring heuristics 19 . In practice, many compilers
perform a global analysis (build an interference graph of variables) and then color it with the number of
physical registers. Alternatively, a simpler local strategy (within each basic block) may be used in teaching
compilers. The goal is to keep frequently used variables in the fast registers rather than repeatedly loading/
storing them to memory 19 .

Stack and calling conventions are also handled here. For each function, the compiler allocates a stack
frame upon entry: it reserves space on the stack for local variables, saved registers, function parameters,
and the return address. When a function call is made, arguments are pushed to the stack or placed in
specified registers (per the target’s calling convention), and a jump is made to the function. On return, the

5
caller’s frame is restored. The details (which registers must be saved by caller vs callee, stack direction, etc.)
are target-specific, but the compiler back end must implement them correctly. For example, on x86 the EBP/
RBP register often points to the start of the current stack frame; on return that register (and others) must
be restored so the caller can continue correctly. The code generator emits prologue and epilogue code at
each function to set up and tear down the stack frame.

Finally, the code generator outputs either an assembly language file or direct machine code (object code).
The assembly output may still require an assembler/linker, whereas object code (relocatable machine code)
can be linked into an executable. Some compilers even produce absolute machine code (with fixed
addresses), but usually they produce relocatable output plus a symbol table for linking. GeeksforGeeks
notes that the target code can be absolute, relocatable, or assembly, each with trade-offs (e.g. absolute
code is hard to reuse, while assembly needs a separate assembler pass) 27 .

In summary, code generation converts IR into target-specific instructions, handling register allocation and
stack-frame layout. This phase ensures the final program performs the same logic as the source, but
directly on hardware. A well-designed code generator (together with the later optimizer) yields fast, efficient
executables from the high-level input 18 19 .

Code Optimization
Before or during code generation, a compiler typically performs optimization to improve performance or
reduce size of the output code. Optimization encompasses many techniques, but its overall goal is to make
the compiled program run faster and/or use fewer resources, without changing its behavior 16 . This might
mean fewer instructions executed, less memory used, or better use of CPU pipeline.

Optimizations can be machine-independent (applied to the intermediate code) or machine-dependent

(applied after target code generation). Machine-independent optimizations deal with the abstract IR: they
assume unlimited registers and no specific hardware. Examples include constant folding (compute
constant expressions at compile time), constant propagation (substitute known constant values), dead-
code elimination (remove instructions that compute unused results), and common subexpression
elimination (reuse results of repeated expressions) 28 . Machine-dependent optimizations consider the
actual machine: they might schedule instructions to avoid pipeline stalls, or choose specialized instructions
(like multiply-accumulate) if available. According to one source, machine-independent optimizations
improve the IR before register allocation, while machine-dependent optimizers reorganize the generated
code to exploit features like caches and pipelines 29 .

Many optimizations rely on the control-flow graph (CFG) of the program. A CFG breaks the program into
basic blocks (straight-line code sequences) and shows how control can flow between them. Using the CFG,
the compiler can detect unreachable blocks, merge identical code, and apply transformations across
branches. For example, after constructing the CFG, an optimizer might find that a particular block has no
incoming edges (unreachable) and safely remove it 17 . Other optimizations, like loop invariant code
motion, identify calculations inside loops that can be moved outside the loop for efficiency.

Another key method is data-flow analysis on the CFG. This means propagating information (like variable
definitions or usage) along edges until no more changes occur. Data-flow allows optimizers to find dead
variables and dead code: by analyzing what values are actually used, the compiler can eliminate

6
assignments that have no effect on the program’s outcome 30 . Data-flow frameworks also enable more
complex analyses like liveness (for register allocation) and reaching definitions.

Common specific optimizations include:

- Constant Folding: Pre-compute constant expressions at compile time (e.g. replace 2*(22/7)*r with x
= C*r where C is a computed constant) 28 .
- Copy/Constant Propagation: Replace occurrences of a variable known to hold a constant or another
variable, reducing redundancy 28 .
- Strength Reduction: Replace expensive operations with cheaper ones (e.g. replace multiplication by
addition in loops).
- Inline Expansion: Replace a function call with the function’s body if size permits (eliminate call overhead).
- Loop Transformations: Such as unrolling loops a fixed number of times to reduce branching, or swapping
nested loops to improve cache usage.

The optimizer must always preserve correctness: it can only perform transformations that do not change
the program’s meaning. In practice, compilers strike a balance: heavy optimization can exponentially
increase compile time, so usually only the most beneficial passes are enabled by default. Nonetheless,
effective optimizations can greatly speed up code and are a hallmark of modern compilers 16 29 .

Compiler Tools and Software Engineering

Writing a full compiler is a complex software engineering project. Successful compiler construction relies on
good design practices, modularity, and use of existing tools:

• Modular Design: Each compiler phase should be a separate module with clear interfaces. For
example, the lexer outputs a standardized token structure that the parser expects. The parser
produces a parse tree or AST, which is the interface to the semantic analyzer. By keeping the code for
each phase separate, developers can test and debug them independently. As noted in academic
references, compilers generally implement phases as modular components to promote efficient and
correct design 3 . For instance, one team member might work on the scanner (using Flex), another
on grammar and parsing (using Yacc), and later integration relies on both agreeing on token and
node formats.

• Compiler Construction Tools: We already mentioned Flex (lexical analyzer generator) and Bison/
Yacc (parser generators). These tools greatly simplify development. For example, Flex takes a list of
regular expressions and emits C code for a DFA-based scanner 21 . Tutorial sources point out that
scanner generators like Lex are built on finite automata for regex input 31 . Likewise, Bison reads a
grammar specification (BNF-like syntax), checks for ambiguities, and generates a corresponding
LALR(1) parser 25 . Using these tools means you often write far less code by hand and have robust
solutions for lexing/parsing. Other modern tools (outside the classic course scope) include ANTLR (a
powerful parser generator in Java) and LLVM (a framework for building back ends), but the principles
are similar.

• Development Practices: Use version control (e.g. Git) from day one, as a compiler project will evolve
and you want to track changes. Maintain a good suite of test programs: for each phase write small
programs to exercise all features (e.g. lexing corner cases, grammar constructs, type errors).

7
Integrate often: after completing the lexer, immediately hook it to the parser and test. Avoid writing
the entire compiler in one go – instead, build incrementally, phase by phase. Peer review of grammar
rules and code generation templates can catch many errors early.

• Error Handling and Diagnostics: Implement meaningful error messages. At minimum, report line
and column of errors. For syntax errors, consider simple recovery (skipping tokens until a
synchronizing token). For semantic errors (like type mismatch), point clearly to the offending
construct. Good error handling makes debugging much easier, and many problems in student
compilers arise from silent failures.

• Performance and Profiling: If time permits, profile the compiler itself. The front end (lexing/
parsing) should be quite fast for typical student projects, but code generation and optimization can
be heavy. Use tools (e.g. gprof or perf ) to find bottlenecks if the compiler is slow.

By applying software engineering best practices – modular design, clear interfaces, iterative testing – you’ll
reduce development headaches. And remember the old adage: writing compilers teaches “practical
applications of theory” (formal languages, automata, data structures). Document your code well, write clear
grammar comments, and treat the compiler as any major software project.

Exam Preparation
To prepare for exams, focus on understanding each phase and key concepts, and practice by hand on small
examples. Likely examinable topics include: - Phases of Compilation: Know the purpose and input/output
of each phase 2 1 . Be able to explain why each phase is needed.
- Lexical Analysis: Definitions of token, lexeme, pattern. Use of regular expressions and finite automata to
recognize tokens 20 10 . Perhaps design a regex for a given token or draw a small DFA.
- Syntax Analysis: Context-free grammars and parse trees 11 23 . Difference between LL(1) and LR(1)
parsing strategies. FIRST/FOLLOW sets, parsing tables (maybe for small grammar). Handling of ambiguities
(e.g. dangling else).
- Parser Generators: Role of Yacc/Bison: given a grammar, how Yacc translates it into parser code 12 25 .
Understand shift/reduce and reduce/reduce conflicts at a high level.
- Semantic Analysis: Type checking rules (e.g. given a code snippet, determine if type errors exist). Symbol
table contents: what information is stored and how it’s used 8 . Building and annotating the AST.
- Intermediate Representation: Forms like three-address code (quadruples), AST, DAG. Converting simple
statements into 3-address form.
- Code Generation: Manual translation of small IR to assembly. Register allocation basics: possibly simpler
methods (e.g. using stack for spills). Stack frame layout for a function (argument passing, local vars, return).
- Optimizations: Identify dead code, constant expressions, and optimize them (constant folding, dead-code
elimination). Draw a small CFG and explain data-flow facts. Understand loop optimization examples.
- Tools & Design: Roles of lex/yacc. Benefits of modular compiler design 3 .
- Theory vs Practice: Differences between compiler and interpreter 6 , or the meaning of “compiler
compiler” (parser generator) 32 .

Here are some sample questions with answers to guide your study:

8
Q1: Describe the phases of a compiler and the role of each phase.
Answer: The compiler works in stages. Lexical analysis scans characters and outputs tokens (identifiers,
literals, etc.) 9 . Syntax analysis parses tokens according to a grammar to build a parse tree 11 . Semantic
analysis checks types and scopes using the parse tree and symbol table 13 8 . After this, an intermediate
code may be generated. Code optimization then transforms the intermediate code for efficiency (e.g.
removing dead code) 16 . Finally, code generation produces target machine instructions, handling register
allocation and calling conventions 18 19 . Each phase’s output is the next phase’s input, enforcing
correctness before moving on.

Q2: Given the regular expression [A-Za-z_][A-Za-z0-9_]* , construct a DFA or describe how a
lexical analyzer recognizes identifiers matching this pattern.
Answer: The regex describes identifiers starting with a letter or underscore followed by any number of
letters, digits, or underscores. A DFA for this has: a start state S, transition from S on letter/underscore to
state A; state A loops on letter/digit/underscore; any other input leads to a rejecting (error) state. In
practice, a scanner built (e.g. by Flex) would recognize the longest sequence of such characters from the
input as one token (IDENT) and return it. This DFA is derived automatically from the regex and ensures each
valid identifier lexeme is correctly tokenized 20 10 .

Q3: For the grammar:

S -> S + S | S * S | ( S ) | id

is this grammar LL(1)? If not, how could you transform it to be LL(1)?

Answer: The given grammar is left-recursive (e.g. S -> S + S ) and ambiguous, so it is not LL(1). To make
it LL(1), first remove left recursion by refactoring. For example, introduce new rules:

S -> T S'
S' -> + T S' | ε
T -> F T'
T' -> * F T' | ε
F -> ( S ) | id

Here S’ and T’ are new nonterminals, and this grammar is LL(1) with proper FIRST/FOLLOW sets (it encodes
operator precedence + and *). This process of eliminating left recursion and factoring choices is typical for
preparing grammars for top-down parsing 12 .

Q4: What is a symbol table and how is it used in semantic analysis? Give an example of an entry it
might contain.
Answer: A symbol table is a data structure used by the compiler to store information about identifiers
(variables, functions, etc.) encountered in the program 8 . Each entry typically includes the identifier’s
name, type, scope level, memory location, and other attributes. For example, for a variable declaration
int x; the symbol table might have an entry:

Name: x | Kind: variable | Type: int | Scope: global | Memory addr: 0x1004

9
Semantic analysis uses the symbol table to check correct usage. When seeing x = 5; later, the compiler
looks up x in the table, sees it is declared as an int , and confirms that assigning an integer literal to it is
valid. The symbol table ensures that undeclared identifiers are caught and that types are applied
consistently 13 8 .

Q5: Perform constant folding on the following code fragment and explain the transformation.
int a = 2*(22/7)*r;
int x = 12.4;
float y = x/2.3;
Answer: Constant folding computes any expressions with known constant operands at compile time. In the
fragment:
- 2*(22/7)*r → First compute (22/7) as a constant (≈3.1428) and multiply by 2, giving a constant C =
2*(22/7) ≈ 6.2856. The code becomes int a = C * r; (if integer arithmetic is intended, you might get
2*3 * r = 6*r ).
- x = 12.4; y = x/2.3; → since x is assigned the constant 12.4 and never changed before use, we
can propagate the constant: y = 12.4/2.3; . Then fold the division to another constant (≈5.3913). So
this simplifies to y = 5.3913; .
Thus at compile time we replace expressions with their computed constants 28 . This reduces runtime work
and is a basic machine-independent optimization.

These examples should illustrate how to work through key concepts. In exams, always show your work: draw
diagrams (like DFAs or parse trees) neatly and explain each transformation step. Practice by hand is the best
preparation.

Additional Reading and Project Tips

To deepen your understanding and support the compiler project, consult standard textbooks and resources
on compilers. Classic references include “Compilers: Principles, Techniques, and Tools” by Aho, Sethi, and
Ullman (the Dragon Book), “Modern Compiler Implementation” by Appel, and “Engineering a Compiler”
by Cooper and Torczon. These cover theory and practice in detail. Many courses also recommend Fischer’s
book “Crafting a Compiler” or Levine’s “Flex & Bison” for tool-specific guidance. You do not need to read these
cover-to-cover, but use them to clarify concepts or for additional examples.

When using textbooks, read the relevant sections carefully and do the exercises. The examples in chapters
on lexical analysis and parsing are particularly helpful. Compare the book’s algorithms with your own notes
and code. If a concept (e.g. FIRST/FOLLOW sets, register allocation) is unclear, the book will usually have a
worked example. Also use lecture slides and reliable online sources (some are cited above) to complement
the texts.

For the term project of building a compiler for a simple language, apply solid project management: start
early, define milestones (e.g. finish lexer by week 3, parser by week 5, etc.), and test incrementally. Begin by
writing a few small test programs in your language (with all language features) and run them through each
phase as you implement. When debugging, work backwards: for example, if the final output is wrong,
inspect the intermediate code or AST to isolate the error. Use version control commits as you complete each
phase so you can revert if needed. Divide tasks among team members by compiler stage if you’re in a

10
group: one person can code the symbol table and semantic checks while another handles code generation,
but agree on data structures beforehand.

Remember to use the compiler tools: write Flex rules for tokens, and Yacc/Bison grammar rules for parsing.
For each Yacc grammar rule, include a semantic action in C that builds an AST node or generates
intermediate code (many examples exist online). Keep your symbol table and AST data structures well-
defined at the start – for example, use C structs or C++ classes with fields for type, value, and child pointers.
Modularize common tasks (like emitting three-address code) into helper functions.

Finally, manage your time: compilers can be tricky, and bugs in parsing or symbol handling can cascade.
Leave the optimizer for last; a straightforward (unoptimized) code generator is fine if you run out of time.
Focus first on getting correct output and clear error messages. In summary, leverage textbooks for theory,
follow good software practices for engineering, and test continuously. Good planning and understanding
the big picture will make the project (and the course) much smoother.

Sources: Authoritative references from compilers literature and educational resources were used
throughout 2 1 10 11 12 25 13 19 16 3 , to ensure accuracy and depth of explanation.

1 3 4 5 6 Compiler - Wikipedia
https://en.wikipedia.org/wiki/Compiler

2 7 14 Phases of Compiler
https://www.tutorialspoint.com/compiler_design/compiler_design_phases_of_compiler.htm

8 Symbol Table in Compiler | GeeksforGeeks

https://www.geeksforgeeks.org/symbol-table-compiler/

9 20 22 Introduction of Lexical Analysis | GeeksforGeeks

https://www.geeksforgeeks.org/introduction-of-lexical-analysis/

10 Lexical analysis - Wikipedia

https://en.wikipedia.org/wiki/Lexical_analysis

11 24 Parsing – Introduction to Parsers | GeeksforGeeks

https://www.geeksforgeeks.org/introduction-of-parsing-ambiguity-and-parsers-set-1/

12 26 32 Introduction to YACC | GeeksforGeeks

https://www.geeksforgeeks.org/introduction-to-yacc/

13 Semantic Analysis in Compiler Design | GeeksforGeeks

https://www.geeksforgeeks.org/semantic-analysis-in-compiler-design/

15 lect.dvi
https://www.cs.princeton.edu/courses/archive/spr03/cs320/notes/IR-trans1.pdf

16 17 28 29 30 Code Optimization in Compiler Design | GeeksforGeeks

https://www.geeksforgeeks.org/code-optimization-in-compiler-design/

18 27 Issues in the design of a code generator | GeeksforGeeks

https://www.geeksforgeeks.org/issues-in-the-design-of-a-code-generator/

11
19 Register Allocations in Code Generation | GeeksforGeeks
https://www.geeksforgeeks.org/register-allocations-in-code-generation/

21 Introduction of Compiler Design | GeeksforGeeks

https://www.geeksforgeeks.org/introduction-of-compiler-design/

23 Context-free grammar - Wikipedia

https://en.wikipedia.org/wiki/Context-free_grammar

25 GNU Bison - Wikipedia

https://en.wikipedia.org/wiki/GNU_Bison

31 Role of Compiler Construction Tools

https://www.tutorialspoint.com/what-is-the-role-of-compiler-construction-tools

Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
From Everand
Assembly Language:Simple, Short, And Straightforward Way Of Learning Assembly Programming
Sherwyn Allibang
2/5 (1)
Document From Aditya Tripathi
No ratings yet
Document From Aditya Tripathi
5 pages
01 IntroToCompilers
No ratings yet
01 IntroToCompilers
41 pages
Muhammad Hamza BSCS-E3-22-23 Compiler
No ratings yet
Muhammad Hamza BSCS-E3-22-23 Compiler
11 pages
Unit-I - CD R2021
No ratings yet
Unit-I - CD R2021
60 pages
Course Title: Compiler Design Course Code: CS 451 Topic: Introduction Branch: CSE Semester: VII TH
No ratings yet
Course Title: Compiler Design Course Code: CS 451 Topic: Introduction Branch: CSE Semester: VII TH
23 pages
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
No ratings yet
UNIT-I Compiler Design - SCS1303: School of Computing Department of Computer Science and Engineering
27 pages
SCS13033
No ratings yet
SCS13033
121 pages
Compilers
No ratings yet
Compilers
25 pages
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
No ratings yet
Indian Institute of Information Technology, Bhagalpur: Assignment - 1
26 pages
Compiler Design
No ratings yet
Compiler Design
118 pages
CS 321 - Compilers: Outline
No ratings yet
CS 321 - Compilers: Outline
8 pages
AT_Module6_Compiler and its phases_PS
No ratings yet
AT_Module6_Compiler and its phases_PS
32 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
Compiler Construction: Instructor: Aunsia Khan
No ratings yet
Compiler Construction: Instructor: Aunsia Khan
35 pages
CDUnit 1
No ratings yet
CDUnit 1
39 pages
Unit 1 Slides
No ratings yet
Unit 1 Slides
49 pages
Bedasa
No ratings yet
Bedasa
31 pages
CSE353 Slides
No ratings yet
CSE353 Slides
76 pages
Compiler Design Notes
No ratings yet
Compiler Design Notes
35 pages
Compiler Notes 1
No ratings yet
Compiler Notes 1
92 pages
1.Lecture Notes 19 Apil.doc
No ratings yet
1.Lecture Notes 19 Apil.doc
26 pages
Compiler Theory: 001 - Introduction and Course Outline
No ratings yet
Compiler Theory: 001 - Introduction and Course Outline
33 pages
Compiler Design
No ratings yet
Compiler Design
15 pages
SCSA1604
No ratings yet
SCSA1604
133 pages
Compiler Construction Complete Notes
No ratings yet
Compiler Construction Complete Notes
22 pages
CD
No ratings yet
CD
38 pages
Compiler Design Ch1
No ratings yet
Compiler Design Ch1
13 pages
15. Compiler Design
No ratings yet
15. Compiler Design
18 pages
Introduction To Compilers - Analysis of The Source Program - Phases of A Compiler Phases of Compiler
No ratings yet
Introduction To Compilers - Analysis of The Source Program - Phases of A Compiler Phases of Compiler
25 pages
66fe65b5746f9CCWeek-02Lecture03
No ratings yet
66fe65b5746f9CCWeek-02Lecture03
47 pages
1 - Introduction To Compilers
No ratings yet
1 - Introduction To Compilers
21 pages
Compiler Design
No ratings yet
Compiler Design
11 pages
Compiler Notes
No ratings yet
Compiler Notes
68 pages
5 Com
No ratings yet
5 Com
3 pages
Compiler Design Slide Chapter 1-6
No ratings yet
Compiler Design Slide Chapter 1-6
250 pages
compiler lec-one (1)
No ratings yet
compiler lec-one (1)
46 pages
Chapter 1
No ratings yet
Chapter 1
43 pages
compiler_design_and_implementation
No ratings yet
compiler_design_and_implementation
5 pages
Compiler Design Concepts Worked Out Examples and M
100% (1)
Compiler Design Concepts Worked Out Examples and M
100 pages
INTRO TO COMPILERS
No ratings yet
INTRO TO COMPILERS
77 pages
Unit 1
No ratings yet
Unit 1
29 pages
Unit 1
No ratings yet
Unit 1
29 pages
Unit-1,2
No ratings yet
Unit-1,2
33 pages
CD Unit1 Notes
No ratings yet
CD Unit1 Notes
28 pages
Unit-1 PCD
No ratings yet
Unit-1 PCD
28 pages
Compiler Design - YesDee(1)
No ratings yet
Compiler Design - YesDee(1)
427 pages
Chapter 1 - Overview of Compilation
No ratings yet
Chapter 1 - Overview of Compilation
32 pages
CC_unit_1
No ratings yet
CC_unit_1
70 pages
CD Introduction
No ratings yet
CD Introduction
32 pages
Unit I SRM
100% (1)
Unit I SRM
36 pages
Compiler Design
No ratings yet
Compiler Design
174 pages
cd unit 1
No ratings yet
cd unit 1
63 pages
Compiler Design Quantum
No ratings yet
Compiler Design Quantum
251 pages
CC 1
No ratings yet
CC 1
41 pages
Compiler Lecture 3 4 5
No ratings yet
Compiler Lecture 3 4 5
14 pages
Notes Compiler
No ratings yet
Notes Compiler
28 pages
Csc 321 Compiler Consturction 1 Note Main
No ratings yet
Csc 321 Compiler Consturction 1 Note Main
82 pages
Unit 2. The Phases of A Compiler
No ratings yet
Unit 2. The Phases of A Compiler
23 pages
Dart for Flutter
From Everand
Dart for Flutter
Zeuz IT
No ratings yet
.FirstAttemptFail-Review Questions
No ratings yet
.FirstAttemptFail-Review Questions
1 page
Generative Writing
No ratings yet
Generative Writing
3 pages
Contentment
No ratings yet
Contentment
2 pages
Globalization
No ratings yet
Globalization
2 pages
HD07.Proof of Completeness For PL
No ratings yet
HD07.Proof of Completeness For PL
5 pages
20cs2054 - Introduction
No ratings yet
20cs2054 - Introduction
37 pages
Toc MCQ
No ratings yet
Toc MCQ
5 pages
CS 540-1: Introduction To Artificial Intelligence: Exam 2: 7:15-9:15pm, April 13, 1998
No ratings yet
CS 540-1: Introduction To Artificial Intelligence: Exam 2: 7:15-9:15pm, April 13, 1998
8 pages
Cbma2103 T1
No ratings yet
Cbma2103 T1
8 pages
KEAM Engineering Maths Paper 2007 PDF
No ratings yet
KEAM Engineering Maths Paper 2007 PDF
30 pages
Brief History in Logic
No ratings yet
Brief History in Logic
2 pages
Class note on Propositional and predicate Logic
No ratings yet
Class note on Propositional and predicate Logic
58 pages
String Programs
No ratings yet
String Programs
19 pages
On The Foundation of Nonstandard Mathematics
No ratings yet
On The Foundation of Nonstandard Mathematics
42 pages
6 Regular Expressions
No ratings yet
6 Regular Expressions
28 pages
CMP3008 LN5 ContextFreeGrammars
No ratings yet
CMP3008 LN5 ContextFreeGrammars
56 pages
Finite Automata
No ratings yet
Finite Automata
41 pages
Finautomata
No ratings yet
Finautomata
1 page
Logical Agents I: Introducing The Wumpus
No ratings yet
Logical Agents I: Introducing The Wumpus
15 pages
Cyk Algorithm
No ratings yet
Cyk Algorithm
12 pages
Second Edition Errata - Ps
No ratings yet
Second Edition Errata - Ps
6 pages
Turing Machine
No ratings yet
Turing Machine
84 pages
CompilerDesign 210170107518 Krishna (4-10)
No ratings yet
CompilerDesign 210170107518 Krishna (4-10)
47 pages
Strings Programming Questions
No ratings yet
Strings Programming Questions
20 pages
Grammar and Languages
No ratings yet
Grammar and Languages
6 pages
Set Theory: ZFC: Axioms of Zermelo-Frankel With The Choice Axiom (ZFC) Define The Standard Theory of Sets
No ratings yet
Set Theory: ZFC: Axioms of Zermelo-Frankel With The Choice Axiom (ZFC) Define The Standard Theory of Sets
5 pages
SPCC Viva Question For Engineering PDF
No ratings yet
SPCC Viva Question For Engineering PDF
37 pages
Instant Access to Formal Techniques for Safety Critical Systems 4th International Workshop FTSCS 2015 Paris France November 6 7 2015 Revised Selected Papers 1st Edition Cyrille Artho ebook Full Chapters
100% (1)
Instant Access to Formal Techniques for Safety Critical Systems 4th International Workshop FTSCS 2015 Paris France November 6 7 2015 Revised Selected Papers 1st Edition Cyrille Artho ebook Full Chapters
59 pages
TCS Unit 1 Introduction
No ratings yet
TCS Unit 1 Introduction
23 pages
III Cse Ma3354 Notes Unit1
No ratings yet
III Cse Ma3354 Notes Unit1
36 pages
CD Final PDF
No ratings yet
CD Final PDF
140 pages
STRING3
No ratings yet
STRING3
5 pages
Intro To Logic Crash Course
No ratings yet
Intro To Logic Crash Course
27 pages
Question - Bank - Complier Design
No ratings yet
Question - Bank - Complier Design
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Introduction to Compiler Design

Uploaded by

Introduction to Compiler Design

Uploaded by

Introduction to Compiler Design

Tokens (e.g. Regular expressions, finite

Context-free grammars (BNF), parse

Type checking, scope rules, symbol

Intermediate code Abstract operations, temporary

Control-flow graphs, data-flow

(Optimized) Instruction selection, register

There are two broad classes of parsing strategies:

both sides are numerical or castable.

Optimizations can be machine-independent (applied to the intermediate code) or machine-dependent

Common specific optimizations include:

Compiler Tools and Software Engineering

Q3: For the grammar:

is this grammar LL(1)? If not, how could you transform it to be LL(1)?

Additional Reading and Project Tips

8 Symbol Table in Compiler | GeeksforGeeks

9 20 22 Introduction of Lexical Analysis | GeeksforGeeks

10 Lexical analysis - Wikipedia

11 24 Parsing – Introduction to Parsers | GeeksforGeeks

12 26 32 Introduction to YACC | GeeksforGeeks

13 Semantic Analysis in Compiler Design | GeeksforGeeks

16 17 28 29 30 Code Optimization in Compiler Design | GeeksforGeeks

18 27 Issues in the design of a code generator | GeeksforGeeks

21 Introduction of Compiler Design | GeeksforGeeks

23 Context-free grammar - Wikipedia

25 GNU Bison - Wikipedia

31 Role of Compiler Construction Tools

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.