0% found this document useful (0 votes)

5 views24 pages

Ss CC

SiCC is a tokenizer and parser code generator that simplifies the creation of compilers for specific languages by allowing users to define token and grammar specifications. It generates Java classes for tokenizing and parsing, but users must implement additional logic to utilize the generated parse tree. The document outlines the command usage, file formats for token and grammar definitions, and how to traverse the parse tree using a Visitor pattern.

Uploaded by

nirmalashetty569

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views24 pages

Ss CC

Uploaded by

nirmalashetty569

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

ssCC

a Parser Code Generator

ssCC

SiCC
Simple Compiler Compiler
The Goal of a Compiler Compiler

To create a compiler for a specific language based on the

language's token and grammar definition.

Takes care of the dirty work of having to analyze input.

No need to implement tedious tokenizing and parsing of input

every time you need to create a language, simply define what
the desired language "looks like".
How SiCC Works
Let's create a new language: MyUberLang

The user creates token and grammar definitions for

MyUberLang and feeds them to SiCC
SiCC uses the definitions to build a tokenizer and a parser,
and outputs them as Java classes
Along with SiCC's output, the user provides extra Java
classes which traverses the parse tree in order to implement
MyUberLang's "logic"
The total of generated and user-supplied classes can now
be used to compile or interpret code written MyUberLang
How SiCC Works
now with diagrams!
Strictly Speaking...
(you might have noticed)

...SiCC is not a "compiler compiler" but rather a "tokenizer and

parser code generator".

The code that SiCC generates will tokenize and build a parse
tree, but it does not know what to do next, it cannot compile.

It's up to you to create classes that use the generated parse

tree to meet your needs, such as interpreting or compiling.
Overview
checklist...

SiCC - Takes care of dirty work, creates tokenizer and parser

Token Definition File (txt) - Food for SiCC, makes tokenizer

Grammar Definition File (txt) - Food for SiCC, makes parser

Visitor implementation (Java) - Traverse parse tree, gives

meaning

Main class (Java) - Connect everything together

The SiCC Command
SiCC [options] definitions

options (combination of the following)

--package pakagename
includes all generated files in the given Java package

--prefix prefix
adds the given prefix to all generated classes

definitions (one of the following)

token.def grammar.def

--tokenizer-only tokens.def

--parser-only grammar.def
The SiCC Command
examples...

The basic command

sicc myuberlang.tokens.txt myuberlang.grammar.txt

Include all generated classes in the Java package "uberpack"

sicc --package uberpack myuberlang.tokens.txt myuberlang.grammar.txt

Prefix all generated classes with Uber (such as UberTokenizer.java)

sicc --prefix uber myuberlang.tokens.txt myuberlang.grammar.txt

Generate only a tokenizer, which will belong to the "ubertok" package

sicc --package ubertok --tokenizer-only myuberlang.tokens.txt
Token Definition File
A simple text file containing regular definitions.

One definition per line in the format tokenname: definition

Token names must be of only alphanumeric characters ('a' to 'b' and

'0' to '9') and must start with a letter.

Definitions are written as regular expressions.

Internal token definitions are written :tokenname: definition

and will not be turned into tokens by the generated tokenizer, but can
be embedded in other token definitions.

Embedding a token is only allowed if the token to be embedded has

been defined above the current definition.

Comments may be written by starting with a pound sign: #

Token Definition File
operators and special characters...
* Match zero or more

+ Match one or more

? Match one or none

| Match the pattern on either side, much like an OR

\ Escape, matches the next character (used to match operators, such as \+ or \[ or \:)

\n Matches a newline characters

\r Matches a carriage return character

\t Matches a tab character

\s Matches a space character

() Group patterns

[abc] Character class, matches any character within the brackets

[^abc] Negative character class, matches any character that is not within the brackets

:tok: Uses the pattern of the named token to match

Token Definition File
examples...

# matching "khan", "khaan", "khaaan", "khaaaan"...

tok1: kha+n

# matching "fun" or "sun"

tok2: [fs]un

# matching "wild cats" or "wild dogs", note the use of \s

tok3: wild \s (cats | dogs)

# matches a quoted string

tok5: " [^"]* "

# matching a variable name

# (alphanumeric, starting with a letter)
:alpha: [ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz]
:digit: [0123456789]
tok4: :alpha: (:alpha: | :digit:)*
Token Definition File
special SiCC token definitions...

SKIP

definition of what the generated tokenizer may ignore,

usually used for ignoring whitespace and comments

EOF

not defined in the token definition file, but is automatically

returned by the generated tokenizer when the end of the
input has been reached, which is used by the parser
Grammar Definition File
A simple text file containing a context-free grammar.

One definition per line in the format Rule -> definition

Rule names must be of only alphanumeric characters ('a' to 'b' and '0'
to '9') and must start with a letter.

Definitions are written as (simpler) regular expressions.

Only token and rule names may be used in the definition.

The first rule is considered the starting rule and becomes the root of
the parse tree.

Comments may be written by starting with a pound sign: #

Grammar Definition File
operators and special tokens...

note: A much smaller set of operations compared

to token defintions.

* Match zero or more

? Match one or none

| Match the pattern on either side, much like an OR

() Group patterns

\0 Epsilon, an 'empty' match

[>1] Multiple child flag, must be places at the end of the definition,
signals that the rule should be included in the parse tree only
if the node has more than one child
Grammar Definition File
examples...

# matching a phone number of the form "(613) 555-1234"

PhoneNumber -> AreaCode space FirstPart dash

SecondPart AreaCode -> leftparen threedigits rightparen

FirstPart -> threedigits

SecondPart -> fourdigits

# matching a person's name, optional title and middle names

FullName -> Title FirstName MiddleNames LastName

Title -> profession | maritalstatus | \0

FirstName -> name

MiddleNames -> name*

LastName -> name

Grammar Definition File
examples (continued)...

# using the [>1] indicator

Assignment -> var eq Sum

Sum -> Term (plus Term)* [>1]

Term -> number (multiply number)* [>1]

myvar = 5 + 6 * 4 myvar2 = 6 * 4
SiCC Generated Classes
ASTNode Base parse tree node class

ASTToken A superclass of ASTNode, represents a token in the parse

tree
AST___Node A superclass of ASTNode, one created for each grammar rule

iTokenizer An interface implemented by Tokenizer

Parser The main parsing class, takes a Tokenizer and outputs a

parse tree

Token A token outputed from Tokenizer

Tokenizer The main tokenizing class, reads in a character stream and

outputs a stream of Tokens

Visitor<X,Y> An interface that uses the visitor pattern, used to traverse the
parse tree
Traversing the Parse Tree
parse tree nodes...

A class named AST___Node is created for every rule defined.

ex: ASTBlockNode, ASTStatementNode, ASTSumNode

An ASTToken class is also created to represent tokens.

All of these classes are superclasses of the base ASTNode and

make up the generated parse tree.
Traversing the Parse Tree
selected variables and methods of ASTNode...
public class ASTNode {

private ASTNode parent;

private Vector<ASTNode> children = new Vector<ASTNode>();

private String name, value;

public Vector<ASTNode> getChildren() { return children; }

public ASTNode getChild(int i) { return children.get(i); }

public int numChildren() { return children.size(); }

public String getName() { return name; } // rule or token name

public String getValue() { return value; } // only used for ASTTokens

public ASTNode getParent() { return parent; }

public <X,Y> X accept(Visitor<X,Y> visitor, Y data) {

return visitor.visit(this, data);
}
}
Traversing the Parse Tree
the Visitor interface...

A generic Java interface called Visitor is also created:

public interface Visitor<X,Y>

The interface defines the following method for ASTNode and

each of its AST___Node superclasses:

public X visit(AST___Node node, Y data);

In your implementation of the Visitor interface, the class types

and uses of X and Y are of your choosing, they are meant
as helpers.
Traversing the Parse Tree
visiting the tree...

As you might have noticed, ASTNode defines an accept(Visitor

v) method, which calls the Visitor's visit(ASTNode n) function
with itself as the argument.

The parse tree is visited in this way.

In your implementation of Visitor, each call to visit(AST___Node

n) will usually include recursive accept(this) to each of the node's
children, along with the "logic" needed to handle the node.
Putting It All Together
creating a Main class...

Basically:
class SuperApp {

public static void main(String args[]) {

// Create a tokenizer from where the input is coming

Tokenizer tokenizer = new Tokenizer(new InputStreamReader(System.in));

// The parser needs a tokenizer, so pass it in

Parser parser = new Parser(tokenizer);

// Simply call the parser's parse() method, which returns the root node
ASTEquationNode rootnode = parser.parse();

// Create a visitor
InterpretorVisitor() interpretor = new InterpretorVisitor();

// Start the traversal by visiting the root node, the output type and
meaning // depends on your Visitor implementation
String output = interpretor.visit(rootnode, null);

}
That's All There Is To It!

.....yeah ok, it's best to learn by examples

CSE340 Summer 2016 Project 2: Parsing: 1. Lexical Specification
No ratings yet
CSE340 Summer 2016 Project 2: Parsing: 1. Lexical Specification
9 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Xii-Score Plus Cs - QB With Cbse SP and MTP - 12
No ratings yet
Xii-Score Plus Cs - QB With Cbse SP and MTP - 12
178 pages
Compiler Construction MCQ
100% (1)
Compiler Construction MCQ
16 pages
Using SableCC
No ratings yet
Using SableCC
19 pages
Lexical Analysis
No ratings yet
Lexical Analysis
66 pages
Compiler Design Notes, IIT Delhi
No ratings yet
Compiler Design Notes, IIT Delhi
147 pages
Introduction To Javacc: Cheng-Chia Chen
No ratings yet
Introduction To Javacc: Cheng-Chia Chen
87 pages
9 - Syntax Analysis
No ratings yet
9 - Syntax Analysis
60 pages
Lecture 06
No ratings yet
Lecture 06
54 pages
Lecture 7-8 - Context-Free Grammars and Bottom-Up Parsing
No ratings yet
Lecture 7-8 - Context-Free Grammars and Bottom-Up Parsing
39 pages
Getting Started With JavaCC
No ratings yet
Getting Started With JavaCC
9 pages
Module1 1
No ratings yet
Module1 1
20 pages
Lexical Analysis: Programming Languages Translators
No ratings yet
Lexical Analysis: Programming Languages Translators
21 pages
Antlr PDF
0% (1)
Antlr PDF
37 pages
Antlr 4 Guide To Help U Learn
No ratings yet
Antlr 4 Guide To Help U Learn
37 pages
Compiler Lab Manual
No ratings yet
Compiler Lab Manual
19 pages
Compiler Lab Manual
No ratings yet
Compiler Lab Manual
19 pages
Chapter-02 (Part-II) PDF
No ratings yet
Chapter-02 (Part-II) PDF
23 pages
PL Lec 2 Syntax and Semantics
No ratings yet
PL Lec 2 Syntax and Semantics
48 pages
حاسبة
No ratings yet
حاسبة
58 pages
(Week 3) Syntax Analysis (Derivation)
No ratings yet
(Week 3) Syntax Analysis (Derivation)
46 pages
CH2 2
No ratings yet
CH2 2
30 pages
Simple One Pass Compiler
No ratings yet
Simple One Pass Compiler
62 pages
03 Lexing Parsing
No ratings yet
03 Lexing Parsing
78 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
A Simple One - Pass Compiler
No ratings yet
A Simple One - Pass Compiler
62 pages
Compiler Design 4
No ratings yet
Compiler Design 4
48 pages
2-Lexical Analysis Part1
No ratings yet
2-Lexical Analysis Part1
39 pages
Entrepreneurship Process
No ratings yet
Entrepreneurship Process
22 pages
2014-CD Ch-03 SAn
No ratings yet
2014-CD Ch-03 SAn
21 pages
Lab 2: Lexer Implementation: Preparation
No ratings yet
Lab 2: Lexer Implementation: Preparation
6 pages
Certificate Declaration: Topic Name
No ratings yet
Certificate Declaration: Topic Name
16 pages
SP Unit III-2024-25
No ratings yet
SP Unit III-2024-25
126 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
A Simple One-Pass Compiler (To Generate Code For The JVM)
No ratings yet
A Simple One-Pass Compiler (To Generate Code For The JVM)
70 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Parsing Techniques Compiler Design - Virender Singh
100% (1)
Parsing Techniques Compiler Design - Virender Singh
203 pages
Lex and Yacc
No ratings yet
Lex and Yacc
33 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
2 Tokens Naturalness of Code
No ratings yet
2 Tokens Naturalness of Code
56 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
BCS 324 Compiler Design Notes - Unit2
No ratings yet
BCS 324 Compiler Design Notes - Unit2
37 pages
Part 1. Experiments With Javacc: Source Code Source Code
No ratings yet
Part 1. Experiments With Javacc: Source Code Source Code
3 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Compiler Construction Using Flex and Bison - Aaby - Anthony A
100% (1)
Compiler Construction Using Flex and Bison - Aaby - Anthony A
102 pages
Tutorial Flex Bison
No ratings yet
Tutorial Flex Bison
102 pages
Compiler 3
No ratings yet
Compiler 3
11 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
LexYacc Final
No ratings yet
LexYacc Final
44 pages
Syntax Directed
No ratings yet
Syntax Directed
80 pages
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
No ratings yet
Compiler Construction: Chapter # 2 - Lexical Analysis Instructor: Ms. Raazia Sosan
53 pages
Dusan Slides
No ratings yet
Dusan Slides
81 pages
Compiler Construction: Department of Computer Science
No ratings yet
Compiler Construction: Department of Computer Science
17 pages
Compilation Techniques
No ratings yet
Compilation Techniques
20 pages
Principles of Programming Language
No ratings yet
Principles of Programming Language
44 pages
Java Programming Tutorial With Screen Shots & Many Code Example
From Everand
Java Programming Tutorial With Screen Shots & Many Code Example
Desmond Ohwofosirai
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
From Everand
Python: Advanced Guide to Programming Code with Python: Python Computer Programming, #4
Charlie Masterson
No ratings yet
Python: Advanced Guide to Programming Code with Python
From Everand
Python: Advanced Guide to Programming Code with Python
Charlie Masterson
No ratings yet
Python Reference: An Alphabetical Guide
From Everand
Python Reference: An Alphabetical Guide
Jo Foster
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
1 Principles of Compiler Design
No ratings yet
1 Principles of Compiler Design
89 pages
Experiment 11 Case Study:Lex and Yacc
No ratings yet
Experiment 11 Case Study:Lex and Yacc
4 pages
B.tech Semester III System Software 08-08-2023 (Revised)
No ratings yet
B.tech Semester III System Software 08-08-2023 (Revised)
5 pages
Chapter 5: A Framework For Investigating Textual Identity or 'Persona' in Witten Interaction
No ratings yet
Chapter 5: A Framework For Investigating Textual Identity or 'Persona' in Witten Interaction
73 pages
Formation of SQL From Natural Language Query Using NLP: Uma M Sneha V Sneha G
No ratings yet
Formation of SQL From Natural Language Query Using NLP: Uma M Sneha V Sneha G
5 pages
2 - Lexical Analyzer Lecture 01
No ratings yet
2 - Lexical Analyzer Lecture 01
68 pages
Computer Science & Engineering: G. Pulla Reddy Engineering College (Autonomous) : Kurnool
No ratings yet
Computer Science & Engineering: G. Pulla Reddy Engineering College (Autonomous) : Kurnool
35 pages
Introduction To NLP Basics of Text Processing, Spelling Correction-Edit Distance, Weighted Edit Distance
No ratings yet
Introduction To NLP Basics of Text Processing, Spelling Correction-Edit Distance, Weighted Edit Distance
35 pages
Lex Programming Lab
No ratings yet
Lex Programming Lab
9 pages
Constants Variables Datatypes
100% (1)
Constants Variables Datatypes
44 pages
Compiler Lab
No ratings yet
Compiler Lab
2 pages
Compiler Design Lab Manual For r13 PDF
100% (2)
Compiler Design Lab Manual For r13 PDF
52 pages
Java String Class
No ratings yet
Java String Class
26 pages
Speech Recognition Documentation 2
No ratings yet
Speech Recognition Documentation 2
80 pages
Quik RevisionCS Class12
No ratings yet
Quik RevisionCS Class12
194 pages
C To Python Translator Using LEX & YACC
No ratings yet
C To Python Translator Using LEX & YACC
9 pages
Yacc Tutorial
No ratings yet
Yacc Tutorial
15 pages
Phases of Compiler
No ratings yet
Phases of Compiler
36 pages
KES2006 Springer
No ratings yet
KES2006 Springer
9 pages
System Software BC0051 BCA - 4 (New) : 1-Marks Questions (Qs 1 To Qs 40.)
No ratings yet
System Software BC0051 BCA - 4 (New) : 1-Marks Questions (Qs 1 To Qs 40.)
16 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
13 pages
Lexical and Syntax Analysis
No ratings yet
Lexical and Syntax Analysis
3 pages
SSCD Assignment1
No ratings yet
SSCD Assignment1
11 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
45 pages
Adhiparasakthi College of Engineering, G.B.Nagar, Kalavai
No ratings yet
Adhiparasakthi College of Engineering, G.B.Nagar, Kalavai
19 pages
A Levels Computing 9691 Revision Notes
No ratings yet
A Levels Computing 9691 Revision Notes
17 pages
Csi 07
No ratings yet
Csi 07
25 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.