0% found this document useful (0 votes)
121 views14 pages

Assignment 2 04042021 045308pm

A scanner needs to compute token attributes and collect them into a data structure called a token record. The scanner will operate under the control of a parser, returning single tokens on demand. Finite automata can be used to describe the scanning process and recognize patterns in input strings. The code example shows how to represent operators as a finite automata transition diagram and write code using the diagram. The tasks are to read input character by character, handle single character tokens, detect keywords and identifiers, and make a symbol table for identifiers.

Uploaded by

atif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views14 pages

Assignment 2 04042021 045308pm

A scanner needs to compute token attributes and collect them into a data structure called a token record. The scanner will operate under the control of a parser, returning single tokens on demand. Finite automata can be used to describe the scanning process and recognize patterns in input strings. The code example shows how to represent operators as a finite automata transition diagram and write code using the diagram. The tasks are to read input character by character, handle single character tokens, detect keywords and identifiers, and make a symbol table for identifiers.

Uploaded by

atif
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Bahria University, Islamabad Campus

Department of Computer Science


Assignment # 2
Class: BS(CS) -5A/B
(Fall 2019 Semester)

Course: Compiler Construction Date: / /2021


Deadline: / /2021 Total Marks: 20

Name:atif ali Enrollment:(01-134191-008)

A scanner needs to compute as many attributes as are necessary to allow further processing.
Since the scanner will have to compute possibly several attributes for each token. It is often
helpful to collect all the attributes into a single structured data type which we could call a token
record. Such a record could be declared in C as
typedef struct {
TokenType tokenval ;
char * stringval;
Int numval;
} TokenRecord ;
A common arrangement is for the scanner to return the token value only and place the other
attributes in global variables where they can be accessed by other parts of compiler. Although the
task of scanner is to convert the entire source program into a sequence of tokens, the scanner will
rarely do this all at once. Instead the scanner will operate under the control of parser , returning
the single next token from the input on demand. So the scanner will be declared as a function
such as
TokenType getToken(void);
Regular expressions: Regular expressions represent patterns of string of characters. Patterns
recognized by scanner are defined by Regular expressions.
Reserved words and identifiers: Reserved words are the simplest to write a regular expression:
they are represented by their fixed sequence of characters. If we wanted to collect the reserved
words into one definition, we could write something like
Reserved = if | while | do | ….
Identifiers: identifiers are strings of characters which are not fixed. Typically an identifier must
begin with a letter and contain only letters and digits. We can express this in terms of regular
definitions as letter = [a-zA-Z] digit = [0-9]
identifier = letter(letter|digit)*
Numbers: Numbers can be just sequence of digits (natural numbers), or decimal numbers, or
numbers with an exponent. We can write regular definitions for these numbers as follows:
nat = [0-9]+ signedNat = (+|-)? nat number =
signedNat (“.” nat) ? (E signedNat) ?
Finite automate: Finite automata or finite-state machines are a mathematical way of describing
particular kind of algorithms. In particular, finite automata can be used to describe the process of
recognizing patterns in input strings, and so can be used to construct scanners. Finite automata
can be described using transition diagrams the following example illustrates this. The transition
diagram makes it easy to visualize the scanner algorithm and code can be written easily by hand
if the scanner it to process simple language. consider the following operators. < , <= , <> , > ,
>= , =
We can represent them as a transition diagram. Then we can write code using the transition
diagram.

Transition Table: In the above code example, the Finite automata has been hardwired right into
the code. It is also possible to express the DFA as a data structure and then write “generic” code
that will take its actions from the data structure.
A simple data structure that is adequate for this purpose is a transition table. A two dimensional
array, indexed by state and input character that expresses the values of the transition function T.
Consider the DFA for identifier:

The DFA can be represented by the following transition table.


Input char Letter Digit other
State
1 2 Error Error
2 2 2 3
3 Yes Accept
Then the generic code can be expressed as:
State = 1
ch = next input character ;
while not Accept [ state ] and not error state do
new state = T[state ,ch]
if Advance[state,ch] then ch = next input
char state = new state end while if
Accept[state] then accept ;
Lexical Analyzer in C++
Now you have studied the basic theory to code for a Finite Automata. Using the suggested style
of coding write code to recognize the following key words and language constructs.
Keywords.:
If, do, for, while, begin , end , switch , else , break.
All Special Characters in C++:
; , [, ] , ( , ) , { , } ,
Operators:
The table is given below. The precedence at the top is maximum, at the bottom minimum.

Operators associativity
* / % multiply, divide, mod Left to right
+ - add, subtract Left to right
<< >> shift left, shift right Left to right
< <= > >= less than, less or equal, greater, greater or equal Left to right
== != equal, not equal Left to right
&& logical and Left to right
|| logical or Left to right
= += -= *= /= %= >>= <<= assignments Right to left

Tasks

1- Read file character by character in a char variable.


2- Check whether the character read is a space, tab, newline, if so skip and read next
character.
3- Consume C style comments starting with // consume all character until newline char
found.
4- If the character just read is not a space character, then proceed as follows. You can
divide and handle tokens in the following categories.
5- Tokens comprising of a single character only – like comma, semicolon, parenthesis,
brackets, arithmetic operators etc. --- so handle them first.
6- Task 5. Detect key words return corresponding tokens. 7- Task 6. Detect
identifiers return corresponding tokens.
8- Make a symbol table for identifiers with the following functions: entry
*Search(string). entry * make_entry(string).

#include<iostream>
#include<fstream>
#include<string>
#include<ctype.h>
#include<iomanip>
using namespace std;
enum TokenType
{
TRUE,FLASE,PLUS, PLUSEQUAL, MINUS, MINUSEQUAL, MUL,
MULEQUAL, DIV, DIVEQUAL,MOD, MODEQUAL,
UNDERSCORE,LSB, RSB, LRB, RRB, LCB, RCB, HASH, EQUAL, LESS,
GREATER,
INCREMENT, DECREMENT, COLON, SEMICOLON, DOT,
QUESTIONMARK, AND,
OR,OR, NOT, COMMA, NUM, ID, MAIN, IF, THEN, ELSE, BOOL, BREAK, CASE,
CHAR, CLASS, CONST, CONTINUE, DEFAULT, DELETE, DO, DOUBLE, ENUM,
EXPLICIT, EXPOT1,
FALSE, FLOAT, FOR, GOTO, INLINE, INT, LONG, NAMESPACE, NEW,
OPERATOR, PRIVATE, PROTECTED, PUBLIC, RETURN, SIZEOF,
STRUCT,
SWITCH, UNION, UNSIGNED, USING, VIT1UAL,
VOID, VOLATILE, WCHAR_T, WHILE
};

struct TokenRecord
{
TokenType TokenValue; string StringValue;
int numval;
};

TokenRecord T1; char c;


TokenRecord Table[100]; int Size = 0;
int Search(string);
void Entry(TokenType, string); TokenRecord LexicalAnalyzer();
int main()
{
LexicalAnalyzer(); return 0;
}
int Search(string a)
{
for (int i = 0; i <= Size; i++)
{
if (a == Table[i].StringValue)
{
return i;
}
}
T1.TokenValue = ID;
T1.StringValue = a;
Entry(T1.TokenValue, T1.StringValue); return Size - 1;
}

void Entry(TokenType token, string value)


{
Table[Size].TokenValue = token; Table[Size].StringValue = value; Size+
+;
}
TokenRecord LexicalAnalyzer()
f.open("Hamza.txt", ios::in);
if (!f)
{
cout<<"File has been opened sucessfully"<<endl;
}
else
{
cout << "File couldn't be opened." << endl;
}
while (!f.eof())
{

c = f.get();
if (c == '/' || c == ' ' || c == '\t' || c == '\n')
{
if (c == '/')
{
c = f.get();
if (c == '/')
{
while (c != '\n')
{
c = f.get();
if (f.eof())
{
exit(0);
}
}
f.seekg(-1, ios::cur);
}
}
}
else if (c == ' ')
{
continue;
}
else if (c == '\t')
{
continue;
}
else if (c == '\n')
{
continue;
}
}
T1.TokenValue = DIVEQUAL; T1.StringValue = "/=";
cout << "<" << T1.StringValue << ">" << endl;
f.seekg(-1, ios::cur); T1.TokenValue = DIV; T1.StringValue = "/";
cout << "<" << T1.StringValue << ">" << endl;
else
{
if (c == ',' || c == ';' || c == '(' || c == ')' || c == '{' || c == '}' ||
c== '=' || c == '+' || c == '-' ||
c == '*' || c == '[' || c == ']' || c == '%' ||
c == '#' || c == '<' || c == '>' || c == ':' || c == '.' || c == '^' ||
c == '&' || c == '|' || c == '!' || c == '~' || c == '"' || c == '?' || c== '\'')
{
if (c == ',')
{
T1.TokenValue = COMMA; T1.StringValue = ",";
}
else if (c == ';')
{
T1.TokenValue = SEMICOLON; T1.StringValue = ";";
}
else if (c == '(')
{
T1.TokenValue = LRB; T1.StringValue = "(";
}
else if (c == ')')
{
T1.TokenValue = RRB; T1.StringValue = ")";
}
else if (c == '{')
{
T1.TokenValue = LCB; T1.StringValue = "{";
}
else if (c == '}')
{
T1.TokenValue = RCB; T1.StringValue = "}";
}
else if (c == '=')
{
c = f.get();
T1.TokenValue = EQUAL; T1.StringValue = "==";
f.seekg(-1, ios::cur); T1.TokenValue = ASSIGN; T1.StringValue = "=";
else if (c == '+')
{
c = f.get();
if (c == '+')
{
T1.TokenValue = INCREMENT; T1.StringValue = "++";
}
else

T1.TokenValue = PLUSEQUAL; T1.StringValue = "+=";

{
f.seekg(-1, ios::cur); T1.TokenValue = PLUS; T1.StringValue = "+";
}
}
else if (c == '-')
{
c = f.get();
if (c == '-')
{
T1.TokenValue = DECREMENT; T1.StringValue = "--";
}
else if (c == '=')
{

}
else
{

}
}

T1.TokenValue = MINUSEQUAL; T1.StringValue = "-=";

f.seekg(-1, ios::cur); T1.TokenValue = MINUS; T1.StringValue = "-";

else if (c == '*')
{
c = f.get();
if (c == '=')
{

}
else
{

}
}

T1.TokenValue = MULEQUAL; T1.StringValue = "*=";

f.seekg(-1, ios::cur); T1.TokenValue = MUL; T1.StringValue = "*";

else if (c == '[')
{
T1.TokenValue = LSB; T1.StringValue = "[";
}
else if (c == ']')
{

T1.TokenValue = RSB; T1.StringValue = "]";


}
else if (c == '%')
{
c = f.get();
if (c == '=')
{

}
else
{

}
}

T1.TokenValue = MODEQUAL; T1.StringValue = "%=";

f.seekg(-1, ios::cur); T1.TokenValue = MOD; T1.StringValue = "%";

else if (c == '#')
{
T1.TokenValue = HASH; T1.StringValue = "#";
}
else if (c == '<')
{
c = f.get();
if (c == '=')
{
T1.TokenValue = LE; T1.StringValue = "<=";
}
else if (c == '<')
{

}
else
{

}
}

T1.TokenValue = ShiftL; T1.StringValue = "<<";

f.seekg(-1, ios::cur); T1.TokenValue = LESS; T1.StringValue = "<";

else if (c == '>')
{
c = f.get();
if (c == '=')
{

T1.TokenValue = GE; T1.StringValue = ">=";


}
else if (c == '>')
{

}
else
{

}
}

T1.TokenValue = ShiftR; T1.StringValue = ">>";

f.seekg(-1, ios::cur); T1.TokenValue = GREATER; T1.StringValue = ">";

else if (c == ':')
{
c = f.get();
if (c == ':')
{
}
else
{

}
}

T1.TokenValue = SCOPERESOLUTION;
T1.StringValue = "::";

f.seekg(-1, ios::cur); T1.TokenValue = COLON; T1.StringValue = ":";

else if (c == '.')
{
T1.TokenValue = DOT; T1.StringValue = ".";
}
else if (c == '^')
{
T1.TokenValue = POWER; T1.StringValue = "^";
}
else if (c == '&')
{
c = f.get();
if (c == '&')
{
T1.TokenValue = COMPARISONAND; T1.StringValue = "&&";
}

else
{

}
}

f.seekg(-1, ios::cur); T1.TokenValue = AND; T1.StringValue = "&";


else if (c == '|')
{
c = f.get();
if (c == '|')
{

}
else
{

}
}

T1.TokenValue = COMPARISONOR; T1.StringValue = "||";

f.seekg(-1, ios::cur); T1.TokenValue = OR; T1.StringValue = "|";

else if (c == '!')
{
c = f.get();
if (c == '=')
{

}
else
{

}
}

T1.TokenValue = NE; T1.StringValue = "!=";

f.seekg(-1, ios::cur); T1.TokenValue = NOT; T1.StringValue = "!";

else if (c == '~')
{
T1.TokenValue = TILDE; T1.StringValue = "~";
}
else if (c == '"')
{
T1.TokenValue = DICOMMA; T1.StringValue = '"';
}
else if (c == '?')

{
T1.TokenValue = QUESTIONMARK; T1.StringValue = '?';
}
else if (c == '\'')
{
T1.TokenValue = SICOMMA; T1.StringValue = '\'';
}
cout << "<" << T1.StringValue << ">" << endl;
}
else if (isdigit(c))
{
T1.TokenValue = NUM; T1.StringValue = c; while (isdigit(c))
{
c = f.get();
if (!isdigit(c))
{
break;
}
T1.StringValue += c;
}

endl;

f.seekg(-1, ios::cur);
cout << "<" << "NUM," << T1.StringValue << '>' <<

else if (c == '_' || isalpha(c))


{
T1.StringValue = c;
while (isalpha(c) || c == '_' || isdigit(c))
{
c = f.get();
if (c != '_' && !isalpha(c) && !isdigit(c))
{
break;
}
T1.StringValue += c;
}
f.seekg(-1, ios::cur);
if (T1.StringValue == "main" || T1.StringValue == "if" || T1.StringValue == "then" ||
T1.StringValue == "else" || T1.StringValue == "asm" || T1.StringValue == "auto" ||
T1.StringValue == "bool" || T1.StringValue == "break" || T1.StringValue == "case" ||
T1.StringValue == "catch" || T1.StringValue == "char" || T1.StringValue == "class" ||
T1.StringValue
== "const" || T1.StringValue == "continue" || T1.StringValue == "default" || T1.StringValue
== "delete" || T1.StringValue == "do" || T1.StringValue == "double" || T1.StringValue ==
"enum" || T1.StringValue == "explicit" || T1.StringValue == "expoT1" || T1.StringValue ==
"extern" ||

T1.StringValue == "false" || T1.StringValue == "float" || T1.StringValue == "for" ||


T1.StringValue == "friend" || T1.StringValue == "goto" || T1.StringValue == "inline" ||
T1.StringValue == "int" || T1.StringValue == "long" || T1.StringValue == "mutable" ||
T1.StringValue == "namespace" || T1.StringValue == "new" || T1.StringValue == "operator"
|| T1.StringValue == "private" || T1.StringValue == "protected" || T1.StringValue == "public"
|| T1.StringValue == "register" || T1.StringValue == "return" || T1.StringValue == "shoT1" ||
T1.StringValue == "signed" || T1.StringValue == "sizeof" || T1.StringValue == "static" ||
T1.StringValue == "struct" || T1.StringValue == "switch" || T1.StringValue == "template" ||
T1.StringValue == "this" || T1.StringValue == "throw" || T1.StringValue == "true" ||
T1.StringValue == "try" || T1.StringValue
== "typedef" || T1.StringValue == "typeid" || T1.StringValue == "typename" ||
T1.StringValue == "union" || T1.StringValue == "unsigned" || T1.StringValue == "using" ||
T1.StringValue == "viT1ual" || T1.StringValue == "void" || T1.StringValue == "volatile" ||
T1.StringValue == "wchar_t" || T1.StringValue == "while")

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy