Toc Theory
Toc Theory
1. Symbols
• Example: In a binary language, the symbols could be {0, 1}. In a more general language,
symbols could include letters (A, B, C, etc.) or any characters.
2. Alphabet
• Definition: An alphabet is a finite, non-empty set of symbols. It is the basic building block of
formal languages. The notation often used is Σ (sigma), which represents the alphabet.
• Example: For example, the alphabet Σ = {a, b} contains two symbols: 'a' and 'b'.
3. Power of an Alphabet
• Definition: The power of an alphabet refers to the number of distinct strings that can be
formed using the symbols of the alphabet. If the alphabet has nnn symbols, then the total
number of strings of length kkk that can be formed from the alphabet is nkn^knk.
• Example: If the alphabet is Σ = {a, b}, then for strings of length 2, the possible strings are: {aa,
ab, ba, bb}. There are 22=42^2 = 422=4 strings of length 2.
4. Strings
• Definition: A string is a finite sequence of symbols from a given alphabet. Strings can be of
any length, including zero (the empty string, usually denoted as ε).
• Example: If Σ = {a, b}, then "abba", "aa", and "b" are all examples of strings formed from this
alphabet.
5. Language
• Definition: A language is a set of strings formed from an alphabet. Languages can be finite or
infinite, and they can be defined by specific rules or patterns. The concept of language is
central to formal language theory.
• Example: For the alphabet Σ = {a, b}, a language L could be defined as L = {a, ab, aab, aaab,
...}, which consists of strings with one or more 'a's followed by zero or more 'b's.
The primary role of the lexical analyzer is to read the input source code and convert it into a
sequence of tokens. Here are the key functions and responsibilities of a lexical analyzer:
1. Input Processing:
o The lexical analyzer reads the source code character by character and organizes the
input for further processing.
2. Token Generation:
o It identifies meaningful sequences of characters (lexemes) and classifies them into
tokens. Tokens are categorized into types such as keywords, identifiers, literals,
operators, and punctuation.
4. Error Detection:
o The lexical analyzer checks for errors in the source code, such as illegal characters or
malformed tokens, and generates appropriate error messages.
o It may maintain a symbol table to keep track of identifiers, their types, and other
attributes that are used during compilation.
6. Output:
o The lexical analyzer outputs a stream of tokens to the parser for further syntactic
analysis, thus acting as an interface between the source code and the parser.
Specification of Token
A token is defined as a categorized string of characters that represents a basic unit of meaning in the
source code. Each token consists of two main components:
1. Token Type:
o This is a category that identifies the class of the token, such as:
2. Lexeme:
o This is the actual sequence of characters in the source code that corresponds to the
token. For instance, in the expression int count = 0;, the lexeme for the identifier
token would be count, and for the keyword token, it would be int.
Recognition of Token
Token recognition involves several steps, typically implemented using regular expressions and finite
automata. Here’s how it works:
1. Regular Expressions:
o Regular expressions define the patterns for different token types. For example:
▪ Identifiers: [a-zA-Z_][a-zA-Z0-9_]*
▪ Operators: [+\-*/]
▪ Comments: //.*|/\*.*?\*/
2. Finite Automata:
3. Tokenization Process:
o As the lexical analyzer reads the input, it transitions through states in the finite
automaton based on the input characters. When it reaches an accepting state, it
recognizes the corresponding token and stores it in the output.
4. Error Handling:
o If the lexer encounters an input that does not match any token definition, it raises
an error, indicating that the input is not valid.
Summary
In summary, the lexical analyzer serves as the first stage of the compilation process, transforming
raw source code into a structured stream of tokens, which are essential for syntactic analysis. By
defining and recognizing tokens through regular expressions and finite automata, the lexical analyzer
efficiently processes the input while also ensuring that errors are detected early in the compilation
process.
STRING RELATION
In the context of formal languages and automata theory, relations on strings refer to the various
ways in which strings can be compared, combined, or manipulated. These relations can help in
defining languages, parsing strings, and constructing automata. Here’s an overview of several key
relations and operations on strings:
1. Equality Relation
• Definition: Two strings s1s_1s1 and s2s_2s2 are said to be equal if they consist of the same
sequence of characters.
2. Substring Relation
• Definition: A string s1s_1s1 is a substring of s2s_2s2 if s1s_1s1 can be found within s2s_2s2.
• Prefix: A string s1s_1s1 is a prefix of s2s_2s2 if s2s_2s2 can be expressed as s1+s3s_1 + s_3s1
+s3, where s3s_3s3 is another string (which can be empty).
• Suffix: A string s1s_1s1 is a suffix of s2s_2s2 if s2s_2s2 can be expressed as s3+s1s_3 + s_1s3
+s1, where s3s_3s3 is another string.
4. Concatenation Relation
• Definition: The concatenation of two strings s1s_1s1 and s2s_2s2 is the string formed by
appending s2s_2s2 to the end of s1s_1s1.
• Notation: The concatenation is denoted as s1⋅s2s_1 \cdot s_2s1⋅s2 or simply s1s2s_1 s_2s1
s2.
5. Length Relation
• Definition: The length of a string sss is the number of characters in it, denoted as ∣s∣|s|∣s∣.
6. Language Relation
• Definition: A language is a set of strings formed from a specific alphabet. The relation can be
defined based on properties of the strings in the language.
• Example: Let L={s∣s contains an even number of a’s}L = \{ s \mid s \text{ contains an even
number of a's} \}L={s∣s contains an even number of a’s}.
7. Homomorphism Relation
• Definition: A homomorphism is a mapping from one alphabet to another that preserves the
structure of the strings.
• Example: If we define a mapping hhh where h(a)=xh(a) = xh(a)=x and h(b)=yh(b) = yh(b)=y,
then h(ab)=xyh(ab) = xyh(ab)=xy.
8. Equivalence Relation
• Definition: An equivalence relation on strings is a relation that partitions the set of strings
into equivalence classes. Two strings s1s_1s1 and s2s_2s2 are equivalent if they satisfy
certain conditions.
• Example: In the context of regular languages, two strings are equivalent if they cannot be
distinguished by any string in the language.