Anab Batool Kazmi
Anab Batool Kazmi
• The word “Automata” is plural of “automaton” which simply means any machine.
Contd.
BATTERY
input: switch
output: light bulb
actions: flip switch
states: on, off
A simple “computer”
H
W ITC
S
input: switch
output: light bulb bulb is on if and only if
there was an odd number of
actions: f for “flip switch” flips
states: on, off
Important Concepts in Automata
Introduction to Language
• Automata deals with language.
• Language is combination of
• Alphabets (∑)
• Grammar/Rules
Types of Languages
There are two types of languages
• Formal Language
• A language which can be represented mathematically.
• Informal Language
• A language which cannot be represented mathematically.
Symbol/Letter
• A symbol is a building block for a string.
• Symbols cannot be sub-divided, they are the atoms of everything we
build.
• In the theory of formal languages they are usually called letters.
Examples: a, A, 0, 1, %, @.
• We use letters from the end of the Roman alphabet, such as x, y, z to
refer to an arbitrary symbol.
Alphabet (∑)
• A finite non-empty set of symbols (called letters), is called an
alphabet.
• We use capital Greek letters, typically Σ ( Greek letter sigma), to refer
to an arbitrary alphabet.
• Example:
• Σ= {A,B,C,…,Z,a,b,c,…,z}
• Σ={0,1}
• Σ={a,b,c}
• Σ={ab,cd}
• Σ={for,int,while}
Strings
• Concatenation of zero or more finite number of letters/symbols from
the alphabet is called a string.
• Note that every letter can be viewed as a one-letter string.
• We use letters s, t, u to refer to an arbitrary string.
• Example:
• Σ={a,b,c} then strings are
• a, ab, aabbcc, abc, aaaabbbababa etc
Empty String/ Null String
denoted by
29
Power of an alphabet
• ∑={0,1}
• ∑0 ={Λ} //strings of length 0
• ∑1 ={0,1} // strings of length 1
• ∑2 ={00,01,10,11} //strings of length 2
• ∑3 ={000,001,010,011, 100,101,110,111} //strings of length 3
String Tokenization
• Tokenization is the act of breaking up a sequence of strings into piece
such as words, keywords, phrases, symbols and other elements
called token.
Example:
• If Σ={a,b,c}
• s= ababc
• Tokenization = (a)(b)(a)(b)(c)
• Σ={ab,cd}
• s=abababcd
• Tokenization= (ab)(ab)(ab)(cd)
Valid/Invalid Alphabet
• While defining an alphabet of letters consisting of more than one
symbols, no letter should be started with the letter of the same
alphabet i.e. one letter should not be the prefix of another. However,
a letter may be ended in a letter of same alphabet.
• Σ={for,int}
• s= intforint
• Tokenization = (int)(for)(int)
• |s|= 3
Reverse of String
• The reverse of a string s denoted by rev(s) or sr, is obtained by writing
the letters of s in reverse order.
• First do tokenization then reverse the string.
Example
if s=abababc is a string defined over Σ={ab,c} then
Tokenization=(ab)(ab)(ab)(c)
|s|=4
rev(s)=cababab
Language
• A language is a collection of words, which we think of as a set.
• Examples
• {ε}
• ∅
• {ab,abc,aba}
• { | n ∈N}.
• We use letters such as L, L1 and L1 to refer to an arbitrary language.
Types of Languages
• Single Letter Language
• Language whose alphabet is having only one letter.
• Σ={a}
• Σ={b}
• Σ={Λ}
• Finite Language
• Which generates countable strings
• Infinite Language
• Which generates infinite strings
Defining new languages from old ones
• Union: Since languages are just sets we can form their unions.
• Intersection: Since languages are merely sets we can form their
intersections.
• Set difference: If L1 and L2 are languages we can form
• The language L of strings that does not start with a, defined over
Σ ={a,b,c}, can be written as
• L ={Λ, b, c, ba, bb, bc, ca, cb, cc, baa,bab,bac,bba…}
Examples cont.
• The language L of strings of length 2, defined over Σ ={0,1,2}, can be
written as
• L={00, 01, 02,10, 11,12,20,21,22}
• The language EVEN-EVEN, of strings with even number of a’s and even
number of b’s, defined over Σ={a,b}, can be written as
• EVEN-EVEN={Λ, aa, bb, aaaa,aabb,abab, abba, baab, baba, bbaa, bbbb,…}
• Even= {2,4,6,..}
Examples cont.
• The language INTEGER, of strings defined over
Σ={-,0,1,2,3,4,5,6,7,8,9}, can be written as
• INTEGER = {…,-2,-1,0,1,2,…}
• Example:
For Σ={a,b}, PALINDROME={Λ , a, b, aa, bb, aaa, aba, bab, bbb, ...}
• Language called vowels , contain all the vowels
• L={a,e,I,o,u}
• Language containg loop keywords of c++
• L={for, while, do while}
• Language defined over Σ={a,b,c} where length of strings is less then equals to
2.
• L={^, a,b,c, aa,ab,ac, ba,bb,bc, ca,cb,cc}
• Language defined over Σ={a,b,c} where length of strings is greater then equals
to 2.
• L={aa,ab,ac, ba,bb,bc, ca,cb,cc, aaa,aab,aac,…}