Spring 2024 Compiler Constructoin A Lab 3-2
Spring 2024 Compiler Constructoin A Lab 3-2
Objective:
This experiment introduces the students to implement the concept of regular expressions in
programming. Also how to extract the regular expressions for a given DFA then do code.
Student Information
Student Name
Student ID
Date
Assessment
Marks Obtained
Remarks
Signature
Lab 03
Instructions
• Come to the lab in time. Students who are late more than 10 minutes, will not be allowed to attend the lab.
• Students have to perform the examples and exercises by themselves.
• Raise your hand if you face any difficulty in understanding and solving the examples or exercises.
• Lab work must be submitted on or before the submission date.
• Do not copy the work of other students otherwise both will get zero marks.
1. Objective
This experiment introduces the students to implement the concept of regular expressions in programming.
Also, how to extract the regular expressions for a given DFA then do code.
2. Labs Descriptions
Just as finite automata are used to recognize patterns of strings, regular expressions are used to generate
patterns of strings. A regular expression is an algebraic formula whose value is a pattern consisting of a set
of strings, called the language of the expression.
• characters from the alphabet over which the regular expression is defined. variables whose values
are any pattern defined by a regular expression. epsilon which denotes the empty string containing
no characters.
• null which denotes the empty set of strings.
2.1.1 Union: If R1 and R2 are regular expressions, then R1 | R2 (also written as R1 U R2 or R1 + R2) is
also a regular expression.
L(R1|R2) = L(R1) U L(R2).
2.1.2 Concatenation: If R1 and R2 are regular expressions, then R1R2 (also written as R1.R2) is also a
regular expression.
L(R1R2) = L(R1) concatenated with L(R2).
2.1.3 Kleene Closure: If R1 is a regular expression, then R1* (the Kleene closure of R1) is also a
regular expression.
L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U …
Examples
1. The set of strings over {0,1} that end in 3 consecutive 1's.
(0 | 1)* 111
0* 1 (0 | 1)*
0* | 0* 1 0*
4. The set of strings over {A..Z,a..z} that contain the word "main".
• For every regular expression R, there is a corresponding FA that accepts the set of strings
generated by R.
• For every FA A there is a corresponding regular expression that generates the set of strings
accepted by A.
i. an algorithm that, given a regular expression R, produces an FA A such that L(A) == L(R). ii.
an algorithm that, given an FA A, produces a regular expression R such that L(R) == L(A).
Our construction of FA from regular expressions will allow "epsilon transitions" (a transition from one
state to another with epsilon as the label). Such a transition is always possible, since epsilon (or the empty
string) can be said to exist between any two input symbols. We can show that such epsilon transitions are
a notational convenience; for every FA with epsilon transitions there is a corresponding FA without them.
If the operand is a character c, then our FA has two states, s0 (the start state) and sF (the final, accepting
state), and a transition from s0 to sF with label c.
If the operand is epsilon, then our FA has two states, s0 (the start state) and sF (the final, accepting state),
and an epsilon transition from s0 to sF.
If the operand is null, then our FA has two states, s0 (the start state) and sF (the final, accepting state), and
no transitions.
Given FA for R1 and R2, we now show how to build an FA for R1R2, R1|R2, and R1*. Let A (with start
state a0 and final state aF) be the machine accepting L(R1) and B (with start state b0 and final state bF) be
the machine accepting L(R2).
The machine C accepting L(R1R2) includes A and B, with start state a0, final state bF, and an epsilon
transition from aF to b0.
The machine C accepting L(R1|R2) includes A and B, with a new start state c0, a new final state cF, and
epsilon transitions from c0 to a0 and b0, and from aF and bF to cF.
The machine C accepting L(R1*) includes A, with a new start state c0, a new final state cF, and epsilon
transitions from c0 to a0 and cF, and from aF to a0, and from aF to cF.
If we can eliminate epsilon transitions from an FA, then our construction of an FA from a regular
expression (which yields an FA with epsilon transitions) can be completed.
Observe that epsilon transitions are similar to nondeterminism in that they offer a choice: an epsilon
transition allows us to stay in a state or move to a new state, regardless of the input symbol.
If starting in state s1, we can reach state s2 via a series of epsilon transitions followed by a transition on
input symbol x, we can replace all of the epsilon transitions with a single transition from s1 to s2 on symbol
x.
2.6 Algorithm for Eliminating Epsilon Transitions
We can build a finite automaton F2 with no epsilon transitions from a finite automaton F1 containing
epsilon transitions as follows:
The states of F2 are all the states of F1 that have an entering transition labeled by some symbol other than
epsilon, plus the start state of F1, which is also the start state of F2.
For each state in F1, determine which other states are reachable via epsilon transitions only. If a state of F1
can reach a final state in F1 via epsilon transitions, then the corresponding state is a final state in F2.
For each pair of states i and j in F2, there is a transition from state i to state j on input x if there exists a
state k that is reachable from state i via epsilon transitions in F1, and there is a transition in F1 from state k
to state j on input x.
To construct a regular expression from a DFA (and thereby complete the proof that regular expressions
and finite automata have the same expressive power), we replace each state in the DFA one by one with a
corresponding regular expression.
Just as we built a small FA for each operator and operand in a regular expression, we will now build a
small regular expression for each state in the DFA.
The basic idea is to eliminate the states of the FA one by one, replacing each state with a regular
expression that generates the portion of the input string that labels the transitions into and out of the state
being eliminated.
We preprocess the FA, turning the labels on transitions into regular expressions. If there is a transition
with label {a,b}, then we replace the label with the regular expression a | b. If there is no transition from a
state to itself, we can add one with the label NULL.
For each accepting state sF in F, eliminate all states in F except the start state s0 and sF.
To eliminate a state sE, consider all pairs of states sA and sB such that there is a transition from sA to sE
with label R1, a transition from sE to sE with label R2 (possibly null, meaning no transition), and a
transition from sE to sB with label R3. Introduce a transition from sA to sB with label R1R2*R3. If there
is already a transition from sA to sB with label R4, then replace that label with R4|R1R2*R3. After
eliminating all states except s0 and sF:
If s0 == sF, then the resulting regular expression is R1*, where R is the label on the transition from s0 to
s0.
If s0 != sF, then assume the transition from s0 to s0 is labeled R1, the transition from s0 to sF is labeled
R2, the transition from sF to sF is labeled R3, and the transition from sF to s0 is labeled R4. The resulting
regular expression is R1*R2(R3 | R4R1*R2)*
Let RFi be the regular expression produced by eliminating all the states except s0 and sFi. If there are n
final states in the DFA, then the regular expression that generates the strings accepted by the original DFA
is RF1 | RF2 | ... RFn.
The regular expressions library provides a class that represents regular expressions, which are a kind of
mini-language used to perform pattern matching within strings. Almost all operations with regexes can be
characterized by operating on several of the following objects:
The character sequence that is searched for a pattern. This may be a range specified by two iterators, a null
terminated character string or a std::string.
2.9.2 Pattern:
This is the regular expression itself. It determines what constitutes a match. It is an object of type
std::basic_regex, constructed from a string with special syntax. See regex_constants::syntax_option_type
for the description of supported syntax variations.
This is a string that determines how to replace the matches, see regex_constants::match_flag_type for the
description of supported syntax variations.
2.10 Algorithms
These functions are used to apply the regular expression encapsulated in a regex to a target sequence of
characters.
2.11 Iterators
The regex iterators are used to traverse the entire set of regular expression matches found within a
sequence.
regex_iterator (C++11) iterates through all regex matches within a character sequence
(class template)
regex_token_iterator (C++11) iterates through the specified sub-expressions within all regex
matches in a given string or through unmatched substrings (class
template)
2.12 Exceptions
This class defines the type of objects thrown as exceptions to report errors from the regular expressions
library.
2.13 Traits
The regex traits class is used to encapsulate the localizable aspects of a regex.
regex_traits (C++11) provides metainformation about a character type, required by the regex
library
(class template)
2.14 Constants
Program 3.1: Program to use regex() and regex_match() to match different patterns input by user.
Program 3.2: Program to use regex_replace() for finding and replacing a word.
4. Lab tasks
Task 1 Construct the automata machines for any five of the following given conditions of regular
expressions:
Task 2
Write a program in C++ to replace a word in a sentence using regular expression method.
Note: The String (Sentence) will be asked by the user and a word which need to find, its first character
will also be asked by user at runtime.
5. Homework Tasks
2. Use JFLAP to prove that the above regular expressions are valid.
3. For the following FA’s find the regular expression and use regex to code.
a.
b.