0% found this document useful (0 votes)
15 views16 pages

Spring 2024 Compiler Constructoin A Lab 3-2

Uploaded by

neha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views16 pages

Spring 2024 Compiler Constructoin A Lab 3-2

Uploaded by

neha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Usman Institute of Technology

Department of Computer Science


Course Code: CS412
Course Title: Compiler Construction SPRING 2024
Lab 03

Objective:

This experiment introduces the students to implement the concept of regular expressions in
programming. Also how to extract the regular expressions for a given DFA then do code.

Student Information

Student Name

Student ID

Date

Assessment

Marks Obtained

Remarks

Signature

Usman Institute of Technology


Department of Computer Science
CS412 – Compiler Construction
Lab 03: Design a code for regular expression to DFA and vice versa

Lab 03

Instructions

• Come to the lab in time. Students who are late more than 10 minutes, will not be allowed to attend the lab.
• Students have to perform the examples and exercises by themselves.
• Raise your hand if you face any difficulty in understanding and solving the examples or exercises.
• Lab work must be submitted on or before the submission date.
• Do not copy the work of other students otherwise both will get zero marks.

1. Objective

This experiment introduces the students to implement the concept of regular expressions in programming.
Also, how to extract the regular expressions for a given DFA then do code.

2. Labs Descriptions

2.1 What is Regular Expression?

Just as finite automata are used to recognize patterns of strings, regular expressions are used to generate
patterns of strings. A regular expression is an algebraic formula whose value is a pattern consisting of a set
of strings, called the language of the expression.

Operands in a regular expression can be:

• characters from the alphabet over which the regular expression is defined. variables whose values
are any pattern defined by a regular expression. epsilon which denotes the empty string containing
no characters.
• null which denotes the empty set of strings.

Operators used in regular expressions include:

2.1.1 Union: If R1 and R2 are regular expressions, then R1 | R2 (also written as R1 U R2 or R1 + R2) is
also a regular expression.
L(R1|R2) = L(R1) U L(R2).

2.1.2 Concatenation: If R1 and R2 are regular expressions, then R1R2 (also written as R1.R2) is also a
regular expression.
L(R1R2) = L(R1) concatenated with L(R2).

2.1.3 Kleene Closure: If R1 is a regular expression, then R1* (the Kleene closure of R1) is also a
regular expression.
L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U …

Mr. Zulfiqar Ali SPRING 2024 Page 2 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

Closure has the highest precedence, followed by concatenation, followed by union.

Examples
1. The set of strings over {0,1} that end in 3 consecutive 1's.

(0 | 1)* 111

2. The set of strings over {0,1} that have at least one 1.

0* 1 (0 | 1)*

3. The set of strings over {0,1} that have at most one 1.

0* | 0* 1 0*

4. The set of strings over {A..Z,a..z} that contain the word "main".

Let <letter> = A | B | ... | Z | a | b | ... | z


<letter>* main <letter>*

5. The set of strings over {A..Z,a..z} that contain 3 x's.


<letter>* x <letter>* x <letter>* x <letter>*

6. The set of identifiers in Pascal.

Let <letter> = A | B | ... | Z | a | b | ... | z


Let <digit> = 0 | 1 | 2 | 3 ... | 9

<letter> (<letter> | <digit>)*

7. The set of real numbers in Pascal.

Let <digit> = 0 | 1 | 2 | 3 ... | 9


Let <exp> = 'E' <sign> <digit> <digit>* | epsilon
Let <sign> = '+' | '-' | epsilon
Let <decimal> = '.' <digit> <digit>* | epsilon

<digit> <digit>* <decimal> <exp>

2.2 Unix Operator Extensions


Regular expressions are used frequently in Unix:

• In the command line


• Within text editors
• In the context of pattern matching programs such as grep and egrep

Mr. Zulfiqar Ali SPRING 2024 Page 3 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

• To facilitate construction of regular expressions, Unix recognizes additional operators. These


operators can be defined in terms of the operators given above; they represent a notational
convenience only.
• character classes: '[' <list of chars> ']'

• start of a line: '^' end of a line: '$'


• wildcard matching any character except newline: '.' optional instance: R? = epsilon | R
• one or more instances: R+ == RR*

2.3 Equivalence of Regular Expressions and Finite Automata

Regular expressions and finite automata have equivalent expressive power:

• For every regular expression R, there is a corresponding FA that accepts the set of strings
generated by R.
• For every FA A there is a corresponding regular expression that generates the set of strings
accepted by A.

The proof is in two parts:

i. an algorithm that, given a regular expression R, produces an FA A such that L(A) == L(R). ii.
an algorithm that, given an FA A, produces a regular expression R such that L(R) == L(A).

Our construction of FA from regular expressions will allow "epsilon transitions" (a transition from one
state to another with epsilon as the label). Such a transition is always possible, since epsilon (or the empty
string) can be said to exist between any two input symbols. We can show that such epsilon transitions are
a notational convenience; for every FA with epsilon transitions there is a corresponding FA without them.

2.4 Constructing an FA from a RE

We begin by showing how to construct an FA for the operands in a regular expression.

If the operand is a character c, then our FA has two states, s0 (the start state) and sF (the final, accepting
state), and a transition from s0 to sF with label c.

If the operand is epsilon, then our FA has two states, s0 (the start state) and sF (the final, accepting state),
and an epsilon transition from s0 to sF.

If the operand is null, then our FA has two states, s0 (the start state) and sF (the final, accepting state), and
no transitions.

Given FA for R1 and R2, we now show how to build an FA for R1R2, R1|R2, and R1*. Let A (with start
state a0 and final state aF) be the machine accepting L(R1) and B (with start state b0 and final state bF) be
the machine accepting L(R2).

The machine C accepting L(R1R2) includes A and B, with start state a0, final state bF, and an epsilon
transition from aF to b0.

Mr. Zulfiqar Ali SPRING 2024 Page 4 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

The machine C accepting L(R1|R2) includes A and B, with a new start state c0, a new final state cF, and
epsilon transitions from c0 to a0 and b0, and from aF and bF to cF.

The machine C accepting L(R1*) includes A, with a new start state c0, a new final state cF, and epsilon
transitions from c0 to a0 and cF, and from aF to a0, and from aF to cF.

2.5 Eliminating Epsilon Transitions

If we can eliminate epsilon transitions from an FA, then our construction of an FA from a regular
expression (which yields an FA with epsilon transitions) can be completed.

Observe that epsilon transitions are similar to nondeterminism in that they offer a choice: an epsilon
transition allows us to stay in a state or move to a new state, regardless of the input symbol.

If starting in state s1, we can reach state s2 via a series of epsilon transitions followed by a transition on
input symbol x, we can replace all of the epsilon transitions with a single transition from s1 to s2 on symbol
x.
2.6 Algorithm for Eliminating Epsilon Transitions

We can build a finite automaton F2 with no epsilon transitions from a finite automaton F1 containing
epsilon transitions as follows:

The states of F2 are all the states of F1 that have an entering transition labeled by some symbol other than
epsilon, plus the start state of F1, which is also the start state of F2.

For each state in F1, determine which other states are reachable via epsilon transitions only. If a state of F1
can reach a final state in F1 via epsilon transitions, then the corresponding state is a final state in F2.

For each pair of states i and j in F2, there is a transition from state i to state j on input x if there exists a
state k that is reachable from state i via epsilon transitions in F1, and there is a transition in F1 from state k
to state j on input x.

2.7 Constructing a RE from FA

To construct a regular expression from a DFA (and thereby complete the proof that regular expressions
and finite automata have the same expressive power), we replace each state in the DFA one by one with a
corresponding regular expression.

Just as we built a small FA for each operator and operand in a regular expression, we will now build a
small regular expression for each state in the DFA.

The basic idea is to eliminate the states of the FA one by one, replacing each state with a regular
expression that generates the portion of the input string that labels the transitions into and out of the state
being eliminated.

2.8 Algorithm for Constructing RE from FA

Mr. Zulfiqar Ali SPRING 2024 Page 5 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

Given a DFA F we construct a regular expression R such that


L(F) == L(R).

We preprocess the FA, turning the labels on transitions into regular expressions. If there is a transition
with label {a,b}, then we replace the label with the regular expression a | b. If there is no transition from a
state to itself, we can add one with the label NULL.

For each accepting state sF in F, eliminate all states in F except the start state s0 and sF.

To eliminate a state sE, consider all pairs of states sA and sB such that there is a transition from sA to sE
with label R1, a transition from sE to sE with label R2 (possibly null, meaning no transition), and a
transition from sE to sB with label R3. Introduce a transition from sA to sB with label R1R2*R3. If there
is already a transition from sA to sB with label R4, then replace that label with R4|R1R2*R3. After
eliminating all states except s0 and sF:

If s0 == sF, then the resulting regular expression is R1*, where R is the label on the transition from s0 to
s0.

If s0 != sF, then assume the transition from s0 to s0 is labeled R1, the transition from s0 to sF is labeled
R2, the transition from sF to sF is labeled R3, and the transition from sF to s0 is labeled R4. The resulting
regular expression is R1*R2(R3 | R4R1*R2)*

Let RFi be the regular expression produced by eliminating all the states except s0 and sFi. If there are n
final states in the DFA, then the regular expression that generates the strings accepted by the original DFA
is RF1 | RF2 | ... RFn.

2.9 Regular Expressions Library (since C++11)

The regular expressions library provides a class that represents regular expressions, which are a kind of
mini-language used to perform pattern matching within strings. Almost all operations with regexes can be
characterized by operating on several of the following objects:

2.9.1 Target Sequence:

The character sequence that is searched for a pattern. This may be a range specified by two iterators, a null
terminated character string or a std::string.

2.9.2 Pattern:

This is the regular expression itself. It determines what constitutes a match. It is an object of type
std::basic_regex, constructed from a string with special syntax. See regex_constants::syntax_option_type
for the description of supported syntax variations.

Mr. Zulfiqar Ali SPRING 2024 Page 6 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

2.9.3 Matched Array:

The information about matches may be retrieved as an object of type std::match_results.

2.9.4 Replacement String:

This is a string that determines how to replace the matches, see regex_constants::match_flag_type for the
description of supported syntax variations.

2.9.5 Main Classes


These classes encapsulate a regular expression and the results of matching a regular expression within a
target sequence of characters.

basic_regex (C++11) regular expression object


(class template)
sub_match (C++11) identifies the sequence of characters matched by a sub-expression
(class template)
match_results (C++11) identifies one regular expression match, including all sub-expression
matches
(class template)

Mr. Zulfiqar Ali SPRING 2024 Page 7 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

2.10 Algorithms

These functions are used to apply the regular expression encapsulated in a regex to a target sequence of
characters.

regex_match (C++11) attempts to match a regular expression to an entire character sequence


(function template)
regex_search (C++11) attempts to match a regular expression to any part of a character sequence
(function template)
regex_replace (C++11) replaces occurrences of a regular expression with formatted replacement
text (function template)

2.11 Iterators

The regex iterators are used to traverse the entire set of regular expression matches found within a
sequence.

regex_iterator (C++11) iterates through all regex matches within a character sequence
(class template)
regex_token_iterator (C++11) iterates through the specified sub-expressions within all regex
matches in a given string or through unmatched substrings (class
template)

2.12 Exceptions

This class defines the type of objects thrown as exceptions to report errors from the regular expressions
library.

regex_error (C++11) reports errors generated by the regular expressions library


(class)

2.13 Traits

The regex traits class is used to encapsulate the localizable aspects of a regex.

regex_traits (C++11) provides metainformation about a character type, required by the regex
library
(class template)

2.14 Constants

Defined in namespace std::regex_constants

syntax_option_type (C++11) general options controlling regex behavior


(typedef)

Mr. Zulfiqar Ali SPRING 2024 Page 8 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

match_flag_type (C++11) options specific to matching


(typedef)
error_type (C++11) describes different types of matching errors
(typedef)

Mr. Zulfiqar Ali SPRING 2024 Page 9 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

3. Lab Programming Practice

Program 3.1: Program to use regex() and regex_match() to match different patterns input by user.

Mr. Zulfiqar Ali SPRING 2024 Page 10 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

Program 3.2: Program to use regex_replace() for finding and replacing a word.

Mr. Zulfiqar Ali SPRING 2024 Page 11 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

Program 3.3: Program to use regex_match() to match words

Mr. Zulfiqar Ali SPRING 2024 Page 12 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

Program 3.4: Program to use regex_search() to search a word with in a sentence.

Mr. Zulfiqar Ali SPRING 2024 Page 13 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

4. Lab tasks

Task 1 Construct the automata machines for any five of the following given conditions of regular
expressions:

Task 2
Write a program in C++ to replace a word in a sentence using regular expression method.
Note: The String (Sentence) will be asked by the user and a word which need to find, its first character
will also be asked by user at runtime.

Mr. Zulfiqar Ali SPRING 2024 Page 14 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

5. Homework Tasks

1. Design any ten of the following regular expressions:

Mr. Zulfiqar Ali SPRING 2024 Page 15 of 16


Lab 03: Design a code for regular expression to DFA and vice versa

2. Use JFLAP to prove that the above regular expressions are valid.

3. For the following FA’s find the regular expression and use regex to code.

a.

b.

Mr. Zulfiqar Ali SPRING 2024 Page 16 of 16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy