0% found this document useful (0 votes)
7 views20 pages

PP Handout 4

This document provides an overview of exception handling in Python, detailing the types of errors (syntax and runtime) and how to manage them using try, except, else, and finally blocks. It also explains built-in and user-defined exceptions, along with examples of raising exceptions and creating custom exception classes. Additionally, the document introduces regular expressions in Python, their applications, and various regex methods for string manipulation.

Uploaded by

nainalashalini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views20 pages

PP Handout 4

This document provides an overview of exception handling in Python, detailing the types of errors (syntax and runtime) and how to manage them using try, except, else, and finally blocks. It also explains built-in and user-defined exceptions, along with examples of raising exceptions and creating custom exception classes. Additionally, the document introduces regular expressions in Python, their applications, and various regex methods for string manipulation.

Uploaded by

nainalashalini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

PYTHON PROGRAMMING MODULE -IV Handout - IV

EXCEPTION HANDLING
The errors in any language can be of two types
1. Syntax Errors:-
2. Runtime Errors
i.e. An error can be a syntax error or an exception.

 Syntax Errors: The errors which occur due to invalid syntax are called syntax errors. Syntax
errors are detected when we have not followed the rules of the particular programming language
while writing a program. These errors are also known as parsing errors. On encountering a
syntax error, the interpreter does not execute the program unless we rectify the errors,

Example:1
x=10
if x==10 ( : Colon is missing here , which will cause error)
print("Hello")

SyntaxError: invalid syntax

Example:1
print "Hello" ( Parenthesis is missing here , which will cause error)

SyntaxError: Missing parentheses in call to 'print'

Note: The programs with syntax errors cannot be executed.

 Runtime Errors:
It is also known as exception. An exception is an error that happens during the execution of a program.

Example:
print(10/0) # This will cause ZeroDivisionError: division by zero
print(10/"a") # This will cause TypeError: unsupported operand type(s) for / : 'int' and 'str'
Output:
Traceback (most recent call last):
File "c:\Users\hp\ex24.py", line 23, in <module>
print(10/0)
ZeroDivisionError: division by zero

What is Exception?
An exception is an unexpected even that disturbs normal flow of execution. Even if a statement or
expression is syntactically correct, there might arise an error during its execution. For example, trying to
open a file that does not exist, division by zero and so on.

Some example exceptions


o ZeroDivisionError
o TypeError
o ValueError
o FileNotFoundError
o EOFError
o TyrePuncturedError

Compiled by G Sreenivasulu, Associate Professor of CSE, JBIET Page 0


Note: It is important to handle exceptions and see that program terminates gracefully. Exception
handling will not remove exception but it ensures the program continues with some alternative
execution.

There are two types of Python exceptions:

1. Built-in exceptions: They are also called pre-defined exceptions. Python‘s standard library is an
extensive collection of built-in exceptions that deals with the commonly occurring errors
(exceptions) by providing the standardized solutions for such errors.

On the occurrence of any built-in exception, the appropriate exception handler code is executed
which displays the reason along with the raised exception name

2. User-defined exceptions: User-defined exceptions are custom exceptions created by programmers.


They enable you to enforce restrictions or consequences on certain functions, values, or variables.

Handling Exceptions using pre-defined exception handling

Exception handling allows the program to continue running even if an error occurs. It is an alternative
way to continue with program execution normally. This will help in avoiding abnormal termination of
the program.

Exception handling in Python is achieved using the try, except, else and finally blocks.

try:
# code that may cause exception
The try block contains the code that can have an error. If an error occurs in the try block the script will
continue to the except blocks. If no errors were given it will continue to the else block, or finally block if no
else block is present.
except <error type>:
# executed if the try block throws an error. This block can handle specific error like
(ZeroDivisionError , AssertionError etc), or it can catch all errors when left blank.
else:
# This block executes, if try block executes successfully without errors
If the try block runs without errors the else block will be triggered. If there is an error, no matter
if it is caught or not, the else block will be skipped
finally:
# This block is always executed
After the try block, except, and else, the finally block will always run. Even if the error in the try block was
not caught or the error occurred in an except or else block.

Example: try...except blocks


try:
print(x)
except:
print("An exception occurred")
Note: The try block will generate an exception, because x is not defined.Since the try block raises an error, the
except block will be executed.

Multiple except blocks


You can define as many exception blocks as you want, e.g. if you want to execute a special block of code for a
special kind of error:

try:

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 1


x=int(input("Enter First Number: "))
y=int(input("Enter Second Number: "))
print(x/y)
except ZeroDivisionError :
print("Can't Divide with Zero")
except ValueError:
print("please provide int value only")

Note: When multiple except blocks are used then the order of these except blocks is important. Python interpreter
will always consider from top to bottom until matched except block identified.

Handle multiple exceptions in single except block

We can write a single except block that can handle multiple different types of exceptions.
except (Exception1,Exception2,exception3,..):
(or)
except (Exception1,Exception2,exception3,..) as msg :

Example:
try:
x=int(input("Enter First Number: "))
y=int(input("Enter Second Number: "))
print(x/y)
except (ZeroDivisionError,ValueError) as msg:
print("Plz Provide valid numbers only and problem is: ",msg)

Raising an Exception: User-Defined exception handling

There can be more such cases where the values entered by the user or some piece of code are considered
invalid for our program. In those cases, we can manually raise an exception.

We can use raise to throw an exception if a condition occurs. The statement can be complemented with
a custom exception.

The following example explains how to throw an error, using raise, when a certain condition occurs

Example : Raise error if entered number is not positive integer


try:
roll = int(input("Please enter your roll number"))
if roll <= 0:
raise ValueError()

except ValueError:
print("ValueError Exception thrown")

Example: Raise a TypeError if x is not an integer:


x = "hello"
if not type(x) is int:
raise TypeError("Only integers are allowed")

In the example above, we are taking the input from the user inside try clause. Then we are checking if
the number entered by the user is non-positive. If it is, then we are manually raising the ValueError
exception by writing raise ValueError(). We are also handling this exception using an except clause. We
Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 2
chose to raise ValueError because normally ValueError exceptions are raised when some value is
incorrect. Though, we can raise any other type of exception as well if the entered value is not positive, it
is advisable to choose the exception type that best matches the reason you are raising it.

It should be noted that all classes are subclasses of the object class. The object class is a built-in class in
Python. Similarly, all the exception classes are direct or indirect subclasses of the built-in Exception
class. Thus, exceptions like IndexError, TypeError, ValueError, etc are subclasses of the Exception
class.

Creating our own exception class

Example1:
class MyCustomError(Exception):
pass

roll = int(input("Please enter your roll number"))


if roll <= 0:
raise MyCustomError("The number entered is not positive")

MyCustomError which inherits the Exception class and raised this exception using the raise keyword by
writing raise MyCustomError.

Example 2:
#defining exceptions

class Error(Exception):
"""Base class for all exceptions"""
pass

class PasswordSmallError(Error):
"""Raised when the input password is small"""
pass

class PasswordLargeError(Error):
"""Raised when the input password is large"""
pass

try:
password = input("Enter a password")

if len(password) < 6:
raise PasswordSmallError("Password is short!")

if len(password) > 15:


raise PasswordLargeError("Password is long!")

except PasswordSmallError as ps:


print(ps)

except PasswordLargeError as pl:


print(pl)

How to print Exception Information to the Console

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 3


try:
res = 190 / 0
except ZeroDivisionError as e:
# handle the exception
print("An exception occurred:", e) # An exception occurred: division by zero

How to Print the Exception Name


What if you want to get the exact exception name and print it to the terminal? That‘s possible too. All
you need to do is use the type() function to get the type of the exception and then use the __name__
attribute to get the name of the exception.

try:
res = 190 / 0
except Exception as error:
# handle the exception
print("An exception occurred:", type(error).__name__) # An exception occurred: ZeroDivisionError

try:
print("Here's variable x:", x)
except Exception as error:
print("An error occurred:", type(error).__name__, "–", error) # An error occurred: NameError – name 'x' is
not defined

try:
res = 190 / 0
except Exception as error:
# handle the exception
print("An exception occurred:", type(error).__name__, "–", error) # An exception occurred:
ZeroDivisionError – division by zero

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 4


REGULAR EXPRESSION IN PYTHON (ReGex in Python)

A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern. RegEx can be
used to check if a string contains the specified search pattern. Python has a built-in package called re,
which can be used to work with Regular Expressions.

The Regex or Regular Expression is a way to define a pattern for searching or manipulating strings. We
can use a regular expression to match, search, replace, and manipulate inside textual data.

Application areas of Regular Expressions


1. To develop validation frameworks/validation logic.
2. To develop Pattern matching applications (ctrl-f in windows, grep in UNIX, etc)
3. To develop Translators like compilers, interpreters, etc
4. To develop digital circuits
5. To develop communication protocols like TCP/IP, UDP, etc

Python regex methods


The Python regex module consists of multiple methods. Below is the list of regex methods and their
meaning.

 match()
 fullmatch()
 search()
 findall()
 finditer()
 sub()
 subn()
 split()
 compile()

match():
We can use the match function to check the given pattern at the beginning of the target string. If the
match is available then we will get a Match object, otherwise we will get None.

import re
s=input("Enter pattern to check: ")
m=re.match(s, "abcabdefg")
if m != None:
print("Match is available at the beginning of the String")
print("Start Index:", m.start(), "and End Index:", m.end())
else:
print("Match is not available at the beginning of the String")

Output:
Enter pattern to check: abc
Match is available at the beginning of the String
Start Index: 0 and End Index:

Output:
Enter pattern to check: bde
Match is not available at the beginning of the String

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 5


fullmatch():
We can use fullmatch() function to match a pattern to all of target string i.e the complete string should
be matched according to the given pattern. If complete string matched, then this function returns Match
object otherwise it returns None.

import re
s=input("Enter pattern to check: ")
m=re.fullmatch(s, "ababab")
if m != None :
print("Full String Matched")
else :
print("Full String not Matched")

Output
Enter pattern to check: ab
Full String not Matched

Output
Enter pattern to check: ababab
Full String Matched

search():
We can use the search() function to search the given pattern in the target string. If the match is
available, then it returns the Match object which represents the first occurrence of the match. If the
match is not available, then it returns None

import re
s=input("Enter pattern to check: ")
m=re.search(s, "abaaaba")
if m != None:
print("Match is available")
print("First Occurrence of match with start index:",m.start(),"and end index:", m.end())
else:
print("Match is not available")

Output
Enter pattern to check: aaa
Match is available
First Occurrence of match with start index: 2 and end index: 5

Output
Enter pattern to check: bbb
Match is not available

findall():
To find all occurrences of the match. This function returns a list object which contains all occurrences.

import re

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 6


l=re.findall("[0-9]","a7b9c5kz")
print(l)

Output
['7', '9', '5']

finditer():
Returns the iterator yielding a match object for each match. On each match object we can call start(),
end() and group() functions. Many examples have been covered with this method at the starting of the
chapter.

import re
itr=re.finditer("[a-z]","a7b9c5k8z")
for m in itr:
print(m.start(),"...",m.end(),"...", m.group())

Output
0 ... 1 ... a
2 ... 3 ... b
4 ... 5 ... c
6 ... 7 ... k
8 ... 9 ... z
sub():
sub means substitution or replacement
re.sub(regex, replacement, targetstring)
In the target string every matched pattern will be replaced with provided replacement.

import re
s=re.sub("[a-z]","#","a7b9c5k8z")
print(s)

Output
#7#9#5#8#

subn():
It is exactly the same as sub except it can also return the number of replacements. This function returns a
tuple where the first element is the result string and second element is the number of replacements.

(resultstring, number of replacements)

import re
t=re.subn("[a-z]","#","a7b9c5k8z")
print(t)
print("The Result String:", t[0])
print("The number of replacements:", t[1])

Output
('#7#9#5#8#', 5)

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 7


The Result String: #7#9#5#8#
The number of replacements: 5

split():
If we want to split the given target string according to a particular pattern then we should go for the
split() function. This function returns a list of all tokens.

import re
l=re.split("," , " sunny,bunny,chinny,vinny,jinny")
print(l)
for t in l:
print(t)

Output
sunny
bunny
chinny
vinny
jinny

^ symbol:
We can use ^ symbol to check whether the given target string starts with our provided pattern or not. If
the target string starts with Learn then it will return Match object, otherwise returns None.

Syntax
res=re.search(“^Learn”, s)

import re
s="Learning Python is Very Easy"
res=re.search("^Learn", s)
if res !=None:
print("Target String starts with Learn")
else:
print("Target String Not starts with Learn")

Output:
Target String starts with Learn

If we want to ignore case then we have to pass 3rd argument re.IGNORECASE for search() function.
res = re.search(“easy$”, s, re.IGNORECASE)

import re
s="Learning Python is Very Easy"
res=re.search("easy$", s, re.IGNORECASE)
if res !=None:
print("Target String ends with Easy by ignoring case")
else:
print("Target String Not ends with Easy by ignoring case")

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 8


Output:
Target String ends with Easy by ignoring case

SAMPLE PROGRAMS
Let‘s consider the following requirement to create a regex object.

PROGRAM 1:
1. The allowed characters are a-z, A-Z, 0-9,#
2. The first character should be a lower-case alphabet symbol from a to k
3. The second character should be a digit divisible by 3.
4. The length of the identifier should be at least 2.

Regular expression: [a-k][0369][a-zA-Z0-9#]*.

Write a python program to check whether the given string is following above rules or not?

import re
s=input("Enter string:")
m=re.fullmatch("[a-k][0369][a-zA-Z0-9#]*",s)
if m!= None:
print(s, "Entered regular expression is matched")
else:
print(s, " Entered regular expression is not matched ")

PROGRAM 2:
Write a Regular Expression to represent all 10 digit mobile numbers

Rules:
1. Every number should contain exactly 10 digits
2. The first digit should be 7 or 8 or 9

Regular Expressions:
[7-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]
or
[7-9][0-9]{9}
or
[7-9]\d{9}

import re
n=input("Enter number:")
m=re.fullmatch("[7-9]\d{9}",n)
if m!=None:
print("Valid Mobile Number")
else:
print("Please enter valid Mobile Number")

PROGRAM 3:

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 9


Write a python program to extract all mobile numbers present in input.txt where numbers are mixed
with normal text data

import re
f1=open("input.txt", "r")
f2=open("output.txt", "w")
for line in f1:
items = re.findall("[7-9]\d{9}",line)
for n in items:
f2.write(n+"\n")
print("Extracted all Mobile Numbers into output.txt")
f1.close()
f2.close()

PROGRAM 4:
Write a Python Program to check whether the given mail id is valid gmail id or not?
import re
s=input("Enter Mail id:")
m=re.fullmatch("\w[a-zA-Z0-9_.]*@gmail[.]com", s)
if m!=None:
print("Valid Mail Id")
else:
print("Invalid Mail id")

PROGRAM 5
Write a python program to check whether given car registration number is valid Telangana State
Registration number or not?

import re
s=input("Enter Vehicle Registration Number:")
m=re.fullmatch("TS[012][0-9][A-Z]{2}\d{4}",s)
if m!=None:
print("Valid Vehicle Registration Number");
else:
print("Invalid Vehicle Registration Number")

Write a python program to check whether given PAN number is valid or not?
The valid PAN Card number must satisfy the following conditions:
1. It should be ten characters long.
2. The first five characters should be any upper case alphabets.
3. The next four-characters should be any number from 0 to 9.
4. The last(tenth) character should be any upper case alphabet.
5. It should not contain any white spaces.

The regular expression used is given below


regex = "[A-Z]{5}[0-9]{4}[A-Z]{1}";

Where:
[A-Z]{5} represents the first five upper case alphabets which can be A to Z.
[0-9]{4} represents the four numbers which can be 0-9.
[A-Z]{1} represents the one upper case alphabet which can be A to Z.
Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 10
^ –> Identifies the beginning of a string.
$ –> Recognizes end of the string.
? –> This resembles zero or one occurrence.

import re

#if re.search("^([a-z?A-Z?0-9]){5}([a-z?A-Z?0-9]){4}([a-z?A-Z?0-9]){1}?$", "DWKPK3344E"): (or)

if re.fullmatch("[A-Z]{5}[0-9]{4}[A-Z]{1}", "DWKPK3344E"):
print("Valid PAN Number ")
else:
print("Not a Valid PAN Number ")

Character Classes
With Character Classes or Character Sets you can tell the regex engine to match only one out of several
characters. Simply place the characters you want to match between square brackets. If you want to
match an a or an e, use [ae]. You could use this in gr[ae]y to match either gray or grey. A character class
matches only a single character. gr[ae]y does not match graay, gra ey or any such thing.

Negated Character Classes


Typing a caret after the opening square bracket negates the character class. The result is that the
character class matches any character that is not in the character class.

Inside Character Classes


The only special characters or metacharacters inside a character class are the closing bracket ], the
backslash \, the caret ^, and the hyphen -

Repeating Characters or Quantifiers in Python:


A quantifier metacharacter immediately follows a portion of a <regex> and indicates how many times
that portion must occur for the match to succeed.
We can use quantifiers to specify the number of occurrences to match.
A quantifier has the form {m,n} where m and n are the minimum and maximum times the expression to
which the quantifier applies must match. We can use quantifiers to specify the number of occurrences to
match.

A Match regular expression Exactly one 'A'


A? Match regular expression A zero or one times
A* Match regular expression A zero or more times
A+ Match regular expression A one or more times
A{m} Match regular expression A exactly m times
A{m,n} Match regular expression A between m and n times (included)
Example:
import re
matcher=re.finditer("a","abaabaaab")
for match in matcher:
print(match.start(),"......",match.group())

Output:
0 ...... a
2 ...... a
3 ...... a

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 11


5 ...... a
6 ...... a
7 ...... a

Example:
import re
matcher=re.finditer("a+","abaabaaab")
for match in matcher:
print(match.start(),"......",match.group())

Output
0 ...... a
2 ...... aa
5 ...... aaa

Example:
import re
matcher=re.finditer("a*","abaabaaab")
for match in matcher:
print(match.start(),"......",match.group())

Output
0 ...... a
1 ......
2 ...... aa
4 ......
5 ...... aaa
8 ......
9 ......
Example:
import re
matcher=re.finditer("a?","abaabaaab")
for match in matcher:
print(match.start(),"......",match.group())

Output
0 ...... a
1 ......
2 ...... a
3 ...... a
4 ......
5 ...... a
6 ...... a
7 ...... a
8 ......
9 ......

Example:
import re
matcher=re.finditer("a{3}","abaabaaab")

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 12


for match in matcher:
print(match.start(),"......",match.group())

Output
5 ...... aaa

Example
import re
matcher=re.finditer("a{2,4}","abaabaaab")
for match in matcher:
print(match.start(),"......",match.group())

Output
2 ...... aa
5 ...... aaa

What is Metacharacter in a Regular Expression?

Metacharacters are special characters that affect how the regular expressions around them are
interpreted. Metacharacters are characters with a special meaning. We can further classify
Metacharacters into identifier and modifiers.

Metacharacters also called as operators, sign, or symbols.

Identifiers are used to recognise a certain type of characters. For example, to find all the number
characters in a string we can use an identifier ‗/d‘
import re
string="Hello I live on street 9 which is near street 23"
print(re.findall("\d",string))

Output:
[‗9‘, ‗2‘, ‗3‘]

In the above output it returned single-digit numbers and even double digit split into two digits. If we
use 2 identifiers but we can only find two-digit numbers. This problem can be solved with modifiers.

Modifiers are a set of Metacharacters that add more functionality to identifiers. As mentioned before,
we will see how we can use a modifier ― + ‖ to get numbers of any length from the string. This modifier
returns a string when it matches 1 or more characters.

Modifier Description
. (DOT) Matches any character except a newline.
^ (Caret) Matches pattern only at the start of the string.
$ (Dollar) Matches pattern at the end of the string
* (asterisk) Matches 0 or more repetitions of the regex.
+ (Plus) Match 1 or more repetitions of the regex.
? (Question mark) Match 0 or 1 repetition of the regex.
[](Square brackets) Used to indicate a set of characters. Matches any single character in brackets. For
example, [abc] will match either a, or, b, or c character
| (Pipe) used to specify multiple patterns. For example, P1|P2, where P1 and P2 are two
different regexes.
\ (backslash) Use to escape special characters or signals a special sequence. For example, If you
Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 13
are searching for one of the special characters you can use a \ to escape them
[^...] Matches any single character not in brackets.
(...) Matches whatever regular expression is inside the parentheses. For example, (abc)
will match to substring 'abc'

Example:
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by two (any) characters, and an "o":
x = re.findall("he..o", txt)
print(x)

Output:
['hello']

Example:
import re
txt = "The rain in Spain"
#Find all lower case characters alphabetically between "a" and "m":
x = re.findall("[a-m]", txt)
print(x)

Output:
['h', 'e', 'a', 'i', 'i', 'a', 'i']

Example:
import re
txt = "That will be 59 dollars"
#Find all digit characters:
x = re.findall("\d", txt)
print(x)

Output:
['5', '9']

Example:
import re
txt = "hello planet"
#Check if the string starts with 'hello':
x = re.findall("^hello", txt)
if x:
print("Yes, the string starts with 'hello'")
else:
print("No match")

Output:
Yes, the string starts with 'hello'

Example:
import re
txt = "hello planet"
#Check if the string ends with 'planet':

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 14


x = re.findall("planet$", txt)
if x:
print("Yes, the string ends with 'planet'")
else:
print("No match")

Output:
Yes, the string ends with 'planet'

Example:
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 0 or more (any) characters, and an "o":
x = re.findall("he.*o", txt)
print(x)

Output:
['hello']

Example
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 1 or more (any) characters, and an "o":
x = re.findall("he.+o", txt)
print(x)

Output:
['hello']

Example
import re
txt = "hello planet"
#Search for a sequence that starts with "he", followed by 0 or 1 (any) character, and an "o":
x = re.findall("he.?o", txt)
print(x)
#This time we got no match, because there were not zero, not one, but two characters between "he" and the
"o"

Output:
[]

Example
import re
txt = "The rain in Spain falls mainly in the plain!"
#Check if the string contains either "falls" or "stays":
x = re.findall("falls|stays", txt)
print(x)
if x:
print("Yes, there is at least one match!")
else:
print("No match")
Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 15
Output:
['falls']
Yes, there is at least one match!

Python Regex Greedy vs Non-Greedy Quantifiers

The word ―quantifier‖ originates from latin: it‘s meaning is quantus = how much / how often.

Definition Greedy Quantifier:


A greedy quantifier such as ?, *, +, and {m,n} matches as many characters as possible (longest match).
For example, the regex 'a+' will match as many 'a's as possible in your string 'aaaa'—even though the
substrings 'a', 'aa', 'aaa' all match the regex 'a+'.

Definition Non-Greedy Quantifier:


A non-greedy quantifier such as ??, *?, +?, and {m,n}? matches as few characters as possible (shortest
possible match). For example, the regex 'a+?' will match as few 'a's as possible in your string 'aaaa'.
Thus, it matches the first character 'a' and is done with it.

A greedy match means that the regex engine (the one which tries to find your pattern in the string)
matches as many characters as possible.
For example, the regex 'a+' will match as many 'a's as possible in your string 'aaaa'. Although the
substrings 'a', 'aa', 'aaa' all match the regex 'a+', it‘s not enough for the regex engine. It‘s always hungry
and tries to match even more.

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 16


In other words, the greedy quantifiers give you the longest match from a given position in the string.
All default quantifiers ?, *, +, {m}, and {m,n} .they ―consume‖ or match as many characters as possible
so that the regex pattern is still satisfied.

>>> import re
>>> re.findall('a?', 'aaaa')
['a', 'a', 'a', 'a', '']
>>> re.findall('a*', 'aaaa')
['aaaa', '']
>>> re.findall('a+', 'aaaa')
['aaaa']
>>> re.findall('a{3}', 'aaaa')
['aaa']
>>> re.findall('a{1,2}', 'aaaa')
['aa', 'aa']

In all cases, a shorter match would also be valid. But as the regex engine is greedy per default, those are
not enough for the regex engine.

Python Regex Non-Greedy Match


A non-greedy match means that the regex engine matches as few characters as possible—so that it still
can match the pattern in the given string.

For example, the regex 'a+?' will match as few 'a's as possible in your string 'aaaa'. Thus, it matches the
first character 'a' and is done with it. Then, it moves on to the second character (which is also a match)
and so on

The non-greedy quantifiers give you the shortest possible match from a given position in the string

You can make the default quantifiers ?, *, +, {m}, and {m,n} non-greedy by appending a question mark
symbol '?' to them: ??, *?, +?, and {m,n}?. they ―consume‖ or match as few characters as possible so
that the regex pattern is still satisfied.

 Non-Greedy Question Mark Operator (??)


>>> re.findall('a??', 'aaaa')
['', 'a', '', 'a', '', 'a', '', 'a', '']

 Non-Greedy Asterisk Operator (*?)


>>> re.findall('a*?', 'aaaa')
['', 'a', '', 'a', '', 'a', '', 'a', '']

 Non-Greedy Plus Operator (+?)


>>> re.findall('a+?', 'aaaa')
['a', 'a', 'a', 'a']

Backreferences

You can match a previously captured group later within the same regex using a special metacharacter
sequence called a backreference.

They are regular expression commands which refer to a previous part of the matched regular
expression.
Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 17
Backreferences in a pattern allow you to specify that the contents of an earlier capturing group must also
be found at the current location in the string.

The following shows the syntax of a backreference:


\<n>
Alternatively, you can use the following syntax:
\g<N>
Where g means group and n can be 1, 2, 3 etc gives no of group

For example, \1 will succeed if the exact contents of group 1 can be found at the current position, and
fails otherwise.

Example 1:
The below example we want to find all the duplicated words in the given text.

import re
txt = """
hello hello
how are you
bye bye
"""
p= re.compile("(\w+) \\1")
print(p.findall(txt))

Output:
['hello', 'bye']
Note: Since Python’s string literals also use a backslash followed by numbers to allow including arbitrary
characters in a string, backreferences need to be escaped so that regex engine gets proper format. We can also
use raw strings to ignore escaping.

Above example is given using raw strings

import re
txt = """
hello hello
how are you
bye bye
"""
p = re.compile(r"(\w+) \1")
print(p.findall(txt))
Output
['hello', 'bye']

Example 2
Consider a scenario where we want to find all dates with the format dd/mm/yyyy and change them to
yyyy-mm-dd format.

import re
txt = """
today is 23/02/2019.
yesterday was 22/02/2019.

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 18


tomorrow is 24/02/2019.
"""
p = re.compile("(\d{2})\/(\d{2})\/(\d{4})") # Regular expression
newtxt = p.sub(r"\3-\2-\1", txt)
print(newtxt)

Output
today is 2019-02-23.
yesterday was 2019-02-22.
tomorrow is 2019-02-24.

Note: Any time you use a regex in Python with a numbered backreference, it‘s a good idea to specify it
as a raw string. Otherwise, the interpreter may confuse the backreference with an octal value.

Example:
import re
print(re.search("([a-z])#\1", "d#d"))

Output
None

The regex ([a-z])#\1 matches a lowercase letter, followed by '#', followed by the same lowercase letter.
The string in this case is 'd#d', which should match. But the match fails because Python misinterprets the
backreference \1 as the character whose octal value is one:

import re
print(re.search("([a-z])#\\1", "d#d"))

Output:
<re.Match object; span=(0, 3), match='d#d'>

You‘ll achieve the correct match if you specify the regex as a raw string:
import re
print(re.search(r'([a-z])#\1', 'd#d'))

Output
<re.Match object; span=(0, 3), match='d#d'>

Compiled by – G Sreenivasulu, Associate Professor of CSE, Page 19

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy