0% found this document useful (0 votes)

11 views

Regular Expressions in Python

The document provides an overview of Regular Expressions (Regex) in Python, explaining their importance in text processing, validation, and data extraction. It details key functions in the re module, such as re.match, re.search, and re.findall, along with the benefits of using re.compile for efficiency. Additionally, it covers common regex patterns, practical applications, and advanced techniques like groups and alternation.

Uploaded by

Praghya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

Regular Expressions in Python

Uploaded by

Praghya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Regular Expressions in Python

Introduction
Regular Expressions (Regex) are patterns used to match character combinations in strings.
They are an essential tool for processing text and data across various domains. In web
development, regex is used to validate user inputs like email addresses or passwords. Data
scientists and analysts use regex to clean, transform, and extract meaningful patterns from
raw datasets. Similarly, regex plays a critical role in parsing log files, identifying errors in
large datasets, and extracting specific information from documents or web pages. Its
versatility makes it a foundational skill for software engineers, data professionals, and system
administrators alike. They are widely used for:

• Text validation (e.g., email validation)

• Searching within text
• Text manipulation (e.g., replacing patterns)
• Parsing complex datasets (e.g., logs, HTML, or CSV files)

Python provides the re module to work with regular expressions. This document explains
regex concepts with examples and outputs to help beginners understand and apply regex
effectively.

Basics of Regular Expressions

Raw String Literals in Python ( r prefix)

• Raw strings in Python (e.g., r"\d") treat backslashes literally, simplifying regex
patterns. Without the r prefix, double backslashes are required.
• Example:

import re
pattern = r"\d"
string = "abc123"
result = re.search(pattern, string)
print(result.group()) # Output: 1

Key Functions in the re Module

1. re.match()

Matches a pattern at the start of the string.

import re
result = re.match(r'\d+', '123abc')
if result:
print(result.group()) # Output: 123

2. re.search()

Searches for the first occurrence of a pattern anywhere in the string.

result = re.search(r'\d+', 'abc123def')

if result:
print(result.group()) # Output: 123

3. re.findall()

Returns all occurrences of a pattern as a list.

result = re.findall(r'\d+', 'abc123def456')

print(result) # Output: ['123', '456']

4. re.split()

Splits a string by occurrences of a pattern.

result = re.split(r'\d+', 'abc123def456')

print(result) # Output: ['abc', 'def', '']

5. re.sub()

Replaces occurrences of a pattern with a replacement string.

result = re.sub(r'\s+', '-', 'This is a test')

print(result) # Output: 'This-is-a-test'

6. re.compile()

Creates a reusable regex pattern object for efficiency.

pattern = re.compile(r'\d+')
result = pattern.findall('123abc456')
print(result) # Output: ['123', '456']

What does re.compile do?

When you use re.compile, it "prepares" (or compiles) your regular expression into a reusable
object. This object can then be used multiple times for different operations like finding
matches, replacing text, etc., without needing to re-interpret the pattern every time.
Without re.compile, Python has to process the pattern every time you call a function like
re.findall or re.search. Using re.compile is more efficient if you're working with the same
pattern multiple times in your code.
Key Benefits of re.compile:
1. Improved performance: The pattern is compiled once and reused, saving time if you
use it repeatedly.
2. Better readability: The pattern is defined and reused in a clear, structured way.

Example Without re.compile:

Imagine you need to find and replace all numbers in multiple strings. Without re.compile,
you'll repeatedly pass the pattern to the re functions:
import re

strings = ["abc123", "456def", "ghi789"]

for s in strings:
# Find all numbers in each string
matches = re.findall(r'\d+', s)
print(f"Numbers in '{s}': {matches}")
Output:
Numbers in 'abc123': ['123']
Numbers in '456def': ['456']
Numbers in 'ghi789': ['789']
Here, Python processes the pattern r'\d+' every time you call re.findall.

Example With re.compile:

If you use re.compile, the pattern is prepared once and reused for each string:
import re

# Compile the pattern once

pattern = re.compile(r'\d+')

strings = ["abc123", "456def", "ghi789"]

for s in strings:
# Use the compiled pattern to find numbers
matches = pattern.findall(s)
print(f"Numbers in '{s}': {matches}")
Output:
Numbers in 'abc123': ['123']
Numbers in '456def': ['456']
Numbers in 'ghi789': ['789']

What’s the Difference?

• Without re.compile: The pattern is interpreted every time re.findall is called.
• With re.compile: The pattern is interpreted once and reused for all operations.
If you're working with the pattern only once, re.compile doesn't make a noticeable difference.
However, if the pattern is reused multiple times (e.g., in a loop or across different parts of
your code), using re.compile improves performance and makes your code more readable.

Summary:
• re.compile is useful when you use the same pattern repeatedly.
• It saves time by compiling the pattern once and allows you to use the compiled object
for all regex operations.

Common Regex Patterns and Characters

Table 1: Basic Characters and Character Classes

Character/Pattern Description Example Matches / Fails to Match

Matches any character except a Matches: "acb", "a1b"; Fails:
. "a.b"
newline. "ab", "a\nb"

Matches any alphanumeric Matches: "hello",

\w "\w+" "Python_123"; Fails: "hello!",
character ([a-zA-Z0-9_]).
"123$"
Matches any non-alphanumeric Matches: "!!!", "#@$%";
\w "\w+"
character. Fails: "abc123", "hello"
Matches: "123", "456"; Fails:
\d Matches any digit ([0-9]). "\d{3}"
"12", "abc"
Matches any non-digit Matches: "hello", "abc!";
\d "\d+"
character. Fails: "1234", "567"
Matches any whitespace Matches: " ", "\\t"; Fails:
\s "\s+"
character (space, tab, newline). "abc", "123"
Matches any non-whitespace Matches: "hello123", "abc!";
\S "\S+"
character. Fails: " ", "\n"

Anchors and Special Characters

Character/Pattern Description Example Matches / Fails to Match

Anchors the pattern to the Matches: "hello world"; Fails:
^ "^hello"
start of the string. "world hello", "abc hello"
Anchors the pattern to the Matches: "hello world"; Fails:
$ "world$"
end of the string. "world hello", "hello"
Matches: "cat", "a cat"; Fails:
\b Matches a word boundary. "\bcat\b"
"catalog", "scattered"

Quantifiers

Matches / Fails to
Character/Pattern Description Example
Match
Matches zero or more repetitions Matches: "b", "ab",
* "a*b"
of the preceding element. "aaab"; Fails: "cab", "c"
Matches / Fails to
Character/Pattern Description Example
Match
Matches one or more repetitions of Matches: "ab", "aaab";
+ "a+b"
the preceding element. Fails: "b", "c"
Matches: "aaa"; Fails:
{n} Matches exactly n repetitions. "a{3}"
"aa", "aaaa"
Matches: "aa", "aaa";
{n,} Matches at least n repetitions. "a{2,}"
Fails: "a"
Matches between n and m Matches: "a", "aa", "aaa";
{n,m} "a{1,3}"
repetitions. Fails: "aaaa"

Examples of Practical Applications

Email Validation

Extracting Data

Example 1: Extract Numbers

Example 2: Extract Email Addresses

Example 3: Validate Mobile Numbers

Example 4: Extract Hours from Timestamps

Example 5: Extract Specific Data from Text

Using Regex Flags

Multi-line Matching
Case-Insensitive Matching

Advanced Regex Techniques

Groups and Alternation

Here are explanations and examples for each of the regex components mentioned in the
image:

1. | (Either or):

The pipe | is used to match either of two or more options.

Example:

• Explanation: The pattern matches either "falls" or "stays" in the input text.

2. () (Capture and group):

Parentheses are used to group parts of a pattern and capture them as separate groups.

Example:
• Explanation:
o (rain|sun) captures "rain" or "sun".
o (falls|stays) captures "falls" or "stays".
o The result is a list of tuples containing the captured groups.

3. [] (Set of characters):

Square brackets [] define a set of characters to match.

Example:

• Explanation: The pattern [a-z] matches any lowercase letter from 'a' to 'z'. Each match
is returned as a separate element in the list.

4. \ (Special sequence):

The backslash \ is used to escape special characters or represent special sequences.

Example:
• Explanation: The pattern \d matches any digit (0–9). Here, it finds all the digits in the
input text.

Another Example (Escaping Special Characters):

• Explanation: The backslash \ escapes the special meaning of |, treating it as a literal

character to match.

The .group() method in Python regular expressions is used to extract the part of the string
that matches the pattern or a specific group within the match.

Syntax of .group()

match.group([group_number])

• group_number (optional):
o If not specified (i.e., .group()), it returns the entire match.
o group(0): Returns the entire match (same as .group()).
o group(n): Returns the text matched by the n-th capturing group (inside
parentheses).

Example 1: Using .group() to Return the Entire Match

• Explanation:
o The pattern \d+ matches one or more digits.
o .search() finds the first match ("123").
o .group() returns the entire match.

Example 2: Using .group(n) to Access Capturing Groups

• Explanation:
o The parentheses () create a capturing group for the digits (\d+).
o .group(1) returns the content of the first capturing group (the digits "123").

Example 3: Multiple Capturing Groups

• Explanation:
o (123) is captured in group 1.
o (apples) is captured in group 2.
o .group(0) always returns the full match.

Example 4: Named Groups

You can assign names to groups and access them with .group('name').

Explanation:

o (?P<number>\d+) names the first group "number".

o (?P<item>\w+) names the second group "item".
o .group('name') retrieves the content of the named group.

What Happens If There’s No Match?

If there’s no match, .group() raises an AttributeError. To avoid this, always check if match is
not None before using .group().

Summary:

• .group() returns the entire match.

• .group(n) returns the n-th capturing group.
• Named groups allow you to retrieve specific parts of the match using names.

CHAPTER 10
No ratings yet
CHAPTER 10
28 pages
Python Regular Expressions
No ratings yet
Python Regular Expressions
14 pages
Python Regex
No ratings yet
Python Regex
8 pages
Python Course: Session 6b - Regular Expressions
No ratings yet
Python Course: Session 6b - Regular Expressions
11 pages
python_reg_expressions
No ratings yet
python_reg_expressions
8 pages
UNIT - 4 REGEX
No ratings yet
UNIT - 4 REGEX
28 pages
Python Regex: Re - Match, Re - Search, Re - Findall With Example
No ratings yet
Python Regex: Re - Match, Re - Search, Re - Findall With Example
10 pages
Python Regular Expression
100% (1)
Python Regular Expression
31 pages
Python Module-41
No ratings yet
Python Module-41
56 pages
Python Regular Expressions Quick Reference
No ratings yet
Python Regular Expressions Quick Reference
2 pages
9.RegEx
No ratings yet
9.RegEx
57 pages
Lecture 9 Python
No ratings yet
Lecture 9 Python
8 pages
Lecture 6 Re Basics
No ratings yet
Lecture 6 Re Basics
12 pages
Regular Expression l
No ratings yet
Regular Expression l
20 pages
Regular Expressions
No ratings yet
Regular Expressions
9 pages
9.RegEx (1)
No ratings yet
9.RegEx (1)
57 pages
Regex Case Interview Guide
No ratings yet
Regex Case Interview Guide
10 pages
Python Reg Expressions PDF
No ratings yet
Python Reg Expressions PDF
8 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex
No ratings yet
Howto Regex
19 pages
Howto Regex
No ratings yet
Howto Regex
20 pages
Howto Regex PDF
No ratings yet
Howto Regex PDF
20 pages
mod-3-PATTERN MATCHING WITH REGULAR EXPRESSIONS
No ratings yet
mod-3-PATTERN MATCHING WITH REGULAR EXPRESSIONS
21 pages
9Python-Simple-Character-Matches
No ratings yet
9Python-Simple-Character-Matches
19 pages
Python Regex Cheatsheet With Examples: Re Module Functions
No ratings yet
Python Regex Cheatsheet With Examples: Re Module Functions
1 page
Untitled
No ratings yet
Untitled
53 pages
Regular-Expressions-Cheat-Sheet
No ratings yet
Regular-Expressions-Cheat-Sheet
5 pages
Python 201 - (Slightly) Advanced Python Topics
No ratings yet
Python 201 - (Slightly) Advanced Python Topics
69 pages
Unit-3 - Regular Expression
No ratings yet
Unit-3 - Regular Expression
15 pages
Regular
No ratings yet
Regular
9 pages
RegEx-in-Python
No ratings yet
RegEx-in-Python
5 pages
howto-regex
No ratings yet
howto-regex
20 pages
RegEx in Python (4)
No ratings yet
RegEx in Python (4)
6 pages
Regular Expressions
No ratings yet
Regular Expressions
5 pages
Regular Expression Python
No ratings yet
Regular Expression Python
23 pages
Data Analysis Using Python Lab Ex3
No ratings yet
Data Analysis Using Python Lab Ex3
27 pages
Regular Expression
No ratings yet
Regular Expression
20 pages
Lec 06 - Regular Expression
No ratings yet
Lec 06 - Regular Expression
19 pages
Module5_RegularExpressions
No ratings yet
Module5_RegularExpressions
10 pages
Text-Processing-For-NLP-Understanding-Regex (7)
No ratings yet
Text-Processing-For-NLP-Understanding-Regex (7)
16 pages
Unit 2
No ratings yet
Unit 2
69 pages
Howto Regex
No ratings yet
Howto Regex
17 pages
Python RegEx
No ratings yet
Python RegEx
11 pages
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
No ratings yet
Regular Expression HOWTO: Guido Van Rossum and The Python Development Team
20 pages
Manipulating Text with Regular Expression in python
No ratings yet
Manipulating Text with Regular Expression in python
4 pages
Unit7_RegularExpressionpdf__2023_10_17_09_16_29
No ratings yet
Unit7_RegularExpressionpdf__2023_10_17_09_16_29
17 pages
Regular Expressions (Slides)
No ratings yet
Regular Expressions (Slides)
20 pages
Python - Slide 5
No ratings yet
Python - Slide 5
42 pages
Python Assignment Date: 08-11-2021: Name-Navjeet Kaur Sap ID-500076160 Roll No - R134219065
No ratings yet
Python Assignment Date: 08-11-2021: Name-Navjeet Kaur Sap ID-500076160 Roll No - R134219065
3 pages
Regular Expressions: Regular Expression Syntax in Python
No ratings yet
Regular Expressions: Regular Expression Syntax in Python
11 pages
Supplement Python Regular Expression
No ratings yet
Supplement Python Regular Expression
6 pages
Lecture 11 Regular Expressions
No ratings yet
Lecture 11 Regular Expressions
17 pages
Regular Expression
No ratings yet
Regular Expression
21 pages
Module II
No ratings yet
Module II
17 pages
13B RegExp
No ratings yet
13B RegExp
38 pages
Module3 RegularExpressions
No ratings yet
Module3 RegularExpressions
8 pages
Regular Expressions - Regexes in Python (Part 1) - Real Python
No ratings yet
Regular Expressions - Regexes in Python (Part 1) - Real Python
44 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Ian Talks Regex A-Z
From Everand
Ian Talks Regex A-Z
Ian Eress
No ratings yet
Comprehensive CSS3 Command List, With Descriptions And Typical Mark Up
From Everand
Comprehensive CSS3 Command List, With Descriptions And Typical Mark Up
Online Trainees
No ratings yet
Unit 4_Strings
No ratings yet
Unit 4_Strings
26 pages
Unit 3_Functions
No ratings yet
Unit 3_Functions
19 pages
Files in Python
No ratings yet
Files in Python
12 pages
Strings and Lists
No ratings yet
Strings and Lists
33 pages
01introduction To DBMS
No ratings yet
01introduction To DBMS
27 pages
IGET GIS 003 Georeferencing
No ratings yet
IGET GIS 003 Georeferencing
9 pages
Queue Data Structure and Implementation in Java
No ratings yet
Queue Data Structure and Implementation in Java
10 pages
cs188 Fa23 Note23
No ratings yet
cs188 Fa23 Note23
2 pages
Dd90 Software Editor: User Manual
No ratings yet
Dd90 Software Editor: User Manual
12 pages
Design of Automated Dual B and 4G Jammer Using MATLAB Simulink
No ratings yet
Design of Automated Dual B and 4G Jammer Using MATLAB Simulink
6 pages
Data Types in Java
No ratings yet
Data Types in Java
13 pages
ONVIF Media Service Spec v242
No ratings yet
ONVIF Media Service Spec v242
93 pages
PLC Commissioning
No ratings yet
PLC Commissioning
2 pages
Computers Chapter 1 Grade 5
No ratings yet
Computers Chapter 1 Grade 5
2 pages
Security Attacks and Challenges of Vanets: A Literature Survey
No ratings yet
Security Attacks and Challenges of Vanets: A Literature Survey
10 pages
Alureon: Alureon (Also Known As TDSS or TDL-4) Is A
No ratings yet
Alureon: Alureon (Also Known As TDSS or TDL-4) Is A
4 pages
Back Matter
100% (1)
Back Matter
22 pages
Realtime Operating Systems
No ratings yet
Realtime Operating Systems
29 pages
NPTEL Course List
No ratings yet
NPTEL Course List
94 pages
20241024111806Transparency and Privacy the Role of Explainable AI and Federated Learning in Financial Fraud Detection
No ratings yet
20241024111806Transparency and Privacy the Role of Explainable AI and Federated Learning in Financial Fraud Detection
11 pages
ES-19 Embedded Digital Readout Operation Manual: Always Committed To Quality, Technology & Innovation
No ratings yet
ES-19 Embedded Digital Readout Operation Manual: Always Committed To Quality, Technology & Innovation
15 pages
Kraussmaffei mc4 Ethernet Manual
No ratings yet
Kraussmaffei mc4 Ethernet Manual
27 pages
01-04 Basic Configurations Commands
No ratings yet
01-04 Basic Configurations Commands
455 pages
Si4703 - Example Code Si4703 FM Tuner
No ratings yet
Si4703 - Example Code Si4703 FM Tuner
10 pages
LD7007 - Assignment 2
No ratings yet
LD7007 - Assignment 2
7 pages
Computer Engineering
No ratings yet
Computer Engineering
1 page
Sarthak ITR
No ratings yet
Sarthak ITR
31 pages
Ec-Bos-8 Uukl - Ig
No ratings yet
Ec-Bos-8 Uukl - Ig
10 pages
2021-Problem-Set1 (Solutions)
No ratings yet
2021-Problem-Set1 (Solutions)
5 pages
23
No ratings yet
23
292 pages
Grade 9 End Term 2 Computer Exam
No ratings yet
Grade 9 End Term 2 Computer Exam
7 pages
Code For Unit-1
No ratings yet
Code For Unit-1
6 pages
FL 5
No ratings yet
FL 5
49 pages
IS2GASXX-Install-Suite-1 2 0 1
No ratings yet
IS2GASXX-Install-Suite-1 2 0 1
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Regular Expressions in Python

Uploaded by

Regular Expressions in Python

Uploaded by

Regular Expressions in Python

• Text validation (e.g., email validation)

Basics of Regular Expressions

Key Functions in the re Module

Matches a pattern at the start of the string.

Searches for the first occurrence of a pattern anywhere in the string.

result = re.search(r'\d+', 'abc123def')

Returns all occurrences of a pattern as a list.

result = re.findall(r'\d+', 'abc123def456')

Splits a string by occurrences of a pattern.

result = re.split(r'\d+', 'abc123def456')

Replaces occurrences of a pattern with a replacement string.

result = re.sub(r'\s+', '-', 'This is a test')

Creates a reusable regex pattern object for efficiency.

What does re.compile do?

Example Without re.compile:

strings = ["abc123", "456def", "ghi789"]

Example With re.compile:

# Compile the pattern once

strings = ["abc123", "456def", "ghi789"]

What’s the Difference?

Common Regex Patterns and Characters

Character/Pattern Description Example Matches / Fails to Match

Matches any alphanumeric Matches: "hello",

Anchors and Special Characters

Character/Pattern Description Example Matches / Fails to Match

Examples of Practical Applications

Example 1: Extract Numbers

Example 2: Extract Email Addresses

Example 3: Validate Mobile Numbers

Example 5: Extract Specific Data from Text

Using Regex Flags

Advanced Regex Techniques

The pipe | is used to match either of two or more options.

2. () (Capture and group):

Square brackets [] define a set of characters to match.

The backslash \ is used to escape special characters or represent special sequences.

Another Example (Escaping Special Characters):

• Explanation: The backslash \ escapes the special meaning of |, treating it as a literal

Example 1: Using .group() to Return the Entire Match

Example 2: Using .group(n) to Access Capturing Groups

Example 3: Multiple Capturing Groups

Example 4: Named Groups

o (?P<number>\d+) names the first group "number".

What Happens If There’s No Match?

• .group() returns the entire match.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.