0% found this document useful (0 votes)
11 views

Regular Expressions in Python

The document provides an overview of Regular Expressions (Regex) in Python, explaining their importance in text processing, validation, and data extraction. It details key functions in the re module, such as re.match, re.search, and re.findall, along with the benefits of using re.compile for efficiency. Additionally, it covers common regex patterns, practical applications, and advanced techniques like groups and alternation.

Uploaded by

Praghya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Regular Expressions in Python

The document provides an overview of Regular Expressions (Regex) in Python, explaining their importance in text processing, validation, and data extraction. It details key functions in the re module, such as re.match, re.search, and re.findall, along with the benefits of using re.compile for efficiency. Additionally, it covers common regex patterns, practical applications, and advanced techniques like groups and alternation.

Uploaded by

Praghya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Regular Expressions in Python

Introduction
Regular Expressions (Regex) are patterns used to match character combinations in strings.
They are an essential tool for processing text and data across various domains. In web
development, regex is used to validate user inputs like email addresses or passwords. Data
scientists and analysts use regex to clean, transform, and extract meaningful patterns from
raw datasets. Similarly, regex plays a critical role in parsing log files, identifying errors in
large datasets, and extracting specific information from documents or web pages. Its
versatility makes it a foundational skill for software engineers, data professionals, and system
administrators alike. They are widely used for:

• Text validation (e.g., email validation)


• Searching within text
• Text manipulation (e.g., replacing patterns)
• Parsing complex datasets (e.g., logs, HTML, or CSV files)

Python provides the re module to work with regular expressions. This document explains
regex concepts with examples and outputs to help beginners understand and apply regex
effectively.

Basics of Regular Expressions


Raw String Literals in Python ( r prefix)

• Raw strings in Python (e.g., r"\d") treat backslashes literally, simplifying regex
patterns. Without the r prefix, double backslashes are required.
• Example:

import re
pattern = r"\d"
string = "abc123"
result = re.search(pattern, string)
print(result.group()) # Output: 1

Key Functions in the re Module


1. re.match()

Matches a pattern at the start of the string.

import re
result = re.match(r'\d+', '123abc')
if result:
print(result.group()) # Output: 123

2. re.search()

Searches for the first occurrence of a pattern anywhere in the string.

result = re.search(r'\d+', 'abc123def')


if result:
print(result.group()) # Output: 123

3. re.findall()

Returns all occurrences of a pattern as a list.

result = re.findall(r'\d+', 'abc123def456')


print(result) # Output: ['123', '456']

4. re.split()

Splits a string by occurrences of a pattern.

result = re.split(r'\d+', 'abc123def456')


print(result) # Output: ['abc', 'def', '']

5. re.sub()

Replaces occurrences of a pattern with a replacement string.

result = re.sub(r'\s+', '-', 'This is a test')


print(result) # Output: 'This-is-a-test'

6. re.compile()

Creates a reusable regex pattern object for efficiency.

pattern = re.compile(r'\d+')
result = pattern.findall('123abc456')
print(result) # Output: ['123', '456']

What does re.compile do?


When you use re.compile, it "prepares" (or compiles) your regular expression into a reusable
object. This object can then be used multiple times for different operations like finding
matches, replacing text, etc., without needing to re-interpret the pattern every time.
Without re.compile, Python has to process the pattern every time you call a function like
re.findall or re.search. Using re.compile is more efficient if you're working with the same
pattern multiple times in your code.
Key Benefits of re.compile:
1. Improved performance: The pattern is compiled once and reused, saving time if you
use it repeatedly.
2. Better readability: The pattern is defined and reused in a clear, structured way.

Example Without re.compile:


Imagine you need to find and replace all numbers in multiple strings. Without re.compile,
you'll repeatedly pass the pattern to the re functions:
import re

strings = ["abc123", "456def", "ghi789"]

for s in strings:
# Find all numbers in each string
matches = re.findall(r'\d+', s)
print(f"Numbers in '{s}': {matches}")
Output:
Numbers in 'abc123': ['123']
Numbers in '456def': ['456']
Numbers in 'ghi789': ['789']
Here, Python processes the pattern r'\d+' every time you call re.findall.

Example With re.compile:


If you use re.compile, the pattern is prepared once and reused for each string:
import re

# Compile the pattern once


pattern = re.compile(r'\d+')

strings = ["abc123", "456def", "ghi789"]

for s in strings:
# Use the compiled pattern to find numbers
matches = pattern.findall(s)
print(f"Numbers in '{s}': {matches}")
Output:
Numbers in 'abc123': ['123']
Numbers in '456def': ['456']
Numbers in 'ghi789': ['789']

What’s the Difference?


• Without re.compile: The pattern is interpreted every time re.findall is called.
• With re.compile: The pattern is interpreted once and reused for all operations.
If you're working with the pattern only once, re.compile doesn't make a noticeable difference.
However, if the pattern is reused multiple times (e.g., in a loop or across different parts of
your code), using re.compile improves performance and makes your code more readable.

Summary:
• re.compile is useful when you use the same pattern repeatedly.
• It saves time by compiling the pattern once and allows you to use the compiled object
for all regex operations.

Common Regex Patterns and Characters


Table 1: Basic Characters and Character Classes

Character/Pattern Description Example Matches / Fails to Match


Matches any character except a Matches: "acb", "a1b"; Fails:
. "a.b"
newline. "ab", "a\nb"

Matches any alphanumeric Matches: "hello",


\w "\w+" "Python_123"; Fails: "hello!",
character ([a-zA-Z0-9_]).
"123$"
Matches any non-alphanumeric Matches: "!!!", "#@$%";
\w "\w+"
character. Fails: "abc123", "hello"
Matches: "123", "456"; Fails:
\d Matches any digit ([0-9]). "\d{3}"
"12", "abc"
Matches any non-digit Matches: "hello", "abc!";
\d "\d+"
character. Fails: "1234", "567"
Matches any whitespace Matches: " ", "\\t"; Fails:
\s "\s+"
character (space, tab, newline). "abc", "123"
Matches any non-whitespace Matches: "hello123", "abc!";
\S "\S+"
character. Fails: " ", "\n"

Anchors and Special Characters

Character/Pattern Description Example Matches / Fails to Match


Anchors the pattern to the Matches: "hello world"; Fails:
^ "^hello"
start of the string. "world hello", "abc hello"
Anchors the pattern to the Matches: "hello world"; Fails:
$ "world$"
end of the string. "world hello", "hello"
Matches: "cat", "a cat"; Fails:
\b Matches a word boundary. "\bcat\b"
"catalog", "scattered"

Quantifiers

Matches / Fails to
Character/Pattern Description Example
Match
Matches zero or more repetitions Matches: "b", "ab",
* "a*b"
of the preceding element. "aaab"; Fails: "cab", "c"
Matches / Fails to
Character/Pattern Description Example
Match
Matches one or more repetitions of Matches: "ab", "aaab";
+ "a+b"
the preceding element. Fails: "b", "c"
Matches: "aaa"; Fails:
{n} Matches exactly n repetitions. "a{3}"
"aa", "aaaa"
Matches: "aa", "aaa";
{n,} Matches at least n repetitions. "a{2,}"
Fails: "a"
Matches between n and m Matches: "a", "aa", "aaa";
{n,m} "a{1,3}"
repetitions. Fails: "aaaa"

Examples of Practical Applications


Email Validation

Extracting Data

Example 1: Extract Numbers

Example 2: Extract Email Addresses

Example 3: Validate Mobile Numbers


Example 4: Extract Hours from Timestamps

Example 5: Extract Specific Data from Text

Using Regex Flags

Multi-line Matching
Case-Insensitive Matching

Advanced Regex Techniques


Groups and Alternation

Here are explanations and examples for each of the regex components mentioned in the
image:

1. | (Either or):

The pipe | is used to match either of two or more options.

Example:

• Explanation: The pattern matches either "falls" or "stays" in the input text.

2. () (Capture and group):

Parentheses are used to group parts of a pattern and capture them as separate groups.

Example:
• Explanation:
o (rain|sun) captures "rain" or "sun".
o (falls|stays) captures "falls" or "stays".
o The result is a list of tuples containing the captured groups.

3. [] (Set of characters):

Square brackets [] define a set of characters to match.

Example:

• Explanation: The pattern [a-z] matches any lowercase letter from 'a' to 'z'. Each match
is returned as a separate element in the list.

4. \ (Special sequence):

The backslash \ is used to escape special characters or represent special sequences.

Example:
• Explanation: The pattern \d matches any digit (0–9). Here, it finds all the digits in the
input text.

Another Example (Escaping Special Characters):

• Explanation: The backslash \ escapes the special meaning of |, treating it as a literal


character to match.

The .group() method in Python regular expressions is used to extract the part of the string
that matches the pattern or a specific group within the match.

Syntax of .group()

match.group([group_number])

• group_number (optional):
o If not specified (i.e., .group()), it returns the entire match.
o group(0): Returns the entire match (same as .group()).
o group(n): Returns the text matched by the n-th capturing group (inside
parentheses).

Example 1: Using .group() to Return the Entire Match


• Explanation:
o The pattern \d+ matches one or more digits.
o .search() finds the first match ("123").
o .group() returns the entire match.

Example 2: Using .group(n) to Access Capturing Groups

• Explanation:
o The parentheses () create a capturing group for the digits (\d+).
o .group(1) returns the content of the first capturing group (the digits "123").

Example 3: Multiple Capturing Groups


• Explanation:
o (123) is captured in group 1.
o (apples) is captured in group 2.
o .group(0) always returns the full match.

Example 4: Named Groups

You can assign names to groups and access them with .group('name').

Explanation:

o (?P<number>\d+) names the first group "number".


o (?P<item>\w+) names the second group "item".
o .group('name') retrieves the content of the named group.

What Happens If There’s No Match?


If there’s no match, .group() raises an AttributeError. To avoid this, always check if match is
not None before using .group().

Summary:

• .group() returns the entire match.


• .group(n) returns the n-th capturing group.
• Named groups allow you to retrieve specific parts of the match using names.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy