Open In App

Remove URLs from string in Python

Last Updated : 23 Jul, 2025
Comments
Improve
Suggest changes
Like Article
Like
Report

A regular expression (regex) is a sequence of characters that defines a search pattern in text. To remove URLs from a string in Python, you can either use regular expressions (regex) or some external libraries like urllib.parse. The re-module in Python is used for working with regular expressions. In this article, we will see how we can remove URLs from a string in Python.

Python Remove URLs from a String

Below are the ways by which we can remove URLs from a string in Python:

  • Using the re.sub() function
  • Using the re.findall() function
  • Using the re.search() function
  • Using the urllib.parse class

Python Remove URLs from String Using re.sub() function

In this example, the code defines a function 'remove_urls' to find URLs in text and replace them with a placeholder [URL REMOVED], using regular expressions for pattern matching and the re.sub() method for substitution.

Python3
import re
def remove_urls(text, replacement_text="[URL REMOVED]"):
    # Define a regex pattern to match URLs
    url_pattern = re.compile(r'https?://\S+|www\.\S+')

    # Use the sub() method to replace URLs with the specified replacement text
    text_without_urls = url_pattern.sub(replacement_text, text)

    return text_without_urls

# Example:
input_text = "Visit on GeeksforGeeks Website: https://www.geeksforgeeks.org/"
output_text = remove_urls(input_text)

print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text)

Output
Original Text:
Visit on GeeksforGeeks Website: https://www.geeksforgeeks.org/

Text with URLs Removed:
Visit on GeeksforGeeks Website: [URL REMOVED]

Remove URLs from String Using re.findall() function

In this example, the Python code defines a function 'remove_urls_findall' that uses regular expressions to find all URLs using re.findall() method in a given text and replaces them with a replacement text "[URL REMOVED]".

Python3
import re
def remove_urls_findall(text, replacement_text="[URL REMOVED]"):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')
    urls = url_pattern.findall(text)

    for url in urls:
        text = text.replace(url, replacement_text)

    return text

# Example:
input_text = "Check out the latest Python tutorials on GeeksforGeeks: https://www.geeksforgeeks.org/category/python/"/
output_text_findall = remove_urls_findall(input_text)

print("\nUsing re.findall():")
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text_findall)

Output:

Using re.findall():
Original Text:
Check out the latest Python tutorials on GeeksforGeeks: https://www.geeksforgeeks.org/category/python/
Text with URLs Removed:
Check out the latest Python tutorials on GeeksforGeeks: [URL REMOVED]

Remove URLs from String in Python Using re.search() function

In this example, the Python code defines a function 'remove_urls_search' using regular expressions and re.search() to find and replace URLs in a given text with a replacement text "[URL REMOVED]".

Python3
import re
def remove_urls_search(text, replacement_text="[URL REMOVED]"):
    url_pattern = re.compile(r'https?://\S+|www\.\S+')

    while True:
        match = url_pattern.search(text)
        if not match:
            break
        text = text[:match.start()] + replacement_text + text[match.end():]

    return text

# Example:
input_text = "Visit our website at https://www.geeksforgeeks.org/ for more information. Follow us on Twitter: @geeksforgeeks"
output_text_search = remove_urls_search(input_text)

print("\nUsing re.search():")
print("Original Text:")
print(input_text)
print("\nText with URLs Removed:")
print(output_text_search)

Output:

Using re.search():
Original Text:
Visit our website at https://www.geeksforgeeks.org/ for more information. Follow us on Twitter: @geeksforgeeks
Text with URLs Removed:
Visit our website at [URL REMOVED] for more information. Follow us on Twitter: @geeksforgeeks

Remove URLs from String Using urllib.parse

In this example, the Python code defines a function 'remove_urls_urllib' that uses urllib.parse to check and replace URLs in a given text with a replacement text "[URL REMOVED]".

Python3
# Using urllib.parse
from urllib.parse import urlparse

def remove_urls_urllib(text, replacement_text="[URL REMOVED]"):
    words = text.split()
    for i, word in enumerate(words):
        parsed_url = urlparse(word)
        if parsed_url.scheme and parsed_url.netloc:
            words[i] = replacement_text
    return ' '.join(words)

# Example:
input_text = "Check out the GeeksforGeeks website at https://www.geeksforgeeks.org/ for programming tutorials."
output_text_urllib = remove_urls_urllib(input_text)

print("Using urllib.parse:")
print("Text with URLs Removed:")
print(output_text_urllib)

Output
Using urllib.parse:
Text with URLs Removed:
Check out the GeeksforGeeks website at [URL REMOVED] for programming tutorials.

Similar Reads

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy