Skip to content

gh-137146: Restrict IPvFuture address parsing to RFC 3986-valid characters #137147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

mauricelambert
Copy link
Contributor

@mauricelambert mauricelambert commented Jul 27, 2025

This PR fixes overly permissive validation of IPvFuture hostnames in urllib.parse (#137146).

Previously, the regex used to match IPvFuture (v...) components allowed all characters (.+), which is incorrect. According to RFC 3986 §3.2.2, an IPvFuture should match the following structure:

"v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )

Where:

  • unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
  • sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

This patch replaces the permissive regex with one that strictly enforces this allowed character set.

Before the fix:

>>> import urllib.parse
>>> urllib.parse.urlparse("http://[v45.test|bad]/path")
ParseResult(scheme='http', netloc='[v45.test|bad]', path='/path', ...)

After the fix:

>>> urllib.parse.urlparse("http://[v45.test|bad]/path")
ValueError: IPvFuture address is invalid

This improves standards compliance and helps prevent silent acceptance of malformed or unsafe host components.

…acters

IPvFuture hostnames in URLs were being matched using a too-permissive regex
(`.+`), which allowed invalid characters not defined by RFC 3986.
This patch updates the pattern to only accept characters explicitly allowed
by the RFC for IPvFuture addresses.

According to RFC 3986 §3.2.2, the format of IPvFuture is:

  "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )

Where:
  - unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
  - sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Before the fix:

>>> import urllib.parse
>>> urllib.parse.urlparse("http://[v45.test|test]/path")
ParseResult(scheme='http', netloc='[v45.test|test]', path='/path', ...)

Invalid characters such as `|` were incorrectly accepted.

After the fix:

>>> import urllib.parse
>>> urllib.parse.urlparse("http://[v45.test|test]/path")
Traceback (most recent call last):
    ...
ValueError: IPvFuture address is invalid

This improves standards compliance and prevents malformed URLs from being
silently accepted.
@StanFromIreland StanFromIreland changed the title #137146: Restrict IPvFuture address parsing to RFC 3986-valid characters gh-137146: Restrict IPvFuture address parsing to RFC 3986-valid characters Jul 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy