Skip to content

urllib.parse accepts invalid characters in IPv6 ZoneIDs and IPvFuture addresses #137146

@mauricelambert

Description

@mauricelambert

Bug report

Bug description:

Issue Summary

The urllib.parse module currently allows invalid characters in both IPv6 Zone Identifiers and IPvFuture addresses due to discrepancies between how it validates these components and what is defined in relevant RFCs.

Details

  1. IPv6 ZoneID Parsing

    • According to RFC 6874, the IPv6 ZoneID should follow a restricted format when used in a URL. Specifically, it must use percent-encoding (% followed by two hex digits) for non-allowed characters, and only the following characters are permitted in the decoded form:

      ALPHA / DIGIT / "-" / "." / "_" / "~"
      
    • However, urllib.parse relies on the ipaddress module to parse ZoneIDs, which follows the broader rules from RFC 4007, accepting any non-null string.

    • This results in invalid ZoneIDs being accepted in parsed URLs, which could cause compatibility issues or security concerns in strict RFC-compliant applications.

  2. IPvFuture Parsing

    • RFC 3986 defines the format of an IPvFuture address as:

      "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )
      

      where:

      • unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~"
      • sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="
    • However, the current regex used by urllib.parse for validating IPvFuture uses a .+ pattern, which matches any character sequence. This allows completely invalid strings (including those with spaces or other illegal characters) to be accepted as valid IPvFuture addresses.

Expected Behavior

  • urllib.parse should:

    • Enforce the character restrictions for ZoneIDs as defined in RFC 6874 when parsing URL hosts.
    • Strictly validate IPvFuture addresses based on the character classes defined in RFC 3986.

Actual Behavior

  • Invalid characters are accepted in both ZoneIDs and IPvFuture segments.

Suggested Fix

  • Update urllib.parse:

    • To decode and validate ZoneID characters according to RFC 6874.
    • To adjust the regex or parsing logic for IPvFuture addresses to comply with the ABNF in RFC 3986.

Example

from urllib.parse import urlparse

# Invalid ZoneID containing `@`
url = urlparse("http://[fe80::1%en0@]:8080")
print(url.hostname)  # Should raise or reject invalid ZoneID

# Invalid IPvFuture containing ` ` (space)
url = urlparse("http://[v1.invalid space]/")
print(url.hostname)  # Should raise or reject invalid IPvFuture

Impact

This behavior may lead to security vulnerabilities or unexpected behavior in applications that rely on urllib.parse for URL validation or sanitization based on RFC compliance.

In particular:

  • A developer may incorrectly assume that urllib.parse enforces RFC 3986, RFC 6874, or RFC 4007 character restrictions.

  • If the parser accepts invalid ZoneIDs or malformed IPvFuture components, applications could:

    • Accept and process invalid or malicious URLs.
    • Misroute requests, leading to access control issues.
    • Be exposed to injection attacks (e.g., if the ZoneID is reused unsanitized in shell commands or logging).
    • Fail silently in contexts where strict compliance is expected, introducing logic bugs or interoperability issues.

Example scenario:

If a developer trusts urllib.parse to reject invalid hostnames, and then interpolates the parsed url.hostname or url.netloc into system commands, proxy configurations, or DNS lookups, an attacker could exploit improperly validated input.

Recommendation:

Until this behavior is corrected, developers should not rely solely on urllib.parse for validation of ZoneIDs or IPvFuture addresses, and should consider adding additional sanitization layers when handling user-supplied URLs.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-securityA security issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy