Skip to content

Commit 5c17dfc

Browse files
orsenthilmerwokFidget-SpinnerAdamGold
authored
[3.6] bpo-42967: only use '&' as a query string separator (GH-24297) (GH-24532)
bpo-42967: [security] Address a web cache-poisoning issue reported in urllib.parse.parse_qsl(). urllib.parse will only us "&" as query string separator by default instead of both ";" and "&" as allowed in earlier versions. An optional argument seperator with default value "&" is added to specify the separator. Co-authored-by: Éric Araujo <merwok@netwok.org> Co-authored-by: Ken Jin <28750310+Fidget-Spinner@users.noreply.github.com> Co-authored-by: Adam Goldschmidt <adamgold7@gmail.com>
1 parent 34df10a commit 5c17dfc

File tree

8 files changed

+134
-43
lines changed

8 files changed

+134
-43
lines changed

Doc/library/cgi.rst

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -277,13 +277,12 @@ These are useful if you want more control, or if you want to employ some of the
277277
algorithms implemented in this module in other circumstances.
278278

279279

280-
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False)
280+
.. function:: parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False, separator="&")
281281

282282
Parse a query in the environment or from a file (the file defaults to
283-
``sys.stdin``). The *keep_blank_values* and *strict_parsing* parameters are
283+
``sys.stdin``). The *keep_blank_values*, *strict_parsing* and *separator* parameters are
284284
passed to :func:`urllib.parse.parse_qs` unchanged.
285285

286-
287286
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False)
288287

289288
This function is deprecated in this module. Use :func:`urllib.parse.parse_qs`
@@ -308,6 +307,9 @@ algorithms implemented in this module in other circumstances.
308307
Note that this does not parse nested multipart parts --- use
309308
:class:`FieldStorage` for that.
310309

310+
.. versionchanged:: 3.6.13
311+
Added the *separator* parameter.
312+
311313

312314
.. function:: parse_header(string)
313315

Doc/library/urllib.parse.rst

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ or on combining URL components into a URL string.
143143
now raise :exc:`ValueError`.
144144

145145

146-
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
146+
.. function:: parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
147147

148148
Parse a query string given as a string argument (data of type
149149
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a
@@ -168,6 +168,9 @@ or on combining URL components into a URL string.
168168
read. If set, then throws a :exc:`ValueError` if there are more than
169169
*max_num_fields* fields read.
170170

171+
The optional argument *separator* is the symbol to use for separating the
172+
query arguments. It defaults to ``&``.
173+
171174
Use the :func:`urllib.parse.urlencode` function (with the ``doseq``
172175
parameter set to ``True``) to convert such dictionaries into query
173176
strings.
@@ -179,8 +182,14 @@ or on combining URL components into a URL string.
179182
.. versionchanged:: 3.6.8
180183
Added *max_num_fields* parameter.
181184

185+
.. versionchanged:: 3.6.13
186+
Added *separator* parameter with the default value of ``&``. Python
187+
versions earlier than Python 3.6.13 allowed using both ``;`` and ``&`` as
188+
query parameter separator. This has been changed to allow only a single
189+
separator key, with ``&`` as the default separator.
190+
182191

183-
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None)
192+
.. function:: parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&')
184193

185194
Parse a query string given as a string argument (data of type
186195
:mimetype:`application/x-www-form-urlencoded`). Data are returned as a list of
@@ -204,6 +213,9 @@ or on combining URL components into a URL string.
204213
read. If set, then throws a :exc:`ValueError` if there are more than
205214
*max_num_fields* fields read.
206215

216+
The optional argument *separator* is the symbol to use for separating the
217+
query arguments. It defaults to ``&``.
218+
207219
Use the :func:`urllib.parse.urlencode` function to convert such lists of pairs into
208220
query strings.
209221

@@ -213,6 +225,12 @@ or on combining URL components into a URL string.
213225
.. versionchanged:: 3.6.8
214226
Added *max_num_fields* parameter.
215227

228+
.. versionchanged:: 3.6.13
229+
Added *separator* parameter with the default value of ``&``. Python
230+
versions earlier than Python 3.6.13 allowed using both ``;`` and ``&`` as
231+
query parameter separator. This has been changed to allow only a single
232+
separator key, with ``&`` as the default separator.
233+
216234

217235
.. function:: urlunparse(parts)
218236

Doc/whatsnew/3.6.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2459,3 +2459,16 @@ because of the behavior of the socket option ``SO_REUSEADDR`` in UDP. For more
24592459
details, see the documentation for ``loop.create_datagram_endpoint()``.
24602460
(Contributed by Kyle Stanley, Antoine Pitrou, and Yury Selivanov in
24612461
:issue:`37228`.)
2462+
2463+
Notable changes in Python 3.6.13
2464+
================================
2465+
2466+
Earlier Python versions allowed using both ``;`` and ``&`` as
2467+
query parameter separators in :func:`urllib.parse.parse_qs` and
2468+
:func:`urllib.parse.parse_qsl`. Due to security concerns, and to conform with
2469+
newer W3C recommendations, this has been changed to allow only a single
2470+
separator key, with ``&`` as the default. This change also affects
2471+
:func:`cgi.parse` and :func:`cgi.parse_multipart` as they use the affected
2472+
functions internally. For more details, please see their respective
2473+
documentation.
2474+
(Contributed by Adam Goldschmidt, Senthil Kumaran and Ken Jin in :issue:`42967`.)

Lib/cgi.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,8 @@ def closelog():
117117
# 0 ==> unlimited input
118118
maxlen = 0
119119

120-
def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
120+
def parse(fp=None, environ=os.environ, keep_blank_values=0,
121+
strict_parsing=0, separator='&'):
121122
"""Parse a query in the environment or from a file (default stdin)
122123
123124
Arguments, all optional:
@@ -136,6 +137,9 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
136137
strict_parsing: flag indicating what to do with parsing errors.
137138
If false (the default), errors are silently ignored.
138139
If true, errors raise a ValueError exception.
140+
141+
separator: str. The symbol to use for separating the query arguments.
142+
Defaults to &.
139143
"""
140144
if fp is None:
141145
fp = sys.stdin
@@ -180,7 +184,7 @@ def parse(fp=None, environ=os.environ, keep_blank_values=0, strict_parsing=0):
180184
qs = ""
181185
environ['QUERY_STRING'] = qs # XXX Shouldn't, really
182186
return urllib.parse.parse_qs(qs, keep_blank_values, strict_parsing,
183-
encoding=encoding)
187+
encoding=encoding, separator=separator)
184188

185189

186190
# parse query string function called from urlparse,
@@ -405,7 +409,7 @@ class FieldStorage:
405409
def __init__(self, fp=None, headers=None, outerboundary=b'',
406410
environ=os.environ, keep_blank_values=0, strict_parsing=0,
407411
limit=None, encoding='utf-8', errors='replace',
408-
max_num_fields=None):
412+
max_num_fields=None, separator='&'):
409413
"""Constructor. Read multipart/* until last part.
410414
411415
Arguments, all optional:
@@ -453,6 +457,7 @@ def __init__(self, fp=None, headers=None, outerboundary=b'',
453457
self.keep_blank_values = keep_blank_values
454458
self.strict_parsing = strict_parsing
455459
self.max_num_fields = max_num_fields
460+
self.separator = separator
456461
if 'REQUEST_METHOD' in environ:
457462
method = environ['REQUEST_METHOD'].upper()
458463
self.qs_on_post = None
@@ -678,7 +683,7 @@ def read_urlencoded(self):
678683
query = urllib.parse.parse_qsl(
679684
qs, self.keep_blank_values, self.strict_parsing,
680685
encoding=self.encoding, errors=self.errors,
681-
max_num_fields=self.max_num_fields)
686+
max_num_fields=self.max_num_fields, separator=self.separator)
682687
self.list = [MiniFieldStorage(key, value) for key, value in query]
683688
self.skip_lines()
684689

@@ -694,7 +699,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
694699
query = urllib.parse.parse_qsl(
695700
self.qs_on_post, self.keep_blank_values, self.strict_parsing,
696701
encoding=self.encoding, errors=self.errors,
697-
max_num_fields=self.max_num_fields)
702+
max_num_fields=self.max_num_fields, separator=self.separator)
698703
self.list.extend(MiniFieldStorage(key, value) for key, value in query)
699704

700705
klass = self.FieldStorageClass or self.__class__
@@ -736,7 +741,7 @@ def read_multi(self, environ, keep_blank_values, strict_parsing):
736741

737742
part = klass(self.fp, headers, ib, environ, keep_blank_values,
738743
strict_parsing,self.limit-self.bytes_read,
739-
self.encoding, self.errors, max_num_fields)
744+
self.encoding, self.errors, max_num_fields, self.separator)
740745

741746
if max_num_fields is not None:
742747
max_num_fields -= 1

Lib/test/test_cgi.py

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -55,12 +55,9 @@ def do_test(buf, method):
5555
("", ValueError("bad query field: ''")),
5656
("&", ValueError("bad query field: ''")),
5757
("&&", ValueError("bad query field: ''")),
58-
(";", ValueError("bad query field: ''")),
59-
(";&;", ValueError("bad query field: ''")),
6058
# Should the next few really be valid?
6159
("=", {}),
6260
("=&=", {}),
63-
("=;=", {}),
6461
# This rest seem to make sense
6562
("=a", {'': ['a']}),
6663
("&=a", ValueError("bad query field: ''")),
@@ -75,8 +72,6 @@ def do_test(buf, method):
7572
("a=a+b&b=b+c", {'a': ['a b'], 'b': ['b c']}),
7673
("a=a+b&a=b+a", {'a': ['a b', 'b a']}),
7774
("x=1&y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
78-
("x=1;y=2.0&z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
79-
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
8075
("Hbc5161168c542333633315dee1182227:key_store_seqid=400006&cuyer=r&view=bustomer&order_id=0bb2e248638833d48cb7fed300000f1b&expire=964546263&lobale=en-US&kid=130003.300038&ss=env",
8176
{'Hbc5161168c542333633315dee1182227:key_store_seqid': ['400006'],
8277
'cuyer': ['r'],
@@ -180,6 +175,30 @@ def test_strict(self):
180175
else:
181176
self.assertEqual(fs.getvalue(key), expect_val[0])
182177

178+
def test_separator(self):
179+
parse_semicolon = [
180+
("x=1;y=2.0", {'x': ['1'], 'y': ['2.0']}),
181+
("x=1;y=2.0;z=2-3.%2b0", {'x': ['1'], 'y': ['2.0'], 'z': ['2-3.+0']}),
182+
(";", ValueError("bad query field: ''")),
183+
(";;", ValueError("bad query field: ''")),
184+
("=;a", ValueError("bad query field: 'a'")),
185+
(";b=a", ValueError("bad query field: ''")),
186+
("b;=a", ValueError("bad query field: 'b'")),
187+
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
188+
("a=a+b;a=b+a", {'a': ['a b', 'b a']}),
189+
]
190+
for orig, expect in parse_semicolon:
191+
env = {'QUERY_STRING': orig}
192+
fs = cgi.FieldStorage(separator=';', environ=env)
193+
if isinstance(expect, dict):
194+
for key in expect.keys():
195+
expect_val = expect[key]
196+
self.assertIn(key, fs)
197+
if len(expect_val) > 1:
198+
self.assertEqual(fs.getvalue(key), expect_val)
199+
else:
200+
self.assertEqual(fs.getvalue(key), expect_val[0])
201+
183202
def test_log(self):
184203
cgi.log("Testing")
185204

Lib/test/test_urlparse.py

Lines changed: 46 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -32,16 +32,10 @@
3232
(b"&a=b", [(b'a', b'b')]),
3333
(b"a=a+b&b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
3434
(b"a=1&a=2", [(b'a', b'1'), (b'a', b'2')]),
35-
(";", []),
36-
(";;", []),
37-
(";a=b", [('a', 'b')]),
38-
("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
39-
("a=1;a=2", [('a', '1'), ('a', '2')]),
40-
(b";", []),
41-
(b";;", []),
42-
(b";a=b", [(b'a', b'b')]),
43-
(b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
44-
(b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
35+
(";a=b", [(';a', 'b')]),
36+
("a=a+b;b=b+c", [('a', 'a b;b=b c')]),
37+
(b";a=b", [(b';a', b'b')]),
38+
(b"a=a+b;b=b+c", [(b'a', b'a b;b=b c')]),
4539
]
4640

4741
# Each parse_qs testcase is a two-tuple that contains
@@ -68,16 +62,10 @@
6862
(b"&a=b", {b'a': [b'b']}),
6963
(b"a=a+b&b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
7064
(b"a=1&a=2", {b'a': [b'1', b'2']}),
71-
(";", {}),
72-
(";;", {}),
73-
(";a=b", {'a': ['b']}),
74-
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
75-
("a=1;a=2", {'a': ['1', '2']}),
76-
(b";", {}),
77-
(b";;", {}),
78-
(b";a=b", {b'a': [b'b']}),
79-
(b"a=a+b;b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
80-
(b"a=1;a=2", {b'a': [b'1', b'2']}),
65+
(";a=b", {';a': ['b']}),
66+
("a=a+b;b=b+c", {'a': ['a b;b=b c']}),
67+
(b";a=b", {b';a': [b'b']}),
68+
(b"a=a+b;b=b+c", {b'a':[ b'a b;b=b c']}),
8169
]
8270

8371
class UrlParseTestCase(unittest.TestCase):
@@ -884,10 +872,46 @@ def test_parse_qsl_encoding(self):
884872
def test_parse_qsl_max_num_fields(self):
885873
with self.assertRaises(ValueError):
886874
urllib.parse.parse_qs('&'.join(['a=a']*11), max_num_fields=10)
887-
with self.assertRaises(ValueError):
888-
urllib.parse.parse_qs(';'.join(['a=a']*11), max_num_fields=10)
889875
urllib.parse.parse_qs('&'.join(['a=a']*10), max_num_fields=10)
890876

877+
def test_parse_qs_separator(self):
878+
parse_qs_semicolon_cases = [
879+
(";", {}),
880+
(";;", {}),
881+
(";a=b", {'a': ['b']}),
882+
("a=a+b;b=b+c", {'a': ['a b'], 'b': ['b c']}),
883+
("a=1;a=2", {'a': ['1', '2']}),
884+
(b";", {}),
885+
(b";;", {}),
886+
(b";a=b", {b'a': [b'b']}),
887+
(b"a=a+b;b=b+c", {b'a': [b'a b'], b'b': [b'b c']}),
888+
(b"a=1;a=2", {b'a': [b'1', b'2']}),
889+
]
890+
for orig, expect in parse_qs_semicolon_cases:
891+
with self.subTest(f"Original: {orig!r}, Expected: {expect!r}"):
892+
result = urllib.parse.parse_qs(orig, separator=';')
893+
self.assertEqual(result, expect, "Error parsing %r" % orig)
894+
895+
896+
def test_parse_qsl_separator(self):
897+
parse_qsl_semicolon_cases = [
898+
(";", []),
899+
(";;", []),
900+
(";a=b", [('a', 'b')]),
901+
("a=a+b;b=b+c", [('a', 'a b'), ('b', 'b c')]),
902+
("a=1;a=2", [('a', '1'), ('a', '2')]),
903+
(b";", []),
904+
(b";;", []),
905+
(b";a=b", [(b'a', b'b')]),
906+
(b"a=a+b;b=b+c", [(b'a', b'a b'), (b'b', b'b c')]),
907+
(b"a=1;a=2", [(b'a', b'1'), (b'a', b'2')]),
908+
]
909+
for orig, expect in parse_qsl_semicolon_cases:
910+
with self.subTest(f"Original: {orig!r}, Expected: {expect!r}"):
911+
result = urllib.parse.parse_qsl(orig, separator=';')
912+
self.assertEqual(result, expect, "Error parsing %r" % orig)
913+
914+
891915
def test_urlencode_sequences(self):
892916
# Other tests incidentally urlencode things; test non-covered cases:
893917
# Sequence and object values.

Lib/urllib/parse.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -644,7 +644,7 @@ def unquote(string, encoding='utf-8', errors='replace'):
644644

645645

646646
def parse_qs(qs, keep_blank_values=False, strict_parsing=False,
647-
encoding='utf-8', errors='replace', max_num_fields=None):
647+
encoding='utf-8', errors='replace', max_num_fields=None, separator='&'):
648648
"""Parse a query given as a string argument.
649649
650650
Arguments:
@@ -668,12 +668,15 @@ def parse_qs(qs, keep_blank_values=False, strict_parsing=False,
668668
max_num_fields: int. If set, then throws a ValueError if there
669669
are more than n fields read by parse_qsl().
670670
671+
separator: str. The symbol to use for separating the query arguments.
672+
Defaults to &.
673+
671674
Returns a dictionary.
672675
"""
673676
parsed_result = {}
674677
pairs = parse_qsl(qs, keep_blank_values, strict_parsing,
675678
encoding=encoding, errors=errors,
676-
max_num_fields=max_num_fields)
679+
max_num_fields=max_num_fields, separator=separator)
677680
for name, value in pairs:
678681
if name in parsed_result:
679682
parsed_result[name].append(value)
@@ -683,7 +686,7 @@ def parse_qs(qs, keep_blank_values=False, strict_parsing=False,
683686

684687

685688
def parse_qsl(qs, keep_blank_values=False, strict_parsing=False,
686-
encoding='utf-8', errors='replace', max_num_fields=None):
689+
encoding='utf-8', errors='replace', max_num_fields=None, separator='&'):
687690
"""Parse a query given as a string argument.
688691
689692
Arguments:
@@ -706,19 +709,25 @@ def parse_qsl(qs, keep_blank_values=False, strict_parsing=False,
706709
max_num_fields: int. If set, then throws a ValueError
707710
if there are more than n fields read by parse_qsl().
708711
712+
separator: str. The symbol to use for separating the query arguments.
713+
Defaults to &.
714+
709715
Returns a list, as G-d intended.
710716
"""
711717
qs, _coerce_result = _coerce_args(qs)
712718

719+
if not separator or (not isinstance(separator, (str, bytes))):
720+
raise ValueError("Separator must be of type string or bytes.")
721+
713722
# If max_num_fields is defined then check that the number of fields
714723
# is less than max_num_fields. This prevents a memory exhaustion DOS
715724
# attack via post bodies with many fields.
716725
if max_num_fields is not None:
717-
num_fields = 1 + qs.count('&') + qs.count(';')
726+
num_fields = 1 + qs.count(separator)
718727
if max_num_fields < num_fields:
719728
raise ValueError('Max number of fields exceeded')
720729

721-
pairs = [s2 for s1 in qs.split('&') for s2 in s1.split(';')]
730+
pairs = [s1 for s1 in qs.split(separator)]
722731
r = []
723732
for name_value in pairs:
724733
if not name_value and not strict_parsing:
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Fix web cache poisoning vulnerability by defaulting the query args separator to ``&``, and allowing the user to choose a custom separator.

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy