-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
TypeError when parsing regexp with unicode named character sequence escape #90568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
re.compile(r"\N{name of Unicode Named Character Sequence}"), e.g. re.compile(r"\N{KEYCAP NUMBER SIGN}"), throws a TypeError. The regular expression parser relies on 'unicodedata' to lookup character names. The 'unicodedata' module recently added support for Unicode Named Character Sequences (https://www.unicode.org/Public/13.0.0/ucd/NamedSequences.txt). Trying to use these named character sequences in a regular expression leads to a 'TypeError', as the regexp parser tries to call 'ord' on a string with length > 1. |
They're not supported in string literals either: Python 3.10.1 (tags/v3.10.1:2cd268a, Dec 6 2021, 19:10:37) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> "\N{KEYCAP NUMBER SIGN}"
File "<stdin>", line 1
"\N{KEYCAP NUMBER SIGN}"
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-21: unknown Unicode character name |
>>> import unicodedata
>>> unicodedata.lookup('KEYCAP NUMBER SIGN')
'#️'
>>> print(ascii(unicodedata.lookup('KEYCAP NUMBER SIGN')))
'#\ufe0f\u20e3' Support of Unicode Named Character Sequences in the unicodeescape codec and in the RE parser would be a new feature. |
re.error is now raised instead of TypeError.
Support of named sequence in One of reasons is that |
…1665) re.error is now raised instead of TypeError.
…e in RE (pythonGH-91665) re.error is now raised instead of TypeError.. (cherry picked from commit 6ccfa31) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
… in RE (pythonGH-91665) (pythonGH-91830) (pythonGH-91834) re.error is now raised instead of TypeError. (cherry picked from commit 6ccfa31) (cherry picked from commit 9c18d78) Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: