-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Allow lowercase hexadecimal characters in base64.b16decode() #79738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Currently, the The revision itself is straightforward. We simply have to amend the regular expression to match the lowercase characters a-f in addition to A-F. Likewise the corresponding tests in Lib/base64.py also need to be changed to account for the lack of a second argument. Therefore there are two files total which need to be refactored. In my view, there are several compelling reasons for this change:
There are two arguments against this patch, as far as I can see it:
As I mentioned, I have already written the changes on my own patch branch. I'll open a pull request once this issue has been created and reference the issue in the pull request on GitHub. References: |
Thanks for the report. A couple of points as below :
I looked for some more inefficiencies and I can see re.search for every run. Perhaps re.compile can be used to store the compiled regex at module level and then to match against the string. This makes the function 25% faster without changing the interface. In case casefold=False then an extra call to make the string upper case is avoided giving some more benefit. With re.search inside the function $ python3.7 -m perf timeit -s 'import base64; hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"' 'base64.b16decode(hex_data, casefold=True)'
.....................
Mean +- std dev: 3.08 us +- 0.22 us
$ python3.7 -m perf timeit -s 'import base64; hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca".upper()' 'base64.b16decode(hex_data)'
.....................
Mean +- std dev: 2.93 us +- 0.20 us With the regex compiled to a variable at the module level $ python3.7 -m perf timeit -s 'import base64; hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"' 'base64.b16decode(hex_data, casefold=True)'
.....................
Mean +- std dev: 2.08 us +- 0.15 us
$ python3.7 -m perf timeit -s 'import base64; hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca".upper()' 'base64.b16decode(hex_data)'
.....................
Mean +- std dev: 1.98 us +- 0.17 us Since this is a comparison of fixed set of elements I tried using a set of elements and any to short-circuit but it seems to be slower $ python3.7 -m perf timeit -s 'import base64; hex_data="806903d098eb50957b1b376385f233bb3a5d54f54191c8536aefee21fc9ba3ca"' 'base64.b16decode(hex_data, casefold=True)'
.....................
Mean +- std dev: 8.21 us +- 0.66 us I am opening a PR to use the compiled regex at the module level since I see it as a net win of 25-30% without any interface change or test case changes required. |
This is compatibility breaking change. Furthermore, it changes the purposed behavior that was from the initial version (4c904d1). Since currently there is an option which allows to accept lowercase hexadecimal characters, I do not see a need in this change. You can also use bytes.fromhex(). |
Karthikeyan, Thank you for taking the time to respond so thoroughly. In particular, in the future I'll be more careful to qualify and quantify potential performance improvements using That being said, as I mentioned the primary motivation for this is not a performance improvement - I just felt that was a nice potential side effect. Rather, this enhancement brings However I can definitely understand what you and Serhiy are saying about this being a breaking change. Therefore I'd like to amend my proposal to the following:
I've altered this issue to reflect my amended proposal, targeting only version 3.8 and editing the type to be behavior instead of performance. In this way, the change will still make Naturally there would be additional logic that enforces the case sensitivity if If this change is considered agreeable, I will amend my open pull request to roll back the breaking change and refactor the way |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: