-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: Add CPU feature detection for POWER10 (VSX4) #20821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Does this enable a speed bump in benchmarks for that architecture? |
Hi @mattip. Yes, it does. As an example, we have speedup between 15-25% for floor_divide (unsigned types). We have something similar for the operation 'remainder'. But notice that only with the modifications of this PR the optimizations described above are not enabled. My plan is to submit other PRs separately with the optimizations I am working on for the POWER architectures (POWER8, POWER9, and POWER10). Thanks! |
Could you add this to the documentation here (VSX3 is also missing). Also documentation is missing in the rendered table which is generate by this code, I wonder why VSX4 does not show up there. Is something missing for Power10? |
This should probably also get an enhancement release note |
Hi @mattip,
The table you pointed out shows the minimum CPU features (MIN) that can safely run on a wide range of platforms. For POWER architecture, VSX2/Power8 is defined as the minimum CPU features (MIN), which means that the documentation is correct, thus no modification is required. See below the MIN that is defined for POWER and also for the other architectures (defined here: link).
VSX4 does not show up there because I did not execute the script you referred :-). After executing the script, I see VSX4 (Power10) added to the table. I just pushed a commit with the changes of the script to update the table. Notice that the same was done in this PR (#20552). Still related to PR #20552 I found some additional tests there that I also included in this PR (see here: link).
This is a great idea! But I think that it would be more valuable to add this within 3~4 weeks as I will be adding optimizations to accelerate some operations for VSX4/Power10. Is it ok if I proceed this way? Thanks again! |
Just to be sure I opened #20849 with a 1.23 milestone. Now we have to remember to close it when you submit the next PR :) |
Thanks @rafaelcfsousa |
Hi-five on merging your first pull request to NumPy, @rafaelcfsousa! We hope you stick around! Your choices aren’t limited to programming – you can review pull requests, help us stay on top of new and old issues, develop educational material, work on our website, add or improve graphic design, create marketing materials, translate website content, write grant proposals, and help with other fundraising initiatives. For more info, check out: https://numpy.org/contribute/. |
This PR adds CPU feature detection for POWER10 (VSX4). It will allow us to accelerate some NumPy operations, e.g., floor_divide, using the new SIMD builtins available on POWER10.
This PR follows the instructions presented here: #19537