Skip to content

ENH: Add CPU feature detection for POWER10 (VSX4) #20821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 18, 2022

Conversation

rafaelcfsousa
Copy link
Contributor

This PR adds CPU feature detection for POWER10 (VSX4). It will allow us to accelerate some NumPy operations, e.g., floor_divide, using the new SIMD builtins available on POWER10.

This PR follows the instructions presented here: #19537

@mattip
Copy link
Member

mattip commented Jan 14, 2022

Does this enable a speed bump in benchmarks for that architecture?

@rafaelcfsousa
Copy link
Contributor Author

rafaelcfsousa commented Jan 14, 2022

Does this enable a speed bump in benchmarks for that architecture?

Hi @mattip. Yes, it does. As an example, we have speedup between 15-25% for floor_divide (unsigned types). We have something similar for the operation 'remainder'. But notice that only with the modifications of this PR the optimizations described above are not enabled. My plan is to submit other PRs separately with the optimizations I am working on for the POWER architectures (POWER8, POWER9, and POWER10). Thanks!

@mattip
Copy link
Member

mattip commented Jan 16, 2022

Could you add this to the documentation here (VSX3 is also missing). Also documentation is missing in the rendered table which is generate by this code, I wonder why VSX4 does not show up there. Is something missing for Power10?

@mattip
Copy link
Member

mattip commented Jan 16, 2022

This should probably also get an enhancement release note

@mattip mattip added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Jan 16, 2022
@rafaelcfsousa
Copy link
Contributor Author

rafaelcfsousa commented Jan 18, 2022

Hi @mattip,

Could you add this to the documentation here (VSX3 is also missing).

The table you pointed out shows the minimum CPU features (MIN) that can safely run on a wide range of platforms. For POWER architecture, VSX2/Power8 is defined as the minimum CPU features (MIN), which means that the documentation is correct, thus no modification is required. See below the MIN that is defined for POWER and also for the other architectures (defined here: link).

conf_min_features = dict( 
        x86 = "SSE SSE2",
        x64 = "SSE SSE2 SSE3",
        ppc64 = '', # play it safe
        ppc64le = "VSX VSX2",
        s390x = '',
        armhf = '', # play it safe
        aarch64 = "NEON NEON_FP16 NEON_VFPV4 ASIMD"
    )

Also documentation is missing in the rendered table which is generate by this code, I wonder why VSX4 does not show up there. Is something missing for Power10?

VSX4 does not show up there because I did not execute the script you referred :-). After executing the script, I see VSX4 (Power10) added to the table. I just pushed a commit with the changes of the script to update the table. Notice that the same was done in this PR (#20552).

Still related to PR #20552 I found some additional tests there that I also included in this PR (see here: link).

This should probably also get an enhancement release note

This is a great idea! But I think that it would be more valuable to add this within 3~4 weeks as I will be adding optimizations to accelerate some operations for VSX4/Power10. Is it ok if I proceed this way?

Thanks again!

@mattip
Copy link
Member

mattip commented Jan 18, 2022

Is it ok if I proceed this way?

Just to be sure I opened #20849 with a 1.23 milestone. Now we have to remember to close it when you submit the next PR :)

@mattip mattip merged commit f9c4596 into numpy:main Jan 18, 2022
@mattip
Copy link
Member

mattip commented Jan 18, 2022

Thanks @rafaelcfsousa

@InessaPawson
Copy link
Member

Hi-five on merging your first pull request to NumPy, @rafaelcfsousa! We hope you stick around! Your choices aren’t limited to programming – you can review pull requests, help us stay on top of new and old issues, develop educational material, work on our website, add or improve graphic design, create marketing materials, translate website content, write grant proposals, and help with other fundraising initiatives. For more info, check out: https://numpy.org/contribute/.
Also, consider joining our mailing list. This is a great way to connect with other cool people in our community and be part of important conversations that affect the development of NumPy: https://mail.python.org/mailman/listinfo/numpy-discussion

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy