Skip to content

ENH Add array API for PolynomialFeatures #31580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

OmarManzoor
Copy link
Contributor

Reference Issues/PRs

Towards #26024

What does this implement/fix? Explain your changes.

  • Array api implementation for preprocessing.PolynomialFeatures

Any other comments?

@OmarManzoor OmarManzoor changed the title Array api poly features ENH Add array API for PolynomialFeatures Jun 18, 2025
Copy link

github-actions bot commented Jun 18, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 6620df5. Link to the linter CI: here

@OmarManzoor
Copy link
Contributor Author

OmarManzoor commented Jun 18, 2025

Some benchmarks:

Kaggle notebook

Avg fit time for numpy: 0.007334113121032715
Avg transform time for numpy: 3.54639573097229

Avg fit time for torch cuda: 0.050480985641479494
Avg transform time for torch cuda: 0.011217021942138672

from time import time

import numpy as np
import torch as xp
from tqdm import tqdm

from sklearn._config import config_context
from sklearn.preprocessing._polynomial import PolynomialFeatures

X_np = np.random.rand(100000, 100)
X_xp_cuda = xp.asarray(X_np, device="cuda")

# Numpy benchmarks
fit_times = []
transform_times = []
for _ in tqdm(range(10), desc="Numpy Flow"):
    start = time()
    pf_np = PolynomialFeatures(degree=2)
    pf_np.fit(X_np)
    fit_times.append(time() - start)

    start = time()
    pf_np.transform(X_np)
    transform_times.append(time() - start)

avg_fit_time = sum(fit_times) / 10
avg_transform_time = sum(transform_times) / 10
print(f"Avg fit time for numpy: {avg_fit_time}")
print(f"Avg transform time for numpy: {avg_transform_time}")


# Torch cuda benchmarks
fit_times = []
transform_times = []
for _ in tqdm(range(10), desc="Torch cuda Flow"):
    with config_context(array_api_dispatch=True):
        start = time()
        pf_xp = PolynomialFeatures(degree=2)
        pf_xp.fit(X_xp_cuda)
        fit_times.append(time() - start)

        start = time()
        pf_xp.transform(X_xp_cuda)
        transform_times.append(time() - start)

avg_fit_time = sum(fit_times) / 10
avg_transform_time = sum(transform_times) / 10
print(f"Avg fit time for torch cuda: {avg_fit_time}")
print(f"Avg transform time for torch cuda: {avg_transform_time}")

Local System with MPS (just changed device and dtype to float32 in the above code)

Avg fit time for numpy: 0.0025035619735717775
Avg transform time for numpy: 1.2987385749816895

Avg fit time for torch mps: 0.16063039302825927
Avg transform time for torch mps: 0.051826977729797365


I don't think we can expect any improvements (and actually some downgrade) in the fit time because I did not have to change anything in the fit part to support array api which means it can't really benefit from it. But the transform times are significantly better.

CC: @ogrisel @lesteve @lucyleeow @StefanieSenger for reviews

@OmarManzoor
Copy link
Contributor Author

The code coverage warning can be ignored because it is related to a special case for mps devices.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @OmarManzoor. This looks good to me.

I edited the benchmark as follows to insert a block call on the resulting array to force cuda synchronization. We still get a 36x speed-up on this data and runtime!

from time import time

import numpy as np
import torch as xp
from tqdm import tqdm

from sklearn._config import config_context
from sklearn.preprocessing._polynomial import PolynomialFeatures

X_np = np.random.rand(100000, 100)
X_xp_cuda = xp.asarray(X_np, device="cuda")

# Numpy benchmarks
fit_times = []
transform_times = []
n_iter = 10
for _ in tqdm(range(n_iter), desc="Numpy Flow"):
    start = time()
    pf_np = PolynomialFeatures(degree=2)
    pf_np.fit(X_np)
    fit_times.append(time() - start)

    start = time()
    pf_np.transform(X_np)
    transform_times.append(time() - start)

avg_fit_time_numpy = sum(fit_times) / n_iter
avg_transform_time_numpy = sum(transform_times) / n_iter
print(f"Avg fit time for numpy: {avg_fit_time_numpy:.3f}")
print(f"Avg transform time for numpy: {avg_transform_time_numpy:.3f}")


# Torch cuda benchmarks
fit_times = []
transform_times = []
for _ in tqdm(range(n_iter), desc="Torch cuda Flow"):
    with config_context(array_api_dispatch=True):
        start = time()
        pf_xp = PolynomialFeatures(degree=2)
        pf_xp.fit(X_xp_cuda)
        fit_times.append(time() - start)

        start = time()
        float(pf_xp.transform(X_xp_cuda)[0, 0])
        transform_times.append(time() - start)

avg_fit_time_cuda = sum(fit_times) / n_iter
avg_transform_time_cuda = sum(transform_times) / n_iter
print(
    f"Avg fit time for torch cuda: {avg_fit_time_cuda:.3f}, "
    f"speed-up: {avg_fit_time_numpy / avg_fit_time_cuda:.1f}x"
)
print(
    f"Avg transform time for torch cuda: {avg_transform_time_cuda:.3f} "
    f"speed-up: {avg_transform_time_numpy / avg_transform_time_cuda:.1f}x"
)
Numpy Flow: 100%|██████████| 10/10 [00:37<00:00,  3.70s/it]

Avg fit time for numpy: 0.008
Avg transform time for numpy: 3.695

Torch cuda Flow: 100%|██████████| 10/10 [00:01<00:00,  9.76it/s]

Avg fit time for torch cuda: 0.001, speed-up: 6.8x
Avg transform time for torch cuda: 0.100 speed-up: 36.8x

I think the supported_float_dtypes function could be simplified by leveraging the new inspection API. Otherwise, +1 for merge.

@ogrisel
Copy link
Member

ogrisel commented Jun 18, 2025

I also get a 5.5x speed-up over numpy using the MPS GPU on my M1 laptop (compared to your 25x speed-up on your MPS GPU).

@github-actions github-actions bot removed the CUDA CI label Jun 20, 2025
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more follow-up comment below.

Besides, the __sklearn_tags__ method should be updated to declare that this transformer supports array API inputs.

return (xp.float64, xp.float32, xp.float16)
else:
return (xp.float64, xp.float32)
valid_float_dtypes.append(xp.float16)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed? I think it can be wrong: some devices might not support float16 even when the namespace exposes it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array API specification does not support float16 which is why we have this condition. https://data-apis.org/array-api/latest/API_specification/data_types.html

kind="real floating", device=device
)
valid_float_dtypes = []
for dtype_key in ("float64", "float32"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for dtype_key in ("float64", "float32"):
for dtype_key in ("float64", "float32", "float16"):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy