Skip to content

FIX FunctionTransformer.get_feature_names_out when output is set to dataframe #31573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

osyuksel
Copy link

@osyuksel osyuksel commented Jun 17, 2025

Reference Issues/PRs

Fixes #28780

What does this implement/fix? Explain your changes.

When FunctionTransformer:

  • has feature_names_out set to None
  • has set_output called with "pandas" or "polars"
  • is fitted

Then get_feature_names_out() should return the output dataframe's columns. However, it raises an AttributeError instead.

A specific example, copied from the issue's discussions:

from sklearn.preprocessing import FunctionTransformer
import pandas as pd

my_transformer = FunctionTransformer(
    lambda X : pd.DataFrame(
        {
            f"{str(col)}^{power}" : X[col]**power
            for col in X
            for power in range(2,4)
        }
    )
   # no features_names_out
)
X = pd.DataFrame({
    "feature 1" : [1,2,3,4,5],
    "feature 2" : [3,4,5,6,7]
})
my_transformer.set_output(transform="pandas")
my_transformer.fit_transform(X)
# raises: AttributeError: This 'FunctionTransformer' has no attribute 'get_feature_names_out'
my_transformer.get_feature_names_out()

Any other comments?

To get the output column names, I saw three options:

  1. Retrieve them when fit is called by running func on the fitted data
  2. Retrieve them when transform is called by recording the output columns.
  3. Construct a dummy dataframe when get_feature_names_out is called with feature names, and apply the function to it.

(1) means calling the function twice during a fit_transform, whereas (2) means making transform a stateful operation. (3) means that the specific example above, when called without an argument, would still fail.

I went with (1): during fit, call func with a smaller slice of the input in order to get output dataframe column names. If that's not desirable let me know and I can change the approach.

The fix works both for pandas and polars dataframes.

Copy link

github-actions bot commented Jun 17, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 108a63f. Link to the linter CI: here

@osyuksel osyuksel changed the title FIX FunctionTransformer get_feature_names_out when fitted with a dataframe FIX FunctionTransformer's get_feature_names_out when fitted with a dataframe Jun 17, 2025
@osyuksel osyuksel changed the title FIX FunctionTransformer's get_feature_names_out when fitted with a dataframe FIX FunctionTransformer's get_feature_names_out when output is set to dataframe Jun 17, 2025
@osyuksel osyuksel force-pushed the fix-functransformer-df-output branch from d48690e to 158d7ed Compare June 18, 2025 15:09
@osyuksel osyuksel marked this pull request as ready for review June 18, 2025 15:12
@osyuksel osyuksel changed the title FIX FunctionTransformer's get_feature_names_out when output is set to dataframe FIX FunctionTransformer.get_feature_names_out when output is set to dataframe Jun 18, 2025
Comment on lines 470 to 473
if _is_polars_df(X) or _is_pandas_df(X):
head = X.head(1)
else:
head = X[:1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use _safe_indexing instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Updated code

head = X[:1]

head_out = self.func(head)
if _is_polars_df(head_out) or _is_pandas_df(head_out):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to check for pandas or polars, do we? It's done in _get_feature_names

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, missed that. Updated code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FunctionTransformer need feature_names_out even if func returns DataFrame
2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy