Skip to content

adding pandas.api.typing.aliases and docs #61735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Jun 29, 2025

This is my first proposal for adding the typing aliases that are "public" so that people do not import from pandas._typing.

@Dr-Irv Dr-Irv requested a review from rhshadrach June 29, 2025 03:38
@simonjayhawkins simonjayhawkins added the Typing type annotations, mypy/pyright type checking label Jun 30, 2025
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are to make these public, what is the process of making changes to them?

.. currentmodule:: pandas.api.atyping.aliases

The typing declarations in ``pandas/_typing.py`` are considered private, and used
by pandasdevelopers for type checking of the pandascode base. For users, it is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
by pandasdevelopers for type checking of the pandascode base. For users, it is
by pandas developers for type checking of the pandas code base. For users, it is

This also occurs more times below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next commit

@@ -83,6 +83,7 @@ Other enhancements
- Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`).
- Added half-year offset classes :class:`HalfYearBegin`, :class:`HalfYearEnd`, :class:`BHalfYearBegin` and :class:`BHalfYearEnd` (:issue:`60928`)
- Added support to read and write from and to Apache Iceberg tables with the new :func:`read_iceberg` and :meth:`DataFrame.to_iceberg` functions (:issue:`61383`)
- Certain aliases from :py:mod:`pandas._typing` are now exposed in :py:mod:`pandas.api.typing.aliases` (:issue:`55231`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not advertising where they come from.

Suggested change
- Certain aliases from :py:mod:`pandas._typing` are now exposed in :py:mod:`pandas.api.typing.aliases` (:issue:`55231`)
- Many type aliases are now exposed in the new submodule :py:mod:`pandas.api.typing.aliases` (:issue:`55231`)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in next commit

Axes,
Axis,
ColspaceArgType,
CompressionOptions,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many type aliases here where it is not clear what method(s) they are appropriate for. E.g. it would be wrong to use this for DataFrame.to_parquet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to cover that in the docs, without getting too specific. I can make the docs more specific, although there are cases where the aliases are used in lots of methods, so the list can get quite long. E.g., for CompressionOptions, I said "Argument type for compression in many I/O output methods" .

Open to suggestions as to how to better document this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only resolution I see is to introduce more aliases, e.g. ParquetCompressionOptions and CsvCompressionOptions. This would be my preference, but I can understand if there is an aversion to this.

In any case, if we deem something to be not "sufficiently good" I think we should refrain from releasing something new. That is my take on some of the aliases here, but I won't block if I'm alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current aliases follow what's in the code. So in your example, right now the type for compression in to_parquet() is str | None, while for to_csv() it is CompressionOptions. If we improve the typing in the code, then we can improve it here by introducing new aliases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't those improvements be made prior to making them public?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 1, 2025

If we are to make these public, what is the process of making changes to them?

My suggestion would be that if someone adds an alias to pandas._typing.py that is used as an argument or return type of a documented pandas method, then they should update the pandas/api/typing/aliases.py file and doc/source/reference/aliases.rst . Should I add something to the contributors guide about that?

@rhshadrach
Copy link
Member

@Dr-Irv - my question is about how do we go about changing the definition of aliases that we have already made public, not about adding new aliases.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 1, 2025

@Dr-Irv - my question is about how do we go about changing the definition of aliases that we have already made public, not about adding new aliases.

We just edit pandas._typing.py and we don't have to make changes elsewhere. Am I still misunderstanding your question?

@rhshadrach
Copy link
Member

We just edit pandas._typing.py and we don't have to make changes elsewhere. Am I still misunderstanding your question?

And break user code without warning? Can we introduce such breakages in minor or patch releases? While most breakages I would expect to be of a type-checking nature and therefore an annoyance, type-hints can be enforced in runtime and changes in this regard can introduce runtime breakages as well.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 2, 2025

We just edit pandas._typing.py and we don't have to make changes elsewhere. Am I still misunderstanding your question?

And break user code without warning? Can we introduce such breakages in minor or patch releases? While most breakages I would expect to be of a type-checking nature and therefore an annoyance, type-hints can be enforced in runtime and changes in this regard can introduce runtime breakages as well.

I am pretty sure we can change the definition of an alias without breaking user code, unless people do introspection on those aliases, which is not a supported usage of aliases anyway. For example, let's say we implement a new sorting algorithm and change SortKind to include the new sorting method, user code won't break.

If we deleted or renamed an alias, then user code could potentially break. But at least my observation has been (by getting alerts to when anyone makes PRs that change pandas._typing.py) that we don't make such changes to pandas._typing.py (which would then propagate to pandas.api.typing.aliases).

The renaming issue probably exists for everything in pandas.api.typing - have we committed to those names as well?

@rhshadrach
Copy link
Member

For example, let's say we implement a new sorting algorithm... user code won't break.

Or remove or rename an existing sorting algorithm?

unless people do introspection on those aliases, which is not a supported usage of aliases anyway

I think you're saying we don't support the enforcement of pandas type-aliases at runtime (e.g. use with Pydantic), is that right? Is this documented?

But at least my observation has been... that we don't [delete or rename type aliases]

That's fine, but I'm -1 here until we have a plan that is documented about how we would do so if such a case were to come up. I'm very flexible on what that plan could be, but there needs to be a plan.

The renaming issue probably exists for everything in pandas.api.typing - have we committed to those names as well?

These are public classes and need to go through the usual deprecation cycle if we were to remove or rename.

by pandas developers for type checking of the pandas code base. For users, it is
highly recommended to use the ``pandas-stubs`` package that represents the officially
supported type declarations for users of pandas.
Note that the definitions and use cases of these aliases are subject to change.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is implying that they are subject to change without any user notice. If that is the case, can this be made more explicit and put in a .. warning:: box. Perhaps something like

... are subject to change without notice in any major, minor, or patch release of pandas.

I would also be okay with only saying major or minor; it seems okay to me saying we can promise not to make changes in patch releases.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 2, 2025

For example, let's say we implement a new sorting algorithm... user code won't break.

Or remove or rename an existing sorting algorithm?

So if we were to change the runtime allowable string for a sorting algorithm, e.g., "quicksort" becomes "Quicksort" or we were to remove "heapsort" from SortKind, and someone was using either "quicksort" or "heapsort" in their code, the code would fail at runtime. But that is independent of the alias changing its definition. In fact, if we updated the alias to do the renaming and/or removal, the type checker would pick up the change. My point here is that if we change the definition of the alias, if a user is not using the alias, their runtime code would break. If they were using the alias, which presumably would be for type checking, the type checker would pick it up for them.

unless people do introspection on those aliases, which is not a supported usage of aliases anyway

I think you're saying we don't support the enforcement of pandas type-aliases at runtime (e.g. use with Pydantic), is that right? Is this documented?

The code is inconsistent. Sometimes we check that the arguments are of the right possible values, sometimes we don't. But it is not related to the aliases themselves. My sense is that we shouldn't document this at all. We say that the aliases are for type checking.

But at least my observation has been... that we don't [delete or rename type aliases]

That's fine, but I'm -1 here until we have a plan that is documented about how we would do so if such a case were to come up. I'm very flexible on what that plan could be, but there needs to be a plan.

I think we have to treat them like we do other code changes. Not sure where to document that.

The renaming issue probably exists for everything in pandas.api.typing - have we committed to those names as well?

These are public classes and need to go through the usual deprecation cycle if we were to remove or rename.

So we can do that if we decide to rename or delete an alias, right?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 2, 2025

Also worth mentioning that @simonjayhawkins suggested making this "experimental" in #55231 (comment) although I'm not sure that's the right word here. I think the warning you suggested cover this, and I have added that in the most recent commit.

@rhshadrach
Copy link
Member

I think we have to treat [changes to type aliases] like we do other code changes.

I do not think this is possible. To my knowledge we have no process to warn users of the upcoming change to a type alias. This is unlike other parts of the pandas code where we can emit deprecation warnings, put behaviors behind flags, and the like. Happy to be wrong here; to make this explicit could you detail how we'd go about adding or removing a case to ArrayLike?

My sense is that we shouldn't document this at all. We say that the aliases are for type checking.

A large part of the community is also enforcing type-hints at runtime, e.g. via Pydantic. It seems to me if we are going to make these public, we should not handcuff users by disallowing this kind of usage.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 3, 2025

I think we have to treat [changes to type aliases] like we do other code changes.

I do not think this is possible. To my knowledge we have no process to warn users of the upcoming change to a type alias. This is unlike other parts of the pandas code where we can emit deprecation warnings, put behaviors behind flags, and the like. Happy to be wrong here; to make this explicit could you detail how we'd go about adding or removing a case to ArrayLike?

I don't think we have to notify in this case. TypeAlias is only used for type checking. There is nothing about the definition that affects runtime behavior.

A large part of the community is also enforcing type-hints at runtime, e.g. via Pydantic. It seems to me if we are going to make these public, we should not handcuff users by disallowing this kind of usage.

Yes, but I don't think you can enforce TypeAlias type-hints at runtime. You can enforce it on classes and basic python types, but not aliases.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 4, 2025

Yes, but I don't think you can enforce TypeAlias type-hints at runtime. You can enforce it on classes and basic python types, but not aliases.

For example - you can't call isinstance() on a TypeAlias:

>>> from pandas._typing import ArrayLike
>>> ArrayLike
typing.Union[ForwardRef('ExtensionArray'), numpy.ndarray]
>>> import numpy as np
>>> arr=np.array([1,2,3])
>>> isinstance(arr, np.ndarray)
True
>>> isinstance(arr, ArrayLike)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Condadirs\envs\pandasstubs\lib\typing.py", line 1260, in __instancecheck__
    return self.__subclasscheck__(type(obj))
  File "C:\Condadirs\envs\pandasstubs\lib\typing.py", line 1264, in __subclasscheck__
    if issubclass(cls, arg):
TypeError: issubclass() arg 2 must be a class, a tuple of classes, or a union

So these only have value in type declarations.

@rhshadrach
Copy link
Member

from pydantic_settings import BaseSettings
from pandas._typing import ArrayLike

class Foo(BaseSettings):
    x: ArrayLike

Foo(x=np.ndarray([1, 2]))  # Succeeds
Foo(x=1)  # ValidationError

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 4, 2025

from pydantic_settings import BaseSettings
from pandas._typing import ArrayLike

class Foo(BaseSettings):
    x: ArrayLike

Foo(x=np.ndarray([1, 2]))  # Succeeds
Foo(x=1)  # ValidationError

I’m without laptop for 2 weeks and on a plane about to take off but I’m pretty sure the type checkers would also flag this as an error.

I wouldn’t expect people to use the aliases without type checking turned on. So the error above would be caught before runtime, I.e. by the type checkers. So if we assume people importing an alias would type check their code before executing it, then we should be fine.

I’m fine to put in the docs something that explains that if you think that helps.

@jbrockmendel
Copy link
Member

This came up on today's dev call, where the closest I came to an opinion was "I will offer moral support to both Irv and Richard".

The idea came up of applying special backwards-compatibility rules to this file to the effect of "Warning: may change without warning" which I think is reasonable given the difficulty of doing deprecations here.

Also AFAICT most of these are lists of string literals which I'm just not going to lose sleep over libraries not having aliases for. That said, I'm happy with my default of "defer to Irv on anything stubs-adjacent".

@mroeschke
Copy link
Member

@jorenham would be interested to have your thoughts on our approach of exposing typing aliases and Numpy's approach too

@jorenham
Copy link

jorenham commented Jul 23, 2025

@jorenham would be interested to have your thoughts on our approach of exposing typing aliases and Numpy's approach too

Thanks for the ping :)


I see that there are a lot of type aliases. What will happen if at some later point you want to remove one of them? As far as I know, there's no good way to have them throw a warning at runtime when they're used, and on the static side there's also nothing like @deprecated that can be used for it. It's way easier to add type-aliases than to remove them. So my advice here would be is to limit the public types to the most commonly used ones that are battle-tested (and therefore likely to work as intended).


The first type I took a closer look at, AggFuncType, is one such example of a type that might not work as intended. This is how it is defined:

AggFuncTypeBase: TypeAlias = Callable | str
AggFuncTypeDict: TypeAlias = MutableMapping[
    Hashable, AggFuncTypeBase | list[AggFuncTypeBase]
]
AggFuncType: TypeAlias = AggFuncTypeBase | list[AggFuncTypeBase] | AggFuncTypeDict

First thing to note is that Callable is missing its required type arguments. Pyright, for example, will consequently fill in the missing type args as Unknown. Because of this, users that have pyright configured in strict mode will see a pyright error when they try to use AggFuncType.
The obvious way to avoid this category of problems is by (also) configuring your static type-checkers to run in strict mode, as can be seen on mypy-play and pyright-play.

The AggFuncTypeDict alias uses the list and MutableMapping types. Both have invariant type parameters. That means that list[AggFuncTypeBase], for example, will only accept things whose type is exactly list[AggFuncTypeBase], i.e. list[Callable | str]. So it will reject list[str], and it will reject list[Callable].
MutableMapping is also invariant in both its key- and value-type parameters. So dict[str, Any] will be rejected, because str is not equivalent to Hashable.

Since this was the first type-alias I looked at, I'm assuming that there are more types like this might not work as intended.


If I were in your shoes, I'd write a whole bunch of type-tests to verify that these types accept what you want them to accept, and that they reject what you want them to reject. For the types that you use a lot already (i.e. the battle tested ones), there's a smaller chance that they're not working as intended. So when it comes to making them public, and don't feel like writing type-tests, that's the ones I'd start with those.

Oh and in case you're wondering what I mean with those "type-tests", it's probably easiest to just look at some examples of those, e.g. in scipy-stubs or in numtype (a thorough rework of numpy's typing stubs with a focus on correctness).

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 23, 2025

If I were in your shoes, I'd write a whole bunch of type-tests to verify that these types accept what you want them to accept, and that they reject what you want them to reject. For the types that you use a lot already (i.e. the battle tested ones), there's a smaller chance that they're not working as intended. So when it comes to making them public, and don't feel like writing type-tests, that's the ones I'd start with those.

We do typing tests in pandas-stubs. We don't support strict type checking yet with pyright because the stubs go beyond what is in pandas. E.g., in the stubs, you can have Series[int] but also Series[Unknown] and pyright strict doesn't like the latter.

The goal of this PR was to expose some of the internal types used in the stubs (currently in pandas/_typing.py) into a public module.

So the question we really had for you @jorenham is not about the aliases themselves and how they are defined, but whether we should worry or not about deleting (or changing the definition) of the aliases in the future. There's not a way we can deprecate an alias in the context of type checking. Are you doing anything special in numpy to worry about how the aliases might be deleted or changed in the future?

@rhshadrach
Copy link
Member

@jorenham

As far as I know, there's no good way to have them throw a warning at runtime when they're used

Can deprecate with a module level __getattr__. I don't like doing it, but it's possible, and seems like an okay solution in this case.

@jorenham
Copy link

E.g., in the stubs, you can have Series[int] but also Series[Unknown] and pyright strict doesn't like the latter.

PEP 696 type parameter defaults could help with that :)

The goal of this PR was to expose some of the internal types used in the stubs (currently in pandas/_typing.py) into a public module.

Yea I get that, and I think it's a good idea, and that many users will be very happy about it. However, if those type aliases are not working as intended, then it might cause more problems than it solves.

It might help a bit if you explicitly state that pandas does not support strict mode. But that still leaves the issues with invariance, which are unrelated to type-checker configuration.

@jorenham
Copy link

There's not a way we can deprecate an alias in the context of type checking. Are you doing anything special in numpy to worry about how the aliases might be deleted or changed in the future?

That's indeed a very tricky thing. Especially if you consider that no type-checker would understand statements like if pd.__version__ < .... For libraries that support multiple pandas versions (e.g. because they follow SPEC 0), then they'd be in trouble if you change or rename a type alias.

In NumPy we recently deprecated numpy.typing.NBitBase, but I don't expect that we'll be able to remove that for a couple of years. FWIW; I noticed that even tiny backwards-incompatible typing changes can leads to a lot of frustrated users. I'm guessing that's because no one likes it if CI breaks after you update one of your libraries, even if the motivation behind it makes a lot of sense.

@jbrockmendel
Copy link
Member

Can deprecate with a module level __getattr__

Discussed on today's dev call, sounded like this might not work bc type checkers don't actually execute imports.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Jul 24, 2025

In NumPy we recently deprecated numpy.typing.NBitBase, but I don't expect that we'll be able to remove that for a couple of years.

@jorenham Did you instrument anything that provides some type of warning to users if someone was using numpy.typing.NBitBase in a typing context? If so, what did you do?

@jorenham
Copy link

In NumPy we recently deprecated numpy.typing.NBitBase, but I don't expect that we'll be able to remove that for a couple of years.

@jorenham Did you instrument anything that provides some type of warning to users if someone was using numpy.typing.NBitBase in a typing context? If so, what did you do?

Well, NBitBase is secretly not a type alias but a class that's pretending to be one. So we kinda got lucky in that sense. But that also means that it's probably not the best example of how to deprecate a type alias 😅.

Anyway, by exploiting the fact that it's a class, I was able to simply slap a @typing_extensions.deprecated onto it. That way, static type-checkers will report it as deprecated (although that's not enabled by default in mypy for some reason).

On the runtime side of things, I used the same __getattr__ approach that @rhshadrach mentioned, so that it'll report a DeprecationWarning when imported at runtime.

See numpy/numpy#28884 for details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Typing type annotations, mypy/pyright type checking
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Export (a subset of?) pandas._typing for type checking
6 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy