Skip to content

DOC: Update documentation for using natural sort with sort_values #61979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

marc-jones
Copy link
Contributor

@marc-jones marc-jones commented Jul 28, 2025

The previous documentation recommended to use the lambda function lambda x: np.argsort(index_natsorted(x)) as a key argument to sort_values. However, while this works when sorting on a single column, it causes incorrect sorting when sorting multiple columns with duplicated values. For example:

>>> df = pd.DataFrame(
...     {
...         "hours": ["0hr", "128hr", "0hr", "64hr", "64hr", "128hr"],
...         "mins": ["10mins", "40mins", "40mins", "40mins", "10mins", "10mins"],
...         "value": [10, 20, 30, 40, 50, 60],
...     }
... )
>>> df
   hours    mins  value
0    0hr  10mins     10
1  128hr  40mins     20
2    0hr  40mins     30
3   64hr  40mins     40
4   64hr  10mins     50
5  128hr  10mins     60
>>> from natsort import index_natsorted
>>> df.sort_values(
...     by=["hours", "mins"],
...     key=lambda x: np.argsort(index_natsorted(x)),
... )
   hours    mins  value
0    0hr  10mins     10
2    0hr  40mins     30
3   64hr  40mins     40
4   64hr  10mins     50
1  128hr  40mins     20
5  128hr  10mins     60

Note how the hours column is sorted correctly, but the mins column isn't.

This PR updates the documentation to use natsort_keygen, which is robust to sorting on multiple columns.

Commit 2: Removes the calls to natsort_keygen() in the example code as the output generated was too long and doctest didn't seem to like having the tuple formatted.

@marc-jones marc-jones force-pushed the marc-jones/sort_values-documentation branch 3 times, most recently from 5f545f3 to 73852fa Compare July 28, 2025 10:38
@marc-jones marc-jones force-pushed the marc-jones/sort_values-documentation branch from 73852fa to 2a2cddf Compare July 28, 2025 10:38
@mroeschke mroeschke added the Docs label Jul 28, 2025
@mroeschke mroeschke added this to the 3.0 milestone Jul 28, 2025
@mroeschke mroeschke merged commit 94d9d2e into pandas-dev:main Jul 28, 2025
47 checks passed
@mroeschke
Copy link
Member

Thanks @marc-jones

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy