Skip to content

Copy-on-Write (PDEP-7) follow-up overview issue #48998

@jorisvandenbossche

Description

@jorisvandenbossche

PDEP-7: https://pandas.pydata.org/pdeps/0007-copy-on-write.html

An initial implementation was merged in #46958 (with the proposal described in more detail in https://docs.google.com/document/d/1ZCQ9mx3LBMy-nhwRl33_jgcvWo9IWdEfxDNQ2thyTb0/edit / discussed in #36195).

In #36195 (comment) I mentioned some next steps that are still needed; moving this to a new issue.

Implementation

Complete the API surface:

Improve the performance

  • Optimize setitem operations to prevent copies of whole blocks (eg splitting the block could help keeping a view for all other columns, and we only take a copy for the columns that are modified) where splitting the block could keep a view for all other columns, and
  • Check overall performance impact (eg run asv with / without CoW enabled by default and see the difference)

Provide upgrade path:

  • Add a warning mode that gives deprecation warnings for all cases where the current behaviour would change (initially also behind an option): CoW warning mode for cases that will change behaviour #56019
    • We can also update the message of the existing SettingWithCopyWarnings to point users towards enabling CoW as a way to get rid of the warnings
    • Add a general FutureWarning "on first use that would change" that is only raised a single time

Documentation / feedback

Aside from finalizing the implementation, we also need to start documenting this, and it will be super useful to have people give this a try, run their code or test suites with it, etc, so we can iron out bugs / missing warnings / or discover unexpected consequences that need to be addressed/discussed.

  • Document this new feature (how it works, how you can test it)
  • We can still add a note to the 1.5 whatsnew linking to those docs
  • Write a set of blogposts on the topic
  • Gather feedback from users / downstream packages
  • Update existing documentation:
  • Write an upgrade guide

Some remaining aspects of the API to figure out:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      pFad - Phonifier reborn

      Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

      Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


      Alternative Proxies:

      Alternative Proxy

      pFad Proxy

      pFad v3 Proxy

      pFad v4 Proxy