Skip to content

gh-51067: add ZipFile.remove() #103033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

gh-51067: add ZipFile.remove() #103033

wants to merge 11 commits into from

Conversation

danny0838
Copy link

@danny0838 danny0838 commented Mar 25, 2023

This is a revision of #19358 (for issue #51067) as the original author seems not keeping working.

Notable changes:

  • Added docs and tests.

  • Support mode 'w' and 'x', as noted by remove/delete method for zipfile objects #51067 (comment)

  • Support removing multiple members and removing non-physically with the internal _remove_members method, as they may be used by some interested people, as noted by remove/delete method for zipfile objects #51067 (comment) and remove/delete method for zipfile objects #51067 (comment).

    They are not currently introduced in the public remove API, as it would involve more complicated changes to the public APIs (e.g. introducing error handling for multiple members, and a extra method that purges stale data by non-physical removing) and other ZipFile related APIs do not support similar operations.

  • Move physical data in chunks, to prevent a memory issue for large files.

  • Fixed a flaw of the previous implementation that self.NameToInfo gets a missing key when removing one of duplicated arcnames.


@ghost
Copy link

ghost commented Mar 25, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@danny0838 danny0838 force-pushed the gh-51067 branch 3 times, most recently from 76722fa to f3450f1 Compare March 26, 2023 12:37
@barneygale
Copy link
Contributor

Automatically compacting the .zip file seems like overkill. Suggestion: split this into two methods:

  • ZipInfo.remove() removes the record for the file, and by default zeroes out its data.
  • ZipFile.repack() reclaims free space.

The latter method is dangerous for self-extracting .zip files, which have an executable header before the the zip data begins. That header would be stripped out, I think.

@barneygale
Copy link
Contributor

barneygale commented Mar 30, 2023

Also, please don't force-push to an open PR. It makes it harder for reviewers to follow changes! Thanks

@arhadthedev arhadthedev added the stdlib Python modules in the Lib dir label Mar 30, 2023
bpepple added a commit to Metron-Project/darkseid that referenced this pull request Aug 31, 2024
bpepple added a commit to Metron-Project/darkseid that referenced this pull request Aug 31, 2024
* Add remove method to ZipFile

Refer to: python/cpython#103033

* Make use of `ZipFileWithRemove`
@merwok merwok added the type-feature A feature request or enhancement label Apr 24, 2025
@python-cla-bot
Copy link

python-cla-bot bot commented May 22, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@bedevere-app
Copy link

bedevere-app bot commented May 22, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

danny0838 added a commit to danny0838/cpython that referenced this pull request May 22, 2025
This is a revision of commit 659eb04 (PR python#19358), notably with following changes:

- Add documentation and tests.
- Raise `ValueError` for a bad mode, as in other methods.
- Support multi-member removal in `_remove_members()`.
- Support non-physical removal in `_remove_members()`.
- Move physical file data in chunks to prevent excessive memory usage on large files.
- Fix missing entry in `self.NameToInfo` when removing a duplicated archive name.
- Also update `ZipInfo._end_offset` for physically moved files.

Co-authored-by: Éric <merwok@netwok.org>

(cherry picked from commit e6bc82a (PR python#103033))
@bedevere-app
Copy link

bedevere-app bot commented May 22, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

danny0838 added 8 commits May 22, 2025 20:51
This is a revision of commit 659eb04 (PR python#19358), notably with following changes:

- Add documentation and tests.
- Raise `ValueError` for a bad mode, as in other methods.
- Support multi-member removal in `_remove_members()`.
- Support non-physical removal in `_remove_members()`.
- Move physical file data in chunks to prevent excessive memory usage on large files.
- Fix missing entry in `self.NameToInfo` when removing a duplicated archive name.
- Also update `ZipInfo._end_offset` for physically moved files.

Co-authored-by: Éric <merwok@netwok.org>

(cherry picked from commit e6bc82a (PR python#103033))
- File is not truncated in mode 'w'/'x', which results non-shrinked file.
- This cannot be simply resolved by adding truncation for mode 'w'/'x', which may be used on an unseekable file buffer and truncation is not allowed.
- The seek will be automatically called in `ZipFile.close`.
@bedevere-app
Copy link

bedevere-app bot commented May 22, 2025

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@danny0838 danny0838 changed the title gh-51067: add ZipInfo.remove() gh-51067: add ZipFile.remove() May 22, 2025
@danny0838
Copy link
Author

The PR is being old and the base python executable can be hardly compiled. Rebased onto the latest Python and make sure it builds, with few trivial commits squashed together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

5 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy