-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-134004: Dbm vacuuming #134028
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-134004: Dbm vacuuming #134028
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
…ce to module links not found
…m.dumb for consistency with dbm.gnu. Also updated documentations and tests to reflect the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good code.
However, there are few issues.
-
dbm.dumb
does not support concurrent writing, but it supports limited concurrent access with a single writer and multiple readers. Readers may not see new data, and can read deleted data (because they cache the index), but otherwise they do not conflict with writer. But when writer callsreorganize()
, it breaks readers. So using this feature can cause new bugs.I am not sure that we can to do something with this.
-
Other issue is that
reorganize()
is inherently not fail-safe. If something happens during the reorganization process, you may lose data.
Do gdbm_reorganize()
and Sqlite VACUUM have the same issues or they provide some protection?
Good morning @serhiy-storchaka, Easy changesI have pushed changes to address the documentation and dbm.dumb comment remarks. Organize not being fail-safeOn the topic of
Let me know how you think we should proceed. SQLITE and GDBM are designed to be much more robust and use the second technique to ensure failed reorganizations don't corrupt the database. Crucially, however, they handle a single binary file with both index and data, hence either everything is renamed atomically or nothing is. ConcurrencyMy apologies, but I would disagree Examples:
Of course this problem is well-known to any developer who looked at the source code for One thing that was already well explained in the documentation, however:
Please let me know how you would like to proceed on this issue as well. In the meantime, have a great day! 😃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your response. What is left:
- Add
.. versionadded:: next
for all new methods. - Add What's New entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
Thank you for your contribution, @Andrea-Oliveri. Please ping me next week if I don't merge this PR before then.
Thank you @serhiy-storchaka 😄 . Just FYI, since I had merge conflicts due to the whatsnew I have merged with main between you marking as approved and me reading the message. |
This is great; thanks for the proposal and implementation, @Andrea-Oliveri; thanks for managing this, @serhiy-storchaka. |
gh-134004:
.reorganize()
method todbm.sqlite
anddbm.dumb
which allows to retrieve unused space from database files. Name was chosen for consistency withdbm.gnu
.reorganize().
dbm.ndbm` does not support any such optimization.dbm.dumb.reorganize()
is a new custom implementation rewriting the database file in-place and removing the gaps left by deletions, then truncating the excess..reorganize()
runs as expected on all compatibledbm
submodules.dbm
module to include contributed methods as well as to highlight some of the limitations of the current implementations, such as lack of concurrency compatibility (fordbm.dumb
) and lack of automatic reorganizing.reorganize()
method toshelve
to expose dbm's submodules methodsshelve.reorganize()
Note: Unfortunately I did not feel comfortable changing the C code at this time, so I chose not to add an empty
.reorganize()
method todbm.ndbm
. This resulted in:dbm
submodules.hasattr()
, whether thedbm
objects have the.reorganize()
method and succeed early if that is not the case.