-
-
Notifications
You must be signed in to change notification settings - Fork 32.4k
gh-137103: A better circular check for json.dump() #137104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When check_circular=True (default) is used, the JSON module created a map and created a new Long object for each object pointer and stored it in the map to prevent circular references and dumping the same object again. Other Python objects like list and dict solve this problem by using Py_ReprEnter/Py_ReprLeave without creating a new Long object for each object. Use Py_ReprEnter/Py_ReprLeave for JSON as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Py_ReprEnter/ReprLeave
really meant to be used that way? it also allocates lists and dicts so I don't know whether it's better than using ints.
More generally, is there a need to use PyDict
and PyLong
? maybe we can only use _Py_hashtable
instead? I don't know if using hashtables is faster or not though.
I think it is meant to be used for that and other objects are using it, you can search around.
The big difference is that for a dict (current markers) you have to create a hash-able key object (but we keep track of lists and dicts) and it leads to creation of new Long objects. The Py_Repr* functions allocate dictionary and a list only on the first call per thread and then does a linear scan which is CPU-cache friendly. It doesn't do any allocations per new objects being tracked. |
To me it doesn't because it uses a specific key namely
While we don't allocate a full new object, we're still growing the list, or am I wrong? cc @serhiy-storchaka as the JSON expert |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I'll definitely port this to jsonyx
if accepted. Could you decrease the diff though?
Who is the best person to ask? In the worst case we can copy the function and call it JsonEnter/JsonLeave because it works better than the current dict approach. Another option would be to change the internal API and make
It will increase the list size if there is insufficient space but it will do so few times per the lifetime of the thread. |
The first peculiarity is that it uses a thread local Second difference is that it uses a We could use a hybrid approach -- |
When check_circular=True (default) is used, the JSON module created a dict and created a new Long object for each object pointer and stored it in the map to prevent circular references and dumping the same object again.
Other Python objects like list and dict solve this problem by using Py_ReprEnter/Py_ReprLeave without creating a new Long object for each object.
Use Py_ReprEnter/Py_ReprLeave for JSON as well.