Skip to content

MNT: add linter for thread-unsafe C API uses #28634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

lvllvl
Copy link
Contributor

@lvllvl lvllvl commented Apr 2, 2025

Adding a linter to audit PRs.
This would lint out PRs that use problematic functions such as

  • PyList_GetItem
  • PyList_GET_ITEM
  • PyDict_GetItem
  • PyDict_getItemWithError
  • PyDict_Next
  • PyDict_GetItemString
  • _PyDict_GetItemStringWithError

Attempts to resolve #26159

@lvllvl lvllvl force-pushed the issue-26159 branch 5 times, most recently from 62be0b4 to b77b43a Compare April 3, 2025 01:50
@lvllvl lvllvl marked this pull request as ready for review April 3, 2025 02:19
@ngoldbaum
Copy link
Member

ngoldbaum commented Apr 3, 2025

Does this need a whole new CI job? I know I suggested that in the issue, but it occurs to me that this could be integrated into e.g. the spin lint command.

Also rather than just checking diffs we could do this codebase-wide and add e.g. // noqa comments throughout the codebase, since it is safe to use borrowed references in some situations, but I think having a linter that alerts contributors and reviewers about the problem is good.

If it ends up being really invasive to add the lint opt-outs we can reconsider.

If the lint fails, it should suggest what to do to fix the issue. One of:

  • use an API that returns a strong reference

or, add an opt-out to the lint if:

  • you are very sure that a lock is held
  • you are very sure you have the only reference to the object (e.g. it is newly created in the local scope).

The linter should also probably check all of the functions listed in the free-threaded HOWTO in the CPython docs:

https://docs.python.org/3/howto/free-threading-extensions.html#borrowed-references

Also see here for more details about why we need to do this and why we can't just globally replace all these functions:

https://py-free-threading.github.io/porting-extensions/#cpython-c-api-usage

We probably should also be linting uses of fast accessor macros that don't do any locking.

Also, FWIW, there are probably still lurking issues. I'm aware of Py_SEQUENCE_FAST at least (#28046), I've just had more pressing things to work on and haven't been able to finish that one off. I'm hoping to finish off that work soon, or at least upstream the trivial fixes I have but leave an issue open for the more complicated issues in the array coercion code.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this goes in the right direction, but may need a bit of care. Overall, perhaps good to keep it simple, and maybe just use git grep for all, like,

git grep -e PyList_GetItem\( --or -e PyList_GET_ITEM\( ... origin/main...HEAD

Or perhaps better write the list to a file and use -f.

Notes:

  1. important to add the ( otherwise correct functions like PyList_GetItemRef will be matched too.
  2. You have to be sure to match all C files, so also .c.src and .cpp, etc. My suggestion would be to just match all files and perhaps remove .rst... since this only runs on the differences, false positives are not very likely.

@seberg
Copy link
Member

seberg commented Apr 4, 2025

I almost wonder if one could do a custom clang-tidy style check (not if that can that work with .c.src though, I admit)? That might also be useful for other projects then.

Plus that means we have an established pattern for allowing it, just add a // NOLINT: Newly created tuple comment. I think an inline code-comment is vastly preferential to some global allowlist.

@lvllvl lvllvl force-pushed the issue-26159 branch 5 times, most recently from e69e629 to 3cf055d Compare April 11, 2025 01:00
@ngoldbaum
Copy link
Member

This is looking much closer to being mergeable. Let me know if you need a hand with getting the circleci build fixed.

@ngoldbaum
Copy link
Member

.. or maybe it's unrelated and would go away with a rebase?

@lvllvl lvllvl force-pushed the issue-26159 branch 3 times, most recently from bd4afe5 to 5e57f30 Compare April 12, 2025 18:40
@lvllvl
Copy link
Contributor Author

lvllvl commented Apr 12, 2025

@ngoldbaum

  1. I think the rebase worked, kind of.
  2. The test is set to fail if any of the functions are used but currently I think there's about 92 uses of the functions, here's a link: C API Borrowed Ref Lint
  3. Should I assume that all the currently flagged uses of the functions are approved uses? I.e., should I just mark each listed instance as ok // noqa: borrowed-ref OK
  4. I think a failing version of this test would cause 4 tests to fail, azure-pipeline numpy, azure-pipeline numpy (ComprehensiveTest Lint), Linux, and Run MyPy/C API Borrowed Ref Lint. I think it's because those jobs run linter.py, is that ok if it functions that way?

@lvllvl lvllvl force-pushed the issue-26159 branch 2 times, most recently from 7811388 to 0ea4d9c Compare April 13, 2025 17:24
@lvllvl lvllvl requested a review from ngoldbaum April 13, 2025 18:10
@lvllvl lvllvl force-pushed the issue-26159 branch 3 times, most recently from af12173 to dccef23 Compare April 20, 2025 23:01
@lvllvl lvllvl force-pushed the issue-26159 branch 2 times, most recently from 7bc287b to 3e90169 Compare April 21, 2025 01:18
@lvllvl lvllvl force-pushed the issue-26159 branch 2 times, most recently from d155200 to 56b3d77 Compare April 21, 2025 17:48
@ngoldbaum
Copy link
Member

Let me know if you'd like a hand with some git surgery :)

@lvllvl lvllvl force-pushed the issue-26159 branch 2 times, most recently from d2828c9 to b5f5ffb Compare April 21, 2025 18:36
@lvllvl
Copy link
Contributor Author

lvllvl commented Apr 21, 2025

@ngoldbaum looks like all the tests are passing now.

  1. Re: numpy/_core/src/common/pythoncapi_compat.h, I think I accidentally added this file into this location. So you were right, this was the source of all the additional lines. I removed this from my PR
  2. Re: Linter in tools/ci/check_c_api_usage.sh, I changed the ALL_FILES filter to ignore the submodule section. I tested it in my codespace and here, seems to work ok. The line is now:
ALL_FILES=$(find numpy -type f \( -name "*.c" -o -name "*.h" -o -name "*.c.src" -o -name "*.cpp" \) ! -path "*/pythoncapi-compat/*")

Let me know if you need any additional changes. thanks for the help!

@lvllvl lvllvl requested a review from ngoldbaum April 21, 2025 19:03
Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, I'm glad this is shaping up.

I guess my main issue with this is that we should be relatively sure that all of the ones we're marking as OK really are OK, otherwise we'll be misleading future readers.

I need to sit down with this PR and try to stare at all the uses and see if I can come up with possible problems.

We might need a new NOQA category for uses that are known to be problematic but need manual fixes. That would allow us to catch and triage new uses in CI while not needing to go through and fix absolutely everything that's still in the library.

I will try to do this soon but may not have a ton of time before PyCon.

@@ -108,7 +108,7 @@ PyUFuncOverride_GetOutObjects(PyObject *kwds, PyObject **out_kwd_obj, PyObject *
* PySequence_Fast* functions. This is required for PyPy
*/
PyObject *seq;
seq = PySequence_Fast(*out_kwd_obj,
seq = PySequence_Fast(*out_kwd_obj, // noqa: borrowed-ref OK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have some changes I need to upstream related to this from back in February, see #28046 (comment). Maybe I should finally get around to sending in what I have...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I think I missed this one.

It sounds like you have a manual fix for this so I'll label this as:
// noqa: borrowed-ref - manual fix needed

@ngoldbaum
Copy link
Member

Hi all, I'm really sorry for the delay here. Totally my fault for letting it sit for so long.

I'll try to set aside some time to get this up to shape.

@lvllvl
Copy link
Contributor Author

lvllvl commented Jul 2, 2025

@ngoldbaum No worries I should've followed up on your comments a lot sooner

  1. How are you determining which use of free threading is acceptable and which ones are not? Is there criteria beyond the literature that you linked in earlier comments? I'll try and verify each is labeled as well.

  2. Regarding your suggestion to add another noqa category for known, problematic uses that need manual fixes: I can add in an additional category to the script, e.g., label = noqa: manual fix needed and allow those noqa labeled comments to pass this linter as well.

@ngoldbaum
Copy link
Member

ngoldbaum commented Jul 3, 2025

How are you determining which use of free threading is acceptable and which ones are not? Is there criteria beyond the literature that you linked in earlier comments? I'll try and verify each is labeled as well.

The two links I shared above are where you should look. You could also watch the PyCon talk @lysnikolaou and I gave about free-threaded support: https://www.youtube.com/watch?v=EuU3ksI1l04, which covers this.

In short, borrowed references to items in mutable containers (e.g. dicts, lists) are unsafe if the container is visible to another thread, because the item might be de-allocated if its refcount happens to go to zero. Whether or not the container is visible to other threads is context-dependent.

We don't want to wholesale move everything to strong reference APIs because of performance and the risk of bugs. Performance because borrowed references skip an incref and a decref, which might be very expensive in a tight loop. Bugs because reference counting logic in C is very tricky, particularly in code that has complicated control flow, and is easy to screw up.

I can add in an additional category to the script, e.g., label = noqa: manual fix needed and allow those noqa labeled comments to pass this linter as well.

Makes sense to me.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spotted some issues with how you've set the markers so far, let me know if you have questions about my logic

@lvllvl
Copy link
Contributor Author

lvllvl commented Jul 9, 2025

spotted some issues with how you've set the markers so far, let me know if you have questions about my logic

cool thank you for reviewing. I had a loose rubric for each use of thread-unsafe functions but your input was helpful. I updated the comments according to your input. For reference my rubric is outlined below.

  1. Rubric was: Is container inmutable? Is the container new/private? Is there an explicit lock on container? Is there a guarantee that container will not be deleted? And if I answered "no" on all those questions, then I changed comment to noqa: borrowed-ref - manual fix needed
  2. I also added all the labels from the Borrowed reference API so the linter will catch all those listed there
  3. I'm not sure why I'm failing tests, sorry will investigate this asap

@lvllvl lvllvl force-pushed the issue-26159 branch 2 times, most recently from ec4741f to ad699bf Compare July 11, 2025 15:03
Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments below. Sorry I didn't make it clearer on the last review that I wasn't exhaustive in my comments. I'm also not exhaustive in pointing out every instance of a pattern I commented about here, please make sure that all of the ones dealing with fields below are marked as OK, since it's effectively append-only and I don't think it's worth the reference counting churn to fix every spot you have currently marked as needing a fix.

@@ -268,7 +268,7 @@ _buffer_format_string(PyArray_Descr *descr, _tmp_string_t *str,
int ret;

name = PyTuple_GET_ITEM(ldescr->names, k);
item = PyDict_GetItem(ldescr->fields, name);
item = PyDict_GetItem(ldescr->fields, name); // noqa: borrowed-ref - manual fix needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you missed an access to fields here. Can you go through and make sure all of them are marked OK per my explanation from the last round of review?

@@ -130,7 +130,7 @@ PyArray_IntpConverter(PyObject *obj, PyArray_Dims *seq)
* dimension_from_scalar as soon as possible.
*/
if (!PyLong_CheckExact(obj) && PySequence_Check(obj)) {
seq_obj = PySequence_Fast(obj,
seq_obj = PySequence_Fast(obj, // noqa: borrowed-ref OK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't OK, because another thread can access obj.

@@ -1135,7 +1135,7 @@ PyArray_IntpFromSequence(PyObject *seq, npy_intp *vals, int maxvals)
{
PyObject *seq_obj = NULL;
if (!PyLong_CheckExact(seq) && PySequence_Check(seq)) {
seq_obj = PySequence_Fast(seq,
seq_obj = PySequence_Fast(seq, // noqa: borrowed-ref OK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -2749,7 +2749,7 @@ nonstructured_to_structured_resolve_descriptors(

Py_ssize_t pos = 0;
PyObject *key, *tuple;
while (PyDict_Next(to_descr->fields, &pos, &key, &tuple)) {
while (PyDict_Next(to_descr->fields, &pos, &key, &tuple)) { // noqa: borrowed-ref - manual fix needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields again, this is fine, ditto for all the other ones that go through PyDataType_FIELDS below.

@@ -1522,7 +1522,7 @@ arr_add_docstring(PyObject *NPY_UNUSED(dummy), PyObject *const *args, Py_ssize_t
PyTypeObject *new = (PyTypeObject *)obj;
_ADDDOC(new->tp_doc, new->tp_name);
if (new->tp_dict != NULL && PyDict_CheckExact(new->tp_dict) &&
PyDict_GetItemString(new->tp_dict, "__doc__") == Py_None) {
PyDict_GetItemString(new->tp_dict, "__doc__") == Py_None) { // noqa: borrowed-ref - manual fix needed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left an incorrect comment earlier, on second though this is theoretically an issue, if there's a race to set __doc__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sounds good. Not sure what happened to my branch. Triaging atm, will try to re-open. Not sure why the PR closed when I committed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ngoldbaum, really sorry for the inconvenience but are you able to re-open this PR on your end? I am unable to re-open the PR.

I'm not sure why my git push --force automatically closed the PR. But I must've done something incorrectly. I'm reviewing my branch again to implement your suggested changes.

If re-opening on your end is not an option, I can submit a new PR. Again, sorry for the inconvenience.

@seberg
Copy link
Member

seberg commented Jul 13, 2025

Just open a new PR. I think id you hit 0 commits, this can happen.

@ngoldbaum
Copy link
Member

I just tried and I can't re-open it. Breaking github is a little bit of a "learning advanced git" right of passage, no worries on our end about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MNT: Add linter for thread-unsafe C API uses
4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy