-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
ENH: Use array indexing preparation routines for flatiter objects #28590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
198df6b
to
75aaed0
Compare
75aaed0
to
9f2d51f
Compare
assert_raises(ValueError, ia, x.flat, s, np.zeros(9, dtype=float)) | ||
assert_raises(ValueError, ia, x.flat, s, np.zeros(11, dtype=float)) | ||
assert_raises(IndexError, ia, x.flat, s, np.zeros(9, dtype=float)) | ||
assert_raises(IndexError, ia, x.flat, s, np.zeros(11, dtype=float)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is certainly more consistent and I'd even call it a bugfix, it is a behavior change and someone might have code relying on the old behavior. Needs a release note at least. You also need another release note for the new features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Added a release note that lists all the most important changes.
8f2b322
to
33109dd
Compare
This is a big refactor, so I think we'll need at least two experienced developers to go over the C code changes, so that might take a while. I'll try to do a pass focusing on the correctness of the C code soon. On a first, high-level pass this looks like mostly simplification and cleanup. I think you should also try running the indexing benchmarks to see if there are any significant regressions in existing benchmarks. I think It would also be nice to get new entries in the |
These are the results of running the (old & new) benchmarks:
It looks like having special cases for tuple, ellipses etc. (instead of going through |
Probably |
I added a couple of special cases for an empty tuple and boolean indexes. This fixes the two worst performance regressions. I feel that the rest are acceptable, since this goes through a much more complex code path to make sure that everything is set up correctly. |
I added the 2.3 milestone to make sure we don't drop reviewing this before cutting the release. |
@ngoldbaum I am about to push this off to 2.4 unless you want to put it in very soon. |
I spoke with Lysandros and he said it's OK to push this off. We'll coordinate on getting this reviewed soon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not looking at it much. Overall looks nice, I need to do a pass to see for refcount issues, etc.
I am slightly worried that some of the bad bool cases should maybe have a FutureWarning
(or just go to an error for a bit?!), to enforce correct behavior.
Overall, I am happy that this seemed to have worked well to integrate, the diff is a bit unwieldy, but it can't be helped.
- Add test with int16 index - Cast integer index array to intp before using it - Remove unused functions - Change error message back to array - Move docstring to prepare_index_noarray
@seberg Sorry for taking so long with this and thanks a lot for the thorough review. It was really helpful! I've addressed all of the feedback so it should be in a better state now. Looking foward to hearing your thoughts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry this is so tedious. I have to look further, but wanted to comment for those things.
I do wonder more now if we shouldn't push the errors/warning down into the index prep helper. As annoying as it is, we just duplicate things and while that code is spaghetti, it is at least pretty linear spaghetti.
Py_INCREF(dtype); | ||
ret = (PyArrayObject *)PyArray_NewFromDescr(Py_TYPE(self->ao), | ||
ret = PyArray_NewFromDescr(Py_TYPE(self->ao), | ||
dtype, | ||
1, &n_steps, | ||
NULL, NULL, | ||
0, (PyObject *)self->ao); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not your problem, so ignore. I think there might be a bit of fuzziness around the fact that sometimes dtypes
get's replaced on array creation.
if (!PyArray_Check(ind) && !PyArrayIter_Check(ind)) { | ||
PyErr_SetString(PyExc_IndexError, | ||
"Non-array indices are not supported because they can lead to unexpected behavior. " | ||
"This is expected to change in future versions."); | ||
goto finish; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tedious :/. I don't think it is right here, we should move things down because arr.flat[[1, 2, 3, 4]]
is really OK.
So there are two things here:
a.flat[[1.]]
floats are at this point, rejected by the original preparation. Changing that this stops working is also the biggest break (of sensible behavior). I could see adding a branch to the index-prep code to deprecate it there (or we gamble and see if someone notices, even considering that array indexing is strict for a decade, I am not sure).a.flat[[True, False]]
was always nonsense, and I am certainly happy to not worry about deprecation. Two possible solutions:- Just move this (similarly) into the
HAS_BOOL
branch. That works, although the current will also fail forarr.flat[np.array([True, False]),]
as it is a tuple. Admittedly, that is rather niche. - We also push this into the prep-index code, where we have the original object at hand to give the warning.
- Just move this (similarly) into the
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The float case is now fixed in that it works but is derecated (added a branch in the index-prep code). I'm a bit confused about the bool case though. When bool indices are used that do not match the iterator's shape, that raises an IndexError. That's what we want, so I think we're okay like this?
prepare_index
initer_subscript
anditer_ass_subscript
. This fixes various cases that were broken before:arr.flat[[True, True]]
arr.flat[[1.0, 1.0]]
arr.flat[()] = 0
flatiter
indexing operationsCloses #28314.