-
-
Notifications
You must be signed in to change notification settings - Fork 348
Add async oindex and vindex methods to AsyncArray #3083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3083 +/- ##
==========================================
- Coverage 59.69% 59.64% -0.06%
==========================================
Files 78 78
Lines 8694 8745 +51
==========================================
+ Hits 5190 5216 +26
- Misses 3504 3529 +25
🚀 New features to boost your workflow:
|
@dcherian suggested making the sync oindex and vindex getitem methods call the new async versions. EDIT: I think this is already the case? |
if is_coordinate_selection(new_selection, self.array.shape): | ||
return await self.array.get_coordinate_selection(new_selection, fields=fields) | ||
elif is_mask_selection(new_selection, self.array.shape): | ||
return await self.array.get_mask_selection(new_selection, fields=fields) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add .get_mask_selection
to AsyncArray
to cover this codepath. But I only realised I needed to thanks to mypy. This means that this codepath is
- not needed for me right now (I think)
- definitely not covered by the property tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I haven't added mask indexing to the property test suite
@property | ||
def oindex(self) -> AsyncOIndex[T_ArrayMetadata]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose this API to try to follow this pattern:
Array.__getitem__
(exists)Array.oindex.__getitem__
(exists)Array.vindex.__getitem__
(exists)AsyncArray.getitem
(exists)AsyncArray.oindex.getitem
(new)AsyncArray.vindex.getitem
(new)
because python doesn't let you make an async version of the __getitem__
magic method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update the sync versions to use sync(self._async_array.oindex.getitem)
instead of going to _get_selection
directly? That will get us some test coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds very sensible, but when I try to refactor to do that I find that it causes a problem where sync()
gets called from within an async coroutine. I wouldn't have thought this should happen, but here's claude's explanation of why it does:
> Explain how exactly sync is being called from within an async coroutine
⏺ Looking at the error and the code, here's exactly how sync() is being called from within an async coroutine:
The Call Chain
1. Test calls zarr array indexing: za[zix] in test_indexing_with_zarr_array
2. Array.getitem (src/zarr/core/array.py:2566):
return self.vindex[cast("CoordinateSelection | MaskSelection", selection)]
3. VIndex.getitem (src/zarr/core/indexing.py:1248):
return sync(self.array._async_array.vindex.getitem(selection))
4. sync() tries to run the coroutine but detects it's already in a running loop
The Problem
The issue is in the __array__ method call chain:
1. VIndex.__getitem__ calls sync() with self.array._async_array.vindex.getitem(selection)
2. This eventually leads to AsyncArray._get_selection() being called
3. But somewhere in the process, Array.__array__() gets called (src/zarr/core/array.py:2413)
4. __array__() calls self[...] which goes back to Array.__getitem__
5. This creates a nested call where sync() is called while already inside an async context
The Root Cause
The function _zarr_array_to_int_or_bool_array() at line 85 calls np.asarray(arr), which triggers the __array__ protocol on the zarr array. This causes:
def _zarr_array_to_int_or_bool_array(arr: Array) -> npt.NDArray[np.intp] | npt.NDArray[np.bool_]:
if arr.dtype.kind in ("i", "b"):
return np.asarray(arr) # <-- This calls arr.__array__()
When np.asarray() is called on a zarr Array, it calls Array.__array__(), which calls self[...], which eventually calls sync() again - but we're already
inside a sync() call from the VIndex, creating the nested async context error.
The original code before the changes avoided this by handling the zarr array conversion within the sync methods directly, rather than delegating to async
methods that would create this nested sync situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for indexing with a Zarr array, we should convert to numpy array before the sync call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I also be adding tests to test_indexing.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i think so
@@ -1425,6 +1427,56 @@ async def getitem( | |||
) | |||
return await self._get_selection(indexer, prototype=prototype) | |||
|
|||
async def get_orthogonal_selection( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_basic_selection
also doesn't exist on AsyncArray
- should I add that too?
The new code needs clear tests. Our current indexing tests set an extremely bad example here, because they don't test the indexing classes very specifically. So ignore the rest of the indexing tests, and instead for each method defined on each new class write a test that checks if that method does what you expect it to do. |
Array
has.oindex
and.vindex
methods, butAsyncArray
has no equivalent. This PR adds them. It only adds the get methods, not the set methods, which I thought could be deferred to a follow-up PR.I want it for pydata/xarray#10327 (comment)
TODO:
docs/user-guide/*.rst
changes/