-
-
Notifications
You must be signed in to change notification settings - Fork 351
Add async oindex and vindex methods to AsyncArray #3083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
4f51d23
535ebaa
6f25f82
e595f76
bdbdd61
fec243d
320e6d2
ea0f657
870b6b6
a7e9e43
102e411
0cd96aa
b503969
9b8ebde
125ebdf
b6d5b6d
e7cbaef
d5d5494
c0026e9
b9197e5
8c13259
9e60062
18ea042
7fe1ffd
b0af4a7
79f78cc
da37026
4a1ca09
3b62dfa
b8b7c09
6fa9f37
01ac722
c7a1000
7e9681d
1469093
6fbb6b1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Added support for async vectorized and orthogonal indexing. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -61,6 +61,7 @@ | |
ZarrFormat, | ||
_default_zarr_format, | ||
_warn_order_kwarg, | ||
ceildiv, | ||
concurrent_map, | ||
parse_shapelike, | ||
product, | ||
|
@@ -76,6 +77,8 @@ | |
) | ||
from zarr.core.dtype.common import HasEndianness, HasItemSize, HasObjectCodec | ||
from zarr.core.indexing import ( | ||
AsyncOIndex, | ||
AsyncVIndex, | ||
BasicIndexer, | ||
BasicSelection, | ||
BlockIndex, | ||
|
@@ -92,7 +95,6 @@ | |
Selection, | ||
VIndex, | ||
_iter_grid, | ||
ceildiv, | ||
check_fields, | ||
check_no_multi_fields, | ||
is_pure_fancy_indexing, | ||
|
@@ -1425,6 +1427,56 @@ async def getitem( | |
) | ||
return await self._get_selection(indexer, prototype=prototype) | ||
|
||
async def get_orthogonal_selection( | ||
self, | ||
selection: OrthogonalSelection, | ||
*, | ||
out: NDBuffer | None = None, | ||
fields: Fields | None = None, | ||
prototype: BufferPrototype | None = None, | ||
) -> NDArrayLikeOrScalar: | ||
if prototype is None: | ||
prototype = default_buffer_prototype() | ||
indexer = OrthogonalIndexer(selection, self.shape, self.metadata.chunk_grid) | ||
return await self._get_selection( | ||
indexer=indexer, out=out, fields=fields, prototype=prototype | ||
) | ||
|
||
async def get_mask_selection( | ||
self, | ||
mask: MaskSelection, | ||
*, | ||
out: NDBuffer | None = None, | ||
fields: Fields | None = None, | ||
prototype: BufferPrototype | None = None, | ||
) -> NDArrayLikeOrScalar: | ||
if prototype is None: | ||
prototype = default_buffer_prototype() | ||
indexer = MaskIndexer(mask, self.shape, self.metadata.chunk_grid) | ||
return await self._get_selection( | ||
indexer=indexer, out=out, fields=fields, prototype=prototype | ||
) | ||
|
||
async def get_coordinate_selection( | ||
self, | ||
selection: CoordinateSelection, | ||
*, | ||
out: NDBuffer | None = None, | ||
fields: Fields | None = None, | ||
prototype: BufferPrototype | None = None, | ||
) -> NDArrayLikeOrScalar: | ||
if prototype is None: | ||
prototype = default_buffer_prototype() | ||
indexer = CoordinateIndexer(selection, self.shape, self.metadata.chunk_grid) | ||
out_array = await self._get_selection( | ||
indexer=indexer, out=out, fields=fields, prototype=prototype | ||
) | ||
|
||
if hasattr(out_array, "shape"): | ||
# restore shape | ||
out_array = np.array(out_array).reshape(indexer.sel_shape) | ||
return out_array | ||
|
||
async def _save_metadata(self, metadata: ArrayMetadata, ensure_parents: bool = False) -> None: | ||
""" | ||
Asynchronously save the array metadata. | ||
|
@@ -1556,6 +1608,19 @@ async def setitem( | |
) | ||
return await self._set_selection(indexer, value, prototype=prototype) | ||
|
||
@property | ||
def oindex(self) -> AsyncOIndex[T_ArrayMetadata]: | ||
Comment on lines
+1611
to
+1612
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I chose this API to try to follow this pattern:
because python doesn't let you make an async version of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we update the sync versions to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That sounds very sensible, but when I try to refactor to do that I find that it causes a problem where > Explain how exactly sync is being called from within an async coroutine
⏺ Looking at the error and the code, here's exactly how sync() is being called from within an async coroutine:
The Call Chain
1. Test calls zarr array indexing: za[zix] in test_indexing_with_zarr_array
2. Array.getitem (src/zarr/core/array.py:2566):
return self.vindex[cast("CoordinateSelection | MaskSelection", selection)]
3. VIndex.getitem (src/zarr/core/indexing.py:1248):
return sync(self.array._async_array.vindex.getitem(selection))
4. sync() tries to run the coroutine but detects it's already in a running loop
The Problem
The issue is in the __array__ method call chain:
1. VIndex.__getitem__ calls sync() with self.array._async_array.vindex.getitem(selection)
2. This eventually leads to AsyncArray._get_selection() being called
3. But somewhere in the process, Array.__array__() gets called (src/zarr/core/array.py:2413)
4. __array__() calls self[...] which goes back to Array.__getitem__
5. This creates a nested call where sync() is called while already inside an async context
The Root Cause
The function _zarr_array_to_int_or_bool_array() at line 85 calls np.asarray(arr), which triggers the __array__ protocol on the zarr array. This causes:
def _zarr_array_to_int_or_bool_array(arr: Array) -> npt.NDArray[np.intp] | npt.NDArray[np.bool_]:
if arr.dtype.kind in ("i", "b"):
return np.asarray(arr) # <-- This calls arr.__array__()
When np.asarray() is called on a zarr Array, it calls Array.__array__(), which calls self[...], which eventually calls sync() again - but we're already
inside a sync() call from the VIndex, creating the nested async context error.
The original code before the changes avoided this by handling the zarr array conversion within the sync methods directly, rather than delegating to async
methods that would create this nested sync situation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess for indexing with a Zarr array, we should convert to numpy array before the sync call |
||
"""Shortcut for orthogonal (outer) indexing, see :func:`get_orthogonal_selection` and | ||
:func:`set_orthogonal_selection` for documentation and examples.""" | ||
return AsyncOIndex(self) | ||
|
||
@property | ||
def vindex(self) -> AsyncVIndex[T_ArrayMetadata]: | ||
"""Shortcut for vectorized (inner) indexing, see :func:`get_coordinate_selection`, | ||
:func:`set_coordinate_selection`, :func:`get_mask_selection` and | ||
:func:`set_mask_selection` for documentation and examples.""" | ||
return AsyncVIndex(self) | ||
|
||
async def resize(self, new_shape: ShapeLike, delete_outside_chunks: bool = True) -> None: | ||
""" | ||
Asynchronously resize the array to a new shape. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_basic_selection
also doesn't exist onAsyncArray
- should I add that too?