Skip to content

Add async oindex and vindex methods to AsyncArray #3083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Jul 30, 2025

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented May 23, 2025

Array has .oindex and .vindex methods, but AsyncArray has no equivalent. This PR adds them. It only adds the get methods, not the set methods, which I thought could be deferred to a follow-up PR.

I want it for pydata/xarray#10327 (comment)

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label May 23, 2025
TomNicholas added a commit to TomNicholas/xarray that referenced this pull request May 23, 2025
@TomNicholas TomNicholas changed the title Add async oindex method to AsyncArray Add async oindex and vindex methods to AsyncArray May 29, 2025
Copy link

codecov bot commented Jul 22, 2025

Codecov Report

❌ Patch coverage is 66.66667% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 60.73%. Comparing base (a0c56fb) to head (6fbb6b1).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/zarr/core/indexing.py 64.28% 10 Missing ⚠️
src/zarr/core/array.py 70.83% 7 Missing ⚠️
src/zarr/core/common.py 60.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3083      +/-   ##
==========================================
+ Coverage   60.68%   60.73%   +0.04%     
==========================================
  Files          78       78              
  Lines        9356     9407      +51     
==========================================
+ Hits         5678     5713      +35     
- Misses       3678     3694      +16     
Files with missing lines Coverage Δ
src/zarr/core/chunk_grids.py 60.37% <ø> (+0.56%) ⬆️
src/zarr/core/common.py 49.16% <60.00%> (+0.47%) ⬆️
src/zarr/core/array.py 69.66% <70.83%> (+0.03%) ⬆️
src/zarr/core/indexing.py 68.10% <64.28%> (-0.10%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@TomNicholas
Copy link
Member Author

TomNicholas commented Jul 22, 2025

@dcherian suggested making the sync oindex and vindex getitem methods call the new async versions.

EDIT: I think this is already the case?

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Jul 23, 2025
Comment on lines +1611 to +1612
@property
def oindex(self) -> AsyncOIndex[T_ArrayMetadata]:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose this API to try to follow this pattern:

  • Array.__getitem__ (exists)
  • Array.oindex.__getitem__ (exists)
  • Array.vindex.__getitem__ (exists)
  • AsyncArray.getitem (exists)
  • AsyncArray.oindex.getitem (new)
  • AsyncArray.vindex.getitem (new)

because python doesn't let you make an async version of the __getitem__ magic method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update the sync versions to use sync(self._async_array.oindex.getitem) instead of going to _get_selection directly? That will get us some test coverage

Copy link
Member Author

@TomNicholas TomNicholas Jul 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds very sensible, but when I try to refactor to do that I find that it causes a problem where sync() gets called from within an async coroutine. I wouldn't have thought this should happen, but here's claude's explanation of why it does:

> Explain how exactly sync is being called from within an async coroutineLooking at the error and the code, here's exactly how sync() is being called from within an async coroutine:

  The Call Chain

  1. Test calls zarr array indexing: za[zix] in test_indexing_with_zarr_array
  2. Array.getitem (src/zarr/core/array.py:2566):
  return self.vindex[cast("CoordinateSelection | MaskSelection", selection)]
  3. VIndex.getitem (src/zarr/core/indexing.py:1248):
  return sync(self.array._async_array.vindex.getitem(selection))
  4. sync() tries to run the coroutine but detects it's already in a running loop

  The Problem

  The issue is in the __array__ method call chain:

  1. VIndex.__getitem__ calls sync() with self.array._async_array.vindex.getitem(selection)
  2. This eventually leads to AsyncArray._get_selection() being called
  3. But somewhere in the process, Array.__array__() gets called (src/zarr/core/array.py:2413)
  4. __array__() calls self[...] which goes back to Array.__getitem__
  5. This creates a nested call where sync() is called while already inside an async context

  The Root Cause

  The function _zarr_array_to_int_or_bool_array() at line 85 calls np.asarray(arr), which triggers the __array__ protocol on the zarr array. This causes:

  def _zarr_array_to_int_or_bool_array(arr: Array) -> npt.NDArray[np.intp] | npt.NDArray[np.bool_]:
      if arr.dtype.kind in ("i", "b"):
          return np.asarray(arr)  # <-- This calls arr.__array__()

  When np.asarray() is called on a zarr Array, it calls Array.__array__(), which calls self[...], which eventually calls sync() again - but we're already
  inside a sync() call from the VIndex, creating the nested async context error.

  The original code before the changes avoided this by handling the zarr array conversion within the sync methods directly, rather than delegating to async
  methods that would create this nested sync situation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for indexing with a Zarr array, we should convert to numpy array before the sync call

@TomNicholas TomNicholas requested a review from d-v-b July 28, 2025 13:31
@dcherian
Copy link
Contributor

Shall we also have the sync array getitme methods use these async methods?
Example:

return sync(
self._async_array._get_selection(
BasicIndexer(selection, self.shape, self.metadata.chunk_grid),
out=out,
fields=fields,
prototype=prototype,
)
)

@TomNicholas
Copy link
Member Author

Yea I would like to, but don't fully understand how to get that to work. So I thought I could leave that for a follow-up.

#3083 (comment)

@dcherian
Copy link
Contributor

dcherian commented Jul 29, 2025

OK but presumably that errors means async indexing with Zarr arrays also doesn't work (https://github.com/zarr-developers/zarr-python/pull/3083/files#r2231114456). Can you open an issue to track please?

@TomNicholas
Copy link
Member Author

indexing with Zarr arrays

I didn't even know it was possible to index a zarr array with another zarr array!

presumably that errors means async indexing with Zarr arrays also doesn't work

Actually I just added a test that seems to show that indexing a zarr array with a (sync) zarr array does work. I also tried indexing a zarr array with the AsyncArray (same test but using ._async_array) which raises with

        # try indexing with async zarr array
>       result = await async_zarr.oindex.getitem(z2._async_array)

tests/test_indexing.py:2061: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
src/zarr/core/indexing.py:974: in getitem
    return await self.array.get_orthogonal_selection(
src/zarr/core/array.py:1440: in get_orthogonal_selection
    indexer = OrthogonalIndexer(selection, self.shape, self.metadata.chunk_grid)
src/zarr/core/indexing.py:878: in __init__
    dim_indexer = BoolArrayDimIndexer(dim_sel, dim_len, dim_chunk_len)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <[AttributeError("'BoolArrayDimIndexer' object has no attribute 'dim_sel'") raised in repr()] BoolArrayDimIndexer object at 0x107e8e990>
dim_sel = <AsyncArray memory://4427184448/z2 shape=(2,) dtype=bool>, dim_len = 2, dim_chunk_len = 1

    def __init__(self, dim_sel: npt.NDArray[np.bool_], dim_len: int, dim_chunk_len: int) -> None:
        # check number of dimensions
        if not is_bool_array(dim_sel, 1):
            raise IndexError("Boolean arrays in an orthogonal selection must be 1-dimensional only")
    
        # check shape
        if dim_sel.shape[0] != dim_len:
            raise IndexError(
                f"Boolean array has the wrong length for dimension; expected {dim_len}, got {dim_sel.shape[0]}"
            )
    
        # precompute number of selected items for each chunk
        nchunks = ceildiv(dim_len, dim_chunk_len)
        chunk_nitems = np.zeros(nchunks, dtype="i8")
        for dim_chunk_ix in range(nchunks):
            dim_offset = dim_chunk_ix * dim_chunk_len
            chunk_nitems[dim_chunk_ix] = np.count_nonzero(
>               dim_sel[dim_offset : dim_offset + dim_chunk_len]
            )
E           TypeError: 'AsyncArray' object is not subscriptable

src/zarr/core/indexing.py:613: TypeError

If that is supposed to work I can raise an issue for it, but it doesn't seem to be the same sync problem that we were discussing before.

@dcherian
Copy link
Contributor

The Claude diagnosis points to a sync np.array being called in _zarr_array_to_ints_or_bool. (#3083 (comment)

@dcherian
Copy link
Contributor

Not sure what to do about codecov, except add more tests

@dstansby dstansby added this to the 3.1.2 milestone Jul 30, 2025
@dcherian
Copy link
Contributor

Thanks for the extra tests!

@dcherian dcherian merged commit 108ec58 into zarr-developers:main Jul 30, 2025
31 checks passed
@TomNicholas TomNicholas deleted the async_oindex branch July 30, 2025 16:56
meeseeksmachine pushed a commit to meeseeksmachine/zarr-python that referenced this pull request Jul 30, 2025
dstansby pushed a commit that referenced this pull request Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy