-
Notifications
You must be signed in to change notification settings - Fork 52
Open
Labels
API extensionAdds new functions or objects to the API.Adds new functions or objects to the API.Needs DiscussionNeeds further discussion.Needs further discussion.RFCRequest for comments. Feature requests and proposed changes.Request for comments. Feature requests and proposed changes.topic: Lazy/GraphLazy and graph-based array implementations.Lazy and graph-based array implementations.
Description
Preface
I do not think that I am the best person to champion this effort, as I am far from the most informed person here on Lazy arrays. I'm probably missing important things, but I would like to start this discussion as I think that it is an important topic.
The problem
The problem of mixing computation requiring data-dependent properties with lazy execution is discussed in detail elsewhere:
- https://data-apis.org/array-api/draft/design_topics/lazy_eager.html
- https://data-apis.org/array-api/draft/design_topics/data_dependent_output_shapes.html#data-dependent-output-shapes
- Handling materialization of lazy arrays #748
- Calculate number of unique values in a lazy array #834
A possible solution
Add the function materialize(x: Array)
to the top level of the API. Behaviour:
- for eagerly-executed arrays, this would be a no-op
- for lazy arrays, this would force computation such that the data is available in the returned array (which is of the same array type?)
- for "100% lazy" arrays (Handling materialization of lazy arrays #748 (comment)), this would raise an exception
Prior art
- Dask:
- https://docs.dask.org/en/stable/generated/dask.array.Array.compute_chunk_sizes.html computes chunk sizes / shape, working in-place and leaving the array as a Dask array.
- https://docs.dask.org/en/stable/generated/dask.array.Array.compute.html materialises the in-memory equivalent of the dask array, returning e.g. a NumPy array.
- JAX:
- ?
- others?
Concerns
- I think the main concern is whether eager-only libraries will agree to adding a no-op into the API. There is precedent for that type of change (e.g.
device
kwargs in NumPy), but perhaps this is too obtrusive? - As far as I can tell there isn't a standard way to do this across lazy libraries. Does JAX just do this automatically when it would be needed? Do other libraries have this capability?
Alternatives
- Do nothing. The easy option, but it leaves us unable to support lazy arrays when data-dependent properties are used in computation (maybe that is okay?)
- An alternative API. Maybe spelled like
compute*
or a method on the array object. Maybe with options for partial materialization (if that's a thing)?
Metadata
Metadata
Assignees
Labels
API extensionAdds new functions or objects to the API.Adds new functions or objects to the API.Needs DiscussionNeeds further discussion.Needs further discussion.RFCRequest for comments. Feature requests and proposed changes.Request for comments. Feature requests and proposed changes.topic: Lazy/GraphLazy and graph-based array implementations.Lazy and graph-based array implementations.