Fix race condition in from_array for arrays with shards #3217
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #3169.
When passing data via
create_array(data = ...)
, the data is inserted byfrom_array
after getting split by chunks. In current main, when using shards,from_array
ends up splitting the data by sub-chunks (due toAsyncArray.chunks
returningMetadata.chunks
which returns the size of the chunks inside shards instead of the size of the "physical" chunk files), which ends up creating a race condition when the writes to multiple sub-chunk end up writing to the same physical chunk file.This PR changes
from_array
to instead split the data by shard in case there are shards.(Probably, there needs to be a lock somewhere in AsyncArray or ShardCodec which prevents multiple writes to the same shard (or, just physical chunk in general) to execute at the same time)
TODO:
docs/user-guide/*.rst
changes/