Skip to content

Commit e9a3615

Browse files
committed
aio: Add missing memory barrier when waiting for IO handle
Previously there was no memory barrier enforcing correct memory ordering when waiting for a free IO handle. However, in the much more common case of waiting for IO to complete, memory barriers already were present. On strongly ordered architectures like x86 this had no negative consequences, but on some armv8 hardware (observed on Apple hardware), it was possible for the update, in the IO worker, to PgAioHandle->state to become visible before ->distilled_result becoming visible, leading to rather confusing assertion failures. The failures were rare enough that the bug sometimes took days to reproduce when running 027_stream_regress in a loop. Once finally debugged, it was easy enough to come up with a much quicker repro: Trigger a lot of very fast IO by limiting io_combine_limit to 1 and ensure that we always have to wait for a free handle by setting io_max_concurrency to 1. Triggering lots of concurrent seqscans in that setup triggers the issue within seconds. One reason this was hard to debug was that the assertion failure most commonly happened in WaitReadBuffers(), rather than in the AIO subsystem itself. The assertions added in this commit make problems like this easier to understand. Also add a comment to the IO worker explaining that we rely on the lwlock acquisition for correct memory ordering. I think it'd be good to add a tap test that stress tests buffer IO, but that's material for a separate patch. Thanks a lot to Alexander and Konstantin for all the debugging help. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Alexander Lakhin <exclusion@gmail.com> Investigated-by: Andres Freund <andres@anarazel.de> Investigated-by: Alexander Lakhin <exclusion@gmail.com> Investigated-by: Konstantin Knizhnik <knizhnik@garret.ru> Discussion: https://postgr.es/m/2dkz7azclpeiqcmouamdixyn5xhlzy4rvikxrbovyzvi6rnv5c@pz7o7osv2ahf
1 parent ee685c9 commit e9a3615

File tree

3 files changed

+30
-1
lines changed

3 files changed

+30
-1
lines changed

src/backend/storage/aio/aio.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -556,6 +556,13 @@ bool
556556
pgaio_io_was_recycled(PgAioHandle *ioh, uint64 ref_generation, PgAioHandleState *state)
557557
{
558558
*state = ioh->state;
559+
560+
/*
561+
* Ensure that we don't see an earlier state of the handle than ioh->state
562+
* due to compiler or CPU reordering. This protects both ->generation as
563+
* directly used here, and other fields in the handle accessed in the
564+
* caller if the handle was not reused.
565+
*/
559566
pg_read_barrier();
560567

561568
return ioh->generation != ref_generation;
@@ -773,7 +780,12 @@ pgaio_io_wait_for_free(void)
773780
* Note that no interrupts are processed between the state check
774781
* and the call to reclaim - that's important as otherwise an
775782
* interrupt could have already reclaimed the handle.
783+
*
784+
* Need to ensure that there's no reordering, in the more common
785+
* paths, where we wait for IO, that's done by
786+
* pgaio_io_was_recycled().
776787
*/
788+
pg_read_barrier();
777789
pgaio_io_reclaim(ioh);
778790
reclaimed++;
779791
}
@@ -852,7 +864,12 @@ pgaio_io_wait_for_free(void)
852864
* check and the call to reclaim - that's important as
853865
* otherwise an interrupt could have already reclaimed the
854866
* handle.
867+
*
868+
* Need to ensure that there's no reordering, in the more
869+
* common paths, where we wait for IO, that's done by
870+
* pgaio_io_was_recycled().
855871
*/
872+
pg_read_barrier();
856873
pgaio_io_reclaim(ioh);
857874
break;
858875
}

src/backend/storage/aio/aio_callback.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -256,6 +256,9 @@ pgaio_io_call_complete_shared(PgAioHandle *ioh)
256256
pgaio_result_status_string(result.status),
257257
result.id, result.error_data, result.result);
258258
result = ce->cb->complete_shared(ioh, result, cb_data);
259+
260+
/* the callback should never transition to unknown */
261+
Assert(result.status != PGAIO_RS_UNKNOWN);
259262
}
260263

261264
ioh->distilled_result = result;
@@ -290,6 +293,7 @@ pgaio_io_call_complete_local(PgAioHandle *ioh)
290293

291294
/* start with distilled result from shared callback */
292295
result = ioh->distilled_result;
296+
Assert(result.status != PGAIO_RS_UNKNOWN);
293297

294298
for (int i = ioh->num_callbacks; i > 0; i--)
295299
{
@@ -306,6 +310,9 @@ pgaio_io_call_complete_local(PgAioHandle *ioh)
306310
pgaio_result_status_string(result.status),
307311
result.id, result.error_data, result.result);
308312
result = ce->cb->complete_local(ioh, result, cb_data);
313+
314+
/* the callback should never transition to unknown */
315+
Assert(result.status != PGAIO_RS_UNKNOWN);
309316
}
310317

311318
/*

src/backend/storage/aio/method_worker.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -461,7 +461,12 @@ IoWorkerMain(const void *startup_data, size_t startup_data_len)
461461
int nwakeups = 0;
462462
int worker;
463463

464-
/* Try to get a job to do. */
464+
/*
465+
* Try to get a job to do.
466+
*
467+
* The lwlock acquisition also provides the necessary memory barrier
468+
* to ensure that we don't see an outdated data in the handle.
469+
*/
465470
LWLockAcquire(AioWorkerSubmissionQueueLock, LW_EXCLUSIVE);
466471
if ((io_index = pgaio_worker_submission_queue_consume()) == UINT32_MAX)
467472
{

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy