Skip to content

Commit 3e2db4c

Browse files
committed
Use SnapshotDirty rather than an active snapshot to probe index endpoints.
If there are lots of uncommitted tuples at the end of the index range, get_actual_variable_range() ends up fetching each one and doing an MVCC visibility check on it, until it finally hits a visible tuple. This is bad enough in isolation, considering that we don't need an exact answer only an approximate one. But because the tuples are not yet committed, each visibility check does a TransactionIdIsInProgress() test, which involves scanning the ProcArray. When multiple sessions do this concurrently, the ensuing contention results in horrid performance loss. 20X overall throughput loss on not-too-complicated queries is easy to demonstrate in the back branches (though someone's made it noticeably less bad in HEAD). We can dodge the problem fairly effectively by using SnapshotDirty rather than a normal MVCC snapshot. This will cause the index probe to take uncommitted tuples as good, so that we incur only one tuple fetch and test even if there are many such tuples. The extent to which this degrades the estimate is debatable: it's possible the result is actually a more accurate prediction than before, if the endmost tuple has become committed by the time we actually execute the query being planned. In any case, it's not very likely that it makes the estimate a lot worse. SnapshotDirty will still reject tuples that are known committed dead, so we won't give bogus answers if an invalid outlier has been deleted but not yet vacuumed from the index. (Because btrees know how to mark such tuples dead in the index, we shouldn't have a big performance problem in the case that there are many of them at the end of the range.) This consideration motivates not using SnapshotAny, which was also considered as a fix. Note: the back branches were using SnapshotNow instead of an MVCC snapshot, but the problem and solution are the same. Per performance complaints from Bartlomiej Romanski, Josh Berkus, and others. Back-patch to 9.0, where the issue was introduced (by commit 40608e7).
1 parent e6f7fe9 commit 3e2db4c

File tree

1 file changed

+21
-4
lines changed

1 file changed

+21
-4
lines changed

src/backend/utils/adt/selfuncs.c

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4705,6 +4705,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
47054705
HeapTuple tup;
47064706
Datum values[INDEX_MAX_KEYS];
47074707
bool isnull[INDEX_MAX_KEYS];
4708+
SnapshotData SnapshotDirty;
47084709

47094710
estate = CreateExecutorState();
47104711
econtext = GetPerTupleExprContext(estate);
@@ -4727,6 +4728,7 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
47274728
slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRel));
47284729
econtext->ecxt_scantuple = slot;
47294730
get_typlenbyval(vardata->atttype, &typLen, &typByVal);
4731+
InitDirtySnapshot(SnapshotDirty);
47304732

47314733
/* set up an IS NOT NULL scan key so that we ignore nulls */
47324734
ScanKeyEntryInitialize(&scankeys[0],
@@ -4743,8 +4745,23 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
47434745
/* If min is requested ... */
47444746
if (min)
47454747
{
4746-
index_scan = index_beginscan(heapRel, indexRel, SnapshotNow,
4747-
1, 0);
4748+
/*
4749+
* In principle, we should scan the index with our current
4750+
* active snapshot, which is the best approximation we've got
4751+
* to what the query will see when executed. But that won't
4752+
* be exact if a new snap is taken before running the query,
4753+
* and it can be very expensive if a lot of uncommitted rows
4754+
* exist at the end of the index (because we'll laboriously
4755+
* fetch each one and reject it). What seems like a good
4756+
* compromise is to use SnapshotDirty. That will accept
4757+
* uncommitted rows, and thus avoid fetching multiple heap
4758+
* tuples in this scenario. On the other hand, it will reject
4759+
* known-dead rows, and thus not give a bogus answer when the
4760+
* extreme value has been deleted; that case motivates not
4761+
* using SnapshotAny here.
4762+
*/
4763+
index_scan = index_beginscan(heapRel, indexRel,
4764+
&SnapshotDirty, 1, 0);
47484765
index_rescan(index_scan, scankeys, 1, NULL, 0);
47494766

47504767
/* Fetch first tuple in sortop's direction */
@@ -4775,8 +4792,8 @@ get_actual_variable_range(PlannerInfo *root, VariableStatData *vardata,
47754792
/* If max is requested, and we didn't find the index is empty */
47764793
if (max && have_data)
47774794
{
4778-
index_scan = index_beginscan(heapRel, indexRel, SnapshotNow,
4779-
1, 0);
4795+
index_scan = index_beginscan(heapRel, indexRel,
4796+
&SnapshotDirty, 1, 0);
47804797
index_rescan(index_scan, scankeys, 1, NULL, 0);
47814798

47824799
/* Fetch first tuple in reverse direction */

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy