Skip to content

Commit dd94c28

Browse files
Fix "single value strategy" index deletion issue.
It is not appropriate for deduplication to apply single value strategy when triggered by a bottom-up index deletion pass. This wastes cycles because later bottom-up deletion passes will overinterpret older duplicate tuples that deduplication actually just skipped over "by design". It also makes bottom-up deletion much less effective for low cardinality indexes that happen to cross a meaningless "index has single key value per leaf page" threshold. To fix, slightly narrow the conditions under which deduplication's single value strategy is considered. We already avoided the strategy for a unique index, since our high level goal must just be to buy time for VACUUM to run (not to buy space). We'll now also avoid it when we just had a bottom-up pass that reported failure. The two cases share the same high level goal, and already overlapped significantly, so this approach is quite natural. Oversight in commit d168b66, which added bottom-up index deletion. Author: Peter Geoghegan <pg@bowt.ie> Discussion: https://postgr.es/m/CAH2-WznaOvM+Gyj-JQ0X=JxoMDxctDTYjiEuETdAGbF5EUc3MA@mail.gmail.com Backpatch: 14-, where bottom-up deletion was introduced.
1 parent 1a9d802 commit dd94c28

File tree

3 files changed

+19
-21
lines changed

3 files changed

+19
-21
lines changed

src/backend/access/nbtree/nbtdedup.c

Lines changed: 17 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -34,14 +34,17 @@ static bool _bt_posting_valid(IndexTuple posting);
3434
*
3535
* The general approach taken here is to perform as much deduplication as
3636
* possible to free as much space as possible. Note, however, that "single
37-
* value" strategy is sometimes used for !checkingunique callers, in which
38-
* case deduplication will leave a few tuples untouched at the end of the
39-
* page. The general idea is to prepare the page for an anticipated page
40-
* split that uses nbtsplitloc.c's "single value" strategy to determine a
41-
* split point. (There is no reason to deduplicate items that will end up on
42-
* the right half of the page after the anticipated page split; better to
43-
* handle those if and when the anticipated right half page gets its own
44-
* deduplication pass, following further inserts of duplicates.)
37+
* value" strategy is used for !bottomupdedup callers when the page is full of
38+
* tuples of a single value. Deduplication passes that apply the strategy
39+
* will leave behind a few untouched tuples at the end of the page, preparing
40+
* the page for an anticipated page split that uses nbtsplitloc.c's own single
41+
* value strategy. Our high level goal is to delay merging the untouched
42+
* tuples until after the page splits.
43+
*
44+
* When a call to _bt_bottomupdel_pass() just took place (and failed), our
45+
* high level goal is to prevent a page split entirely by buying more time.
46+
* We still hope that a page split can be avoided altogether. That's why
47+
* single value strategy is not even considered for bottomupdedup callers.
4548
*
4649
* The page will have to be split if we cannot successfully free at least
4750
* newitemsz (we also need space for newitem's line pointer, which isn't
@@ -52,7 +55,7 @@ static bool _bt_posting_valid(IndexTuple posting);
5255
*/
5356
void
5457
_bt_dedup_pass(Relation rel, Buffer buf, Relation heapRel, IndexTuple newitem,
55-
Size newitemsz, bool checkingunique)
58+
Size newitemsz, bool bottomupdedup)
5659
{
5760
OffsetNumber offnum,
5861
minoff,
@@ -97,8 +100,11 @@ _bt_dedup_pass(Relation rel, Buffer buf, Relation heapRel, IndexTuple newitem,
97100
minoff = P_FIRSTDATAKEY(opaque);
98101
maxoff = PageGetMaxOffsetNumber(page);
99102

100-
/* Determine if "single value" strategy should be used */
101-
if (!checkingunique)
103+
/*
104+
* Consider applying "single value" strategy, though only if the page
105+
* seems likely to be split in the near future
106+
*/
107+
if (!bottomupdedup)
102108
singlevalstrat = _bt_do_singleval(rel, page, state, minoff, newitem);
103109

104110
/*
@@ -764,14 +770,6 @@ _bt_bottomupdel_finish_pending(Page page, BTDedupState state,
764770
* the first pass) won't spend many cycles on the large posting list tuples
765771
* left by previous passes. Each pass will find a large contiguous group of
766772
* smaller duplicate tuples to merge together at the end of the page.
767-
*
768-
* Note: We deliberately don't bother checking if the high key is a distinct
769-
* value (prior to the TID tiebreaker column) before proceeding, unlike
770-
* nbtsplitloc.c. Its single value strategy only gets applied on the
771-
* rightmost page of duplicates of the same value (other leaf pages full of
772-
* duplicates will get a simple 50:50 page split instead of splitting towards
773-
* the end of the page). There is little point in making the same distinction
774-
* here.
775773
*/
776774
static bool
777775
_bt_do_singleval(Relation rel, Page page, BTDedupState state,

src/backend/access/nbtree/nbtinsert.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2748,7 +2748,7 @@ _bt_delete_or_dedup_one_page(Relation rel, Relation heapRel,
27482748
/* Perform deduplication pass (when enabled and index-is-allequalimage) */
27492749
if (BTGetDeduplicateItems(rel) && itup_key->allequalimage)
27502750
_bt_dedup_pass(rel, buffer, heapRel, insertstate->itup,
2751-
insertstate->itemsz, checkingunique);
2751+
insertstate->itemsz, (indexUnchanged || uniquedup));
27522752
}
27532753

27542754
/*

src/include/access/nbtree.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1155,7 +1155,7 @@ extern void _bt_parallel_advance_array_keys(IndexScanDesc scan);
11551155
*/
11561156
extern void _bt_dedup_pass(Relation rel, Buffer buf, Relation heapRel,
11571157
IndexTuple newitem, Size newitemsz,
1158-
bool checkingunique);
1158+
bool bottomupdedup);
11591159
extern bool _bt_bottomupdel_pass(Relation rel, Buffer buf, Relation heapRel,
11601160
Size newitemsz);
11611161
extern void _bt_dedup_start_pending(BTDedupState state, IndexTuple base,

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy