Skip to content

Commit 449e14a

Browse files
Doc: Describe CREATE INDEX deduplication strategy.
The B-Tree index deduplication strategy used during CREATE INDEX and REINDEX differs from the lazy strategy used by retail inserts. Make that clear by adding a new paragraph to the B-Tree implementation section of the documentation. In passing, do some copy-editing of nearby deduplication documentation.
1 parent 3350fb5 commit 449e14a

File tree

1 file changed

+37
-17
lines changed

1 file changed

+37
-17
lines changed

doc/src/sgml/btree.sgml

Lines changed: 37 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -622,12 +622,13 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
622622
</para>
623623
<note>
624624
<para>
625-
While NULL is generally not considered to be equal to any other
626-
value, including NULL, NULL is nevertheless treated as just
627-
another value from the domain of indexed values by the B-Tree
628-
implementation (except when enforcing uniqueness in a unique
629-
index). B-Tree deduplication is therefore just as effective with
630-
<quote>duplicates</quote> that contain a NULL value.
625+
B-Tree deduplication is just as effective with
626+
<quote>duplicates</quote> that contain a NULL value, even though
627+
NULL values are never equal to each other according to the
628+
<literal>=</literal> member of any B-Tree operator class. As far
629+
as any part of the implementation that understands the on-disk
630+
B-Tree structure is concerned, NULL is just another value from the
631+
domain of indexed values.
631632
</para>
632633
</note>
633634
<para>
@@ -642,6 +643,20 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
642643
see a moderate performance benefit from using deduplication.
643644
Deduplication is enabled by default.
644645
</para>
646+
<para>
647+
<command>CREATE INDEX</command> and <command>REINDEX</command>
648+
apply deduplication to create posting list tuples, though the
649+
strategy they use is slightly different. Each group of duplicate
650+
ordinary tuples encountered in the sorted input taken from the
651+
table is merged into a posting list tuple
652+
<emphasis>before</emphasis> being added to the current pending leaf
653+
page. Individual posting list tuples are packed with as many
654+
<acronym>TID</acronym>s as possible. Leaf pages are written out in
655+
the usual way, without any separate deduplication pass. This
656+
strategy is well-suited to <command>CREATE INDEX</command> and
657+
<command>REINDEX</command> because they are once-off batch
658+
operations.
659+
</para>
645660
<para>
646661
Write-heavy workloads that don't benefit from deduplication due to
647662
having few or no duplicate values in indexes will incur a small,
@@ -657,17 +672,22 @@ equalimage(<replaceable>opcintype</replaceable> <type>oid</type>) returns bool
657672
B-Tree indexes are not directly aware that under MVCC, there might
658673
be multiple extant versions of the same logical table row; to an
659674
index, each tuple is an independent object that needs its own index
660-
entry. Thus, an update of a row always creates all-new index
661-
entries for the row, even if the key values did not change. Some
662-
workloads suffer from index bloat caused by these
663-
implementation-level version duplicates (this is typically a
664-
problem for <command>UPDATE</command>-heavy workloads that cannot
665-
apply the <acronym>HOT</acronym> optimization due to modifying at
666-
least one indexed column). B-Tree deduplication does not
667-
distinguish between these implementation-level version duplicates
668-
and conventional duplicates. Deduplication can nevertheless help
669-
with controlling index bloat caused by implementation-level version
670-
churn.
675+
entry. <quote>Version duplicates</quote> may sometimes accumulate
676+
and adversely affect query latency and throughput. This typically
677+
occurs with <command>UPDATE</command>-heavy workloads where most
678+
individual updates cannot apply the <acronym>HOT</acronym>
679+
optimization (often because at least one indexed column gets
680+
modified, necessitating a new set of index tuple versions &mdash;
681+
one new tuple for <emphasis>each and every</emphasis> index). In
682+
effect, B-Tree deduplication ameliorates index bloat caused by
683+
version churn. Note that even the tuples from a unique index are
684+
not necessarily <emphasis>physically</emphasis> unique when stored
685+
on disk due to version churn. The deduplication optimization is
686+
selectively applied within unique indexes. It targets those pages
687+
that appear to have version duplicates. The high level goal is to
688+
give <command>VACUUM</command> more time to run before an
689+
<quote>unnecessary</quote> page split caused by version churn can
690+
take place.
671691
</para>
672692
<tip>
673693
<para>

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy