Skip to content

Commit e9661f2

Browse files
committed
doc: make blooms docs match reality
Parallel execution changed the way bloom queries are executed, so update the EXPLAIN output, and restructure the docs to be clearer and more accurate. Reported-by: Daniel Westermann Discussion: https://postgr.es/m/ZR0P278MB0122119FAE78721A694C30C8D2340@ZR0P278MB0122.CHEP278.PROD.OUTLOOK.COM Author: Daniel Westermann and me Backpatch-through: 9.6
1 parent 20d3fe9 commit e9661f2

File tree

1 file changed

+62
-55
lines changed

1 file changed

+62
-55
lines changed

doc/src/sgml/bloom.sgml

Lines changed: 62 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -110,75 +110,70 @@ CREATE INDEX bloomidx ON tbloom USING bloom (i1,i2,i3)
110110
FROM
111111
generate_series(1,10000000);
112112
SELECT 10000000
113-
=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
114-
CREATE INDEX
115-
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
116-
pg_size_pretty
117-
----------------
118-
153 MB
119-
(1 row)
120-
=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
121-
CREATE INDEX
122-
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
123-
pg_size_pretty
124-
----------------
125-
387 MB
126-
(1 row)
127113
</programlisting>
128114

129115
<para>
130116
A sequential scan over this large table takes a long time:
131117
<programlisting>
132118
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
133-
QUERY PLAN
134-
-------------------------------------------------------------------&zwsp;-----------------------------------------
135-
Seq Scan on tbloom (cost=0.00..213694.08 rows=1 width=24) (actual time=1445.438..1445.438 rows=0 loops=1)
119+
QUERY PLAN
120+
-------------------------------------------------------------------&zwsp;-----------------------------------
121+
Seq Scan on tbloom (cost=0.00..2137.14 rows=3 width=24) (actual time=16.971..16.971 rows=0 loops=1)
136122
Filter: ((i2 = 898732) AND (i5 = 123451))
137-
Rows Removed by Filter: 10000000
138-
Planning time: 0.177 ms
139-
Execution time: 1445.473 ms
123+
Rows Removed by Filter: 100000
124+
Planning Time: 0.346 ms
125+
Execution Time: 16.988 ms
140126
(5 rows)
141127
</programlisting>
142128
</para>
143129

144130
<para>
145-
So the planner will usually select an index scan if possible.
146-
With a btree index, we get results like this:
131+
Even with the btree index defined the result will still be a
132+
sequential scan:
147133
<programlisting>
134+
=# CREATE INDEX btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
135+
CREATE INDEX
136+
=# SELECT pg_size_pretty(pg_relation_size('btreeidx'));
137+
pg_size_pretty
138+
----------------
139+
3976 kB
140+
(1 row)
148141
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
149-
QUERY PLAN
150-
-------------------------------------------------------------------&zwsp;-------------------------------------------------------------
151-
Index Only Scan using btreeidx on tbloom (cost=0.56..298311.96 rows=1 width=24) (actual time=445.709..445.709 rows=0 loops=1)
152-
Index Cond: ((i2 = 898732) AND (i5 = 123451))
153-
Heap Fetches: 0
154-
Planning time: 0.193 ms
155-
Execution time: 445.770 ms
142+
QUERY PLAN
143+
-------------------------------------------------------------------&zwsp;-----------------------------------
144+
Seq Scan on tbloom (cost=0.00..2137.00 rows=2 width=24) (actual time=12.805..12.805 rows=0 loops=1)
145+
Filter: ((i2 = 898732) AND (i5 = 123451))
146+
Rows Removed by Filter: 100000
147+
Planning Time: 0.138 ms
148+
Execution Time: 12.817 ms
156149
(5 rows)
157150
</programlisting>
158151
</para>
159152

160153
<para>
161-
Bloom is better than btree in handling this type of search:
154+
Having the bloom index defined on the table is better than btree in
155+
handling this type of search:
162156
<programlisting>
157+
=# CREATE INDEX bloomidx ON tbloom USING bloom (i1, i2, i3, i4, i5, i6);
158+
CREATE INDEX
159+
=# SELECT pg_size_pretty(pg_relation_size('bloomidx'));
160+
pg_size_pretty
161+
----------------
162+
1584 kB
163+
(1 row)
163164
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
164-
QUERY PLAN
165-
-------------------------------------------------------------------&zwsp;--------------------------------------------------------
166-
Bitmap Heap Scan on tbloom (cost=178435.39..178439.41 rows=1 width=24) (actual time=76.698..76.698 rows=0 loops=1)
165+
QUERY PLAN
166+
-------------------------------------------------------------------&zwsp;--------------------------------------------------
167+
Bitmap Heap Scan on tbloom (cost=1792.00..1799.69 rows=2 width=24) (actual time=0.388..0.388 rows=0 loops=1)
167168
Recheck Cond: ((i2 = 898732) AND (i5 = 123451))
168-
Rows Removed by Index Recheck: 2439
169-
Heap Blocks: exact=2408
170-
-&gt; Bitmap Index Scan on bloomidx (cost=0.00..178435.39 rows=1 width=0) (actual time=72.455..72.455 rows=2439 loops=1)
169+
Rows Removed by Index Recheck: 29
170+
Heap Blocks: exact=28
171+
-&gt; Bitmap Index Scan on bloomidx (cost=0.00..1792.00 rows=2 width=0) (actual time=0.356..0.356 rows=29 loops=1)
171172
Index Cond: ((i2 = 898732) AND (i5 = 123451))
172-
Planning time: 0.475 ms
173-
Execution time: 76.778 ms
173+
Planning Time: 0.099 ms
174+
Execution Time: 0.408 ms
174175
(8 rows)
175176
</programlisting>
176-
Note the relatively large number of false positives: 2439 rows were
177-
selected to be visited in the heap, but none actually matched the
178-
query. We could reduce that by specifying a larger signature length.
179-
In this example, creating the index with <literal>length=200</literal>
180-
reduced the number of false positives to 55; but it doubled the index size
181-
(to 306 MB) and ended up being slower for this query (125 ms overall).
182177
</para>
183178

184179
<para>
@@ -187,24 +182,36 @@ CREATE INDEX
187182
A better strategy for btree is to create a separate index on each column.
188183
Then the planner will choose something like this:
189184
<programlisting>
185+
=# CREATE INDEX btreeidx1 ON tbloom (i1);
186+
CREATE INDEX
187+
=# CREATE INDEX btreeidx2 ON tbloom (i2);
188+
CREATE INDEX
189+
=# CREATE INDEX btreeidx3 ON tbloom (i3);
190+
CREATE INDEX
191+
=# CREATE INDEX btreeidx4 ON tbloom (i4);
192+
CREATE INDEX
193+
=# CREATE INDEX btreeidx5 ON tbloom (i5);
194+
CREATE INDEX
195+
=# CREATE INDEX btreeidx6 ON tbloom (i6);
196+
CREATE INDEX
190197
=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 AND i5 = 123451;
191-
QUERY PLAN
192-
-------------------------------------------------------------------&zwsp;-----------------------------------------------------------
193-
Bitmap Heap Scan on tbloom (cost=9.29..13.30 rows=1 width=24) (actual time=0.148..0.148 rows=0 loops=1)
198+
QUERY PLAN
199+
-------------------------------------------------------------------&zwsp;--------------------------------------------------------
200+
Bitmap Heap Scan on tbloom (cost=24.34..32.03 rows=2 width=24) (actual time=0.028..0.029 rows=0 loops=1)
194201
Recheck Cond: ((i5 = 123451) AND (i2 = 898732))
195-
-&gt; BitmapAnd (cost=9.29..9.29 rows=1 width=0) (actual time=0.145..0.145 rows=0 loops=1)
196-
-&gt; Bitmap Index Scan on tbloom_i5_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.089..0.089 rows=10 loops=1)
202+
-&gt; BitmapAnd (cost=24.34..24.34 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1)
203+
-&gt; Bitmap Index Scan on btreeidx5 (cost=0.00..12.04 rows=500 width=0) (actual time=0.026..0.026 rows=0 loops=1)
197204
Index Cond: (i5 = 123451)
198-
-&gt; Bitmap Index Scan on tbloom_i2_idx (cost=0.00..4.52 rows=11 width=0) (actual time=0.048..0.048 rows=8 loops=1)
205+
-&gt; Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed)
199206
Index Cond: (i2 = 898732)
200-
Planning time: 2.049 ms
201-
Execution time: 0.280 ms
207+
Planning Time: 0.491 ms
208+
Execution Time: 0.055 ms
202209
(9 rows)
203210
</programlisting>
204211
Although this query runs much faster than with either of the single
205-
indexes, we pay a large penalty in index size. Each of the single-column
206-
btree indexes occupies 214 MB, so the total space needed is over 1.2GB,
207-
more than 8 times the space used by the bloom index.
212+
indexes, we pay a penalty in index size. Each of the single-column
213+
btree indexes occupies 2 MB, so the total space needed is 12 MB,
214+
eight times the space used by the bloom index.
208215
</para>
209216
</sect2>
210217

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy