Skip to content

Commit d4fe61b

Browse files
committed
Fix an additional set of problems in GIN's handling of lossy page pointers.
Although the key-combining code claimed to work correctly if its input contained both lossy and exact pointers for a single page in a single TID stream, in fact this did not work, and could not work without pretty fundamental redesign. Modify keyGetItem so that it will not return such a stream, by handling lossy-pointer cases a bit more explicitly than we did before. Per followup investigation of a gripe from Artur Dabrowski. An example of a query that failed given his data set is select count(*) from search_tab where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* | dd:*')) and (to_tsvector('german', keywords ) @@ to_tsquery('german', 'aa:*')); Back-patch to 8.4 where the lossy pointer code was introduced.
1 parent 0454f13 commit d4fe61b

File tree

2 files changed

+137
-66
lines changed

2 files changed

+137
-66
lines changed

src/backend/access/gin/ginget.c

Lines changed: 126 additions & 64 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
* Portions Copyright (c) 1994, Regents of the University of California
99
*
1010
* IDENTIFICATION
11-
* $PostgreSQL: pgsql/src/backend/access/gin/ginget.c,v 1.31 2010/07/31 00:30:54 tgl Exp $
11+
* $PostgreSQL: pgsql/src/backend/access/gin/ginget.c,v 1.32 2010/08/01 19:16:39 tgl Exp $
1212
*-------------------------------------------------------------------------
1313
*/
1414

@@ -43,13 +43,12 @@ findItemInPage(Page page, ItemPointer item, OffsetNumber *off)
4343
int res;
4444

4545
if (GinPageGetOpaque(page)->flags & GIN_DELETED)
46-
/* page was deleted by concurrent vacuum */
46+
/* page was deleted by concurrent vacuum */
4747
return false;
4848

4949
/*
5050
* scan page to find equal or first greater value
5151
*/
52-
5352
for (*off = FirstOffsetNumber; *off <= maxoff; (*off)++)
5453
{
5554
res = compareItemPointers(item, (ItemPointer) GinDataPageGetItem(page, *off));
@@ -527,12 +526,40 @@ entryGetNextItem(Relation index, GinScanEntry entry)
527526
}
528527
}
529528

529+
/* convenience function for invoking a key's consistentFn */
530+
static inline bool
531+
callConsistentFn(GinState *ginstate, GinScanKey key)
532+
{
533+
/*
534+
* Initialize recheckCurItem in case the consistentFn doesn't know it
535+
* should set it. The safe assumption in that case is to force recheck.
536+
*/
537+
key->recheckCurItem = true;
538+
539+
return DatumGetBool(FunctionCall6(&ginstate->consistentFn[key->attnum - 1],
540+
PointerGetDatum(key->entryRes),
541+
UInt16GetDatum(key->strategy),
542+
key->query,
543+
UInt32GetDatum(key->nentries),
544+
PointerGetDatum(key->extra_data),
545+
PointerGetDatum(&key->recheckCurItem)));
546+
}
547+
530548
#define gin_rand() (((double) random()) / ((double) MAX_RANDOM_VALUE))
531549
#define dropItem(e) ( gin_rand() > ((double)GinFuzzySearchLimit)/((double)((e)->predictNumberResult)) )
532550

533551
/*
534552
* Sets entry->curItem to next heap item pointer for one entry of one scan key,
535553
* or sets entry->isFinished to TRUE if there are no more.
554+
*
555+
* Item pointers must be returned in ascending order.
556+
*
557+
* Note: this can return a "lossy page" item pointer, indicating that the
558+
* entry potentially matches all items on that heap page. However, it is
559+
* not allowed to return both a lossy page pointer and exact (regular)
560+
* item pointers for the same page. (Doing so would break the key-combination
561+
* logic in keyGetItem and scanGetItem; see comment in scanGetItem.) In the
562+
* current implementation this is guaranteed by the behavior of tidbitmaps.
536563
*/
537564
static void
538565
entryGetItem(Relation index, GinScanEntry entry)
@@ -625,15 +652,20 @@ entryGetItem(Relation index, GinScanEntry entry)
625652
* item pointer (including the case where the item pointer is a lossy page
626653
* pointer).
627654
*
628-
* Note: lossy page could be returned after single items from the same page.
629-
* This is OK since the results will just be used to build a bitmap; we'll
630-
* set a bitmap entry more than once, but never actually return a row twice.
655+
* Item pointers must be returned in ascending order.
656+
*
657+
* Note: this can return a "lossy page" item pointer, indicating that the
658+
* key potentially matches all items on that heap page. However, it is
659+
* not allowed to return both a lossy page pointer and exact (regular)
660+
* item pointers for the same page. (Doing so would break the key-combination
661+
* logic in scanGetItem.)
631662
*/
632663
static void
633664
keyGetItem(Relation index, GinState *ginstate, MemoryContext tempCtx,
634665
GinScanKey key, ItemPointer advancePast)
635666
{
636667
ItemPointerData myAdvancePast = *advancePast;
668+
ItemPointerData curPageLossy;
637669
uint32 i;
638670
uint32 lossyEntry;
639671
bool haveLossyEntry;
@@ -691,87 +723,112 @@ keyGetItem(Relation index, GinState *ginstate, MemoryContext tempCtx,
691723
myAdvancePast = key->curItem;
692724

693725
/*
694-
* Prepare entryRes array to be passed to consistentFn.
695-
*
696-
* If key->nentries == 1 then the consistentFn should always succeed,
697-
* but we must call it anyway to find out the recheck status.
698-
*
699726
* Lossy-page entries pose a problem, since we don't know the correct
700727
* entryRes state to pass to the consistentFn, and we also don't know
701728
* what its combining logic will be (could be AND, OR, or even NOT).
702-
* Our strategy for a single lossy-page entry is to try the
703-
* consistentFn both ways and return a hit if it accepts either one
704-
* (forcing the hit to be marked lossy so it will be rechecked).
729+
* If the logic is OR then the consistentFn might succeed for all
730+
* items in the lossy page even when none of the other entries match.
731+
*
732+
* If we have a single lossy-page entry then we check to see if the
733+
* consistentFn will succeed with only that entry TRUE. If so,
734+
* we return a lossy-page pointer to indicate that the whole heap
735+
* page must be checked. (On the next call, we'll advance past all
736+
* regular and lossy entries for this page before resuming search,
737+
* thus ensuring that we never return both regular and lossy pointers
738+
* for the same page.)
705739
*
706740
* This idea could be generalized to more than one lossy-page entry,
707741
* but ideally lossy-page entries should be infrequent so it would
708-
* seldom be the case that we have more than one. If we do find more
709-
* than one, we just punt and return the item as lossy.
742+
* seldom be the case that we have more than one at once. So it
743+
* doesn't seem worth the extra complexity to optimize that case.
744+
* If we do find more than one, we just punt and return a lossy-page
745+
* pointer always.
710746
*
711747
* Note that only lossy-page entries pointing to the current item's
712-
* page should trigger this processing.
748+
* page should trigger this processing; we might have future lossy
749+
* pages in the entry array, but they aren't relevant yet.
713750
*/
751+
ItemPointerSetLossyPage(&curPageLossy,
752+
GinItemPointerGetBlockNumber(&key->curItem));
753+
714754
lossyEntry = 0;
715755
haveLossyEntry = false;
716756
for (i = 0; i < key->nentries; i++)
717757
{
718758
entry = key->scanEntry + i;
719-
720-
if (entry->isFinished)
721-
key->entryRes[i] = FALSE;
722-
else if (ItemPointerIsLossyPage(&entry->curItem) &&
723-
GinItemPointerGetBlockNumber(&entry->curItem) ==
724-
GinItemPointerGetBlockNumber(&key->curItem))
759+
if (entry->isFinished == FALSE &&
760+
compareItemPointers(&entry->curItem, &curPageLossy) == 0)
725761
{
726762
if (haveLossyEntry)
727763
{
728-
/* Too many lossy entries, punt */
764+
/* Multiple lossy entries, punt */
765+
key->curItem = curPageLossy;
729766
key->recheckCurItem = true;
730767
return;
731768
}
732769
lossyEntry = i;
733770
haveLossyEntry = true;
734-
/* initially assume TRUE */
735-
key->entryRes[i] = TRUE;
736771
}
737-
else if (compareItemPointers(&entry->curItem, &key->curItem) == 0)
738-
key->entryRes[i] = TRUE;
739-
else
740-
key->entryRes[i] = FALSE;
741772
}
742773

774+
/* prepare for calling consistentFn in temp context */
743775
oldCtx = MemoryContextSwitchTo(tempCtx);
744776

777+
if (haveLossyEntry)
778+
{
779+
/* Single lossy-page entry, so see if whole page matches */
780+
memset(key->entryRes, FALSE, key->nentries);
781+
key->entryRes[lossyEntry] = TRUE;
782+
783+
if (callConsistentFn(ginstate, key))
784+
{
785+
/* Yes, so clean up ... */
786+
MemoryContextSwitchTo(oldCtx);
787+
MemoryContextReset(tempCtx);
788+
789+
/* and return lossy pointer for whole page */
790+
key->curItem = curPageLossy;
791+
key->recheckCurItem = true;
792+
return;
793+
}
794+
}
795+
745796
/*
746-
* Initialize recheckCurItem in case the consistentFn doesn't know it
747-
* should set it. The safe assumption in that case is to force
748-
* recheck.
797+
* At this point we know that we don't need to return a lossy
798+
* whole-page pointer, but we might have matches for individual exact
799+
* item pointers, possibly in combination with a lossy pointer. Our
800+
* strategy if there's a lossy pointer is to try the consistentFn both
801+
* ways and return a hit if it accepts either one (forcing the hit to
802+
* be marked lossy so it will be rechecked).
803+
*
804+
* Prepare entryRes array to be passed to consistentFn.
805+
*
806+
* (If key->nentries == 1 then the consistentFn should always succeed,
807+
* but we must call it anyway to find out the recheck status.)
749808
*/
750-
key->recheckCurItem = true;
809+
for (i = 0; i < key->nentries; i++)
810+
{
811+
entry = key->scanEntry + i;
812+
if (entry->isFinished == FALSE &&
813+
compareItemPointers(&entry->curItem, &key->curItem) == 0)
814+
key->entryRes[i] = TRUE;
815+
else
816+
key->entryRes[i] = FALSE;
817+
}
818+
if (haveLossyEntry)
819+
key->entryRes[lossyEntry] = TRUE;
751820

752-
res = DatumGetBool(FunctionCall6(&ginstate->consistentFn[key->attnum - 1],
753-
PointerGetDatum(key->entryRes),
754-
UInt16GetDatum(key->strategy),
755-
key->query,
756-
UInt32GetDatum(key->nentries),
757-
PointerGetDatum(key->extra_data),
758-
PointerGetDatum(&key->recheckCurItem)));
821+
res = callConsistentFn(ginstate, key);
759822

760823
if (!res && haveLossyEntry)
761824
{
762825
/* try the other way for the lossy item */
763826
key->entryRes[lossyEntry] = FALSE;
764-
key->recheckCurItem = true;
765827

766-
res = DatumGetBool(FunctionCall6(&ginstate->consistentFn[key->attnum - 1],
767-
PointerGetDatum(key->entryRes),
768-
UInt16GetDatum(key->strategy),
769-
key->query,
770-
UInt32GetDatum(key->nentries),
771-
PointerGetDatum(key->extra_data),
772-
PointerGetDatum(&key->recheckCurItem)));
828+
res = callConsistentFn(ginstate, key);
773829
}
774830

831+
/* clean up after consistentFn calls */
775832
MemoryContextSwitchTo(oldCtx);
776833
MemoryContextReset(tempCtx);
777834

@@ -1108,7 +1165,6 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
11081165
GinScanOpaque so = (GinScanOpaque) scan->opaque;
11091166
MemoryContext oldCtx;
11101167
bool recheck,
1111-
keyrecheck,
11121168
match;
11131169
int i;
11141170
pendingPosition pos;
@@ -1152,8 +1208,8 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
11521208
continue;
11531209

11541210
/*
1155-
* Matching of entries of one row is finished, so check row by
1156-
* consistent function.
1211+
* Matching of entries of one row is finished, so check row using
1212+
* consistent functions.
11571213
*/
11581214
oldCtx = MemoryContextSwitchTo(so->tempCtx);
11591215
recheck = false;
@@ -1163,21 +1219,12 @@ scanPendingInsert(IndexScanDesc scan, TIDBitmap *tbm, int64 *ntids)
11631219
{
11641220
GinScanKey key = so->keys + i;
11651221

1166-
keyrecheck = true;
1167-
1168-
if (!DatumGetBool(FunctionCall6(&so->ginstate.consistentFn[key->attnum - 1],
1169-
PointerGetDatum(key->entryRes),
1170-
UInt16GetDatum(key->strategy),
1171-
key->query,
1172-
UInt32GetDatum(key->nentries),
1173-
PointerGetDatum(key->extra_data),
1174-
PointerGetDatum(&keyrecheck))))
1222+
if (!callConsistentFn(&so->ginstate, key))
11751223
{
11761224
match = false;
11771225
break;
11781226
}
1179-
1180-
recheck |= keyrecheck;
1227+
recheck |= key->recheckCurItem;
11811228
}
11821229

11831230
MemoryContextSwitchTo(oldCtx);
@@ -1248,11 +1295,26 @@ scanGetItem(IndexScanDesc scan, ItemPointer advancePast,
12481295

12491296
Assert(!ItemPointerIsMax(item));
12501297

1251-
/*
1298+
/*----------
12521299
* Now *item contains first ItemPointer after previous result.
12531300
*
12541301
* The item is a valid hit only if all the keys returned either
12551302
* that exact TID, or a lossy reference to the same page.
1303+
*
1304+
* This logic works only if a keyGetItem stream can never contain both
1305+
* exact and lossy pointers for the same page. Else we could have a
1306+
* case like
1307+
*
1308+
* stream 1 stream 2
1309+
* ... ...
1310+
* 42/6 42/7
1311+
* 50/1 42/0xffff
1312+
* ... ...
1313+
*
1314+
* We would conclude that 42/6 is not a match and advance stream 1,
1315+
* thus never detecting the match to the lossy pointer in stream 2.
1316+
* (keyGetItem has a similar problem versus entryGetItem.)
1317+
*----------
12561318
*/
12571319
match = true;
12581320
for (i = 0; i < so->nkeys; i++)

src/include/access/gin.h

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
*
55
* Copyright (c) 2006-2010, PostgreSQL Global Development Group
66
*
7-
* $PostgreSQL: pgsql/src/include/access/gin.h,v 1.40 2010/08/01 02:12:42 tgl Exp $
7+
* $PostgreSQL: pgsql/src/include/access/gin.h,v 1.41 2010/08/01 19:16:39 tgl Exp $
88
*--------------------------------------------------------------------------
99
*/
1010
#ifndef GIN_H
@@ -107,13 +107,22 @@ typedef struct GinMetaPageData
107107
* We use our own ItemPointerGet(BlockNumber|GetOffsetNumber)
108108
* to avoid Asserts, since sometimes the ip_posid isn't "valid"
109109
*/
110-
111110
#define GinItemPointerGetBlockNumber(pointer) \
112111
BlockIdGetBlockNumber(&(pointer)->ip_blkid)
113112

114113
#define GinItemPointerGetOffsetNumber(pointer) \
115114
((pointer)->ip_posid)
116115

116+
/*
117+
* Special-case item pointer values needed by the GIN search logic.
118+
* MIN: sorts less than any valid item pointer
119+
* MAX: sorts greater than any valid item pointer
120+
* LOSSY PAGE: indicates a whole heap page, sorts after normal item
121+
* pointers for that page
122+
* Note that these are all distinguishable from an "invalid" item pointer
123+
* (which is InvalidBlockNumber/0) as well as from all normal item
124+
* pointers (which have item numbers in the range 1..MaxHeapTuplesPerPage).
125+
*/
117126
#define ItemPointerSetMin(p) \
118127
ItemPointerSet((p), (BlockNumber)0, (OffsetNumber)0)
119128
#define ItemPointerIsMin(p) \

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy