Skip to content

Commit 46bfe44

Browse files
committed
Fix bogus concurrent use of _hash_getnewbuf() in bucket split code.
_hash_splitbucket() obtained the base page of the new bucket by calling _hash_getnewbuf(), but it held no exclusive lock that would prevent some other process from calling _hash_getnewbuf() at the same time. This is contrary to _hash_getnewbuf()'s API spec and could in fact cause failures. In practice, we must only call that function while holding write lock on the hash index's metapage. An additional problem was that we'd already modified the metapage's bucket mapping data, meaning that failure to extend the index would leave us with a corrupt index. Fix both issues by moving the _hash_getnewbuf() call to just before we modify the metapage in _hash_expandtable(). Unfortunately there's still a large problem here, which is that we could also incur ENOSPC while trying to get an overflow page for the new bucket. That would leave the index corrupt in a more subtle way, namely that some index tuples that should be in the new bucket might still be in the old one. Fixing that seems substantially more difficult; even preallocating as many pages as we could possibly need wouldn't entirely guarantee that the bucket split would complete successfully. So for today let's just deal with the base case. Per report from Antonin Houska. Back-patch to all active branches.
1 parent ab02d35 commit 46bfe44

File tree

1 file changed

+26
-4
lines changed

1 file changed

+26
-4
lines changed

src/backend/access/hash/hashpage.c

Lines changed: 26 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
static bool _hash_alloc_buckets(Relation rel, BlockNumber firstblock,
4141
uint32 nblocks);
4242
static void _hash_splitbucket(Relation rel, Buffer metabuf,
43+
Buffer nbuf,
4344
Bucket obucket, Bucket nbucket,
4445
BlockNumber start_oblkno,
4546
BlockNumber start_nblkno,
@@ -179,7 +180,9 @@ _hash_getinitbuf(Relation rel, BlockNumber blkno)
179180
* EOF but before updating the metapage to reflect the added page.)
180181
*
181182
* It is caller's responsibility to ensure that only one process can
182-
* extend the index at a time.
183+
* extend the index at a time. In practice, this function is called
184+
* only while holding write lock on the metapage, because adding a page
185+
* is always associated with an update of metapage data.
183186
*/
184187
Buffer
185188
_hash_getnewbuf(Relation rel, BlockNumber blkno, ForkNumber forkNum)
@@ -506,6 +509,7 @@ _hash_expandtable(Relation rel, Buffer metabuf)
506509
uint32 spare_ndx;
507510
BlockNumber start_oblkno;
508511
BlockNumber start_nblkno;
512+
Buffer buf_nblkno;
509513
uint32 maxbucket;
510514
uint32 highmask;
511515
uint32 lowmask;
@@ -618,6 +622,13 @@ _hash_expandtable(Relation rel, Buffer metabuf)
618622
}
619623
}
620624

625+
/*
626+
* Physically allocate the new bucket's primary page. We want to do this
627+
* before changing the metapage's mapping info, in case we can't get the
628+
* disk space.
629+
*/
630+
buf_nblkno = _hash_getnewbuf(rel, start_nblkno, MAIN_FORKNUM);
631+
621632
/*
622633
* Okay to proceed with split. Update the metapage bucket mapping info.
623634
*
@@ -671,7 +682,8 @@ _hash_expandtable(Relation rel, Buffer metabuf)
671682
_hash_droplock(rel, 0, HASH_EXCLUSIVE);
672683

673684
/* Relocate records to the new bucket */
674-
_hash_splitbucket(rel, metabuf, old_bucket, new_bucket,
685+
_hash_splitbucket(rel, metabuf, buf_nblkno,
686+
old_bucket, new_bucket,
675687
start_oblkno, start_nblkno,
676688
maxbucket, highmask, lowmask);
677689

@@ -754,10 +766,16 @@ _hash_alloc_buckets(Relation rel, BlockNumber firstblock, uint32 nblocks)
754766
* The caller must hold a pin, but no lock, on the metapage buffer.
755767
* The buffer is returned in the same state. (The metapage is only
756768
* touched if it becomes necessary to add or remove overflow pages.)
769+
*
770+
* In addition, the caller must have created the new bucket's base page,
771+
* which is passed in buffer nbuf, pinned and write-locked. The lock
772+
* and pin are released here. (The API is set up this way because we must
773+
* do _hash_getnewbuf() before releasing the metapage write lock.)
757774
*/
758775
static void
759776
_hash_splitbucket(Relation rel,
760777
Buffer metabuf,
778+
Buffer nbuf,
761779
Bucket obucket,
762780
Bucket nbucket,
763781
BlockNumber start_oblkno,
@@ -769,7 +787,6 @@ _hash_splitbucket(Relation rel,
769787
BlockNumber oblkno;
770788
BlockNumber nblkno;
771789
Buffer obuf;
772-
Buffer nbuf;
773790
Page opage;
774791
Page npage;
775792
HashPageOpaque oopaque;
@@ -786,7 +803,7 @@ _hash_splitbucket(Relation rel,
786803
oopaque = (HashPageOpaque) PageGetSpecialPointer(opage);
787804

788805
nblkno = start_nblkno;
789-
nbuf = _hash_getnewbuf(rel, nblkno, MAIN_FORKNUM);
806+
Assert(nblkno == BufferGetBlockNumber(nbuf));
790807
npage = BufferGetPage(nbuf);
791808

792809
/* initialize the new bucket's primary page */
@@ -835,6 +852,11 @@ _hash_splitbucket(Relation rel,
835852
* insert the tuple into the new bucket. if it doesn't fit on
836853
* the current page in the new bucket, we must allocate a new
837854
* overflow page and place the tuple on that page instead.
855+
*
856+
* XXX we have a problem here if we fail to get space for a
857+
* new overflow page: we'll error out leaving the bucket split
858+
* only partially complete, meaning the index is corrupt,
859+
* since searches may fail to find entries they should find.
838860
*/
839861
itemsz = IndexTupleDSize(*itup);
840862
itemsz = MAXALIGN(itemsz);

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy