Skip to content

Commit 38c8d29

Browse files
Make _bt_killitems drop pins it acquired itself.
Teach nbtree's _bt_killitems to leave the so->currPos page that it sets LP_DEAD items on in whatever state it was in when _bt_killitems was called. In particular, make sure that so->dropPin scans don't acquire a pin whose reference is saved in so->currPos.buf. Allowing _bt_killitems to change so->currPos.buf like this is wrong. The immediate consequence of allowing it is that code in _bt_steppage (that copies so->currPos into so->markPos) will behave as if the scan is a !so->dropPin scan. so->markPos will therefore retain the buffer pin indefinitely, even though _bt_killitems only needs to acquire a pin (along with a lock) for long enough to mark known-dead items LP_DEAD. This issue came to light following a report of a failure of an assertion from recent commit e6eed40. The test case in question involves the use of mark and restore. An initial call to _bt_killitems takes place that leaves so->currPos.buf in a state that is inconsistent with the scan being so->dropPin. A subsequent call to _bt_killitems for the same position (following so->currPos being saved in so->markPos, and then restored as so->currPos) resulted in the failure of an assertion that tests that so->currPos.buf is InvalidBuffer when the scan is so->dropPin (non-assert builds got a "resource was not closed" WARNING instead). The same problem exists on earlier releases, though the issue is far more subtle there. Recent commit e6eed40 introduced the so->dropPin field as a partial replacement for testing so->currPos.buf directly. Earlier releases won't get an assertion failure (or buffer pin leak), but they will allow the second _bt_killitems call from the test case to behave as if a buffer pin was consistently held since the original call to _bt_readpage. This is wrong; there will have been an initial window during which no pin was held on the so->currPos page, and yet the second _bt_killitems call will neglect to check if so->currPos.lsn continues to match the page's now-current LSN. As a result of all this, it's just about possible that _bt_killitems will set the wrong items LP_DEAD (on release branches). This could only happen with merge joins (the sole user of nbtree mark/restore support), when a concurrently inserted index tuple used a recently-recycled TID (and only when the new tuple was inserted onto the same page as a distinct concurrently-removed tuple with the same TID). This is exactly the scenario that _bt_killitems' check of the page's now-current LSN against the LSN stashed in currPos was supposed to prevent. A follow-up commit will make nbtree completely stop conditioning whether or not a position's pin needs to be dropped on whether the 'buf' field is set. All call sites that might need to drop a still-held pin will be taught to rely on the scan-level so->dropPin field recently introduced by commit e6eed40. That will make bugs of the same general nature as this one impossible (or make them much easier to detect, at least). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/545be1e5-3786-439a-9257-a90d30f8b849@gmail.com Backpatch-through: 13
1 parent f09fea3 commit 38c8d29

File tree

1 file changed

+20
-14
lines changed

1 file changed

+20
-14
lines changed

src/backend/access/nbtree/nbtutils.c

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1706,9 +1706,9 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
17061706
* current page and killed tuples thereon (generally, this should only be
17071707
* called if so->numKilled > 0).
17081708
*
1709-
* The caller does not have a lock on the page and may or may not have the
1710-
* page pinned in a buffer. Note that read-lock is sufficient for setting
1711-
* LP_DEAD status (which is only a hint).
1709+
* Caller should not have a lock on the so->currPos page, but may hold a
1710+
* buffer pin. When we return, it still won't be locked. It'll continue to
1711+
* hold whatever pins were held before calling here.
17121712
*
17131713
* We match items by heap TID before assuming they are the right ones to
17141714
* delete. We cope with cases where items have moved right due to insertions.
@@ -1740,7 +1740,8 @@ _bt_killitems(IndexScanDesc scan)
17401740
int i;
17411741
int numKilled = so->numKilled;
17421742
bool killedsomething = false;
1743-
bool droppedpin PG_USED_FOR_ASSERTS_ONLY;
1743+
bool droppedpin;
1744+
Buffer buf;
17441745

17451746
Assert(BTScanPosIsValid(so->currPos));
17461747

@@ -1759,29 +1760,31 @@ _bt_killitems(IndexScanDesc scan)
17591760
* LSN.
17601761
*/
17611762
droppedpin = false;
1762-
LockBuffer(so->currPos.buf, BT_READ);
1763-
1764-
page = BufferGetPage(so->currPos.buf);
1763+
buf = so->currPos.buf;
1764+
LockBuffer(buf, BT_READ);
17651765
}
17661766
else
17671767
{
1768-
Buffer buf;
1768+
XLogRecPtr latestlsn;
17691769

17701770
droppedpin = true;
17711771
/* Attempt to re-read the buffer, getting pin and lock. */
17721772
buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ);
17731773

1774-
page = BufferGetPage(buf);
1775-
if (BufferGetLSNAtomic(buf) == so->currPos.lsn)
1776-
so->currPos.buf = buf;
1777-
else
1774+
latestlsn = BufferGetLSNAtomic(buf);
1775+
Assert(!XLogRecPtrIsInvalid(so->currPos.lsn));
1776+
Assert(so->currPos.lsn <= latestlsn);
1777+
if (so->currPos.lsn != latestlsn)
17781778
{
17791779
/* Modified while not pinned means hinting is not safe. */
17801780
_bt_relbuf(scan->indexRelation, buf);
17811781
return;
17821782
}
1783+
1784+
/* Unmodified, hinting is safe */
17831785
}
17841786

1787+
page = BufferGetPage(buf);
17851788
opaque = (BTPageOpaque) PageGetSpecialPointer(page);
17861789
minoff = P_FIRSTDATAKEY(opaque);
17871790
maxoff = PageGetMaxOffsetNumber(page);
@@ -1897,10 +1900,13 @@ _bt_killitems(IndexScanDesc scan)
18971900
if (killedsomething)
18981901
{
18991902
opaque->btpo_flags |= BTP_HAS_GARBAGE;
1900-
MarkBufferDirtyHint(so->currPos.buf, true);
1903+
MarkBufferDirtyHint(buf, true);
19011904
}
19021905

1903-
LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK);
1906+
if (!droppedpin)
1907+
LockBuffer(buf, BUFFER_LOCK_UNLOCK);
1908+
else
1909+
_bt_relbuf(scan->indexRelation, buf);
19041910
}
19051911

19061912

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy