Skip to content

Commit d2ec671

Browse files
Make _bt_killitems drop pins it acquired itself.
Teach nbtree's _bt_killitems to leave the so->currPos page that it sets LP_DEAD items on in whatever state it was in when _bt_killitems was called. In particular, make sure that so->dropPin scans don't acquire a pin whose reference is saved in so->currPos.buf. Allowing _bt_killitems to change so->currPos.buf like this is wrong. The immediate consequence of allowing it is that code in _bt_steppage (that copies so->currPos into so->markPos) will behave as if the scan is a !so->dropPin scan. so->markPos will therefore retain the buffer pin indefinitely, even though _bt_killitems only needs to acquire a pin (along with a lock) for long enough to mark known-dead items LP_DEAD. This issue came to light following a report of a failure of an assertion from recent commit e6eed40. The test case in question involves the use of mark and restore. An initial call to _bt_killitems takes place that leaves so->currPos.buf in a state that is inconsistent with the scan being so->dropPin. A subsequent call to _bt_killitems for the same position (following so->currPos being saved in so->markPos, and then restored as so->currPos) resulted in the failure of an assertion that tests that so->currPos.buf is InvalidBuffer when the scan is so->dropPin (non-assert builds got a "resource was not closed" WARNING instead). The same problem exists on earlier releases, though the issue is far more subtle there. Recent commit e6eed40 introduced the so->dropPin field as a partial replacement for testing so->currPos.buf directly. Earlier releases won't get an assertion failure (or buffer pin leak), but they will allow the second _bt_killitems call from the test case to behave as if a buffer pin was consistently held since the original call to _bt_readpage. This is wrong; there will have been an initial window during which no pin was held on the so->currPos page, and yet the second _bt_killitems call will neglect to check if so->currPos.lsn continues to match the page's now-current LSN. As a result of all this, it's just about possible that _bt_killitems will set the wrong items LP_DEAD (on release branches). This could only happen with merge joins (the sole user of nbtree mark/restore support), when a concurrently inserted index tuple used a recently-recycled TID (and only when the new tuple was inserted onto the same page as a distinct concurrently-removed tuple with the same TID). This is exactly the scenario that _bt_killitems' check of the page's now-current LSN against the LSN stashed in currPos was supposed to prevent. A follow-up commit will make nbtree completely stop conditioning whether or not a position's pin needs to be dropped on whether the 'buf' field is set. All call sites that might need to drop a still-held pin will be taught to rely on the scan-level so->dropPin field recently introduced by commit e6eed40. That will make bugs of the same general nature as this one impossible (or make them much easier to detect, at least). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/545be1e5-3786-439a-9257-a90d30f8b849@gmail.com Backpatch-through: 13
1 parent 6a4d93e commit d2ec671

File tree

1 file changed

+20
-14
lines changed

1 file changed

+20
-14
lines changed

src/backend/access/nbtree/nbtutils.c

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1713,9 +1713,9 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
17131713
* current page and killed tuples thereon (generally, this should only be
17141714
* called if so->numKilled > 0).
17151715
*
1716-
* The caller does not have a lock on the page and may or may not have the
1717-
* page pinned in a buffer. Note that read-lock is sufficient for setting
1718-
* LP_DEAD status (which is only a hint).
1716+
* Caller should not have a lock on the so->currPos page, but may hold a
1717+
* buffer pin. When we return, it still won't be locked. It'll continue to
1718+
* hold whatever pins were held before calling here.
17191719
*
17201720
* We match items by heap TID before assuming they are the right ones to
17211721
* delete. We cope with cases where items have moved right due to insertions.
@@ -1747,7 +1747,8 @@ _bt_killitems(IndexScanDesc scan)
17471747
int i;
17481748
int numKilled = so->numKilled;
17491749
bool killedsomething = false;
1750-
bool droppedpin PG_USED_FOR_ASSERTS_ONLY;
1750+
bool droppedpin;
1751+
Buffer buf;
17511752

17521753
Assert(BTScanPosIsValid(so->currPos));
17531754

@@ -1766,29 +1767,31 @@ _bt_killitems(IndexScanDesc scan)
17661767
* LSN.
17671768
*/
17681769
droppedpin = false;
1769-
_bt_lockbuf(scan->indexRelation, so->currPos.buf, BT_READ);
1770-
1771-
page = BufferGetPage(so->currPos.buf);
1770+
buf = so->currPos.buf;
1771+
_bt_lockbuf(scan->indexRelation, buf, BT_READ);
17721772
}
17731773
else
17741774
{
1775-
Buffer buf;
1775+
XLogRecPtr latestlsn;
17761776

17771777
droppedpin = true;
17781778
/* Attempt to re-read the buffer, getting pin and lock. */
17791779
buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ);
17801780

1781-
page = BufferGetPage(buf);
1782-
if (BufferGetLSNAtomic(buf) == so->currPos.lsn)
1783-
so->currPos.buf = buf;
1784-
else
1781+
latestlsn = BufferGetLSNAtomic(buf);
1782+
Assert(!XLogRecPtrIsInvalid(so->currPos.lsn));
1783+
Assert(so->currPos.lsn <= latestlsn);
1784+
if (so->currPos.lsn != latestlsn)
17851785
{
17861786
/* Modified while not pinned means hinting is not safe. */
17871787
_bt_relbuf(scan->indexRelation, buf);
17881788
return;
17891789
}
1790+
1791+
/* Unmodified, hinting is safe */
17901792
}
17911793

1794+
page = BufferGetPage(buf);
17921795
opaque = BTPageGetOpaque(page);
17931796
minoff = P_FIRSTDATAKEY(opaque);
17941797
maxoff = PageGetMaxOffsetNumber(page);
@@ -1905,10 +1908,13 @@ _bt_killitems(IndexScanDesc scan)
19051908
if (killedsomething)
19061909
{
19071910
opaque->btpo_flags |= BTP_HAS_GARBAGE;
1908-
MarkBufferDirtyHint(so->currPos.buf, true);
1911+
MarkBufferDirtyHint(buf, true);
19091912
}
19101913

1911-
_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
1914+
if (!droppedpin)
1915+
_bt_unlockbuf(scan->indexRelation, buf);
1916+
else
1917+
_bt_relbuf(scan->indexRelation, buf);
19121918
}
19131919

19141920

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy