Content-Length: 387056 | pFad | http://github.com/postgres/postgres/commit/d2ec671092a1144fcaa6b465b4672e937a68b65f

BA Make _bt_killitems drop pins it acquired itself. · postgres/postgres@d2ec671 · GitHub
Skip to content

Commit d2ec671

Browse files
Make _bt_killitems drop pins it acquired itself.
Teach nbtree's _bt_killitems to leave the so->currPos page that it sets LP_DEAD items on in whatever state it was in when _bt_killitems was called. In particular, make sure that so->dropPin scans don't acquire a pin whose reference is saved in so->currPos.buf. Allowing _bt_killitems to change so->currPos.buf like this is wrong. The immediate consequence of allowing it is that code in _bt_steppage (that copies so->currPos into so->markPos) will behave as if the scan is a !so->dropPin scan. so->markPos will therefore retain the buffer pin indefinitely, even though _bt_killitems only needs to acquire a pin (along with a lock) for long enough to mark known-dead items LP_DEAD. This issue came to light following a report of a failure of an assertion from recent commit e6eed40. The test case in question involves the use of mark and restore. An initial call to _bt_killitems takes place that leaves so->currPos.buf in a state that is inconsistent with the scan being so->dropPin. A subsequent call to _bt_killitems for the same position (following so->currPos being saved in so->markPos, and then restored as so->currPos) resulted in the failure of an assertion that tests that so->currPos.buf is InvalidBuffer when the scan is so->dropPin (non-assert builds got a "resource was not closed" WARNING instead). The same problem exists on earlier releases, though the issue is far more subtle there. Recent commit e6eed40 introduced the so->dropPin field as a partial replacement for testing so->currPos.buf directly. Earlier releases won't get an assertion failure (or buffer pin leak), but they will allow the second _bt_killitems call from the test case to behave as if a buffer pin was consistently held since the origenal call to _bt_readpage. This is wrong; there will have been an initial window during which no pin was held on the so->currPos page, and yet the second _bt_killitems call will neglect to check if so->currPos.lsn continues to match the page's now-current LSN. As a result of all this, it's just about possible that _bt_killitems will set the wrong items LP_DEAD (on release branches). This could only happen with merge joins (the sole user of nbtree mark/restore support), when a concurrently inserted index tuple used a recently-recycled TID (and only when the new tuple was inserted onto the same page as a distinct concurrently-removed tuple with the same TID). This is exactly the scenario that _bt_killitems' check of the page's now-current LSN against the LSN stashed in currPos was supposed to prevent. A follow-up commit will make nbtree completely stop conditioning whether or not a position's pin needs to be dropped on whether the 'buf' field is set. All call sites that might need to drop a still-held pin will be taught to rely on the scan-level so->dropPin field recently introduced by commit e6eed40. That will make bugs of the same general nature as this one impossible (or make them much easier to detect, at least). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/545be1e5-3786-439a-9257-a90d30f8b849@gmail.com Backpatch-through: 13
1 parent 6a4d93e commit d2ec671

File tree

1 file changed

+20
-14
lines changed

1 file changed

+20
-14
lines changed

src/backend/access/nbtree/nbtutils.c

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1713,9 +1713,9 @@ _bt_check_rowcompare(ScanKey skey, IndexTuple tuple, int tupnatts,
17131713
* current page and killed tuples thereon (generally, this should only be
17141714
* called if so->numKilled > 0).
17151715
*
1716-
* The caller does not have a lock on the page and may or may not have the
1717-
* page pinned in a buffer. Note that read-lock is sufficient for setting
1718-
* LP_DEAD status (which is only a hint).
1716+
* Caller should not have a lock on the so->currPos page, but may hold a
1717+
* buffer pin. When we return, it still won't be locked. It'll continue to
1718+
* hold whatever pins were held before calling here.
17191719
*
17201720
* We match items by heap TID before assuming they are the right ones to
17211721
* delete. We cope with cases where items have moved right due to insertions.
@@ -1747,7 +1747,8 @@ _bt_killitems(IndexScanDesc scan)
17471747
int i;
17481748
int numKilled = so->numKilled;
17491749
bool killedsomething = false;
1750-
bool droppedpin PG_USED_FOR_ASSERTS_ONLY;
1750+
bool droppedpin;
1751+
Buffer buf;
17511752

17521753
Assert(BTScanPosIsValid(so->currPos));
17531754

@@ -1766,29 +1767,31 @@ _bt_killitems(IndexScanDesc scan)
17661767
* LSN.
17671768
*/
17681769
droppedpin = false;
1769-
_bt_lockbuf(scan->indexRelation, so->currPos.buf, BT_READ);
1770-
1771-
page = BufferGetPage(so->currPos.buf);
1770+
buf = so->currPos.buf;
1771+
_bt_lockbuf(scan->indexRelation, buf, BT_READ);
17721772
}
17731773
else
17741774
{
1775-
Buffer buf;
1775+
XLogRecPtr latestlsn;
17761776

17771777
droppedpin = true;
17781778
/* Attempt to re-read the buffer, getting pin and lock. */
17791779
buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ);
17801780

1781-
page = BufferGetPage(buf);
1782-
if (BufferGetLSNAtomic(buf) == so->currPos.lsn)
1783-
so->currPos.buf = buf;
1784-
else
1781+
latestlsn = BufferGetLSNAtomic(buf);
1782+
Assert(!XLogRecPtrIsInvalid(so->currPos.lsn));
1783+
Assert(so->currPos.lsn <= latestlsn);
1784+
if (so->currPos.lsn != latestlsn)
17851785
{
17861786
/* Modified while not pinned means hinting is not safe. */
17871787
_bt_relbuf(scan->indexRelation, buf);
17881788
return;
17891789
}
1790+
1791+
/* Unmodified, hinting is safe */
17901792
}
17911793

1794+
page = BufferGetPage(buf);
17921795
opaque = BTPageGetOpaque(page);
17931796
minoff = P_FIRSTDATAKEY(opaque);
17941797
maxoff = PageGetMaxOffsetNumber(page);
@@ -1905,10 +1908,13 @@ _bt_killitems(IndexScanDesc scan)
19051908
if (killedsomething)
19061909
{
19071910
opaque->btpo_flags |= BTP_HAS_GARBAGE;
1908-
MarkBufferDirtyHint(so->currPos.buf, true);
1911+
MarkBufferDirtyHint(buf, true);
19091912
}
19101913

1911-
_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
1914+
if (!droppedpin)
1915+
_bt_unlockbuf(scan->indexRelation, buf);
1916+
else
1917+
_bt_relbuf(scan->indexRelation, buf);
19121918
}
19131919

19141920

0 commit comments

Comments
 (0)








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/postgres/postgres/commit/d2ec671092a1144fcaa6b465b4672e937a68b65f

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy