Skip to content

Commit 40aa5dd

Browse files
Make _bt_killitems drop pins it acquired itself.
Teach nbtree's _bt_killitems to leave the so->currPos page that it sets LP_DEAD items on in whatever state it was in when _bt_killitems was called. In particular, make sure that so->dropPin scans don't acquire a pin whose reference is saved in so->currPos.buf. Allowing _bt_killitems to change so->currPos.buf like this is wrong. The immediate consequence of allowing it is that code in _bt_steppage (that copies so->currPos into so->markPos) will behave as if the scan is a !so->dropPin scan. so->markPos will therefore retain the buffer pin indefinitely, even though _bt_killitems only needs to acquire a pin (along with a lock) for long enough to mark known-dead items LP_DEAD. This issue came to light following a report of a failure of an assertion from recent commit e6eed40. The test case in question involves the use of mark and restore. An initial call to _bt_killitems takes place that leaves so->currPos.buf in a state that is inconsistent with the scan being so->dropPin. A subsequent call to _bt_killitems for the same position (following so->currPos being saved in so->markPos, and then restored as so->currPos) resulted in the failure of an assertion that tests that so->currPos.buf is InvalidBuffer when the scan is so->dropPin (non-assert builds got a "resource was not closed" WARNING instead). The same problem exists on earlier releases, though the issue is far more subtle there. Recent commit e6eed40 introduced the so->dropPin field as a partial replacement for testing so->currPos.buf directly. Earlier releases won't get an assertion failure (or buffer pin leak), but they will allow the second _bt_killitems call from the test case to behave as if a buffer pin was consistently held since the original call to _bt_readpage. This is wrong; there will have been an initial window during which no pin was held on the so->currPos page, and yet the second _bt_killitems call will neglect to check if so->currPos.lsn continues to match the page's now-current LSN. As a result of all this, it's just about possible that _bt_killitems will set the wrong items LP_DEAD (on release branches). This could only happen with merge joins (the sole user of nbtree mark/restore support), when a concurrently inserted index tuple used a recently-recycled TID (and only when the new tuple was inserted onto the same page as a distinct concurrently-removed tuple with the same TID). This is exactly the scenario that _bt_killitems' check of the page's now-current LSN against the LSN stashed in currPos was supposed to prevent. A follow-up commit will make nbtree completely stop conditioning whether or not a position's pin needs to be dropped on whether the 'buf' field is set. All call sites that might need to drop a still-held pin will be taught to rely on the scan-level so->dropPin field recently introduced by commit e6eed40. That will make bugs of the same general nature as this one impossible (or make them much easier to detect, at least). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/545be1e5-3786-439a-9257-a90d30f8b849@gmail.com Backpatch-through: 13
1 parent 30e0d9e commit 40aa5dd

File tree

1 file changed

+20
-14
lines changed

1 file changed

+20
-14
lines changed

src/backend/access/nbtree/nbtutils.c

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4144,9 +4144,9 @@ _bt_checkkeys_look_ahead(IndexScanDesc scan, BTReadPageState *pstate,
41444144
* current page and killed tuples thereon (generally, this should only be
41454145
* called if so->numKilled > 0).
41464146
*
4147-
* The caller does not have a lock on the page and may or may not have the
4148-
* page pinned in a buffer. Note that read-lock is sufficient for setting
4149-
* LP_DEAD status (which is only a hint).
4147+
* Caller should not have a lock on the so->currPos page, but may hold a
4148+
* buffer pin. When we return, it still won't be locked. It'll continue to
4149+
* hold whatever pins were held before calling here.
41504150
*
41514151
* We match items by heap TID before assuming they are the right ones to
41524152
* delete. We cope with cases where items have moved right due to insertions.
@@ -4178,7 +4178,8 @@ _bt_killitems(IndexScanDesc scan)
41784178
int i;
41794179
int numKilled = so->numKilled;
41804180
bool killedsomething = false;
4181-
bool droppedpin PG_USED_FOR_ASSERTS_ONLY;
4181+
bool droppedpin;
4182+
Buffer buf;
41824183

41834184
Assert(BTScanPosIsValid(so->currPos));
41844185

@@ -4197,29 +4198,31 @@ _bt_killitems(IndexScanDesc scan)
41974198
* LSN.
41984199
*/
41994200
droppedpin = false;
4200-
_bt_lockbuf(scan->indexRelation, so->currPos.buf, BT_READ);
4201-
4202-
page = BufferGetPage(so->currPos.buf);
4201+
buf = so->currPos.buf;
4202+
_bt_lockbuf(scan->indexRelation, buf, BT_READ);
42034203
}
42044204
else
42054205
{
4206-
Buffer buf;
4206+
XLogRecPtr latestlsn;
42074207

42084208
droppedpin = true;
42094209
/* Attempt to re-read the buffer, getting pin and lock. */
42104210
buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ);
42114211

4212-
page = BufferGetPage(buf);
4213-
if (BufferGetLSNAtomic(buf) == so->currPos.lsn)
4214-
so->currPos.buf = buf;
4215-
else
4212+
latestlsn = BufferGetLSNAtomic(buf);
4213+
Assert(!XLogRecPtrIsInvalid(so->currPos.lsn));
4214+
Assert(so->currPos.lsn <= latestlsn);
4215+
if (so->currPos.lsn != latestlsn)
42164216
{
42174217
/* Modified while not pinned means hinting is not safe. */
42184218
_bt_relbuf(scan->indexRelation, buf);
42194219
return;
42204220
}
4221+
4222+
/* Unmodified, hinting is safe */
42214223
}
42224224

4225+
page = BufferGetPage(buf);
42234226
opaque = BTPageGetOpaque(page);
42244227
minoff = P_FIRSTDATAKEY(opaque);
42254228
maxoff = PageGetMaxOffsetNumber(page);
@@ -4336,10 +4339,13 @@ _bt_killitems(IndexScanDesc scan)
43364339
if (killedsomething)
43374340
{
43384341
opaque->btpo_flags |= BTP_HAS_GARBAGE;
4339-
MarkBufferDirtyHint(so->currPos.buf, true);
4342+
MarkBufferDirtyHint(buf, true);
43404343
}
43414344

4342-
_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
4345+
if (!droppedpin)
4346+
_bt_unlockbuf(scan->indexRelation, buf);
4347+
else
4348+
_bt_relbuf(scan->indexRelation, buf);
43434349
}
43444350

43454351

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy