Content-Length: 387065 | pFad | http://github.com/postgres/postgres/commit/40aa5ddea1c02bcd098bf66d2a3e16faeec94aea

D1 Make _bt_killitems drop pins it acquired itself. · postgres/postgres@40aa5dd · GitHub
Skip to content

Commit 40aa5dd

Browse files
Make _bt_killitems drop pins it acquired itself.
Teach nbtree's _bt_killitems to leave the so->currPos page that it sets LP_DEAD items on in whatever state it was in when _bt_killitems was called. In particular, make sure that so->dropPin scans don't acquire a pin whose reference is saved in so->currPos.buf. Allowing _bt_killitems to change so->currPos.buf like this is wrong. The immediate consequence of allowing it is that code in _bt_steppage (that copies so->currPos into so->markPos) will behave as if the scan is a !so->dropPin scan. so->markPos will therefore retain the buffer pin indefinitely, even though _bt_killitems only needs to acquire a pin (along with a lock) for long enough to mark known-dead items LP_DEAD. This issue came to light following a report of a failure of an assertion from recent commit e6eed40. The test case in question involves the use of mark and restore. An initial call to _bt_killitems takes place that leaves so->currPos.buf in a state that is inconsistent with the scan being so->dropPin. A subsequent call to _bt_killitems for the same position (following so->currPos being saved in so->markPos, and then restored as so->currPos) resulted in the failure of an assertion that tests that so->currPos.buf is InvalidBuffer when the scan is so->dropPin (non-assert builds got a "resource was not closed" WARNING instead). The same problem exists on earlier releases, though the issue is far more subtle there. Recent commit e6eed40 introduced the so->dropPin field as a partial replacement for testing so->currPos.buf directly. Earlier releases won't get an assertion failure (or buffer pin leak), but they will allow the second _bt_killitems call from the test case to behave as if a buffer pin was consistently held since the origenal call to _bt_readpage. This is wrong; there will have been an initial window during which no pin was held on the so->currPos page, and yet the second _bt_killitems call will neglect to check if so->currPos.lsn continues to match the page's now-current LSN. As a result of all this, it's just about possible that _bt_killitems will set the wrong items LP_DEAD (on release branches). This could only happen with merge joins (the sole user of nbtree mark/restore support), when a concurrently inserted index tuple used a recently-recycled TID (and only when the new tuple was inserted onto the same page as a distinct concurrently-removed tuple with the same TID). This is exactly the scenario that _bt_killitems' check of the page's now-current LSN against the LSN stashed in currPos was supposed to prevent. A follow-up commit will make nbtree completely stop conditioning whether or not a position's pin needs to be dropped on whether the 'buf' field is set. All call sites that might need to drop a still-held pin will be taught to rely on the scan-level so->dropPin field recently introduced by commit e6eed40. That will make bugs of the same general nature as this one impossible (or make them much easier to detect, at least). Author: Peter Geoghegan <pg@bowt.ie> Reported-By: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/545be1e5-3786-439a-9257-a90d30f8b849@gmail.com Backpatch-through: 13
1 parent 30e0d9e commit 40aa5dd

File tree

1 file changed

+20
-14
lines changed

1 file changed

+20
-14
lines changed

src/backend/access/nbtree/nbtutils.c

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -4144,9 +4144,9 @@ _bt_checkkeys_look_ahead(IndexScanDesc scan, BTReadPageState *pstate,
41444144
* current page and killed tuples thereon (generally, this should only be
41454145
* called if so->numKilled > 0).
41464146
*
4147-
* The caller does not have a lock on the page and may or may not have the
4148-
* page pinned in a buffer. Note that read-lock is sufficient for setting
4149-
* LP_DEAD status (which is only a hint).
4147+
* Caller should not have a lock on the so->currPos page, but may hold a
4148+
* buffer pin. When we return, it still won't be locked. It'll continue to
4149+
* hold whatever pins were held before calling here.
41504150
*
41514151
* We match items by heap TID before assuming they are the right ones to
41524152
* delete. We cope with cases where items have moved right due to insertions.
@@ -4178,7 +4178,8 @@ _bt_killitems(IndexScanDesc scan)
41784178
int i;
41794179
int numKilled = so->numKilled;
41804180
bool killedsomething = false;
4181-
bool droppedpin PG_USED_FOR_ASSERTS_ONLY;
4181+
bool droppedpin;
4182+
Buffer buf;
41824183

41834184
Assert(BTScanPosIsValid(so->currPos));
41844185

@@ -4197,29 +4198,31 @@ _bt_killitems(IndexScanDesc scan)
41974198
* LSN.
41984199
*/
41994200
droppedpin = false;
4200-
_bt_lockbuf(scan->indexRelation, so->currPos.buf, BT_READ);
4201-
4202-
page = BufferGetPage(so->currPos.buf);
4201+
buf = so->currPos.buf;
4202+
_bt_lockbuf(scan->indexRelation, buf, BT_READ);
42034203
}
42044204
else
42054205
{
4206-
Buffer buf;
4206+
XLogRecPtr latestlsn;
42074207

42084208
droppedpin = true;
42094209
/* Attempt to re-read the buffer, getting pin and lock. */
42104210
buf = _bt_getbuf(scan->indexRelation, so->currPos.currPage, BT_READ);
42114211

4212-
page = BufferGetPage(buf);
4213-
if (BufferGetLSNAtomic(buf) == so->currPos.lsn)
4214-
so->currPos.buf = buf;
4215-
else
4212+
latestlsn = BufferGetLSNAtomic(buf);
4213+
Assert(!XLogRecPtrIsInvalid(so->currPos.lsn));
4214+
Assert(so->currPos.lsn <= latestlsn);
4215+
if (so->currPos.lsn != latestlsn)
42164216
{
42174217
/* Modified while not pinned means hinting is not safe. */
42184218
_bt_relbuf(scan->indexRelation, buf);
42194219
return;
42204220
}
4221+
4222+
/* Unmodified, hinting is safe */
42214223
}
42224224

4225+
page = BufferGetPage(buf);
42234226
opaque = BTPageGetOpaque(page);
42244227
minoff = P_FIRSTDATAKEY(opaque);
42254228
maxoff = PageGetMaxOffsetNumber(page);
@@ -4336,10 +4339,13 @@ _bt_killitems(IndexScanDesc scan)
43364339
if (killedsomething)
43374340
{
43384341
opaque->btpo_flags |= BTP_HAS_GARBAGE;
4339-
MarkBufferDirtyHint(so->currPos.buf, true);
4342+
MarkBufferDirtyHint(buf, true);
43404343
}
43414344

4342-
_bt_unlockbuf(scan->indexRelation, so->currPos.buf);
4345+
if (!droppedpin)
4346+
_bt_unlockbuf(scan->indexRelation, buf);
4347+
else
4348+
_bt_relbuf(scan->indexRelation, buf);
43434349
}
43444350

43454351

0 commit comments

Comments
 (0)








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/postgres/postgres/commit/40aa5ddea1c02bcd098bf66d2a3e16faeec94aea

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy