Skip to content

Commit aac2c9b

Browse files
committed
For inplace update durability, make heap_update() callers wait.
The previous commit fixed some ways of losing an inplace update. It remained possible to lose one when a backend working toward a heap_update() copied a tuple into memory just before inplace update of that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE to govern admission to the steps of copying an old tuple, modifying it, and issuing heap_update(). This includes MERGE commands. To avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when holding a relation lock sufficient to exclude inplace updaters. Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE pg_class" or "UPDATE pg_database" can still lose an inplace update. The v14+ UPDATE fix needs commit 86dc900, and it wasn't worth reimplementing that fix without such infrastructure. Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas. Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com
1 parent a07e03f commit aac2c9b

File tree

20 files changed

+498
-57
lines changed

20 files changed

+498
-57
lines changed

src/backend/access/heap/README.tuplock

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,48 @@ The following infomask bits are applicable:
154154
We currently never set the HEAP_XMAX_COMMITTED when the HEAP_XMAX_IS_MULTI bit
155155
is set.
156156

157+
Locking to write inplace-updated tables
158+
---------------------------------------
159+
160+
If IsInplaceUpdateRelation() returns true for a table, the table is a system
161+
catalog that receives systable_inplace_update_begin() calls. Preparing a
162+
heap_update() of these tables follows additional locking rules, to ensure we
163+
don't lose the effects of an inplace update. In particular, consider a moment
164+
when a backend has fetched the old tuple to modify, not yet having called
165+
heap_update(). Another backend's inplace update starting then can't conclude
166+
until the heap_update() places its new tuple in a buffer. We enforce that
167+
using locktags as follows. While DDL code is the main audience, the executor
168+
follows these rules to make e.g. "MERGE INTO pg_class" safer. Locking rules
169+
are per-catalog:
170+
171+
pg_class systable_inplace_update_begin() callers: before the call, acquire a
172+
lock on the relation in mode ShareUpdateExclusiveLock or stricter. If the
173+
update targets a row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX),
174+
that lock must be on the table. Locking the index rel is not necessary.
175+
(This allows VACUUM to overwrite per-index pg_class while holding a lock on
176+
the table alone.) systable_inplace_update_begin() acquires and releases
177+
LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for ExclusiveLock, on each
178+
tuple it overwrites.
179+
180+
pg_class heap_update() callers: before copying the tuple to modify, take a
181+
lock on the tuple, a ShareUpdateExclusiveLock on the relation, or a
182+
ShareRowExclusiveLock or stricter on the relation.
183+
184+
SearchSysCacheLocked1() is one convenient way to acquire the tuple lock.
185+
Most heap_update() callers already hold a suitable lock on the relation for
186+
other reasons and can skip the tuple lock. If you do acquire the tuple
187+
lock, release it immediately after the update.
188+
189+
190+
pg_database: before copying the tuple to modify, all updaters of pg_database
191+
rows acquire LOCKTAG_TUPLE. (Few updaters acquire LOCKTAG_OBJECT on the
192+
database OID, so it wasn't worth extending that as a second option.)
193+
194+
Ideally, DDL might want to perform permissions checks before LockTuple(), as
195+
we do with RangeVarGetRelidExtended() callbacks. We typically don't bother.
196+
LOCKTAG_TUPLE acquirers release it after each row, so the potential
197+
inconvenience is lower.
198+
157199
Reading inplace-updated columns
158200
-------------------------------
159201

src/backend/access/heap/heapam.c

Lines changed: 149 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@
4040
#include "access/valid.h"
4141
#include "access/visibilitymap.h"
4242
#include "access/xloginsert.h"
43+
#include "catalog/pg_database.h"
44+
#include "catalog/pg_database_d.h"
4345
#include "commands/vacuum.h"
4446
#include "pgstat.h"
4547
#include "port/pg_bitutils.h"
@@ -57,6 +59,12 @@ static XLogRecPtr log_heap_update(Relation reln, Buffer oldbuf,
5759
Buffer newbuf, HeapTuple oldtup,
5860
HeapTuple newtup, HeapTuple old_key_tuple,
5961
bool all_visible_cleared, bool new_all_visible_cleared);
62+
#ifdef USE_ASSERT_CHECKING
63+
static void check_lock_if_inplace_updateable_rel(Relation relation,
64+
ItemPointer otid,
65+
HeapTuple newtup);
66+
static void check_inplace_rel_lock(HeapTuple oldtup);
67+
#endif
6068
static Bitmapset *HeapDetermineColumnsInfo(Relation relation,
6169
Bitmapset *interesting_cols,
6270
Bitmapset *external_cols,
@@ -103,6 +111,8 @@ static HeapTuple ExtractReplicaIdentity(Relation relation, HeapTuple tp, bool ke
103111
* heavyweight lock mode and MultiXactStatus values to use for any particular
104112
* tuple lock strength.
105113
*
114+
* These interact with InplaceUpdateTupleLock, an alias for ExclusiveLock.
115+
*
106116
* Don't look at lockstatus/updstatus directly! Use get_mxact_status_for_lock
107117
* instead.
108118
*/
@@ -3189,6 +3199,10 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
31893199
(errcode(ERRCODE_INVALID_TRANSACTION_STATE),
31903200
errmsg("cannot update tuples during a parallel operation")));
31913201

3202+
#ifdef USE_ASSERT_CHECKING
3203+
check_lock_if_inplace_updateable_rel(relation, otid, newtup);
3204+
#endif
3205+
31923206
/*
31933207
* Fetch the list of attributes to be checked for various operations.
31943208
*
@@ -4053,6 +4067,128 @@ heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
40534067
return TM_Ok;
40544068
}
40554069

4070+
#ifdef USE_ASSERT_CHECKING
4071+
/*
4072+
* Confirm adequate lock held during heap_update(), per rules from
4073+
* README.tuplock section "Locking to write inplace-updated tables".
4074+
*/
4075+
static void
4076+
check_lock_if_inplace_updateable_rel(Relation relation,
4077+
ItemPointer otid,
4078+
HeapTuple newtup)
4079+
{
4080+
/* LOCKTAG_TUPLE acceptable for any catalog */
4081+
switch (RelationGetRelid(relation))
4082+
{
4083+
case RelationRelationId:
4084+
case DatabaseRelationId:
4085+
{
4086+
LOCKTAG tuptag;
4087+
4088+
SET_LOCKTAG_TUPLE(tuptag,
4089+
relation->rd_lockInfo.lockRelId.dbId,
4090+
relation->rd_lockInfo.lockRelId.relId,
4091+
ItemPointerGetBlockNumber(otid),
4092+
ItemPointerGetOffsetNumber(otid));
4093+
if (LockHeldByMe(&tuptag, InplaceUpdateTupleLock, false))
4094+
return;
4095+
}
4096+
break;
4097+
default:
4098+
Assert(!IsInplaceUpdateRelation(relation));
4099+
return;
4100+
}
4101+
4102+
switch (RelationGetRelid(relation))
4103+
{
4104+
case RelationRelationId:
4105+
{
4106+
/* LOCKTAG_TUPLE or LOCKTAG_RELATION ok */
4107+
Form_pg_class classForm = (Form_pg_class) GETSTRUCT(newtup);
4108+
Oid relid = classForm->oid;
4109+
Oid dbid;
4110+
LOCKTAG tag;
4111+
4112+
if (IsSharedRelation(relid))
4113+
dbid = InvalidOid;
4114+
else
4115+
dbid = MyDatabaseId;
4116+
4117+
if (classForm->relkind == RELKIND_INDEX)
4118+
{
4119+
Relation irel = index_open(relid, AccessShareLock);
4120+
4121+
SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
4122+
index_close(irel, AccessShareLock);
4123+
}
4124+
else
4125+
SET_LOCKTAG_RELATION(tag, dbid, relid);
4126+
4127+
if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, false) &&
4128+
!LockHeldByMe(&tag, ShareRowExclusiveLock, true))
4129+
elog(WARNING,
4130+
"missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
4131+
NameStr(classForm->relname),
4132+
relid,
4133+
classForm->relkind,
4134+
ItemPointerGetBlockNumber(otid),
4135+
ItemPointerGetOffsetNumber(otid));
4136+
}
4137+
break;
4138+
case DatabaseRelationId:
4139+
{
4140+
/* LOCKTAG_TUPLE required */
4141+
Form_pg_database dbForm = (Form_pg_database) GETSTRUCT(newtup);
4142+
4143+
elog(WARNING,
4144+
"missing lock on database \"%s\" (OID %u) @ TID (%u,%u)",
4145+
NameStr(dbForm->datname),
4146+
dbForm->oid,
4147+
ItemPointerGetBlockNumber(otid),
4148+
ItemPointerGetOffsetNumber(otid));
4149+
}
4150+
break;
4151+
}
4152+
}
4153+
4154+
/*
4155+
* Confirm adequate relation lock held, per rules from README.tuplock section
4156+
* "Locking to write inplace-updated tables".
4157+
*/
4158+
static void
4159+
check_inplace_rel_lock(HeapTuple oldtup)
4160+
{
4161+
Form_pg_class classForm = (Form_pg_class) GETSTRUCT(oldtup);
4162+
Oid relid = classForm->oid;
4163+
Oid dbid;
4164+
LOCKTAG tag;
4165+
4166+
if (IsSharedRelation(relid))
4167+
dbid = InvalidOid;
4168+
else
4169+
dbid = MyDatabaseId;
4170+
4171+
if (classForm->relkind == RELKIND_INDEX)
4172+
{
4173+
Relation irel = index_open(relid, AccessShareLock);
4174+
4175+
SET_LOCKTAG_RELATION(tag, dbid, irel->rd_index->indrelid);
4176+
index_close(irel, AccessShareLock);
4177+
}
4178+
else
4179+
SET_LOCKTAG_RELATION(tag, dbid, relid);
4180+
4181+
if (!LockHeldByMe(&tag, ShareUpdateExclusiveLock, true))
4182+
elog(WARNING,
4183+
"missing lock for relation \"%s\" (OID %u, relkind %c) @ TID (%u,%u)",
4184+
NameStr(classForm->relname),
4185+
relid,
4186+
classForm->relkind,
4187+
ItemPointerGetBlockNumber(&oldtup->t_self),
4188+
ItemPointerGetOffsetNumber(&oldtup->t_self));
4189+
}
4190+
#endif
4191+
40564192
/*
40574193
* Check if the specified attribute's values are the same. Subroutine for
40584194
* HeapDetermineColumnsInfo.
@@ -6070,15 +6206,21 @@ heap_inplace_lock(Relation relation,
60706206
TM_Result result;
60716207
bool ret;
60726208

6209+
#ifdef USE_ASSERT_CHECKING
6210+
if (RelationGetRelid(relation) == RelationRelationId)
6211+
check_inplace_rel_lock(oldtup_ptr);
6212+
#endif
6213+
60736214
Assert(BufferIsValid(buffer));
60746215

6216+
LockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
60756217
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
60766218

60776219
/*----------
60786220
* Interpret HeapTupleSatisfiesUpdate() like heap_update() does, except:
60796221
*
60806222
* - wait unconditionally
6081-
* - no tuple locks
6223+
* - already locked tuple above, since inplace needs that unconditionally
60826224
* - don't recheck header after wait: simpler to defer to next iteration
60836225
* - don't try to continue even if the updater aborts: likewise
60846226
* - no crosscheck
@@ -6162,7 +6304,10 @@ heap_inplace_lock(Relation relation,
61626304
* don't bother optimizing that.
61636305
*/
61646306
if (!ret)
6307+
{
6308+
UnlockTuple(relation, &oldtup.t_self, InplaceUpdateTupleLock);
61656309
InvalidateCatalogSnapshot();
6310+
}
61666311
return ret;
61676312
}
61686313

@@ -6171,6 +6316,8 @@ heap_inplace_lock(Relation relation,
61716316
*
61726317
* The tuple cannot change size, and therefore its header fields and null
61736318
* bitmap (if any) don't change either.
6319+
*
6320+
* Since we hold LOCKTAG_TUPLE, no updater has a local copy of this tuple.
61746321
*/
61756322
void
61766323
heap_inplace_update_and_unlock(Relation relation,
@@ -6254,6 +6401,7 @@ heap_inplace_unlock(Relation relation,
62546401
HeapTuple oldtup, Buffer buffer)
62556402
{
62566403
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
6404+
UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
62576405
}
62586406

62596407
#define FRM_NOOP 0x0001

src/backend/access/index/genam.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -765,7 +765,9 @@ systable_endscan_ordered(SysScanDesc sysscan)
765765
*
766766
* Overwriting violates both MVCC and transactional safety, so the uses of
767767
* this function in Postgres are extremely limited. Nonetheless we find some
768-
* places to use it. Standard flow:
768+
* places to use it. See README.tuplock section "Locking to write
769+
* inplace-updated tables" and later sections for expectations of readers and
770+
* writers of a table that gets inplace updates. Standard flow:
769771
*
770772
* ... [any slow preparation not requiring oldtup] ...
771773
* systable_inplace_update_begin([...], &tup, &inplace_state);

src/backend/catalog/aclchk.c

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@
7575
#include "nodes/makefuncs.h"
7676
#include "parser/parse_func.h"
7777
#include "parser/parse_type.h"
78+
#include "storage/lmgr.h"
7879
#include "utils/acl.h"
7980
#include "utils/aclchk_internal.h"
8081
#include "utils/builtins.h"
@@ -1848,7 +1849,7 @@ ExecGrant_Relation(InternalGrant *istmt)
18481849
HeapTuple tuple;
18491850
ListCell *cell_colprivs;
18501851

1851-
tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relOid));
1852+
tuple = SearchSysCacheLocked1(RELOID, ObjectIdGetDatum(relOid));
18521853
if (!HeapTupleIsValid(tuple))
18531854
elog(ERROR, "cache lookup failed for relation %u", relOid);
18541855
pg_class_tuple = (Form_pg_class) GETSTRUCT(tuple);
@@ -2060,6 +2061,7 @@ ExecGrant_Relation(InternalGrant *istmt)
20602061
values, nulls, replaces);
20612062

20622063
CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
2064+
UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
20632065

20642066
/* Update initial privileges for extensions */
20652067
recordExtensionInitPriv(relOid, RelationRelationId, 0, new_acl);
@@ -2072,6 +2074,8 @@ ExecGrant_Relation(InternalGrant *istmt)
20722074

20732075
pfree(new_acl);
20742076
}
2077+
else
2078+
UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
20752079

20762080
/*
20772081
* Handle column-level privileges, if any were specified or implied.
@@ -2185,7 +2189,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
21852189
Oid *oldmembers;
21862190
Oid *newmembers;
21872191

2188-
tuple = SearchSysCache1(cacheid, ObjectIdGetDatum(objectid));
2192+
tuple = SearchSysCacheLocked1(cacheid, ObjectIdGetDatum(objectid));
21892193
if (!HeapTupleIsValid(tuple))
21902194
elog(ERROR, "cache lookup failed for %s %u", get_object_class_descr(classid), objectid);
21912195

@@ -2261,6 +2265,7 @@ ExecGrant_common(InternalGrant *istmt, Oid classid, AclMode default_privs,
22612265
nulls, replaces);
22622266

22632267
CatalogTupleUpdate(relation, &newtuple->t_self, newtuple);
2268+
UnlockTuple(relation, &tuple->t_self, InplaceUpdateTupleLock);
22642269

22652270
/* Update initial privileges for extensions */
22662271
recordExtensionInitPriv(objectid, classid, 0, new_acl);

src/backend/catalog/catalog.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,15 @@ IsCatalogRelationOid(Oid relid)
138138
/*
139139
* IsInplaceUpdateRelation
140140
* True iff core code performs inplace updates on the relation.
141+
*
142+
* This is used for assertions and for making the executor follow the
143+
* locking protocol described at README.tuplock section "Locking to write
144+
* inplace-updated tables". Extensions may inplace-update other heap
145+
* tables, but concurrent SQL UPDATE on the same table may overwrite
146+
* those modifications.
147+
*
148+
* The executor can assume these are not partitions or partitioned and
149+
* have no triggers.
141150
*/
142151
bool
143152
IsInplaceUpdateRelation(Relation relation)

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy