Skip to content

Commit c2fab70

Browse files
committed
Fix creation of partition descriptor during concurrent detach
When a partition is being detached in concurrent mode, it is possible for find_inheritance_children_extended() to return that partition in the list, and immediately after that receive an invalidation message that sets its relpartbound to NULL just before we read it. (This can happen because table_open() reads invalidation messages.) Currently we raise an error ERROR: missing relpartbound for relation %u about the situation, but that's bogus because the table is no longer a partition, so we shouldn't be complaining about it. A better reaction is to retry the find_inheritance_children_extended call to get a new list, which will no longer have the partition being detached. Noticed while investigating bug #18377. Backpatch to 14, where DETACH CONCURRENTLY appeared. Discussion: https://postgr.es/m/202405201616.y4ht2qe5ihoy@alvherre.pgsql
1 parent d1ffcc7 commit c2fab70

File tree

1 file changed

+41
-13
lines changed

1 file changed

+41
-13
lines changed

src/backend/partitioning/partdesc.c

Lines changed: 41 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
#include "utils/builtins.h"
2525
#include "utils/fmgroids.h"
2626
#include "utils/hsearch.h"
27+
#include "utils/inval.h"
2728
#include "utils/lsyscache.h"
2829
#include "utils/memutils.h"
2930
#include "utils/partcache.h"
@@ -144,16 +145,19 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
144145
ListCell *cell;
145146
int i,
146147
nparts;
148+
bool retried = false;
147149
PartitionKey key = RelationGetPartitionKey(rel);
148150
MemoryContext new_pdcxt;
149151
MemoryContext oldcxt;
150152
int *mapping;
151153

154+
retry:
155+
152156
/*
153157
* Get partition oids from pg_inherits. This uses a single snapshot to
154158
* fetch the list of children, so while more children may be getting added
155-
* concurrently, whatever this function returns will be accurate as of
156-
* some well-defined point in time.
159+
* or removed concurrently, whatever this function returns will be
160+
* accurate as of some well-defined point in time.
157161
*/
158162
detached_exist = false;
159163
detached_xmin = InvalidTransactionId;
@@ -196,18 +200,23 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
196200
}
197201

198202
/*
199-
* The system cache may be out of date; if so, we may find no pg_class
200-
* tuple or an old one where relpartbound is NULL. In that case, try
201-
* the table directly. We can't just AcceptInvalidationMessages() and
202-
* retry the system cache lookup because it's possible that a
203-
* concurrent ATTACH PARTITION operation has removed itself from the
204-
* ProcArray but not yet added invalidation messages to the shared
205-
* queue; InvalidateSystemCaches() would work, but seems excessive.
203+
* Two problems are possible here. First, a concurrent ATTACH
204+
* PARTITION might be in the process of adding a new partition, but
205+
* the syscache doesn't have it, or its copy of it does not yet have
206+
* its relpartbound set. We cannot just AcceptInvalidationMessages(),
207+
* because the other process might have already removed itself from
208+
* the ProcArray but not yet added its invalidation messages to the
209+
* shared queue. We solve this problem by reading pg_class directly
210+
* for the desired tuple.
206211
*
207-
* Note that this algorithm assumes that PartitionBoundSpec we manage
208-
* to fetch is the right one -- so this is only good enough for
209-
* concurrent ATTACH PARTITION, not concurrent DETACH PARTITION or
210-
* some hypothetical operation that changes the partition bounds.
212+
* The other problem is that DETACH CONCURRENTLY is in the process of
213+
* removing a partition, which happens in two steps: first it marks it
214+
* as "detach pending", commits, then unsets relpartbound. If
215+
* find_inheritance_children_extended included that partition but we
216+
* below we see that DETACH CONCURRENTLY has reset relpartbound for
217+
* it, we'd see an inconsistent view. (The inconsistency is seen
218+
* because table_open below reads invalidation messages.) We protect
219+
* against this by retrying find_inheritance_children_extended().
211220
*/
212221
if (boundspec == NULL)
213222
{
@@ -231,6 +240,25 @@ RelationBuildPartitionDesc(Relation rel, bool omit_detached)
231240
boundspec = stringToNode(TextDatumGetCString(datum));
232241
systable_endscan(scan);
233242
table_close(pg_class, AccessShareLock);
243+
244+
/*
245+
* If we still don't get a relpartbound value, then it must be
246+
* because of DETACH CONCURRENTLY. Restart from the top, as
247+
* explained above. We only do this once, for two reasons: first,
248+
* only one DETACH CONCURRENTLY session could affect us at a time,
249+
* since each of them would have to wait for the snapshot under
250+
* which this is running; and second, to avoid possible infinite
251+
* loops in case of catalog corruption.
252+
*
253+
* Note that the current memory context is short-lived enough, so
254+
* we needn't worry about memory leaks here.
255+
*/
256+
if (!boundspec && !retried)
257+
{
258+
AcceptInvalidationMessages();
259+
retried = true;
260+
goto retry;
261+
}
234262
}
235263

236264
/* Sanity checks. */

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy