Skip to content

Commit aa21e49

Browse files
author
Amit Kapila
committed
Fix self-deadlock during DROP SUBSCRIPTION.
The DROP SUBSCRIPTION command performs several operations: it stops the subscription workers, removes subscription-related entries from system catalogs, and deletes the replication slot on the publisher server. Previously, this command acquired an AccessExclusiveLock on pg_subscription before initiating these steps. However, while holding this lock, the command attempts to connect to the publisher to remove the replication slot. In cases where the connection is made to a newly created database on the same server as subscriber, the cache-building process during connection tries to acquire an AccessShareLock on pg_subscription, resulting in a self-deadlock. To resolve this issue, we reduce the lock level on pg_subscription during DROP SUBSCRIPTION from AccessExclusiveLock to RowExclusiveLock. Earlier, the higher lock level was used to prevent the launcher from starting a new worker during the drop operation, as a restarted worker could become orphaned. Now, instead of relying on a strict lock, we acquire an AccessShareLock on the specific subscription being dropped and re-validate its existence after acquiring the lock. If the subscription is no longer valid, the worker exits gracefully. This approach avoids the deadlock while still ensuring that orphan workers are not created. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 13 Discussion: https://postgr.es/m/18988-7312c868be2d467f@postgresql.org
1 parent a977e41 commit aa21e49

File tree

3 files changed

+42
-3
lines changed

3 files changed

+42
-3
lines changed

src/backend/commands/subscriptioncmds.c

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1803,10 +1803,12 @@ DropSubscription(DropSubscriptionStmt *stmt, bool isTopLevel)
18031803
bool must_use_password;
18041804

18051805
/*
1806-
* Lock pg_subscription with AccessExclusiveLock to ensure that the
1807-
* launcher doesn't restart new worker during dropping the subscription
1806+
* The launcher may concurrently start a new worker for this subscription.
1807+
* During initialization, the worker checks for subscription validity and
1808+
* exits if the subscription has already been dropped. See
1809+
* InitializeLogRepWorker.
18081810
*/
1809-
rel = table_open(SubscriptionRelationId, AccessExclusiveLock);
1811+
rel = table_open(SubscriptionRelationId, RowExclusiveLock);
18101812

18111813
tup = SearchSysCache2(SUBSCRIPTIONNAME, ObjectIdGetDatum(MyDatabaseId),
18121814
CStringGetDatum(stmt->subname));

src/backend/replication/logical/worker.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5415,6 +5415,13 @@ InitializeLogRepWorker(void)
54155415
StartTransactionCommand();
54165416
oldctx = MemoryContextSwitchTo(ApplyContext);
54175417

5418+
/*
5419+
* Lock the subscription to prevent it from being concurrently dropped,
5420+
* then re-verify its existence. After the initialization, the worker will
5421+
* be terminated gracefully if the subscription is dropped.
5422+
*/
5423+
LockSharedObject(SubscriptionRelationId, MyLogicalRepWorker->subid, 0,
5424+
AccessShareLock);
54185425
MySubscription = GetSubscription(MyLogicalRepWorker->subid, true);
54195426
if (!MySubscription)
54205427
{

src/test/subscription/t/100_bugs.pl

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -575,4 +575,34 @@ BEGIN
575575
$node_publisher->stop('fast');
576576
$node_subscriber->stop('fast');
577577

578+
# BUG #18988
579+
# The bug happened due to a self-deadlock between the DROP SUBSCRIPTION
580+
# command and the walsender process for accessing pg_subscription. This
581+
# occurred when DROP SUBSCRIPTION attempted to remove a replication slot by
582+
# connecting to a newly created database whose caches are not yet
583+
# initialized.
584+
#
585+
# The bug is fixed by reducing the lock-level during DROP SUBSCRIPTION.
586+
$node_publisher->start();
587+
588+
$publisher_connstr = $node_publisher->connstr . ' dbname=regress_db';
589+
$node_publisher->safe_psql(
590+
'postgres', qq(
591+
CREATE DATABASE regress_db;
592+
CREATE SUBSCRIPTION regress_sub1 CONNECTION '$publisher_connstr' PUBLICATION regress_pub WITH (connect=false);
593+
));
594+
595+
my ($ret, $stdout, $stderr) =
596+
$node_publisher->psql('postgres', q{DROP SUBSCRIPTION regress_sub1});
597+
598+
isnt($ret, 0, "replication slot does not exist: exit code not 0");
599+
like(
600+
$stderr,
601+
qr/ERROR: could not drop replication slot "regress_sub1" on publisher/,
602+
"could not drop replication slot: error message");
603+
604+
$node_publisher->safe_psql('postgres', "DROP DATABASE regress_db");
605+
606+
$node_publisher->stop('fast');
607+
578608
done_testing();

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy