Skip to content

Commit 91c0570

Browse files
committed
Don't fail for > 1 walsenders in 019_replslot_limit, add debug messages.
So far the first of the retries introduced in f28bf66 resolves the issue. But I (Andres) am still suspicious that the start of the failures might indicate a problem. To reduce noise, stop reporting a failure if a retry resolves the problem. To allow figuring out what causes the slow slot drop, add a few more debug messages to ReplicationSlotDropPtr. See also commit afdeff1, fe0972e and f28bf66. Discussion: https://postgr.es/m/20220327213219.smdvfkq2fl74flow@alap3.anarazel.de
1 parent da4b566 commit 91c0570

File tree

2 files changed

+16
-3
lines changed

2 files changed

+16
-3
lines changed

src/backend/replication/slot.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -702,15 +702,22 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
702702
slot->active_pid = 0;
703703
slot->in_use = false;
704704
LWLockRelease(ReplicationSlotControlLock);
705+
706+
elog(DEBUG3, "replication slot drop: %s: marked as not in use", NameStr(slot->data.name));
707+
705708
ConditionVariableBroadcast(&slot->active_cv);
706709

710+
elog(DEBUG3, "replication slot drop: %s: notified others", NameStr(slot->data.name));
711+
707712
/*
708713
* Slot is dead and doesn't prevent resource removal anymore, recompute
709714
* limits.
710715
*/
711716
ReplicationSlotsComputeRequiredXmin(false);
712717
ReplicationSlotsComputeRequiredLSN();
713718

719+
elog(DEBUG3, "replication slot drop: %s: computed required", NameStr(slot->data.name));
720+
714721
/*
715722
* If removing the directory fails, the worst thing that will happen is
716723
* that the user won't be able to create a new slot with the same name
@@ -720,6 +727,8 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
720727
ereport(WARNING,
721728
(errmsg("could not remove directory \"%s\"", tmppath)));
722729

730+
elog(DEBUG3, "replication slot drop: %s: removed directory", NameStr(slot->data.name));
731+
723732
/*
724733
* Send a message to drop the replication slot to the stats collector.
725734
* Since there is no guarantee of the order of message transfer on a UDP

src/test/recovery/t/019_replslot_limit.pl

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -339,8 +339,8 @@
339339
# We've seen occasional cases where multiple walsender pids are active. It
340340
# could be that we're just observing process shutdown being slow. To collect
341341
# more information, retry a couple times, print a bit of debugging information
342-
# each iteration. For now report a test failure even if later iterations
343-
# succeed.
342+
# each iteration. Don't fail the test if retries find just one pid, the
343+
# buildfarm failures are too noisy.
344344
my $i = 0;
345345
while (1)
346346
{
@@ -349,7 +349,9 @@
349349
$senderpid = $node_primary3->safe_psql('postgres',
350350
"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walsender'");
351351

352-
last if like($senderpid, qr/^[0-9]+$/, "have walsender pid $senderpid");
352+
last if $senderpid =~ qr/^[0-9]+$/;
353+
354+
diag "multiple walsenders active in iteration $i";
353355

354356
# show information about all active connections
355357
$node_primary3->psql('postgres',
@@ -370,6 +372,8 @@
370372
usleep(100_000);
371373
}
372374

375+
like($senderpid, qr/^[0-9]+$/, "have walsender pid $senderpid");
376+
373377
my $receiverpid = $node_standby3->safe_psql('postgres',
374378
"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walreceiver'");
375379
like($receiverpid, qr/^[0-9]+$/, "have walreceiver pid $receiverpid");

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy