Skip to content

Commit dde70cc

Browse files
Emit cascaded standby message on shutdown only when appropriate.
Adds additional test for active walsenders and closes a race condition for when we failover when a new walsender was connecting. Reported and fixed bu Fujii Masao. Review by Heikki Linnakangas
1 parent 39039e6 commit dde70cc

File tree

2 files changed

+32
-2
lines changed

2 files changed

+32
-2
lines changed

src/backend/postmaster/postmaster.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2328,10 +2328,11 @@ reaper(SIGNAL_ARGS)
23282328
* XXX should avoid the need for disconnection. When we do,
23292329
* am_cascading_walsender should be replaced with RecoveryInProgress()
23302330
*/
2331-
if (max_wal_senders > 0)
2331+
if (max_wal_senders > 0 && CountChildren(BACKEND_TYPE_WALSND) > 0)
23322332
{
23332333
ereport(LOG,
2334-
(errmsg("terminating all walsender processes to force cascaded standby(s) to update timeline and reconnect")));
2334+
(errmsg("terminating all walsender processes to force cascaded "
2335+
"standby(s) to update timeline and reconnect")));
23352336
SignalSomeChildren(SIGUSR2, BACKEND_TYPE_WALSND);
23362337
}
23372338

src/backend/replication/walsender.c

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,35 @@ StartReplication(StartReplicationCmd *cmd)
368368
MarkPostmasterChildWalSender();
369369
SendPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE);
370370

371+
/*
372+
* When promoting a cascading standby, postmaster sends SIGUSR2 to
373+
* any cascading walsenders to kill them. But there is a corner-case where
374+
* such walsender fails to receive SIGUSR2 and survives a standby promotion
375+
* unexpectedly. This happens when postmaster sends SIGUSR2 before
376+
* the walsender marks itself as a WAL sender, because postmaster sends
377+
* SIGUSR2 to only the processes marked as a WAL sender.
378+
*
379+
* To avoid this corner-case, if recovery is NOT in progress even though
380+
* the walsender is cascading one, we do the same thing as SIGUSR2 signal
381+
* handler does, i.e., set walsender_ready_to_stop to true. Which causes
382+
* the walsender to end later.
383+
*
384+
* When terminating cascading walsenders, usually postmaster writes
385+
* the log message announcing the terminations. But there is a race condition
386+
* here. If there is no walsender except this process before reaching here,
387+
* postmaster thinks that there is no walsender and suppresses that
388+
* log message. To handle this case, we always emit that log message here.
389+
* This might cause duplicate log messages, but which is less likely to happen,
390+
* so it's not worth writing some code to suppress them.
391+
*/
392+
if (am_cascading_walsender && !RecoveryInProgress())
393+
{
394+
ereport(LOG,
395+
(errmsg("terminating walsender process to force cascaded standby "
396+
"to update timeline and reconnect")));
397+
walsender_ready_to_stop = true;
398+
}
399+
371400
/*
372401
* We assume here that we're logging enough information in the WAL for
373402
* log-shipping, since this is checked in PostmasterMain().

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy