Skip to content

Commit 9a7b7ad

Browse files
committed
Make logical WAL sender report streaming state appropriately
WAL senders sending logically-decoded data fail to properly report in "streaming" state when starting up, hence as long as one extra record is not replayed, such WAL senders would remain in a "catchup" state, which is inconsistent with the physical cousin. This can be easily reproduced by for example using pg_recvlogical and restarting the upstream server. The TAP tests have been slightly modified to detect the failure and strengthened so as future tests also make sure that a node is in streaming state when waiting for its catchup. Backpatch down to 9.4 where this code has been introduced. Reported-by: Sawada Masahiko Author: Simon Riggs, Sawada Masahiko Reviewed-by: Petr Jelinek, Michael Paquier, Vaishnavi Prabakaran Discussion: https://postgr.es/m/CAD21AoB2ZbCCqOx=bgKMcLrAvs1V0ZMqzs7wBTuDySezTGtMZA@mail.gmail.com
1 parent 39a9651 commit 9a7b7ad

File tree

3 files changed

+23
-7
lines changed

3 files changed

+23
-7
lines changed

src/backend/replication/walsender.c

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2169,7 +2169,7 @@ WalSndLoop(WalSndSendDataCallback send_data)
21692169
if (MyWalSnd->state == WALSNDSTATE_CATCHUP)
21702170
{
21712171
ereport(DEBUG1,
2172-
(errmsg("standby \"%s\" has now caught up with primary",
2172+
(errmsg("\"%s\" has now caught up with upstream server",
21732173
application_name)));
21742174
WalSndSetState(WALSNDSTATE_STREAMING);
21752175
}
@@ -2758,10 +2758,10 @@ XLogSendLogical(void)
27582758
char *errm;
27592759

27602760
/*
2761-
* Don't know whether we've caught up yet. We'll set it to true in
2762-
* WalSndWaitForWal, if we're actually waiting. We also set to true if
2763-
* XLogReadRecord() had to stop reading but WalSndWaitForWal didn't wait -
2764-
* i.e. when we're shutting down.
2761+
* Don't know whether we've caught up yet. We'll set WalSndCaughtUp to
2762+
* true in WalSndWaitForWal, if we're actually waiting. We also set to
2763+
* true if XLogReadRecord() had to stop reading but WalSndWaitForWal
2764+
* didn't wait - i.e. when we're shutting down.
27652765
*/
27662766
WalSndCaughtUp = false;
27672767

@@ -2774,6 +2774,9 @@ XLogSendLogical(void)
27742774

27752775
if (record != NULL)
27762776
{
2777+
/* XXX: Note that logical decoding cannot be used while in recovery */
2778+
XLogRecPtr flushPtr = GetFlushRecPtr();
2779+
27772780
/*
27782781
* Note the lack of any call to LagTrackerWrite() which is handled by
27792782
* WalSndUpdateProgress which is called by output plugin through
@@ -2782,6 +2785,13 @@ XLogSendLogical(void)
27822785
LogicalDecodingProcessRecord(logical_decoding_ctx, logical_decoding_ctx->reader);
27832786

27842787
sentPtr = logical_decoding_ctx->reader->EndRecPtr;
2788+
2789+
/*
2790+
* If we have sent a record that is at or beyond the flushed point, we
2791+
* have caught up.
2792+
*/
2793+
if (sentPtr >= flushPtr)
2794+
WalSndCaughtUp = true;
27852795
}
27862796
else
27872797
{

src/test/perl/PostgresNode.pm

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1535,7 +1535,8 @@ also works for logical subscriptions)
15351535
until its replication location in pg_stat_replication equals or passes the
15361536
upstream's WAL insert point at the time this function is called. By default
15371537
the replay_lsn is waited for, but 'mode' may be specified to wait for any of
1538-
sent|write|flush|replay.
1538+
sent|write|flush|replay. The connection catching up must be in a streaming
1539+
state.
15391540
15401541
If there is no active replication connection from this peer, waits until
15411542
poll_query_until timeout.
@@ -1580,7 +1581,7 @@ sub wait_for_catchup
15801581
. $lsn_expr . " on "
15811582
. $self->name . "\n";
15821583
my $query =
1583-
qq[SELECT $lsn_expr <= ${mode}_lsn FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
1584+
qq[SELECT $lsn_expr <= ${mode}_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
15841585
$self->poll_query_until('postgres', $query)
15851586
or croak "timed out waiting for catchup";
15861587
print "done\n";

src/test/subscription/t/001_rep_changes.pl

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,11 @@
188188
"INSERT INTO tab_ins SELECT generate_series(1001,1100)");
189189
$node_publisher->safe_psql('postgres', "DELETE FROM tab_rep");
190190

191+
# Restart the publisher and check the state of the subscriber which
192+
# should be in a streaming state after catching up.
193+
$node_publisher->stop('fast');
194+
$node_publisher->start;
195+
191196
$node_publisher->wait_for_catchup($appname);
192197

193198
$result = $node_subscriber->safe_psql('postgres',

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy