Skip to content

Commit 98f8cdd

Browse files
committed
Fix more race conditions in the newly-added pg_rewind test.
pg_rewind looks at the control file to check what timeline a server is on. But promotion doesn't immediately write a checkpoint, it merely writes an end-of-recovery WAL record. If pg_rewind runs immediately after promotion, before the checkpoint has completed, it will think think that the server is still on the earlier timeline. We ran into this issue a long time ago already, see commit 484a848. It's a bit bogus that pg_rewind doesn't determine the timeline correctly until the end-of-recovery checkpoint has completed. We probably should fix that. But for now work around it by waiting for the checkpoint to complete before running pg_rewind, like we did in commit 484a848. In the passing, tidy up the new test a little bit. Rerder the INSERTs so that the comments make more sense, remove a spurious CHECKPOINT call after pg_rewind has already run, and add --debug option, so that if this fails again, we'll have more data. Per buildfarm failure at https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=rorqual&dt=2020-12-06%2018%3A32%3A19&stg=pg_rewind-check. Backpatch to all supported versions. Discussion: https://www.postgresql.org/message-id/1713707e-e318-761c-d287-5b6a4aa807e8@iki.fi
1 parent 77a94c3 commit 98f8cdd

File tree

1 file changed

+15
-7
lines changed

1 file changed

+15
-7
lines changed

src/bin/pg_rewind/t/008_min_recovery_point.pl

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,13 @@
7575
#
7676
$node_1->stop('fast');
7777
$node_3->promote;
78+
# Force a checkpoint after the promotion. pg_rewind looks at the control
79+
# file to determine what timeline the server is on, and that isn't updated
80+
# immediately at promotion, but only at the next checkpoint. When running
81+
# pg_rewind in remote mode, it's possible that we complete the test steps
82+
# after promotion so quickly that when pg_rewind runs, the standby has not
83+
# performed a checkpoint after promotion yet.
84+
$node_3->safe_psql('postgres', "checkpoint");
7885

7986
# reconfigure node_1 as a standby following node_3
8087
my $node_3_connstr = $node_3->connstr;
@@ -99,13 +106,18 @@
99106
$node_3->wait_for_catchup('node_1', 'replay', $lsn);
100107

101108
$node_1->promote;
109+
# Force a checkpoint after promotion, like earlier.
110+
$node_1->safe_psql('postgres', "checkpoint");
102111

103112
#
104113
# We now have a split-brain with two primaries. Insert a row on both to
105114
# demonstratively create a split brain. After the rewind, we should only
106115
# see the insert on 1, as the insert on node 3 is rewound away.
107116
#
108117
$node_1->safe_psql('postgres', "INSERT INTO public.foo (t) VALUES ('keep this')");
118+
# 'bar' is unmodified in node 1, so it won't be overwritten by replaying the
119+
# WAL from node 1.
120+
$node_3->safe_psql('postgres', "INSERT INTO public.bar (t) VALUES ('rewind this')");
109121

110122
# Insert more rows in node 1, to bump up the XID counter. Otherwise, if
111123
# rewind doesn't correctly rewind the changes made on the other node,
@@ -114,10 +126,6 @@
114126
$node_1->safe_psql('postgres', "INSERT INTO public.foo (t) VALUES ('and this')");
115127
$node_1->safe_psql('postgres', "INSERT INTO public.foo (t) VALUES ('and this too')");
116128

117-
# Also insert a row in 'bar' on node 3. It is unmodified in node 1, so it won't get
118-
# overwritten by replaying the WAL from node 1.
119-
$node_3->safe_psql('postgres', "INSERT INTO public.bar (t) VALUES ('rewind this')");
120-
121129
# Wait for node 2 to catch up
122130
$node_2->poll_query_until('postgres',
123131
q|SELECT COUNT(*) > 1 FROM public.bar|, 't');
@@ -139,9 +147,10 @@
139147
[
140148
'pg_rewind',
141149
"--source-server=$node_1_connstr",
142-
"--target-pgdata=$node_2_pgdata"
150+
"--target-pgdata=$node_2_pgdata",
151+
"--debug"
143152
],
144-
'pg_rewind detects rewind needed');
153+
'run pg_rewind');
145154

146155
# Now move back postgresql.conf with old settings
147156
move(
@@ -153,7 +162,6 @@
153162
# Check contents of the test tables after rewind. The rows inserted in node 3
154163
# before rewind should've been overwritten with the data from node 1.
155164
my $result;
156-
$result = $node_2->safe_psql('postgres', 'checkpoint');
157165
$result = $node_2->safe_psql('postgres', 'SELECT * FROM public.foo');
158166
is($result, qq(keep this
159167
and this

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy