Skip to content

Commit 9ea3c64

Browse files
Improve replication lag interpolation after idle period
After sitting idle and fully replayed for a while and then encountering a new burst of WAL activity, we interpolate between an ancient sample and the not-yet-reached one for the new traffic. That produced a corner case report of lag after receiving first new reply from standby, which might sometimes be a large spike. Correct this by resetting last_read time and handle that new case. Author: Thomas Munro
1 parent a79122b commit 9ea3c64

File tree

1 file changed

+25
-4
lines changed

1 file changed

+25
-4
lines changed

src/backend/replication/walsender.c

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3443,6 +3443,16 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
34433443
(LagTracker.read_heads[head] + 1) % LAG_TRACKER_BUFFER_SIZE;
34443444
}
34453445

3446+
/*
3447+
* If the lag tracker is empty, that means the standby has processed
3448+
* everything we've ever sent so we should now clear 'last_read'. If we
3449+
* didn't do that, we'd risk using a stale and irrelevant sample for
3450+
* interpolation at the beginning of the next burst of WAL after a period
3451+
* of idleness.
3452+
*/
3453+
if (LagTracker.read_heads[head] == LagTracker.write_head)
3454+
LagTracker.last_read[head].time = 0;
3455+
34463456
if (time > now)
34473457
{
34483458
/* If the clock somehow went backwards, treat as not found. */
@@ -3459,9 +3469,14 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
34593469
* eventually start moving again and cross one of our samples before
34603470
* we can show the lag increasing.
34613471
*/
3462-
if (LagTracker.read_heads[head] != LagTracker.write_head &&
3463-
LagTracker.last_read[head].time != 0)
3472+
if (LagTracker.read_heads[head] == LagTracker.write_head)
34643473
{
3474+
/* There are no future samples, so we can't interpolate. */
3475+
return -1;
3476+
}
3477+
else if (LagTracker.last_read[head].time != 0)
3478+
{
3479+
/* We can interpolate between last_read and the next sample. */
34653480
double fraction;
34663481
WalTimeSample prev = LagTracker.last_read[head];
34673482
WalTimeSample next = LagTracker.buffer[LagTracker.read_heads[head]];
@@ -3494,8 +3509,14 @@ LagTrackerRead(int head, XLogRecPtr lsn, TimestampTz now)
34943509
}
34953510
else
34963511
{
3497-
/* Couldn't interpolate due to lack of data. */
3498-
return -1;
3512+
/*
3513+
* We have only a future sample, implying that we were entirely
3514+
* caught up but and now there is a new burst of WAL and the
3515+
* standby hasn't processed the first sample yet. Until the
3516+
* standby reaches the future sample the best we can do is report
3517+
* the hypothetical lag if that sample were to be replayed now.
3518+
*/
3519+
time = LagTracker.buffer[LagTracker.read_heads[head]].time;
34993520
}
35003521
}
35013522

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy