Content-Length: 454883 | pFad | http://github.com/postgres/postgres/commit/1546e17f9d067e714e066fcdd57d5f56c14f4174

B8 Improve log messages and docs for slot synchronization. · postgres/postgres@1546e17 · GitHub
Skip to content

Commit 1546e17

Browse files
author
Amit Kapila
committed
Improve log messages and docs for slot synchronization.
Improve the clarity of LOG messages when a failover logical slot synchronization fails, making the reasons more explicit for easier debugging. Update the documentation to outline scenarios where slot synchronization can fail, especially during the initial sync, and emphasize that pg_sync_replication_slot() is primarily intended for testing and debugging purposes. We also discussed improving the functionality of pg_sync_replication_slot() so that it can be used reliably, but we would take up that work for next version after some more discussion and review. Reported-by: Suraj Kharage <suraj.kharage@enterprisedb.com> Author: shveta malik <shveta.malik@gmail.com> Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Backpatch-through: 17, where it was introduced Discussion: https://postgr.es/m/CAF1DzPWTcg+m+x+oVVB=y4q9=PYYsL_mujVp7uJr-_oUtWNGbA@mail.gmail.com
1 parent a038059 commit 1546e17

File tree

3 files changed

+57
-9
lines changed

3 files changed

+57
-9
lines changed

doc/src/sgml/func.sgml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29698,7 +29698,7 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
2969829698
</row>
2969929699

2970029700
<row>
29701-
<entry role="func_table_entry"><para role="func_signature">
29701+
<entry id="pg-logical-slot-get-binary-changes" role="func_table_entry"><para role="func_signature">
2970229702
<indexterm>
2970329703
<primary>pg_logical_slot_get_binary_changes</primary>
2970429704
</indexterm>
@@ -29970,7 +29970,9 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
2997029970
standby server. Temporary synced slots, if any, cannot be used for
2997129971
logical decoding and must be dropped after promotion. See
2997229972
<xref linkend="logicaldecoding-replication-slots-synchronization"/> for details.
29973-
Note that this function cannot be executed if
29973+
Note that this function is primarily intended for testing and
29974+
debugging purposes and should be used with caution. Additionaly,
29975+
this function cannot be executed if
2997429976
<link linkend="guc-sync-replication-slots"><varname>
2997529977
sync_replication_slots</varname></link> is enabled and the slotsync
2997629978
worker is already running to perform the synchronization of slots.

doc/src/sgml/logicaldecoding.sgml

Lines changed: 50 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -370,10 +370,10 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
370370
<function>pg_create_logical_replication_slot</function></link>, or by
371371
using the <link linkend="sql-createsubscription-params-with-failover">
372372
<literal>failover</literal></link> option of
373-
<command>CREATE SUBSCRIPTION</command> during slot creation, and then calling
374-
<link linkend="pg-sync-replication-slots">
375-
<function>pg_sync_replication_slots</function></link>
376-
on the standby. By setting <link linkend="guc-sync-replication-slots">
373+
<command>CREATE SUBSCRIPTION</command> during slot creation.
374+
Additionally, enabling <link linkend="guc-sync-replication-slots">
375+
<varname>sync_replication_slots</varname></link> on the standby
376+
is required. By enabling <link linkend="guc-sync-replication-slots">
377377
<varname>sync_replication_slots</varname></link>
378378
on the standby, the failover slots can be synchronized periodically in
379379
the slotsync worker. For the synchronization to work, it is mandatory to
@@ -398,6 +398,52 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
398398
receiving the WAL up to the latest flushed position on the primary server.
399399
</para>
400400

401+
<note>
402+
<para>
403+
While enabling <link linkend="guc-sync-replication-slots">
404+
<varname>sync_replication_slots</varname></link> allows for automatic
405+
periodic synchronization of failover slots, they can also be manually
406+
synchronized using the <link linkend="pg-sync-replication-slots">
407+
<function>pg_sync_replication_slots</function></link> function on the standby.
408+
However, this function is primarily intended for testing and debugging and
409+
should be used with caution. Unlike automatic synchronization, it does not
410+
include cyclic retries, making it more prone to synchronization failures,
411+
particularly during initial sync scenarios where the required WAL files
412+
or catalog rows for the slot may have already been removed or are at risk
413+
of being removed on the standby. In contrast, automatic synchronization
414+
via <varname>sync_replication_slots</varname> provides continuous slot
415+
updates, enabling seamless failover and supporting high availability.
416+
Therefore, it is the recommended method for synchronizing slots.
417+
</para>
418+
</note>
419+
420+
<para>
421+
When slot synchronization is configured as recommended,
422+
and the initial synchronization is performed either automatically or
423+
manually via pg_sync_replication_slot, the standby can persist the
424+
synchronized slot only if the following condition is met: The logical
425+
replication slot on the primary must retain WALs and system catalog
426+
rows that are still available on the standby. This ensures data
427+
integrity and allows logical replication to continue smoothly after
428+
promotion.
429+
If the required WALs or catalog rows have already been purged from the
430+
standby, the slot will not be persisted to avoid data loss. In such
431+
cases, the following log message may appear:
432+
<programlisting>
433+
LOG: could not synchronize replication slot "failover_slot"
434+
DETAIL: Synchronization could lead to data loss as the remote slot needs WAL at LSN 0/3003F28 and catalog xmin 754, but the standby has LSN 0/3003F28 and catalog xmin 756
435+
</programlisting>
436+
If the logical replication slot is actively used by a consumer, no
437+
manual intervention is needed; the slot will advance automatically,
438+
and synchronization will resume in the next cycle. However, if no
439+
consumer is configured, it is advisable to manually advance the slot
440+
on the primary using <link linkend="pg-logical-slot-get-changes">
441+
<function>pg_logical_slot_get_changes</function></link> or
442+
<link linkend="pg-logical-slot-get-binary-changes">
443+
<function>pg_logical_slot_get_binary_changes</function></link>,
444+
allowing synchronization to proceed.
445+
</para>
446+
401447
<para>
402448
The ability to resume logical replication after failover depends upon the
403449
<link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield>

src/backend/replication/logical/slotsync.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -211,9 +211,9 @@ update_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
211211
* impact the users, so we used DEBUG1 level to log the message.
212212
*/
213213
ereport(slot->data.persistency == RS_TEMPORARY ? LOG : DEBUG1,
214-
errmsg("could not synchronize replication slot \"%s\" because remote slot precedes local slot",
214+
errmsg("could not synchronize replication slot \"%s\"",
215215
remote_slot->name),
216-
errdetail("The remote slot has LSN %X/%X and catalog xmin %u, but the local slot has LSN %X/%X and catalog xmin %u.",
216+
errdetail("Synchronization could lead to data loss as the remote slot needs WAL at LSN %X/%X and catalog xmin %u, but the standby has LSN %X/%X and catalog xmin %u.",
217217
LSN_FORMAT_ARGS(remote_slot->restart_lsn),
218218
remote_slot->catalog_xmin,
219219
LSN_FORMAT_ARGS(slot->data.restart_lsn),
@@ -593,7 +593,7 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid)
593593
{
594594
ereport(LOG,
595595
errmsg("could not synchronize replication slot \"%s\"", remote_slot->name),
596-
errdetail("Logical decoding could not find consistent point from local slot's LSN %X/%X.",
596+
errdetail("Synchronization could lead to data loss as standby could not build a consistent snapshot to decode WALs at LSN %X/%X.",
597597
LSN_FORMAT_ARGS(slot->data.restart_lsn)));
598598

599599
return false;

0 commit comments

Comments
 (0)








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/postgres/postgres/commit/1546e17f9d067e714e066fcdd57d5f56c14f4174

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy