Content-Length: 335385 | pFad | http://github.com/electric-sql/electric/pull/2844

EA Debugging the shutdown sequence of Electric.Connection.Supervisor by alco · Pull Request #2844 · electric-sql/electric · GitHub
Skip to content

Debugging the shutdown sequence of Electric.Connection.Supervisor #2844

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

alco
Copy link
Member

@alco alco commented Jun 19, 2025

First example shows explicit stopping of the connection supervisor. None of the Logger calls that are triggered by an EXIT message in Connection.Manager has executed. We only see an explicit stopping of ReplicationClient when Connection.Manager receives a DOWN message from the shape log collector pid.

iex(1)> sup_name = Electric.Connection.Supervisor.name(stack_id: "single_stack")
{:via, Registry,
 {:"Electric.ProcessRegistry:single_stack",
  {Electric.Connection.Supervisor, nil}}}

iex(2)> GenServer.whereis sup_name
#PID<0.332.0>

iex(3)> Supervisor.which_children sup_name
[
  {Electric.Replication.Supervisor, #PID<0.361.0>, :supervisor,
   [Electric.Replication.Supervisor]},
  {Electric.Connection.Manager, #PID<0.334.0>, :supervisor,
   [Electric.Connection.Manager]},
  {Electric.StatusMonitor, #PID<0.333.0>, :worker, [Electric.StatusMonitor]}
]

iex(4)> rep_sup = :erlang.list_to_pid '<0.361.0>'
#PID<0.361.0>

iex(5)> Supervisor.which_children rep_sup
[
  {Electric.Replication.SchemaReconciler, #PID<0.375.0>, :worker,
   [Electric.Replication.SchemaReconciler]},
  {Electric.ShapeCache, #PID<0.373.0>, :worker, [Electric.ShapeCache]},
  {Electric.Replication.ShapeLogCollector, #PID<0.364.0>, :worker,
   [Electric.Replication.ShapeLogCollector]},
  {Electric.Replication.PublicationManager, #PID<0.363.0>, :worker,
   [Electric.Replication.PublicationManager]},
  {Electric.Shapes.DynamicConsumerSupervisor, #PID<0.362.0>, :supervisor,
   [Electric.Shapes.DynamicConsumerSupervisor]}
]

iex(6)> Supervisor.stop sup_name

14:16:18.413 pid=<0.351.0> [notice] Replication client #PID<0.351.0> is stopping after receiving stop request from #PID<0.334.0> with reason :shutdown
:ok

14:16:18.417 pid=<0.45.0> [notice] Application electric exited: shutdown
iex(7)> 

@alco
Copy link
Member Author

alco commented Jun 19, 2025

This second example shows that the same happens when we remove all auto_shutdown options (2nd commit in this PR). It's the same behaviour: no EXIT signal is logged even though we can see that all relevant processes are no longer alive and the process graph from Observer confirms that:

iex(1)> sup_name = Electric.Connection.Supervisor.name(stack_id: "single_stack")
{:via, Registry,
 {:"Electric.ProcessRegistry:single_stack",
  {Electric.Connection.Supervisor, nil}}}

iex(2)> GenServer.whereis sup_name
#PID<0.327.0>

iex(3)> Supervisor.which_children sup_name
[
  {Electric.Replication.Supervisor, #PID<0.356.0>, :supervisor,
   [Electric.Replication.Supervisor]},
  {Electric.Connection.Manager, #PID<0.329.0>, :supervisor,
   [Electric.Connection.Manager]},
  {Electric.StatusMonitor, #PID<0.328.0>, :worker, [Electric.StatusMonitor]}
]

iex(4)> rep_sup = :erlang.list_to_pid '<0.356.0>'
#PID<0.356.0>

iex(5)> Supervisor.which_children rep_sup
[
  {Electric.Replication.SchemaReconciler, #PID<0.370.0>, :worker,
   [Electric.Replication.SchemaReconciler]},
  {Electric.ShapeCache, #PID<0.368.0>, :worker, [Electric.ShapeCache]},
  {Electric.Replication.ShapeLogCollector, #PID<0.359.0>, :worker,
   [Electric.Replication.ShapeLogCollector]},
  {Electric.Replication.PublicationManager, #PID<0.358.0>, :worker,
   [Electric.Replication.PublicationManager]},
  {Electric.Shapes.DynamicConsumerSupervisor, #PID<0.357.0>, :supervisor,
   [Electric.Shapes.DynamicConsumerSupervisor]}
]

iex(6)> Supervisor.stop sup_name

14:21:29.807 pid=<0.346.0> [notice] Replication client #PID<0.346.0> is stopping after receiving stop request from #PID<0.329.0> with reason :shutdown

:ok

iex(7)> Process.alive? :erlang.list_to_pid('<0.370.0>')
false

iex(8)> Process.alive? :erlang.list_to_pid('<0.359.0>')
false

iex(9)> Process.alive? :erlang.list_to_pid('<0.358.0>')
false

iex(10)> Process.alive? :erlang.list_to_pid('<0.357.0>')
false

iex(11)> Process.alive? :erlang.list_to_pid('<0.328.0>')
false

iex(12)> Process.alive? :erlang.list_to_pid('<0.329.0>')
false

iex(13)> Process.alive? :erlang.list_to_pid('<0.356.0>')
false

iex(14)> Process.alive? :erlang.list_to_pid('<0.327.0>')
false

iex(15)> GenServer.whereis sup_name
nil

Screenshot from 2025-06-19 14-23-34

Copy link

codecov bot commented Jun 19, 2025

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1746 1 1745 1487
View the full list of 1 ❄️ flaky tests
test/integration.test.ts > HTTP Sync > should get initial data and then receive updates

Flake rate in main: 13.51% (Passed 32 times, Failed 5 times)

Stack Traces | 5.01s run time
Error: Test timed out in 5000ms.
If this is a long-running test, pass a timeout value as the last argument or configure it globally with "testTimeout".

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

@alco
Copy link
Member Author

alco commented Jun 19, 2025

The 3rd example shows what happens when I kill the shape log collector process:

iex(1)> sup_name = Electric.Connection.Supervisor.name(stack_id: "single_stack")
{:via, Registry,
 {:"Electric.ProcessRegistry:single_stack",
  {Electric.Connection.Supervisor, nil}}}

iex(2)> GenServer.whereis sup_name
#PID<0.284.0>

iex(3)> Supervisor.which_children sup_name
[
  {Electric.Replication.Supervisor, #PID<0.313.0>, :supervisor,
   [Electric.Replication.Supervisor]},
  {Electric.Connection.Manager, #PID<0.286.0>, :supervisor,
   [Electric.Connection.Manager]},
  {Electric.StatusMonitor, #PID<0.285.0>, :worker, [Electric.StatusMonitor]}
]

iex(4)> rep_sup = :erlang.list_to_pid '<0.313.0>'
#PID<0.313.0>

iex(5)> Supervisor.which_children rep_sup
[
  {Electric.Replication.SchemaReconciler, #PID<0.327.0>, :worker,
   [Electric.Replication.SchemaReconciler]},
  {Electric.ShapeCache, #PID<0.325.0>, :worker, [Electric.ShapeCache]},
  {Electric.Replication.ShapeLogCollector, #PID<0.316.0>, :worker,
   [Electric.Replication.ShapeLogCollector]},
  {Electric.Replication.PublicationManager, #PID<0.315.0>, :worker,
   [Electric.Replication.PublicationManager]},
  {Electric.Shapes.DynamicConsumerSupervisor, #PID<0.314.0>, :supervisor,
   [Electric.Shapes.DynamicConsumerSupervisor]}
]

iex(6)> log_collector_pid = :erlang.list_to_pid '<0.316.0>'
#PID<0.316.0>

iex(7)> Process.exit(log_collector_pid, :oops)
true

14:18:07.889 pid=<0.303.0> [notice] Replication client #PID<0.303.0> is stopping after receiving stop request from #PID<0.286.0> with reason :oops

14:18:07.889 pid=<0.330.0> [debug] Starting Elixir.Electric.Shapes.DynamicConsumerSupervisor

14:18:07.893 pid=<0.303.0> [error] :gen_statem {:"Electric.ProcessRegistry:single_stack", {Electric.Postgres.ReplicationClient, nil}} terminating
** (stop) :oops
    (stdlib 6.2) gen_statem.erl:3864: :gen_statem.loop_state_callback_result/11
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Process Label: {:replication_client, "single_stack"}
Queue: [{{:call, {#PID<0.286.0>, [:alias | #Reference<0.0.36611.3085038375.1201209346.19358>]}}, {:stop, :oops}}]
Postponed: []
State: {:no_state, %Postgrex.ReplicationConnection{protocol: %Postgrex.Protocol{sock: {:ssl, {:sslsocket, {:gen_tcp, #Port<0.12>, :tls_connection, :undefined}, [#PID<0.308.0>, #PID<0.307.0>]}}, connection_id: 18671, connection_key: 1513769203, peer: {{3, 75, 230, 136}, 34180}, types: {Postgrex.DefaultTypes, #Reference<0.3085038375.1201274881.19446>}, null: nil, timeout: 15000, ping_timeout: 15000, parameters: #Reference<0.3085038375.1201143809.19536>, queries: #Reference<0.3085038375.1201274882.18529>, postgres: :idle, transactions: :naive, buffer: :active_once, disconnect_on_error_codes: [], scram: %{auth_message: "n=,r=WxbdAhdUZUBK+qFHZSDjXZcl,r=WxbdAhdUZUBK+qFHZSDjXZclABn4SlwXZ+IWAwa/XdYHGkS6,s=GAEryT0WV/iPcdZgPP7Lkw==,i=4096,c=biws,r=WxbdAhdUZUBK+qFHZSDjXZclABn4SlwXZ+IWAwa/XdYHGkS6", iterations: 4096, salt: <<24, 1, 43, 201, 61, 22, 87, 248, 143, 113, 214, 96, 60, 254, 203, 147>>}, disable_composite_types: false, messages: []}, state: {Electric.Postgres.ReplicationClient, %Electric.Postgres.ReplicationClient.State{stack_id: "single_stack", connection_manager: #PID<0.286.0>, transaction_received: {Electric.Replication.ShapeLogCollector, :store_transaction, [{:via, Registry, {:"Electric.ProcessRegistry:single_stack", {Electric.Replication.ShapeLogCollector, nil}}}]}, relation_received: {Electric.Replication.ShapeLogCollector, :handle_relation_msg, [{:via, Registry, {:"Electric.ProcessRegistry:single_stack", {Electric.Replication.ShapeLogCollector, nil}}}]}, publication_name: "electric_publication_default", try_creating_publication?: true, recreate_slot?: nil, start_streaming?: false, slot_name: "electric_slot_default", slot_temporary?: false, display_settings: [], origen: "postgres", txn_collector: %Electric.Postgres.ReplicationClient.Collector{transaction: nil, tx_op_index: nil, relations: %{}}, step: :streaming, applied_wal: 2684354560}}, auto_reconnect: false, reconnect_backoff: 500, streaming: 500}}
Callback mode: :handle_event_function, state_enter: false
Client #PID<0.286.0> is alive

    (stdlib 6.2) gen.erl:260: :gen.do_call/4
    (stdlib 6.2) gen_statem.erl:3250: :gen_statem.call/3
    (postgrex 0.20.0) lib/postgrex/replication_connection.ex:290: Postgrex.ReplicationConnection.call/3
    (electric 1.0.20) lib/electric/connection/manager.ex:712: Electric.Connection.Manager.handle_info/2
    (stdlib 6.2) gen_server.erl:2345: :gen_server.try_handle_info/3
    (stdlib 6.2) gen_server.erl:2433: :gen_server.handle_msg/6
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3

14:18:07.906 pid=<0.286.0> [debug] Connection manager #PID<0.286.0> received EXIT signal from #PID<0.303.0> :: :oops

14:18:07.910 pid=<0.286.0> [error] Electric.DBConnection unknown error: :oops

14:18:07.915 pid=<0.286.0> [warning] Electric.Connection.Manager is restarting after it has encountered an error in process #PID<0.303.0>:
:oops

%Electric.Connection.Manager.State{
  [...]
}

14:18:07.927 pid=<0.334.0> [debug] SchemaReconciler started

14:18:07.928 pid=<0.336.0> [debug] Opening lock connection

14:18:07.928 pid=<0.280.0> [warning] Could not load diverged relations: ** (exit) exited in: DBConnection.Holder.checkout({:via, Registry, {:"Electric.ProcessRegistry:single_stack", {Electric.DbPool, nil}}}, [timeout: 5000])
    ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
    (db_connection 2.7.0) lib/db_connection/holder.ex:97: DBConnection.Holder.checkout/3
    (db_connection 2.7.0) lib/db_connection.ex:1326: DBConnection.checkout/3
    (db_connection 2.7.0) lib/db_connection.ex:1349: DBConnection.checkout/4
    (db_connection 2.7.0) lib/db_connection.ex:1820: DBConnection.begin/3
    (db_connection 2.7.0) lib/db_connection.ex:1080: DBConnection.transaction/3
    (electric 1.0.20) lib/electric/postgres/inspector/ets_inspector.ex:159: Electric.Postgres.Inspector.EtsInspector.handle_call/3
    (stdlib 6.2) gen_server.erl:2381: :gen_server.try_handle_call/4
    (stdlib 6.2) gen_server.erl:2410: :gen_server.handle_msg/6
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3


14:18:08.148 pid=<0.336.0> [info] Acquiring lock from postgres with name electric_slot_default

[...]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant








ApplySandwichStrip

pFad - (p)hone/(F)rame/(a)nonymizer/(d)eclutterfier!      Saves Data!


--- a PPN by Garber Painting Akron. With Image Size Reduction included!

Fetched URL: http://github.com/electric-sql/electric/pull/2844

Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy