-
Notifications
You must be signed in to change notification settings - Fork 235
Debugging the shutdown sequence of Electric.Connection.Supervisor #2844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This second example shows that the same happens when we remove all iex(1)> sup_name = Electric.Connection.Supervisor.name(stack_id: "single_stack")
{:via, Registry,
{:"Electric.ProcessRegistry:single_stack",
{Electric.Connection.Supervisor, nil}}}
iex(2)> GenServer.whereis sup_name
#PID<0.327.0>
iex(3)> Supervisor.which_children sup_name
[
{Electric.Replication.Supervisor, #PID<0.356.0>, :supervisor,
[Electric.Replication.Supervisor]},
{Electric.Connection.Manager, #PID<0.329.0>, :supervisor,
[Electric.Connection.Manager]},
{Electric.StatusMonitor, #PID<0.328.0>, :worker, [Electric.StatusMonitor]}
]
iex(4)> rep_sup = :erlang.list_to_pid '<0.356.0>'
#PID<0.356.0>
iex(5)> Supervisor.which_children rep_sup
[
{Electric.Replication.SchemaReconciler, #PID<0.370.0>, :worker,
[Electric.Replication.SchemaReconciler]},
{Electric.ShapeCache, #PID<0.368.0>, :worker, [Electric.ShapeCache]},
{Electric.Replication.ShapeLogCollector, #PID<0.359.0>, :worker,
[Electric.Replication.ShapeLogCollector]},
{Electric.Replication.PublicationManager, #PID<0.358.0>, :worker,
[Electric.Replication.PublicationManager]},
{Electric.Shapes.DynamicConsumerSupervisor, #PID<0.357.0>, :supervisor,
[Electric.Shapes.DynamicConsumerSupervisor]}
]
iex(6)> Supervisor.stop sup_name
14:21:29.807 pid=<0.346.0> [notice] Replication client #PID<0.346.0> is stopping after receiving stop request from #PID<0.329.0> with reason :shutdown
:ok
iex(7)> Process.alive? :erlang.list_to_pid('<0.370.0>')
false
iex(8)> Process.alive? :erlang.list_to_pid('<0.359.0>')
false
iex(9)> Process.alive? :erlang.list_to_pid('<0.358.0>')
false
iex(10)> Process.alive? :erlang.list_to_pid('<0.357.0>')
false
iex(11)> Process.alive? :erlang.list_to_pid('<0.328.0>')
false
iex(12)> Process.alive? :erlang.list_to_pid('<0.329.0>')
false
iex(13)> Process.alive? :erlang.list_to_pid('<0.356.0>')
false
iex(14)> Process.alive? :erlang.list_to_pid('<0.327.0>')
false
iex(15)> GenServer.whereis sup_name
nil |
❌ 1 Tests Failed:
View the full list of 1 ❄️ flaky tests
To view more test analytics, go to the Test Analytics Dashboard |
The 3rd example shows what happens when I kill the shape log collector process: iex(1)> sup_name = Electric.Connection.Supervisor.name(stack_id: "single_stack")
{:via, Registry,
{:"Electric.ProcessRegistry:single_stack",
{Electric.Connection.Supervisor, nil}}}
iex(2)> GenServer.whereis sup_name
#PID<0.284.0>
iex(3)> Supervisor.which_children sup_name
[
{Electric.Replication.Supervisor, #PID<0.313.0>, :supervisor,
[Electric.Replication.Supervisor]},
{Electric.Connection.Manager, #PID<0.286.0>, :supervisor,
[Electric.Connection.Manager]},
{Electric.StatusMonitor, #PID<0.285.0>, :worker, [Electric.StatusMonitor]}
]
iex(4)> rep_sup = :erlang.list_to_pid '<0.313.0>'
#PID<0.313.0>
iex(5)> Supervisor.which_children rep_sup
[
{Electric.Replication.SchemaReconciler, #PID<0.327.0>, :worker,
[Electric.Replication.SchemaReconciler]},
{Electric.ShapeCache, #PID<0.325.0>, :worker, [Electric.ShapeCache]},
{Electric.Replication.ShapeLogCollector, #PID<0.316.0>, :worker,
[Electric.Replication.ShapeLogCollector]},
{Electric.Replication.PublicationManager, #PID<0.315.0>, :worker,
[Electric.Replication.PublicationManager]},
{Electric.Shapes.DynamicConsumerSupervisor, #PID<0.314.0>, :supervisor,
[Electric.Shapes.DynamicConsumerSupervisor]}
]
iex(6)> log_collector_pid = :erlang.list_to_pid '<0.316.0>'
#PID<0.316.0>
iex(7)> Process.exit(log_collector_pid, :oops)
true
14:18:07.889 pid=<0.303.0> [notice] Replication client #PID<0.303.0> is stopping after receiving stop request from #PID<0.286.0> with reason :oops
14:18:07.889 pid=<0.330.0> [debug] Starting Elixir.Electric.Shapes.DynamicConsumerSupervisor
14:18:07.893 pid=<0.303.0> [error] :gen_statem {:"Electric.ProcessRegistry:single_stack", {Electric.Postgres.ReplicationClient, nil}} terminating
** (stop) :oops
(stdlib 6.2) gen_statem.erl:3864: :gen_statem.loop_state_callback_result/11
(stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Process Label: {:replication_client, "single_stack"}
Queue: [{{:call, {#PID<0.286.0>, [:alias | #Reference<0.0.36611.3085038375.1201209346.19358>]}}, {:stop, :oops}}]
Postponed: []
State: {:no_state, %Postgrex.ReplicationConnection{protocol: %Postgrex.Protocol{sock: {:ssl, {:sslsocket, {:gen_tcp, #Port<0.12>, :tls_connection, :undefined}, [#PID<0.308.0>, #PID<0.307.0>]}}, connection_id: 18671, connection_key: 1513769203, peer: {{3, 75, 230, 136}, 34180}, types: {Postgrex.DefaultTypes, #Reference<0.3085038375.1201274881.19446>}, null: nil, timeout: 15000, ping_timeout: 15000, parameters: #Reference<0.3085038375.1201143809.19536>, queries: #Reference<0.3085038375.1201274882.18529>, postgres: :idle, transactions: :naive, buffer: :active_once, disconnect_on_error_codes: [], scram: %{auth_message: "n=,r=WxbdAhdUZUBK+qFHZSDjXZcl,r=WxbdAhdUZUBK+qFHZSDjXZclABn4SlwXZ+IWAwa/XdYHGkS6,s=GAEryT0WV/iPcdZgPP7Lkw==,i=4096,c=biws,r=WxbdAhdUZUBK+qFHZSDjXZclABn4SlwXZ+IWAwa/XdYHGkS6", iterations: 4096, salt: <<24, 1, 43, 201, 61, 22, 87, 248, 143, 113, 214, 96, 60, 254, 203, 147>>}, disable_composite_types: false, messages: []}, state: {Electric.Postgres.ReplicationClient, %Electric.Postgres.ReplicationClient.State{stack_id: "single_stack", connection_manager: #PID<0.286.0>, transaction_received: {Electric.Replication.ShapeLogCollector, :store_transaction, [{:via, Registry, {:"Electric.ProcessRegistry:single_stack", {Electric.Replication.ShapeLogCollector, nil}}}]}, relation_received: {Electric.Replication.ShapeLogCollector, :handle_relation_msg, [{:via, Registry, {:"Electric.ProcessRegistry:single_stack", {Electric.Replication.ShapeLogCollector, nil}}}]}, publication_name: "electric_publication_default", try_creating_publication?: true, recreate_slot?: nil, start_streaming?: false, slot_name: "electric_slot_default", slot_temporary?: false, display_settings: [], origen: "postgres", txn_collector: %Electric.Postgres.ReplicationClient.Collector{transaction: nil, tx_op_index: nil, relations: %{}}, step: :streaming, applied_wal: 2684354560}}, auto_reconnect: false, reconnect_backoff: 500, streaming: 500}}
Callback mode: :handle_event_function, state_enter: false
Client #PID<0.286.0> is alive
(stdlib 6.2) gen.erl:260: :gen.do_call/4
(stdlib 6.2) gen_statem.erl:3250: :gen_statem.call/3
(postgrex 0.20.0) lib/postgrex/replication_connection.ex:290: Postgrex.ReplicationConnection.call/3
(electric 1.0.20) lib/electric/connection/manager.ex:712: Electric.Connection.Manager.handle_info/2
(stdlib 6.2) gen_server.erl:2345: :gen_server.try_handle_info/3
(stdlib 6.2) gen_server.erl:2433: :gen_server.handle_msg/6
(stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
14:18:07.906 pid=<0.286.0> [debug] Connection manager #PID<0.286.0> received EXIT signal from #PID<0.303.0> :: :oops
14:18:07.910 pid=<0.286.0> [error] Electric.DBConnection unknown error: :oops
14:18:07.915 pid=<0.286.0> [warning] Electric.Connection.Manager is restarting after it has encountered an error in process #PID<0.303.0>:
:oops
%Electric.Connection.Manager.State{
[...]
}
14:18:07.927 pid=<0.334.0> [debug] SchemaReconciler started
14:18:07.928 pid=<0.336.0> [debug] Opening lock connection
14:18:07.928 pid=<0.280.0> [warning] Could not load diverged relations: ** (exit) exited in: DBConnection.Holder.checkout({:via, Registry, {:"Electric.ProcessRegistry:single_stack", {Electric.DbPool, nil}}}, [timeout: 5000])
** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
(db_connection 2.7.0) lib/db_connection/holder.ex:97: DBConnection.Holder.checkout/3
(db_connection 2.7.0) lib/db_connection.ex:1326: DBConnection.checkout/3
(db_connection 2.7.0) lib/db_connection.ex:1349: DBConnection.checkout/4
(db_connection 2.7.0) lib/db_connection.ex:1820: DBConnection.begin/3
(db_connection 2.7.0) lib/db_connection.ex:1080: DBConnection.transaction/3
(electric 1.0.20) lib/electric/postgres/inspector/ets_inspector.ex:159: Electric.Postgres.Inspector.EtsInspector.handle_call/3
(stdlib 6.2) gen_server.erl:2381: :gen_server.try_handle_call/4
(stdlib 6.2) gen_server.erl:2410: :gen_server.handle_msg/6
(stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
14:18:08.148 pid=<0.336.0> [info] Acquiring lock from postgres with name electric_slot_default
[...] |
First example shows explicit stopping of the connection supervisor. None of the Logger calls that are triggered by an
EXIT
message in Connection.Manager has executed. We only see an explicit stopping of ReplicationClient when Connection.Manager receives a DOWN message from the shape log collector pid.