Skip to content

** (RuntimeError) Timed out while waiting for a table lock #2595

Closed
@alco

Description

@alco

Versions
Latest main.

Bug description
Start with a clean slate (no persistent/ dir and new empty DB).

Run Electric in dev mode (iex -S mix).

Create a table in the database called items:

$ psql postgresql://postgres:password@localhost:54321/electric

[localhost] postgres:electric=# create table items(val text);
CREATE TABLE

Attempt no. 1

Try inserting 100 mln rows into the items table:

[localhost] postgres:electric=# insert into items select generate_series::text from generate_series(1, 100000000);

While the query is still running, try requesting a shape from Electric:

$ curl -i http://localhost:3000/v1/shape\?table=items&offset=-1
HTTP/1.1 500 Internal Server Error
date: Mon, 14 Apr 2025 16:00:10 GMT
content-length: 102
vary: accept-encoding
cache-control: no-cache
x-request-id: GDY6qbcQUwwFCRoAAAAD
electric-server: ElectricSQL/1.0.5
access-control-allow-origin: *
access-control-expose-headers: electric-cursor,electric-handle,electric-offset,electric-schema,electric-up-to-date
access-control-allow-methods: GET, HEAD, DELETE, OPTIONS
content-type: application/json; charset=utf-8
electric-schema: {"val":{"type":"text"}}

{"message":"Unable to retrieve shape log: ** (RuntimeError) Timed out while waiting for a table lock"}
Electric's log output
18:00:06.271 pid=<0.792.0> request_id=GDY6qbcQUwwFCRoAAAAD [info] GET /v1/shape
18:00:06.308 pid=<0.792.0> request_id=GDY6qbcQUwwFCRoAAAAD [info] Query String: table=items&offset=-1
18:00:06.345 pid=<0.792.0> request_id=GDY6qbcQUwwFCRoAAAAD [debug] Table {"public", "items"} found with 1 columns
18:00:06.349 pid=<0.790.0> [debug] Starting consumer for 74391704-1744646406349
18:00:06.383 pid=<0.788.0> [debug] 1 consumers of replication stream
18:00:06.387 pid=<0.790.0> [debug] Returning shape id 74391704-1744646406349 for shape Shape.new!("public.items" [OID 16390])
18:00:06.394 pid=<0.798.0> shape_handle=74391704-1744646406349 [debug] Starting a wait on the snapshot 74391704-1744646406349 for {#PID<0.792.0>, [:alias | #Reference<0.0.101379.1137765335.1537802241.148014>]}}
18:00:06.398 pid=<0.787.0> [info] Altering identity of public.items to FULL
18:00:10.128 pid=<0.760.0> [debug] Primary Keepalive: wal_end=9712335958078128128 (86C92618/80800000) reply=0
18:00:11.388 pid=<0.798.0> shape_handle=74391704-1744646406349 [error] Snapshot creation failed for 74391704-1744646406349 because of:
** (RuntimeError) Timed out while waiting for a table lock
    (elixir 1.18.1) lib/gen_server.ex:1128: GenServer.call/3
    (electric 1.0.6) lib/electric/replication/publication_manager.ex:101: Electric.Replication.PublicationManager.add_shape/2
    (opentelemetry_api 1.4.0) src/otel_tracer_noop.erl:59: :otel_tracer_noop.with_span/5
    (electric 1.0.6) lib/electric/telemetry/open_telemetry.ex:87: anonymous fn/3 in Electric.Telemetry.OpenTelemetry.do_with_span/4
    (telemetry 1.3.0) /home/alco/code/electric-sql/electric/packages/sync-service/deps/telemetry/src/telemetry.erl:324: :telemetry.span/3
    (electric 1.0.6) lib/electric/shapes/consumer/snapshotter.ex:64: anonymous fn/10 in Electric.Shapes.Consumer.Snapshotter.handle_continue/2
    (opentelemetry_api 1.4.0) src/otel_tracer_noop.erl:59: :otel_tracer_noop.with_span/5
    (electric 1.0.6) lib/electric/telemetry/open_telemetry.ex:87: anonymous fn/3 in Electric.Telemetry.OpenTelemetry.do_with_span/4
    (telemetry 1.3.0) /home/alco/code/electric-sql/electric/packages/sync-service/deps/telemetry/src/telemetry.erl:324: :telemetry.span/3
    (electric 1.0.6) lib/electric/shapes/consumer/snapshotter.ex:58: Electric.Shapes.Consumer.Snapshotter.handle_continue/2
    (stdlib 6.2) gen_server.erl:2335: :gen_server.try_handle_continue/3
    (stdlib 6.2) gen_server.erl:2244: :gen_server.loop/7
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
18:00:11.401 pid=<0.792.0> request_id=GDY6qbcQUwwFCRoAAAAD [info] Sent 500 in 5130ms
18:00:14.659 pid=<0.760.0> [debug] Primary Keepalive: wal_end=9712335958290266496 (86C92618/8D24F980) reply=1
18:00:15.972 pid=<0.760.0> [debug] Primary Keepalive: wal_end=9712335958346563568 (86C92618/907FFFF0) reply=0
18:00:21.387 pid=<0.776.0> [error] Postgrex.Protocol (#PID<0.776.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.787.0> timed out because it queued and checked out the connection for longer than 15000ms
#PID<0.787.0> was at location:
    :prim_inet.recv0/3
    (postgrex 0.19.0) lib/postgrex/protocol.ex:3298: Postgrex.Protocol.msg_recv/4
    (postgrex 0.19.0) lib/postgrex/protocol.ex:2292: Postgrex.Protocol.recv_bind/3
    (postgrex 0.19.0) lib/postgrex/protocol.ex:2147: Postgrex.Protocol.bind_execute_close/4
    (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.7.0) lib/db_connection.ex:1558: DBConnection.run_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:772: DBConnection.parsed_prepare_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:764: DBConnection.prepare_execute/4
    (postgrex 0.19.0) lib/postgrex.ex:316: Postgrex.query_prepare_execute/4
    (postgrex 0.19.0) lib/postgrex.ex:328: Postgrex.query!/4
    (electric 1.0.6) lib/electric/postgres/configuration.ex:155: anonymous fn/3 in Electric.Postgres.Configuration.set_replica_identity!/2
    (elixir 1.18.1) lib/enum.ex:2546: Enum."-reduce/3-lists^foldl/2-0-"/3
    (electric 1.0.6) lib/electric/postgres/configuration.ex:144: Electric.Postgres.Configuration.set_replica_identity!/2
    (electric 1.0.6) lib/electric/postgres/configuration.ex:138: Electric.Postgres.Configuration.configure_tables_for_replication_internal!/4
    (db_connection 2.7.0) lib/db_connection.ex:1756: DBConnection.run_transaction/4
    (electric 1.0.6) lib/electric/replication/publication_manager.ex:297: Electric.Replication.PublicationManager.update_publication/1
    (electric 1.0.6) lib/electric/replication/publication_manager.ex:218: Electric.Replication.PublicationManager.handle_info/2
    (stdlib 6.2) gen_server.erl:2345: :gen_server.try_handle_info/3
    (stdlib 6.2) gen_server.erl:2433: :gen_server.handle_msg/6
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
18:00:21.403 pid=<0.787.0> [warning] Failed to configure publication, retrying: %DBConnection.ConnectionError{message: "tcp recv: closed (the connection was closed by the pool, possibly due to a timeout or because the pool has been terminated)", severity: :error, reason: :error}
18:00:21.395 pid=<0.798.0> shape_handle=74391704-1744646406349 [error] GenServer {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.Shapes.Consumer, "74391704-1744646406349"}} terminating
** (stop) exited in: GenServer.call({:via, Registry, {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.Replication.PublicationManager, nil}}}, {:remove_shape, Shape.new!("public.items" [OID 16390])}, 5000)
    ** (EXIT) time out
    (elixir 1.18.1) lib/gen_server.ex:1128: GenServer.call/3
    (electric 1.0.6) lib/electric/replication/publication_manager.ex:117: Electric.Replication.PublicationManager.remove_shape/2
    (electric 1.0.6) lib/electric/shapes/consumer.ex:465: Electric.Shapes.Consumer.cleanup/1
    (electric 1.0.6) lib/electric/shapes/consumer.ex:188: Electric.Shapes.Consumer.terminate/2
    (stdlib 6.2) gen_server.erl:2393: :gen_server.try_terminate/3
    (stdlib 6.2) gen_server.erl:2594: :gen_server.terminate/10
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Process Label: {:consumer, "74391704-1744646406349"}
Last message: {:"$gen_cast", {:snapshot_failed, "74391704-1744646406349", %RuntimeError{message: "Timed out while waiting for a table lock"}, [{GenServer, :call, 3, [file: ~c"lib/gen_server.ex", line: 1128]}, {Electric.Replication.PublicationManager, :add_shape, 2, [file: ~c"lib/electric/replication/publication_manager.ex", line: 101]}, {:otel_tracer_noop, :with_span, 5, [file: ~c"src/otel_tracer_noop.erl", line: 59]}, {Electric.Telemetry.OpenTelemetry, :"-do_with_span/4-fun-1-", 3, [file: ~c"lib/electric/telemetry/open_telemetry.ex", line: 87]}, {:telemetry, :span, 3, [file: ~c"/home/alco/code/electric-sql/electric/packages/sync-service/deps/telemetry/src/telemetry.erl", line: 324]}, {Electric.Shapes.Consumer.Snapshotter, :"-handle_continue/2-fun-1-", 10, [file: ~c"lib/electric/shapes/consumer/snapshotter.ex", line: 64]}, {:otel_tracer_noop, :with_span, 5, [file: ~c"src/otel_tracer_noop.erl", line: 59]}, {Electric.Telemetry.OpenTelemetry, :"-do_with_span/4-fun-1-", 3, [file: ~c"lib/electric/telemetry/open_telemetry.ex", line: 87]}, {:telemetry, :span, 3, [file: ~c"/home/alco/code/electric-sql/electric/packages/sync-service/deps/telemetry/src/telemetry.erl", line: 324]}, {Electric.Shapes.Consumer.Snapshotter, :handle_continue, 2, [file: ~c"lib/electric/shapes/consumer/snapshotter.ex", line: 58]}, {:gen_server, :try_handle_continue, 3, [file: ~c"gen_server.erl", line: 2335]}, {:gen_server, :loop, 7, [file: ~c"gen_server.erl", line: 2244]}, {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 329]}]}}
State: %{monitors: [], buffer: [], registry: :"Elixir.Registry.ShapeChanges:single_stack", otel_ctx: %{:"$__otel_baggage_ctx_key" => %{"stack_id" => {"single_stack", []}}, {:otel_tracer, :span_ctx} => {:span_ctx, 0, 0, 0, {:tracestate, []}, false, false, false, :undefined}}, inspector: {Electric.Postgres.Inspector.EtsInspector, [stack_id: "single_stack", server: {:via, Registry, {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.Postgres.Inspector.EtsInspector, nil}}}]}, stack_id: "single_stack", storage: {Electric.ShapeCache.FileStorage, %Electric.ShapeCache.FileStorage{base_path: "./persistent/shapes/single_stack", shape_handle: "74391704-1744646406349", db: {:via, Registry, {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.ShapeCache.FileStorage, "74391704-1744646406349"}}}, data_dir: "./persistent/shapes/single_stack/74391704-1744646406349", cubdb_dir: "./persistent/shapes/single_stack/74391704-1744646406349/cubdb", snapshot_dir: "./persistent/shapes/single_stack/74391704-1744646406349/snapshots", log_dir: "./persistent/shapes/single_stack/74391704-1744646406349/log", stack_id: "single_stack", extra_opts: %{}, chunk_bytes_threshold: 25000000, version: 3}}, shape: Shape.new!("public.items" [OID 16390]), log_state: %{current_chunk_byte_size: 0, current_txn_bytes: 0}, shape_handle: "74391704-1744646406349", log_producer: {:via, Registry, {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.Replication.ShapeLogCollector, nil}}}, shape_status: {Electric.ShapeCache.ShapeStatus, %Electric.ShapeCache.ShapeStatus{root: "./shape_cache", shape_meta_table: :"single_stack:shape_meta_table", storage: {Electric.ShapeCache.FileStorage, %{stack_id: "single_stack", chunk_bytes_threshold: 25000000, base_path: "./persistent/shapes/single_stack"}}}}, latest_offset: LogOffset.last_before_real_offsets(), pg_snapshot: nil, snapshot_started: false, awaiting_snapshot_start: [{#PID<0.792.0>, [:alias | #Reference<0.0.101379.1137765335.1537802241.148014>]}], chunk_bytes_threshold: 25000000, publication_manager: {Electric.Replication.PublicationManager, [stack_id: "single_stack"]}, db_pool: {:via, Registry, {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.DbPool, nil}}}, run_with_conn_fn: &Electric.Shapes.Consumer.Snapshotter.run_with_conn/2, create_snapshot_fn: &Electric.Shapes.Consumer.Snapshotter.query_in_readonly_txn/7}
18:00:21.407 pid=<0.788.0> [debug] 0 consumers of replication stream
18:00:21.718 pid=<0.787.0> [info] Altering identity of public.items to FULL
18:00:22.950 pid=<0.760.0> [debug] Primary Keepalive: wal_end=9712335958656942072 (86C92618/A2FFFFF8) reply=0
18:00:31.288 pid=<0.760.0> [debug] Primary Keepalive: wal_end=9712335959026040832 (86C92618/B9000000) reply=0
18:00:36.707 pid=<0.772.0> [error] Postgrex.Protocol (#PID<0.772.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.787.0> timed out because it queued and checked out the connection for longer than 15000ms
#PID<0.787.0> was at location:
    :prim_inet.recv0/3
    (postgrex 0.19.0) lib/postgrex/protocol.ex:3298: Postgrex.Protocol.msg_recv/4
    (postgrex 0.19.0) lib/postgrex/protocol.ex:2292: Postgrex.Protocol.recv_bind/3
    (postgrex 0.19.0) lib/postgrex/protocol.ex:2147: Postgrex.Protocol.bind_execute_close/4
    (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.7.0) lib/db_connection.ex:1558: DBConnection.run_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:772: DBConnection.parsed_prepare_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:764: DBConnection.prepare_execute/4
    (postgrex 0.19.0) lib/postgrex.ex:316: Postgrex.query_prepare_execute/4
    (postgrex 0.19.0) lib/postgrex.ex:328: Postgrex.query!/4
    (electric 1.0.6) lib/electric/postgres/configuration.ex:155: anonymous fn/3 in Electric.Postgres.Configuration.set_replica_identity!/2
    (elixir 1.18.1) lib/enum.ex:2546: Enum."-reduce/3-lists^foldl/2-0-"/3
    (electric 1.0.6) lib/electric/postgres/configuration.ex:144: Electric.Postgres.Configuration.set_replica_identity!/2
    (electric 1.0.6) lib/electric/postgres/configuration.ex:138: Electric.Postgres.Configuration.configure_tables_for_replication_internal!/4
    (db_connection 2.7.0) lib/db_connection.ex:1756: DBConnection.run_transaction/4
    (electric 1.0.6) lib/electric/replication/publication_manager.ex:297: Electric.Replication.PublicationManager.update_publication/1
    (electric 1.0.6) lib/electric/replication/publication_manager.ex:218: Electric.Replication.PublicationManager.handle_info/2
    (stdlib 6.2) gen_server.erl:2345: :gen_server.try_handle_info/3
    (stdlib 6.2) gen_server.erl:2433: :gen_server.handle_msg/6
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
18:00:36.710 pid=<0.787.0> [warning] Failed to configure publication, retrying: %Postgrex.Error{message: nil, postgres: %{code: :query_canceled, line: "3425", message: "canceling statement due to user request", file: "postgres.c", unknown: "ERROR", severity: "ERROR", pg_code: "57014", routine: "ProcessInterrupts"}, connection_id: 92, query: nil}
18:00:37.026 pid=<0.787.0> [info] Altering identity of public.items to FULL
18:00:41.243 pid=<0.760.0> [debug] Primary Keepalive: wal_end=9712335959474831336 (86C92618/D3BFFFE8) reply=0
18:00:44.660 pid=<0.760.0> [debug] Primary Keepalive: wal_end=9712335959621560376 (86C92618/DC7EE838) reply=1
18:00:52.014 pid=<0.776.0> [error] Postgrex.Protocol (#PID<0.776.0>) disconnected: ** (DBConnection.ConnectionError) client #PID<0.787.0> timed out because it queued and checked out the connection for longer than 15000ms
#PID<0.787.0> was at location:
    :prim_inet.recv0/3
    (postgrex 0.19.0) lib/postgrex/protocol.ex:3298: Postgrex.Protocol.msg_recv/4
    (postgrex 0.19.0) lib/postgrex/protocol.ex:2292: Postgrex.Protocol.recv_bind/3
    (postgrex 0.19.0) lib/postgrex/protocol.ex:2147: Postgrex.Protocol.bind_execute_close/4
    (db_connection 2.7.0) lib/db_connection/holder.ex:354: DBConnection.Holder.holder_apply/4
    (db_connection 2.7.0) lib/db_connection.ex:1558: DBConnection.run_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:772: DBConnection.parsed_prepare_execute/5
    (db_connection 2.7.0) lib/db_connection.ex:764: DBConnection.prepare_execute/4
    (postgrex 0.19.0) lib/postgrex.ex:316: Postgrex.query_prepare_execute/4
    (postgrex 0.19.0) lib/postgrex.ex:328: Postgrex.query!/4
    (electric 1.0.6) lib/electric/postgres/configuration.ex:155: anonymous fn/3 in Electric.Postgres.Configuration.set_replica_identity!/2
    (elixir 1.18.1) lib/enum.ex:2546: Enum."-reduce/3-lists^foldl/2-0-"/3
    (electric 1.0.6) lib/electric/postgres/configuration.ex:144: Electric.Postgres.Configuration.set_replica_identity!/2
    (electric 1.0.6) lib/electric/postgres/configuration.ex:138: Electric.Postgres.Configuration.configure_tables_for_replication_internal!/4
    (db_connection 2.7.0) lib/db_connection.ex:1756: DBConnection.run_transaction/4
    (electric 1.0.6) lib/electric/replication/publication_manager.ex:297: Electric.Replication.PublicationManager.update_publication/1
    (electric 1.0.6) lib/electric/replication/publication_manager.ex:218: Electric.Replication.PublicationManager.handle_info/2
    (stdlib 6.2) gen_server.erl:2345: :gen_server.try_handle_info/3
    (stdlib 6.2) gen_server.erl:2433: :gen_server.handle_msg/6
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
18:00:52.018 pid=<0.787.0> [warning] Failed to configure publication, retrying: %Postgrex.Error{message: nil, postgres: %{code: :query_canceled, line: "3425", message: "canceling statement due to user request", file: "postgres.c", unknown: "ERROR", severity: "ERROR", pg_code: "57014", routine: "ProcessInterrupts"}, connection_id: 115, query: nil}
18:00:52.333 pid=<0.787.0> [info] Altering identity of public.items to FULL

It keeps trying to alter public.items's identity until it succeeds.

Oddly enough, psql at some point fails with an error and the transaction rolls back:

PANIC:  could not write to file "pg_wal/xlogtemp.109": No space left on device
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
[] :!>? \q

$ psql postgresql://postgres:password@localhost:54321/electric

[localhost] postgres:electric=# select count(1) from items;
 count 
───────
     0
(1 row)

The device in question is probably related to some limit in Docker because the memory usage of neither psql nor Postgres grows noticeably and I have plenty of disk space on my machine:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/dm-0       953G   97G  849G  11% /
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs            16G   31M   16G   1% /dev/shm
efivarfs        566K  289K  273K  52% /sys/firmware/efi/efivars
tmpfs           6.3G  2.9M  6.3G   1% /run
/dev/dm-0       953G   97G  849G  11% /home
/dev/nvme1n1p2  974M  380M  527M  42% /boot
tmpfs            16G   84M   16G   1% /tmp
/dev/nvme1n1p1  599M   20M  580M   4% /boot/efi
tmpfs           3.2G   23M  3.1G   1% /run/user/1000

Attempt no. 2

Leaving Electric running, I try to insert 100 mln rows once again. At this point Electric has one active shape for the items table.

[localhost] postgres:electric=# insert into items select generate_series::text from generate_series(1, 100000000);
ERROR:  could not extend file "base/16384/16390.3": No space left on device
HINT:  Check free disk space.
Electric's log output
18:09:59.511 pid=<0.697.0> [info] Starting replication from postgres
18:09:59.519 pid=<0.697.0> [debug] Primary Keepalive: wal_end=9712335959621560377 (86C92618/DC7EE839) reply=0
18:09:59.520 pid=<0.733.0> shape_handle=74391704-1744646406349 [debug] Snapshot known for shape_handle: 74391704-1744646406349 xmin: 759, xmax: 759, xip_list: 
18:10:03.973 pid=<0.733.0> shape_handle=74391704-1744646406349 [debug] Snapshot started shape_handle: 74391704-1744646406349
18:10:03.974 pid=<0.735.0> shape_handle=74391704-1744646406349 [debug] Opening snapshot chunk 0 for writing
18:10:29.514 pid=<0.697.0> [debug] Primary Keepalive: wal_end=9712335958076722936 (86C92618/806A8EF8) reply=1
18:10:59.514 pid=<0.697.0> [debug] Primary Keepalive: wal_end=9712335959381264616 (86C92618/CE2C48E8) reply=1
18:11:29.628 pid=<0.697.0> [debug] Primary Keepalive: wal_end=9712335960680346648 (86C92619/1B9AB418) reply=1
18:11:30.231 pid=<0.697.0> [error] :gen_statem {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.Postgres.ReplicationClient, nil}} terminating
** (Postgrex.Error) ERROR 53100 (disk_full) could not write to data file for XID 753: No space left on device
    (stdlib 6.2) gen_statem.erl:3864: :gen_statem.loop_state_callback_result/11
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Process Label: :replication_client
Queue: [info: {:tcp, #Port<0.13>, <<69, 0, 0, 0, 146, 83, 69, 82, 82, 79, 82, 0, 86, 69, 82, 82, 79, 82, 0, 67, 53, 51, 49, 48, 48, 0, 77, 99, 111, 117, 108, 100, 32, 110, 111, 116, 32, 119, 114, 105, 116, 101, 32, 116, 111, 32, ...>>}]
Postponed: []
State: {:no_state, %Postgrex.ReplicationConnection{protocol: %Postgrex.Protocol{sock: {:gen_tcp, #Port<0.13>}, connection_id: 131, connection_key: -1960164997, peer: {{127, 0, 0, 1}, 54321}, types: {Postgrex.DefaultTypes, #Reference<0.3807092993.2077097985.188528>}, null: nil, timeout: 15000, ping_timeout: 15000, parameters: #Reference<0.3807092993.2076966913.188638>, queries: #Reference<0.3807092993.2077097985.188635>, postgres: :idle, transactions: :naive, buffer: "", disconnect_on_error_codes: [], scram: %{auth_message: "n=,r=plpCbHjyaZg0VM2Lenf+4vVd,r=plpCbHjyaZg0VM2Lenf+4vVd/hpi3iyEMnaQL4VEmIa6m0Et,s=gdcurhnueiDTw3IpnKHduA==,i=4096,c=biws,r=plpCbHjyaZg0VM2Lenf+4vVd/hpi3iyEMnaQL4VEmIa6m0Et", iterations: 4096, salt: <<129, 215, 46, 174, 25, 238, 122, 32, 211, 195, 114, 41, 156, 161, 221, 184>>}, disable_composite_types: false, messages: []}, state: {Electric.Postgres.ReplicationClient, %Electric.Postgres.ReplicationClient.State{stack_id: "single_stack", connection_manager: #PID<0.386.0>, transaction_received: {Electric.Replication.ShapeLogCollector, :store_transaction, [{:via, Registry, {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.Replication.ShapeLogCollector, nil}}}]}, relation_received: {Electric.Replication.ShapeLogCollector, :handle_relation_msg, [{:via, Registry, {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.Replication.ShapeLogCollector, nil}}}]}, publication_name: "electric_publication_default", try_creating_publication?: true, start_streaming?: false, slot_name: "electric_slot_default", slot_temporary?: false, display_settings: [], origin: "postgres", txn_collector: %Electric.Postgres.ReplicationClient.Collector{transaction: nil, tx_op_index: nil, relations: %{}}, step: :streaming, applied_wal: 9712335960680346648}}, auto_reconnect: false, reconnect_backoff: 500, streaming: 500}}
Callback mode: :handle_event_function, state_enter: false
18:11:30.259 pid=<0.386.0> [debug] Handling the exit of the replication client #PID<0.697.0> with reason %Postgrex.Error{message: nil, postgres: %{code: :disk_full, line: "3997", message: "could not write to data file for XID 753: No space left on device", file: "reorderbuffer.c", unknown: "ERROR", severity: "ERROR", pg_code: "53100", routine: "ReorderBufferSerializeChange"}, connection_id: 131, query: nil}
18:11:30.259 pid=<0.386.0> [warning] Reconnecting in 2000ms
18:11:32.260 pid=<0.386.0> [debug] Starting replication client for stack single_stack
18:11:32.273 pid=<0.736.0> [debug] ReplicationClient step: pg_info_query
18:11:32.273 pid=<0.386.0> [info] Reconnection succeeded after 2014ms
18:11:32.275 pid=<0.736.0> [info] Postgres server version = 170001, system identifier = 7493198447926501411, timeline_id = 1
18:11:32.275 pid=<0.736.0> [debug] ReplicationClient step: create_publication_query
18:11:32.276 pid=<0.736.0> [debug] ReplicationClient step: create_slot
18:11:32.277 pid=<0.736.0> [debug] Found existing replication slot
18:11:32.277 pid=<0.736.0> [debug] ReplicationClient step: set_display_setting
18:11:32.277 pid=<0.736.0> [debug] ReplicationClient step: set_display_setting
18:11:32.278 pid=<0.736.0> [debug] ReplicationClient step: set_display_setting
18:11:32.291 pid=<0.736.0> [debug] ReplicationClient step: set_display_setting
18:11:32.292 pid=<0.736.0> [debug] ReplicationClient step: set_display_setting
18:11:32.293 pid=<0.736.0> [debug] ReplicationClient step: start_streaming
18:11:32.293 pid=<0.736.0> [debug] ReplicationClient step: start_replication_slot
18:11:32.293 pid=<0.736.0> [info] Starting replication from postgres
18:11:33.325 pid=<0.736.0> [debug] Primary Keepalive: wal_end=9712335960680346649 (86C92619/1B9AB419) reply=0
18:12:03.324 pid=<0.736.0> [debug] Primary Keepalive: wal_end=9712335958046330696 (86C92618/7E9ACF48) reply=1
18:12:33.325 pid=<0.736.0> [debug] Primary Keepalive: wal_end=9712335959345128320 (86C92618/CC04E380) reply=1
18:13:03.512 pid=<0.736.0> [debug] Primary Keepalive: wal_end=9712335960648904552 (86C92619/19BAEF68) reply=1
18:13:04.828 pid=<0.736.0> [error] :gen_statem {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.Postgres.ReplicationClient, nil}} terminating
** (Postgrex.Error) ERROR 53100 (disk_full) could not write to data file for XID 753: No space left on device
    (stdlib 6.2) gen_statem.erl:3864: :gen_statem.loop_state_callback_result/11
    (stdlib 6.2) proc_lib.erl:329: :proc_lib.init_p_do_apply/3
Process Label: :replication_client
Queue: [info: {:tcp, #Port<0.34>, <<69, 0, 0, 0, 146, 83, 69, 82, 82, 79, 82, 0, 86, 69, 82, 82, 79, 82, 0, 67, 53, 51, 49, 48, 48, 0, 77, 99, 111, 117, 108, 100, 32, 110, 111, 116, 32, 119, 114, 105, 116, 101, 32, 116, 111, 32, ...>>}]
Postponed: []
State: {:no_state, %Postgrex.ReplicationConnection{protocol: %Postgrex.Protocol{sock: {:gen_tcp, #Port<0.34>}, connection_id: 157, connection_key: -783689893, peer: {{127, 0, 0, 1}, 54321}, types: {Postgrex.DefaultTypes, #Reference<0.3807092993.2077097985.188528>}, null: nil, timeout: 15000, ping_timeout: 15000, parameters: #Reference<0.3807092993.2076966925.185296>, queries: #Reference<0.3807092993.2077097997.185293>, postgres: :idle, transactions: :naive, buffer: "", disconnect_on_error_codes: [], scram: %{auth_message: "n=,r=JEJQ9Z248C6kVt9muLqEA/QY,r=JEJQ9Z248C6kVt9muLqEA/QYRQLBQYGs4ZGCOm5Bl/aB9Otp,s=gdcurhnueiDTw3IpnKHduA==,i=4096,c=biws,r=JEJQ9Z248C6kVt9muLqEA/QYRQLBQYGs4ZGCOm5Bl/aB9Otp", iterations: 4096, salt: <<129, 215, 46, 174, 25, 238, 122, 32, 211, 195, 114, 41, 156, 161, 221, 184>>}, disable_composite_types: false, messages: []}, state: {Electric.Postgres.ReplicationClient, %Electric.Postgres.ReplicationClient.State{stack_id: "single_stack", connection_manager: #PID<0.386.0>, transaction_received: {Electric.Replication.ShapeLogCollector, :store_transaction, [{:via, Registry, {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.Replication.ShapeLogCollector, nil}}}]}, relation_received: {Electric.Replication.ShapeLogCollector, :handle_relation_msg, [{:via, Registry, {:"Elixir.Electric.ProcessRegistry:single_stack", {Electric.Replication.ShapeLogCollector, nil}}}]}, publication_name: "electric_publication_default", try_creating_publication?: true, start_streaming?: false, slot_name: "electric_slot_default", slot_temporary?: false, display_settings: [], origin: "postgres", txn_collector: %Electric.Postgres.ReplicationClient.Collector{transaction: nil, tx_op_index: nil, relations: %{}}, step: :streaming, applied_wal: 9712335960648904552}}, auto_reconnect: false, reconnect_backoff: 500, streaming: 500}}
Callback mode: :handle_event_function, state_enter: false
18:13:04.830 pid=<0.386.0> [debug] Handling the exit of the replication client #PID<0.736.0> with reason %Postgrex.Error{message: nil, postgres: %{code: :disk_full, line: "3997", message: "could not write to data file for XID 753: No space left on device", file: "reorderbuffer.c", unknown: "ERROR", severity: "ERROR", pg_code: "53100", routine: "ReorderBufferSerializeChange"}, connection_id: 157, query: nil}
18:13:04.830 pid=<0.386.0> [warning] Reconnecting in 2000ms

An observation

I was surprised to see that during the executing of the INSERT statement, in addition to the Postgres OS process consuming 100% CPU there was also the walsender process also consuming 100% CPU at the same time, spamming Electric with Primary keepalive messages periodically. Presumably the walsender process starts preparing a transaction for streaming even though it's not yet committed.

htop displays the walsender command as

postgres: walsender postgres electric 192.168.80.1(43510) START_REPLICATION

Another oddity as that long after psql has given up and printed the Check free disk space. HINT, the walsender process keep using 100% CPU and at some point during that post-insert processing Electric logs the out of disk space error it gets from Postgrex.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    pFad - Phonifier reborn

    Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

    Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


    Alternative Proxies:

    Alternative Proxy

    pFad Proxy

    pFad v3 Proxy

    pFad v4 Proxy