Skip to content

Commit b9a51b2

Browse files
Fix lock ordering issue for rb_ractor_sched_wait() and rb_ractor_sched_wakeup()
In rb_ractor_sched_wait() (ex: Ractor.receive), we acquire RACTOR_LOCK(cr) and then thread_sched_lock(cur_th). However, on wakeup if we're a dnt, in thread_sched_wait_running_turn() we acquire thread_sched_lock(cur_th) after condvar wakeup and then RACTOR_LOCK(cr). This lock inversion can cause a deadlock with rb_ractor_wakeup_all() (ex: port.send(obj)), where we acquire RACTOR_LOCK(other_r) and then thread_sched_lock(other_th). So, the error happens: nt 1: Ractor.receive rb_ractor_sched_wait() after condvar wakeup in thread_sched_wait_running_turn(): - thread_sched_lock(cur_th) (condvar) # acquires lock - rb_ractor_lock_self(cr) # deadlock here: tries to acquire, HANGS nt 2: port.send ractor_wakeup_all() - RACTOR_LOCK(port_r) # acquires lock - thread_sched_lock # tries to acquire, HANGS To fix it, we now unlock the thread_sched_lock before acquiring the ractor_lock in rb_ractor_sched_wait(). Script that reproduces issue: ```ruby require "async" class RactorWrapper def initialize @Ractor = Ractor.new do Ractor.recv # Ractor doesn't start until explicitly told to # Do some calculations fib = ->(x) { x < 2 ? 1 : fib.call(x - 1) + fib.call(x - 2) } fib.call(20) end end def take_async @ractor.send(nil) Thread.new { @ractor.value }.value end end Async do |task| 10_000.times do |i| task.async do RactorWrapper.new.take_async puts i end end end exit 0 ``` Fixes [Bug #21398] Co-authored-by: John Hawthorn <john.hawthorn@shopify.com>
1 parent ff77473 commit b9a51b2

File tree

2 files changed

+44
-8
lines changed

2 files changed

+44
-8
lines changed

test/ruby/test_ractor.rb

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,45 @@ def test_require_non_string
144144
RUBY
145145
end
146146

147+
# [Bug #21398]
148+
def test_port_receive_dnt_with_port_send
149+
assert_ractor(<<~'RUBY', timeout: 30)
150+
THREADS = 10
151+
JOBS_PER_THREAD = 50
152+
ARRAY_SIZE = 20_000
153+
def ractor_job(job_count, array_size)
154+
port = Ractor::Port.new
155+
workers = (1..4).map do |i|
156+
Ractor.new(port) do |job_port|
157+
while job = Ractor.receive
158+
result = job.map { |x| x * 2 }.sum
159+
job_port.send result
160+
end
161+
end
162+
end
163+
jobs = Array.new(job_count) { Array.new(array_size) { rand(1000) } }
164+
jobs.each_with_index do |job, i|
165+
w_idx = i % 4
166+
workers[w_idx].send(job)
167+
end
168+
results = []
169+
jobs.size.times do
170+
result = port.receive # dnt receive
171+
results << result
172+
end
173+
results
174+
end
175+
threads = []
176+
# creates 40 ractors (THREADSx4)
177+
THREADS.times do
178+
threads << Thread.new do
179+
ractor_job(JOBS_PER_THREAD, ARRAY_SIZE)
180+
end
181+
end
182+
threads.each(&:join)
183+
RUBY
184+
end
185+
147186
def assert_make_shareable(obj)
148187
refute Ractor.shareable?(obj), "object was already shareable"
149188
Ractor.make_shareable(obj)

thread_pthread.c

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1351,23 +1351,20 @@ rb_ractor_sched_wait(rb_execution_context_t *ec, rb_ractor_t *cr, rb_unblock_fun
13511351
}
13521352

13531353
thread_sched_lock(sched, th);
1354+
rb_ractor_unlock_self(cr);
13541355
{
13551356
// setup sleep
13561357
bool can_direct_transfer = !th_has_dedicated_nt(th);
13571358
RB_VM_SAVE_MACHINE_CONTEXT(th);
13581359
th->status = THREAD_STOPPED_FOREVER;
13591360
RB_INTERNAL_THREAD_HOOK(RUBY_INTERNAL_THREAD_EVENT_SUSPENDED, th);
13601361
thread_sched_wakeup_next_thread(sched, th, can_direct_transfer);
1361-
1362-
rb_ractor_unlock_self(cr);
1363-
{
1364-
// sleep
1365-
thread_sched_wait_running_turn(sched, th, can_direct_transfer);
1366-
th->status = THREAD_RUNNABLE;
1367-
}
1368-
rb_ractor_lock_self(cr);
1362+
// sleep
1363+
thread_sched_wait_running_turn(sched, th, can_direct_transfer);
1364+
th->status = THREAD_RUNNABLE;
13691365
}
13701366
thread_sched_unlock(sched, th);
1367+
rb_ractor_lock_self(cr);
13711368

13721369
ubf_clear(th);
13731370

0 commit comments

Comments
 (0)
pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy