0% found this document useful (0 votes)
6 views

Locking in Linux kernel

The document discusses locking mechanisms in the Linux kernel, emphasizing the importance of synchronization to avoid race conditions and ensure data integrity. It outlines various locking techniques, such as atomic operations, spin locks, semaphores, and reader-writer locks, and explains their appropriate use cases. Additionally, it highlights the significance of understanding kernel control paths and the implications of concurrency in coding practices.

Uploaded by

nyuxiao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Locking in Linux kernel

The document discusses locking mechanisms in the Linux kernel, emphasizing the importance of synchronization to avoid race conditions and ensure data integrity. It outlines various locking techniques, such as atomic operations, spin locks, semaphores, and reader-writer locks, and explains their appropriate use cases. Additionally, it highlights the significance of understanding kernel control paths and the implications of concurrency in coding practices.

Uploaded by

nyuxiao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 62

Locking in Linux kernel

Galois @ USTC Linux Users Group


zyf11@mail.ustc.edu.cn

Slides are powered by


OpenOffice.org+Linux+GNU+X31-150$
Copyright © 2005 Galois Y.F. Zheng

Permissions is granted to copy, distribute and/or modify this


document under the terms of the GNU Free Documentation
License, Version 1.2 or any later version published by the
Free Software Foundation; with no Invariant Sections, no
Front-Cover Texts, and no Back-Cover Texts. A copy of the
license can be downloaded from GNU's home:
http://www.gnu.org/licenses/fdl.txt
Locking in Linux kernel

• OS review: Kernel Control Paths


• Locking in Linux
• Locking and Coding
• Conclusions
OS review: Kernel Control Paths
cpu, operating system, and human

• CPU is stupid, running endlessly!

• But the codes in RAM is intelligent.


(They compete for CPU resources.)

• We people control the codes.


endless rotating elevator: CPU
2
2nd floor

2
3

1
“CPU” running endlessly 3

entrance 4 4
1st floor
Human exit

1 2 ... 3 4 : Kernel Control Path 1


X : Codes 1

X : Codes 2 1 2 ... 3 4 : Kernel Control Path 2


OS review: Kernel Control Paths
Pre-KCP: What is Kernel Control Path ?

• Interrupt handlers
• Exception handlers
• User-space threads in kernel(system calls)
• Kernel threads(idle, work queue, pdflush…)
• Bottom halves(soft irq, tasklet,BH...)
Post-KCP: Is mm subsystem a KCP?
NO, But mm codes are called by KCP.
OS review: Kernel Control Paths
What is the composition of a kernel?

• Kernel Control Paths(KCP).


• Kernel Data (global or local).
• Kernel Codes called by the KCPs.
• Bootstrap Codes, Initialization Codes, ...

Now we need Locking (between KCPs), Let's GO!


Locking in Linux kernel

• OS review: Kernel Control Paths


• Locking in Linux
• Locking and Coding
• Conclusions
Locking in Linux
What is Locking? A Simple example.

int i = 0
KCP 1: KCP 2:

load i; // i = 0
... load i; // i = 0
inc i; inc i;
store i; // i = 1 ...
store i; // i = 1

i = 1, wrong
The result is wrong because of accessing “i” at the same time.
Locking in Linux
What is Locking? A Simple example. (cont.)

int i = 0
KCP 1: KCP 2:
<locking starts>
load i; // i = 0 <locking starts>... failed.
inc i; waiting..
store i; // i = 1 waiting..
<locking ends> <locking succeeds>
load i; //now i = 1
inc i;
store i; // i = 2
<locking ends> x86 lock directive.:)
i = 2, right!
Locking in Linux

Another example...
My heart
will panic.

! ! Give me
Give me ! your heart.
your heart. !

Heart has no locking: a disordered world.


My heart will
go on with you
Sorry.
Give me I has been locked. Give me Sigh...
your heart. Locking failed
your heart.
and lock it.. Waiting...
xixi ...
Locking succeeds.

Locking: the world is well-ordered.


now I'm ok.
Give me Now I can
Game over!
your heart. Lock you.

Locking releases.

Locking: the world is well-ordered.


Now we know that
locking
makes the world romantic and beautiful.
Locking in Linux
• What is Locking. (cont.)
– Shared Data
– Does Code need locking ? - Yes, surprising!
• Critical Regions
– Concurrency caused by Kernel Control Paths
• Race Conditions ? - Must be avoided.
– Now we needs Synchronization . - Locking.
Locking in Linux
Concurrency and Locking : another example.

KCP 1 KCP 2
• Locking the queue • Locking the queue
• Succeeded: acquired lock • Failed: waiting…
• Access queue • Waiting…
• Unlock the queue • …
• Succeeded: acquired lock
• Access queue
• Unlock the queue.
Cited from LKD by R. Love
Locking in Linux

What Causes Concurrency?

• Interrupts and Exceptions.


• Sleeping and synchronization.
• Kernel Preemption.
• SMP ! - a hot topic, even in embedded apps.
Locking in Linux

Locking(Sync.) is important.
Let's get into the Locking details.
Locking in Linux

Various Locking mechanisms.

• 1 Atomics operations
• 2 Memory barriers
• 3 Spin locks
• 4 Reader-writer spin locks
• 5 Semaphores
• 6 Reader-writer semaphores
Locking in Linux
Various Locking mechanisms. (cont.)

– 7 Condition(Completion) Variables
– 8 Sequence locks
– 9 Mask Interrupts(local and global)
– 10 Mask Bottom Halves
– 11 Disable Kernel Preemption
– 12 Read-Copy Update

– Big Kernel Lock - Historical, will be removed
– FUTEX ? - NO
1 Atomics Operations
• atomic ops is for the concurrency caused by MP. not for other
concurrencies caused by preemption, sleep...
• atomic operations mechanisms(SMP env):
– cpu guaranteed atomic ops: read/write a byte, alined word...
– lock prefix: add, adc, and, cmpxchg, cmpxch8b, dec, inc, neg, not,
or, sbb, sub ,xor, xadd, btc,bts, btr
– xchg is automatically added lock prefix.
– cache coherency protocols.
1 Atomics Operations
<asm/atomic.h> <asm/bitops.h>

• atomic integer ops: (on atomic_t type v)


– atomic_read(v) v->counter not necessary
– atomic_set(v) v->counter not necessary
– atomic_add(i,v) v->counter+i lock;addl %1,%0
– ...
• atomic bitwise ops:
– set_bit(i,addr) set the i-th bit lock;btsl %1,%0
– clear_bit(i,addr) clear the i-th bit lock;btrl %1,%0
– test_and_set_bit lock;btsl %2,%1;sbbl %1,%0
– ...
• pseudo atomic bitwise ops: carefully!
– __set_bit(), __xxx() ..... there is no lock prefix.
2 Memory Barriers basics
• gcc optimizes instruction streams.
• 386 is strong ordering, where read and write are issued on the system
bus in the order they occur..but pentium 4 is processor ordering, by
which cpu could improve performance.
• memory barriers hardware technologies(x86):
– serializing instructions
• mov(to control register/debug register), wrmsr, invd, invlpg,
wbinvd,lgdt,lldt,lidt,ltr;
• cpuid,iret,rsm (non-previledged)
• sfence(store), mfence(all), lfence(load) (non-preveledged)
– io instructions, read/write to uncached memory, interrupt ocurrence,
lock prefix
– mtrr and pat could control memory ordering.
2 Memory Barriers Methods
<asm/system.h>

• rmb(), prevents loads being reordered


• read_barrier_depends(), prevents data-dependent loads being reordered.
• wmb(), prevents stores being reordered.
• mb(), prevents loads and stores being reordered.

• barrier(), prevents GCC optimize loads and stores.

• smp_xxx(), on smp, provides xxx; on up provides barrier()
Note: “xxx” refers to rmb, wmb...
3 Spin locks
<linux/spinlock.h><asm/spinlock.h>

• Spinning on SMP. Spinning is null on UP.


• Don't hold it for a long time. less than context
switch time.
• spinlock automatically disables preemption, which
avoids deadlock caused by interrupts.
• when data is shared with interrupt handler, before
holding spinlock we must disable interrupts.
• when data is shared with bottom halves, before
holding spinlock we must disable bottom halves.
4 Spin Locks
(cont.)

• spin_lock() acquire lock


• spin_unlock() release lock
• spin_lock_irq() disable local interrupts and acquire lock
• spin_unlock_irq()
• spin_lock_irqsave() save current state of ints, ...
• spin_lock_irqrestore() restore....
• ...
5 Reader-writer spin locks
<asm/spinlock.h><linux/spinlock.h>

• Writing demands mutual exclusion.


• Multiple concurrent Readings is ok.
• When Reading, Writing must be disabled.

• Reading locks and writing locks are seperated.
• read_lock_xxx() read_unlock_xxx()
• write_lock_xxx() write_unlock_xxx()
• ... ...
• Problems: This locks favor readers over writers, which may
starve pending writers.
6 Semaphores
<asm/semaphore.h><arch/xxx/kernel/semaphore.c>

• Checking (struct semaphore*)->count, dec&inc is spinlocked.


• when initial count > 1, it allows arbitrary number of lock
holders. when initial count = 1, it is binary semaphore, also
called mutex which is used in many places.
• It is sleeping locks.
• Threads may sleep while holding semaphores.
• Threads can't acquire semaphores while holding spin lock.

• down() threads get into uninterruptible state
• down_interruptible(), threads get into interruptible state
• up() inc count, if count<=0, wake up waiting thread
• ...
7 Reader-writer semaphores
<linux/rwsem.h>

• WE can understand it.



• down_read(), down_read_trylock()
• up_read()
• down_write(), down_write_trylock()
• up_write()

• NOTE: unlike rw-spinlock, we can downgrade from
writelock to readlock.
Spin locks VS. semaphores
(recommended)

• low overhead locking, spinlock


• short lock hold time , spinlock
• long lock hold time , semaphore
• for interrupt context use, spin lock
• sleep while holding lock, semaphore
8 Condition(Completion) Variables
<linux/completion.h><kernel/sched.c>

• It is a very simple solution to a problem that semaphore


could resolve otherwise. but maybe it is not wise to fix
semaphore.
• It just checks a condition to decide what to do: sleep(wake
up) or continue(null). sleeping+spinning==>cv
• It is mainly for SMP.

• only 2 functions:
• wait_for_completion() if ok, then continue, else wait.
• complete() signal any waiting threads.
Semaphore VS. Con.Varible
down() wait_for_completion()
lock; dec %0 spin_lock(cv->wait.lock)
... //wait queue ops;
spin_lock(sem->wait.lock) //may unlock spin and sleep
//..., wait queue ops; //dec cv->done
spin_unlock(sem->wait.lock) spin_unlock(cv->wait.lock)

up() complete()
lock; inc %0 spin_lock(cv->wait.lock)
... //inc cv->done
spin_lock(sem->wait.lock) //wait queue ops;
//..., wait queue ops; spin_unlock(cv->wait.lock)
spin_unlock(sem->wait.lock)

complex and seperated locking simple and totally spinlocked


9 Sequence Locks
<linux/seqlock.h>
• For this situation: data has many readers and a few writers. like
RCU mechanism
• Unlike reader-writer locks, seqlock favors writers over readers.
• Readers never blocks, but have to retry for arbitray times if a
writer is in progress.
• Writers are mutually exclusive to change data, which is like spin
locks. But writers do not wait for readers.
do {
write_seqlock_xxx(); seq = read_seqbegin_xxx(seq);
// change data... // read data ...
write_sequnlock_xxx(); } while (read_seqretry_xxx(seq))

Writers Readers
10 Mask interrupts(local and global)
<linux/interrupt.h><asm/system.h><kernel/irq/manage.c><asm/processor.h>

• Deal with CPU IF flag. which disable all interrupts of local


CPU (cli and sti instructions.)
• Masking PIC's irq line is another story. It makes serial
execution of same interrupt. but it could not prevent the
preemption from other interrupt.
• local_irq_disable(), local_irq_enable()
• Do you remember: spin_lock_irq() ? Disabling interrupts
are used with spin_lock().

• Global disabling: cruel! I don't know wheather removed. but
we can use synchronize_irq() to synchronize all CPUs.
11 Mask Bottom Halves
<linux/interrupt.h>

• when data is shared with bottom halves,


maybe we need to disable bottom halves.

• local_bh_disable(), local_bh_enable():
– calling add_preempt_count()
• spin_lock_bh()
12 Kernel Preemption Disable
<linux/preempt.h>
• preemption points:
– interrupt return path,
– arbitrary preemption points in kernel codes.

• preempt_disable() and preempt_enable()
preempt_disable();
int cpu = get_cpu();
// manipulating per_cpu(xxx, cpu);
// xxx is per_cpu data, such as runqueues.
preempt_enable

Thread 1, running on CPU 0


13 Read-Copy Updates
<linux/rcupdate.h>

• Best for read-mostly linked list(struct list_head).


• another Reader-Writer lock, but more complex and
advantaged.
• Reader will not block.

read grace period Transient state update


(transition) (spec point)

(old copy)
write create a copy
Change new copy
Big Kernel Lock: history
• Linux 2.0 - BKL about 1996 - SMP
• BSD/OS 4.x:
• FreeBSD 4.x: XXX – Giant (2000 -)
• goal : fine-grained locking

• Dragonfly BSD: forked from FreeBSD 4.x
• goal: lockless mem allocator and scheduling system
FUTEX
• Fast User Space Mutex
• It's for user-space threads synchronization.
• It's not a locking mechanism for kernel.
• It is implemented in kernel.
Relation of different locks implementations
simple complex
mem barriers

mask interrupts seq lock

con.variable

atomics ops spin lock semaphore


(rw) (rw)

preempt disable
RCU
disable_bh
Locking in Linux kernel

• Kernel Control Paths


• Locking in Linux
• Locking and Coding
• Conclusions
Locking and Coding
• Is the data shared? Can other threads(contexts) access it?
• Is the data per-CPU’s? Can other CPUs access it? *
• Is the data shared between threads context and interrupt context? Is it
shared between two different interrupt handlers? …
• If a context is preempted while accessing this data, can the newly
scheduled context access the same data?
• Can the current context sleep on anything while accessing the data? If
it does, what state does that leave the shared data in?
• Does the data has special application? Keep in mind. *

• Now LET’S Continue CODING!


Locking and Coding
• Interrupt safe
• Preempt safe
• SMP safe
– (preempt safe ≦SMP safe)
Locking between various KCPs
• Exceptions..
• Interrupts..
• Bottom Halves..
• Kernel threads..
• System calls by user space threads..
1 between exception contexts
(UP:sleeping locks, SMP:+0)

• 1. Exception could not be caused in kernel. If any kernel


codes trigger an exception, this is a bug.
• 2. BUT page_fault and float-point registers exceptions
• 3. Exceptions could be caused by user-space codes.
• 4. According to 1st item, exception contexts could not
trigger another exceptions, including page_fault and float-
point registers exceptions. But exception contexts could be
preempted by interrupts, and after interrupts return ,the
preempted exceptions continue on same CPU.
• 5 so we could conclude that sleeping locking are enough.
2 between interrupts contexts
(UP:mask local interrupts, SMP:+spinlock)

• Interrupts contexts have no kernel stack. It


could not sleep. Do not use sleeping locks.
• Same interrupt context runs serially on same
CPU because irq_desc->handler.ack() in do_IRQ() masks
the irq line. On UP, This situation is simple.
• Same or different interrupts could be
triggered on different CPUs, so SMP
requires spinlock to prevent race condition.
3 between Bottom Halves
(UP:null, SMP:+spinlock)

• Do not use old BH mechanism, it has poor performance and


has been removed in 2.6.
• Softirqs could not been preempted, except by interrupts. so
on UP, there is no race conditions.
• Bottom Halves could not sleep like interrupts for the same
reasons.
• Same or different softirqs could run on different CPUs.
• Tasklets are based on softirqs. Only different tasklets could
run on different CPUs.
• From above descriptions, we can conclude that on SMP
softirqs and different tasklets should be protected with
spinlocks, same tasklet could be used locklessly.
4 between exceptions and interrupts/bh
(UP: mask interrupts, SMP:+spinlocks)

• Interrupts could not be preempted by exceptions, if


this situation happens, this is a bug!
• So exceptions could disable interrupts to avoid
preemption by interrupts.

• bh is like interrupts, it is executed in interrupt
contexts.
• However, exceptions could use local_bh_disable()
to disable bottom halves.
5 between BottomHalves and interrupts
(UP: mask interrupts, SMP: spinlock)

• Bottom halves could use disabling interrupts


to avoid concurrency.
• for SMP, spinlock is necessary and enough.
6 between kernel threads and interrupts/bh
(UP: mask interrupts, SMP:+spinlock)

• Interrupts could preempt threads. so disable


interrupts to protect data used by threads.
• Because interrupts could not be preempted,
so we use spinlock.
7 between threads
(spinlock or sleeping lock)

• NOTE: in 2.6, spinlock automatically disabling


preemptions.
• what to use: spinlock or sleeping lock?

low overhead locking, spinlock


short lock hold time, spinlock
long lock hold time, semaphore
sleep while holding lock, semaphore
8 between system calls
(spinning lock or sleeping lock)

• This is same as between kernel threads.


Locking used between various KCPs
UP SMP+
exceptions -----------------------------sleepinglock null
interrupts ------------------------------mask interrupts spinlock
bottom halves -------------------------null spinlock or null
exceptions and interrupts/bh --------mask interrupts spinlock
bottom halves and interrupts -------mask interrupts spinlock
kernel threads and interrupts/bh ----mask interrupts spinlock
kernel threads ------------------------- sleeping or spin lock
system calls --------------------------- sleeping or spin lock
Kernel Configuration Tree and Debug
<make menuconfig>

• arch/xxx/Kconfig (mainmenu, <menu,endmenu>*)


– arch/xxx/Kconfig.debug
• lib/Kconfig.debug

– init/Kconfig
– fs/Kconfig.binfmt
– fs/Kconfig
– drivers/Kconfig.binfmt
– lib/Kconfig
– ...

• CONFIG_DEBUG_KERNEL
– CONFIG_DEBUG_SPINLOCK, CONFIG_SPINLOCK_SLEEP
– CONFIG_DEBUG_STACKOVERFLOW, CONFIG_4KSTACKS
– CONFIG_KDB(patches)
– ...
Locking in Linux kernel

• Kernel Control Paths


• Locking in Linux
• Locking and Coding
• Conclusions
Conclusions
• Locking or synchronization is a complex
problem, especially for large and/or complex
system.

• The problem caused by Locking in kernel is


not entirely predictive.
Locking: What is the problem?
• Implementing the actual locking in the code
to protect shared data is not hard.

• The tricky part is identifying the actual


shared data and corresponding critical
sections.

Cited from LKD, by R. Love


Locking: What is the problem?
• Deadlocks
• Priority Inversion
• Locking latency
• Locking: Coarse or fine-grained.
– Scalability VS. Overheads(performance).
– Not only Linux has the dilemma.
– Let’s keep close eyes at DragonflyBSD's progress
References
• Linux kernel source tree by Linus Torvalds and various
patches by hackers.
• Linux Kernel Development. by Robert Love.
• Understanding the Linux Kernel. by Daniel Bovet etc.
• www.freebsd.org/smp
• www.dragonflybsd.org
• .../kernel/Documents/*, google, gcc document...
• Pentium 4 software development document(3 volumes).
Thanks
• USTC BBS embedded board master: dj
• All the organizers and/or friends of the USTC 2005
developer workshop of embedded system.
• USTC Linux Users Group.
Happy Life, Happy Hacking.
THANKS

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy