Exploring The Oracle Latches
Exploring The Oracle Latches
RDTEX, Russia
MEDIAS - 2011
May 8-15
Who am I
• Andrey.Nikolaev@rdtex.ru
• http://andreynikolaev.wordpress.com
for non-Oracle
auditory
Oracle RDBMS
performance improvements timeline:
v. 2 (1979): the first commercial SQL RDBMS
v. 3 (1983): the first database to support SMP
v. 4 (1984): read-consistency, Database Buffer Cache
v. 5 (1986): Client-Server, Clustering, Distributing Database, SGA
v. 6 (1988): procedural language (PL/SQL), undo/redo, latches
v. 7 (1992): Library Cache, Shared SQL, Stored procedures, 64bit
v. 8/8i (1999): Object types, Java, XML
v. 9i (2000): Dynamic SGA, Real Application Clusters
v. 10g (2003): Enterprise Grid Computing, Self-Tuning, mutexes
v. 11g (2008): Results Cache, SQL Plan Management, Exadata
v. 12c (2011): ?Cloud? Not yet released … to be continued
Oracle Database Architecture: Overview
• More then 100 books on Amazon. Need for mainstream science support!
• Complex and variable workloads. Every database is unique.
• Complex internals. 344 "Standard" / 2665 "Hidden" tunable parameters.
• Complicated physical database and schema design decisions.
• Concurrency and Scalability issues.
• Insufficient developers education.
• "Database Independence" issues.
• Self-tuning anomalies. SQL plan instabilities.
• OS and Hardware issues.
• More than 10 million bug reports on MyOracleSupport.
Oracle is well instrumented software:
Oracle instance hangs due to heavy "cache buffers chains" latch contention
The presentation goals:
Struct ksllt{
Struct ksupr {
…
…
}
Struct kslla{
ksllt *ksllalat[14];
}
…}
struct ksllt {
<Latch>
7.3.4 92 - 120
8.0.6 104 - 104
8.1.7 104 144 104
9.0.1 ? 200 160
9.2.0 196 240 200
10.1.0 ? 256 208
10.2.0 - 11.2.0.2 100 160 104
“Rising level” rule leads to “trees” of processes waiting for and holding the
latches:
ospid: 28067 sid: 1677 pid: 61
holding: 3800729f0 'shared pool' (156) level=7 child=1 whr=1602 kghupr1
waiter: ospid: 129 sid: 72 pid: 45
holding: a154b7120 'library cache' (157) level=5 child=17 whr=1664 kglupc: child
waiter: ospid: 18255 sid: 65 pid: 930
waiter: ospid: 6690 sid: 554 pid: 1654
waiter: ospid: 4685 sid: 879 pid: 1034
…
waiter: ospid: 29749 sid: 180 pid: 155
holding: a154b7db8 'library cache' (157) level=5 child=4 whr=1664 kglupc: child
waiter: ospid: 13104 sid: 281 pid: 220
waiter: ospid: 24089 sid: 565 pid: 636
waiter: ospid: 25002 sid: 621 pid: 1481
waiter: ospid: 16930 sid: 1046 pid: 783
Direct SGA access program output for 9.2.0.6 instance with too small shared pool.
Waiting for the latch
S G A
Latch
CPU 1 CPU 2
Latch Acquisition in Wait Mode
• 0.01-0.01-0.01-0.03-0.03-0.07-0.07-0.15-0.23-0.39-0.39-
0.71-0.71-1.35-1.35-2.0-2.0-2.0-2.0...sec
[( N wait + 1) / 2 ]
• timeout = 2 −1
• Typical latch holding time is 10 musec!
kslgetl(0x50006318, 1)
-> sskgslgf(0x50006318)= 0 -immediate latch get
-> kslges(0x50006318, ...) -wait latch get
-> skgslsgts(...,0x50006318, ...) -spin latch get
->sskgslspin(0x50006318)
... - repeated 20000 cycles = 10*_SPIN_COUNT!
-> kskthbwt(0x0)
-> kslwlmod() - set up Wait List
-> sskgslgf(0x50006318)= 0 -immediate latch get
-> skgpwwait -sleep latch get
semop(11, {17,-1,0}, 1)
Contemporary latch spins and waits
• "Right" method: tune the application and reduce the latch demand. Tune
the SQL, bind variables, schema, etc… Many brilliant books exist on this
topic. Out of scope for this work.
• Nowadays the CPU power is cheap. We may already have enough free
CPU resources. The spin count tuning may be beneficial.
• Processes spin for exclusive latch spin upto 20000 cycles, for shared
latch upto 4000 cycles and infinitely for mutex. Tuning may find more
optimal values for your application.
• Oracle does not explicitly forbid spin count tuning. However, change of
undocumented parameter should be discussed with Support.
Spin count adjustment
Shared latches:
• Spin count can be adjusted dynamically by _SPIN_COUNT parameter.
• Good starting point is the multiple of default 2000 value.
• Setting _SPIN_COUNT parameter in initialization file, should be
accompanied by _LATCH_CLASS_0="20000". Otherwise spin for
exclusive latches will be greatly affected by next instance restart.
Exclusive latches:
• Spin count adjustment by _LATCH_CLASS_0 parameter needs the
instance restart.
• Good starting point is the multiple of default 20000 value.
• It may be preferable to increase the number of "yields" for class 0 latches.
Tuning spin count efficiently
• Spin count tuning will only be effective if the latch holding time S is
in its normal microseconds range
• The number of spinning processes should remain far less then the
number of CPUs. Analyze AWR and latch statistics before and after each
change.
• It is a common myth that CPU time will raise infinitely while we increase
spin count. Actually the process will spin up to "residual latch holding
time"
• Elapsed time to acquire the latch will decrease while the latch "holding
time" is less then OS "context switch time"
Latch spin CPU time
The spin probes latch holding time distribution. The spin time distribution is
discontinuous at _SPIN_COUNT: Ps
1
0.8
0.6
0.4
0.2
tђdelta
0.5 1 1.5 2
Oracle normally operates in this region of small latch sleeps ratio κ = 1 − σ < 0.1
Here spin count is greater than number of instructions protected by latch
The spin time is bounded by the "residual latch holding time" and spin count:
Sleep prevents latch from waste CPU for spinning for heavy
tail of holding time distribution
Exponential tail spin scaling
• Questions?
• Comments?
Acknowledgements
Andrey Nikolaev
http://andreynikolaev.wordpress.com
Andrey.Nikolaev@rdtex.ru
www.rdtex.ru