0% found this document useful (0 votes)
165 views40 pages

Interesting Facts About RAC

Uploaded by

Black Swan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views40 pages

Interesting Facts About RAC

Uploaded by

Black Swan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 40

Interesting Facts about RAC

Paper #227

Ashok Singh
Fastenal Company
Winona,MN-55987
asingh@fastenal.com
Objectives

 What is Real Application


Cluster?
 Cluster Manager(9i) ,Cluster
Ready Services(10g)
 Cluster Files System
 Cache Fusion and Interconnect
 Routine Administration of RAC
Why Real Application
Cluster?

 Total Cost of ownership


 True Scalability
 Ease Of Administration
 High Availability
 Transparent to users
 Foundation for GRID
computing
What is Real Application
Cluster

 More than one instance per database.


 Instances are running on different nodes(hybrid).
 Nodes may have multiple CPU’s
 Instances write to the same physical Database
 Data files and Control file on a shared storage
 Shared storage may have a Clustered file system or use
raw devices
 Successor to Oracle Parallel Server (OPS)
What is Real Application
Cluster
 Every Instance will have its own redo log files and Undo
segments
 Every Instance has its own background process
 All Cache (Data,Library ,Data Dictionary ) are
synchronized by Cache Fusion .
 Backup ,Recovery and Instance Recovery are slightly
different than single-instance
 Users can connect to any active instance
 Sessions failover using Transparent Application Failover
(TAF)
 Additional waits due to interconnect traffic .
Real Application Clusters

NODE A Node B
Data Files
Backgound Processes Background Processes
Control Files
Listener Listener
Spfile
Undo Tablespace Undo Tablespace
Redo logs (shared Storage) Redo logs (Shared Storage)
Clustered File System/
Archive Logs (shared Storage) Archive Logs (Shared Storage)
Raw Device
Operating System Operarting Systems
Oracle Binaries Oracle Binaries
Physical Layout
Racnode A

Update Srvm/OCR

Create Database Racnode B

Oracle Binaries

Cluster Manager (9i)


Cluster Ready
Services (10G) Configuration
Oracle cluster File
Oracle cluster File
System
System
Raw Devices
Raw Devices
Operating System Operating System

/usr01/oradata/WMSD
/usr02oradata/WMSD
Voting Disk and Oracle Cluster
Registry
Shared Storage

 Storage Area Network (SAN)


 Cluster Files System / Raw devices
 CFS provides ease of administration
 OCFS -free for Linux and windows
-has a GUI tool OCFSTOOL
-Integrates well with RAC
-data files, control files, archived log files
-install ‘fileutils-4.1-4.2.i386.rpm’ for new
cp and dd
Cluster Manager

 Oracle Cluster Manager (9i) : Linux and Windows


 ORACM runs at the OS level on all the nodes
 Accepts registration of Oracle Instance
 Cluster Manager is installed on all the nodes
 Uses Hangcheck timer. Checks the health of the cluster
at every Hangcheck tick
 Hangcheck timer replaces Watchdog daemon
 Implemented as a kernel module so it is much less
affected by system load
 Uses quorum disk (voting ) to evict a node
 GSD –performs manageability tasks for the databases
Cluster Ready Services (10g)

 Available for all OS,full stack clusterware


 Primarily responsible for HA architecture
 Uses OCSSD,OCRSD and EVMD
 Uses OCR to store instance information
 Uses voting disk to maintain the cluster
 GSD still exists to store 9i instances
 Uses VIP for enhanced failover to increase HA
 Needs to be installed in a separate home
 Procedure to stop /start CRS
 ocrcheck ,ocrdump, ocrconfig – showbackup
Cluster Ready Services

 Change mesg_logging_level = 3 in the


$ORACLE_HOME/crs/srvm/admin/ocrlog.ini for detailed
logging regarding OCR .
 CRS manages VIP used by clients to connect to RAC
 CRS will relocate the VIP address of the failed node to a
surviving node.
 Clients may not notice this failover
 To move back –the node will need a reboot or instance
start or re-start of the node applications .
 Use oifcfg to change and store the new IP in the OCR.
new IP
Installing 9i and 10g together !

 With a little extra effort it can be done .


 Install cluster Manager (9i) first .
 Stop GSD but keep ORACM still executing
 Now ,Install 10g CRS + Oracle Binaries
 Use srvctl from 9i to manage 9i DB and from 10g to manage
10g DB
 Very helpful for test & dev environments
 For details refer to my technical note on OTN
http://www.oracle.com/technology/pub/notes/
technote_singh_crs.html
Interconnects

 Critical component of RAC


 Use the private network for communication with other
nodes
 Connected via switch to other nodes
 Enhanced technology has helped Cache Fusion
 Ideally –low latency ,low CPU consumption and high
transfer rate(millions/second)
 E.g. Gigabit ethernet, VIA … and UDP,TCP/IP…
 Alert file contains more info about Interconnect
 New wait events due to traffic over interconnect
 Interconnect may use the public network (10.1.0.2)
Oracle Database

 Installation could be a Challenge !


 Use DBCA or create manually
 Create database at least once manually –to understand
things better
 Configure listeners ,enterprise manager ..
 Use srvctl to register the database in the OCR
 Check alert.log and v$resource_limit for more learning
 Explore $ORACLE_HOME/srvm/admin for some scripts
Some new Parameters

 Instance_number
 Thread
 Cluster_database
 Remote_listener
 Local_listener
 Cluster_database_instance
 Parallel_instance
 Instance_group
 Cluster_interconnects
Cache Fusion Architecture

 Revolutionary concept ,removes inefficiencies , enhances


performance and scalability
 Use GC_FILES_TO_LOCKS for old cache fusion behavior
 Cache Fusion provides a single buffer Cache through the
Interconnect .
 Data Dictionary and Library Cache are synced
 Cache Fusion model uses –
Global Cache Service (GCS)
Global Enqueue Service (GES)
Global Resource Directory (GRD)
Cache Fusion :Resource Modes
and Roles

lock mode : Null (N), Shared (S) , Exclusive (E) Lock Role : Local (L)
Global (G) Past Image (PI) :(1) PI exists , (0) No PI exists

The different combinations could be one of the following :


 NL0 Null Local and No past Images
 SL0 Shared Local with no past image
 XL0 Exclusive Local with no past image
 NG0 Null Global - Instance owns current block image
 SG0 Global Shared Lock - Instance owns current image
 XG0 Global Exclusive Lock - Instance own current image
 NG1 Global Null – Another Instance Owns the Past image Block.
 SG1 Shared Global – Another Instance owns past image
 XG1 Global Exclusive Lock – Another Instance owns Past Image.
Cache Fusion Model

 Global Resource Directory :


-in SGA ,maintained by GES and GCS
-used at instance recovery and cluster reconfigure.
-latest locations of a block(node)
-modes and roles of the block
 Enqueues : serializes access to any resource
 Past Image (PI) :Whenever a dirty block is sent to any
remote cache using CF, a copy is kept in the local cache
(in case of failures)
 Dynamic Remastering of resources ,enhances scalability
_lm_dynamic_remastering (false)
Buffers States

 Buffers can have many States (gv$bh):


- e.g. free, xcur, scur, cr, read, mrec, irec, wri, pi
• Only one current copy per dba but many in CR state
- limited by _db_block_max_cr_dba (6) and 10 cs
• Light_work_rule : This rule is applied when the cost of
creating a CR block is high .
• Fairness_down_converts : X Lock is converted to
null depending upon the _fairness_threshold (deflt 4)
• Chk values of light_works & fairness_down_converts
from gv$bsp OR gv$cr_block_server
Background Processes

 LMON: GES monitors global enqueues and resources


between nodes.
- monitors LMD
- performs recovery in case of failure
- global enqueues calculated at startup

 LMD :Manages enqueues and updates GRD.


- Co-ordinates with remote LMD’s

 DIAG :Captures diagnostic data regarding process


failures.
Background Processes

 LMSn: (GCS)

 Up to 10(9i),20(10g) depends on the messaging


(9i cpu_count/4 10g gcs_server_processes)
 Co-ordinates activity on data blocks
 ensures that updates are performed on the latest block.
 builds a read-consistent image if required
 satisfies requests from remote LMS
 locates,prepares and transmits a data block

 LCK: handles resource requests not related to


dictionary cache and library cache .
Cache Fusion :Read From Disk
Cache Fusion :Read to Read
Cache Fusion :Read to Write
Cache Fusion :Write to Write
Checkpoint /Writing to Disk

 At log switch/checkpoint the following action takes


place :
 GCS will notify all nodes with PI of the block.
 DBWR of the most current PI will perform the write to
disk.
 BWR (block written record ) may not be the checkpoint
node.
 Resources modified i.e. Global to local

GRD is updated
Transparent Application Failover

 Transparent to users,defined in tnsnames


 Service name is used instead of SID
 Service will have many listeners and instances
 Supports load-balancing and failover
 Currently,failover is not supported for JDBC
 Failover modes (defined in connect_data)
-type ---session,select,none(default)
-method –basic or preconnect
-retries and delay
 Load_balance :YES,NO,OFF,TRUE
 GV$SESSION –failover_type,failover_method,failed_over
 New Connections depends upon Node load,Instance load and
dispatchers (if MTS)
Instance Recovery in RAC

 Failure is detected by Cluster manager or CGS


 SMON will complete the Instance Recovery
 GRD is frozen
 All resources on the lost are re-distributed
 GES resources are reconfigured first
 GCS recovery is done in two pass
- recovery set is prepared (list of blocks) and
locks requested on these blocks
-blocks recovered and made available
Performance Tuning

 All techniques used in single instance databases are also


applicable here
 Global Cache is made up multiple local Cache.
 Cache Fusion introduces additional waits
 Waits -gc_cr_request,gc_cache_busy, enqueues
 Statspacks (7) needs to be executed on all nodes
 Level 7–identify segments for interconnect traffic
 Statspack report has cluster specific statistics
 GCS workload Characteristics

 GES summary statistics

 GCS and GES messaging statistics

 GES detail statistics


Performance Tuning

 Calculate and Monitor :


 Current Block average Latency
 Block Mode conversions
 GCS Request latency
 Avg Global Enqueue Time
 Avg lock convert time
 Populate ges_convert_local and remote by enabling event
29700
 Set fast_start_mttr_target to reduce PI and CR
 If using 10g – Use ADDM Analysis
 ADDM is proactive ,captures RAC related service Issues
 AWR reports –use awrrpti.sql for RAC
 Use GV$sysstat,gv$system_event,gv$...transfer
gv$bsp,gv$cr_block_server
Backup and Recovery

 RAC and RMAN integrate well .


 Conceptually ,same as single instance
 Archived logs should be on CFS and readable by all
nodes .
 RMAN will connect to a single instance as Target
 Channels can be allocated on all nodes for performance
 RMAN will autolocate –data,control & archieved
 Recovery may be tricky, bring up only one instance
Rman:Backups

 racnode2:oracle> rman target /


connected to target database: WMST (DBID=957552121)
RMAN> @bck.sh
 select inst_id,module from gv$session where module like 'back%'
INST_ID MODULE
2 backup incr datafile: ORA_DISK_2
1 backup incr datafile: ORA_DISK_1
2 rows selected
 The script :Bck.sh
configure channel 1 device type disk format ‘…..’
connect 'sys/XXXX@wmst1' ;
configure channel 2 device type disk format ‘…..’
connect 'sys/XXXX@wmst2' ;
backup incremental level 0 database ;
Release channel 1,2 ;
Reverse Key Indexes

 To preserve the B-Tree structure keys are inserted into


specific blocks.
 Many rows are inserted in this block
 Many nodes may insert into the same block
 OPS introduced RKI to avoid contention
 More prominent if sequences are used
 dbms_utility.current_instance will place leaf blocks away
to avoid this contention
 Contention will reduce scalibility
 Contention may lead to gcs and ges waits
Parallelism in RAC

 When dealing with large volume of data


 When we have extra CPU and memory
 /*+ parallel (ps_item,2,2) */
 QC will always be on the initiating node
 Slaves can be on others nodes too
 Configured using instance_group and
parallel_instance_group
System Change Numbers

 Generated Globally,Unique across all instances


 Written to redo logs for synchronization
 Two schemes –Lamport / Broadcast_on_commit
 Depends on max_commit_propogation_delay(mcpd)
 Lamport (9i) / Latch_free SCN scheme 2 (10g)
- default ,mcpd = 700 centiseconds
- no overhead ,all messages contain scn
- not suitable for rapid and simultaneous update/query
 Broadcast_on_commit(mcpd=0) :some overhead as on commit
LGWR will send a msg to other nodes
Freelist Group vs ASSM

 Primarily ,to avoid buffer_busy_waits


 Freelist Group is especially useful for RAC
- to reduce contention on segment header
- space usage may not be very efficient
 Starting 9i ,ASSM may be a good solution
- internal and automatic maintenance
- good when nodes are added
- not tunable, space ,slower for FTS
- Contradictory results
Other Features

 Logminer – no continuous_mine
 Enterprise Manager
 Dbms_jobs –instance is a parameter
 Checkpoints –local /global
 Log Switches
 Crontabs
 Sequence –Cache + No Order
 GV$Views—catclust.sql ,inst_id
 External files ,utl_file
Question & Answers

Questions?
Thank You

Please complete the evaluation form


“Interesting Facts about RAC”
Paper # 227
Ashok Singh
Fastenal,Winona,MN-55987
asingh@fastenal.com
Disclaimer

The content provided as papers and


presentations from the IOUG conferences is
copyrighted by the authors of information,
and has been licensed to the IOUG .It is only
authorized for the personal use of IOUG
members and IOUG conference attendees
directly through the IOUG web site
.Downloading the files,placing them on other
web sites or sharing them with other
individuals is prohibited.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy