Data Guard Basics
Data Guard Basics
Agenda
Site Failures Power failure Air conditioning failure Flooding Fire Storm damage Hurricane Earthquake Terrorism Sabotage Plane crash Planned Maintenance HUMAN ERROR
Primary Instance
Standby
Redo
Instance
Database
Database
Site 1
Site 2
Physical Standby Technology introduced in Oracle 7.2 Marketed as Data Guard in Oracle 8.1.7 and above Standby is identical copy of primary database Redo changes transported from primary to standby applied on standby (Redo Apply) Can switch operations to standby Planned (switchover / switchback) Unplanned (failover) Failover time dependent on various factors Rate of redo generation / size of redo logs Redo transport / apply configuration
Introduced in Oracle 9.2 Subset of database objects Redo copied from primary to standby Changes converted into logical change records (LCR) Logical change records applied on standby (SQL Apply) Standby database can be opened for updates Can modify propagated objects Can create new indexes for propagated objects May need larger system for logical standby LCR apply can be less efficient than redo apply Array updates on primary become single row updates on standby
Three protection modes: Maximum protection - zero data loss Redo synchronously transported to standby database Redo must be applied to at least one standby before transactions on primary can be committed Processing on primary is suspended if no standby is available Maximum availability - minimal data loss Similar to maximum protection mode If no standby database is available processing continues on primary Maximum performance (default) Redo asynchronously shipped to standby database If no standby database is available processing continues on primary
ARCH background process Copies completed redo log files to standby LGWR background process - modes are: ASYNC - asynchronous Oracle 10.1 and below redo written by LGWR to dedicated area in SGA read from SGA by LNSn background process Oracle 10.2 and above redo written by LGWR to local disk read from disk by LNSn background process SYNC - synchronous Redo written to standby by LGWR - modes are: AFFIRM - wait for confirmation redo written to disk NOAFFIRM - do not wait
Standby Database
MRP LSP
LGWR
RFS
Standby Database
ARC0
LOG_ARCHIVE_DEST_1
ARC1
ARCn
Standby Database
MRP LSP
LGWR
RFS
Standby Database
LNSn
ARCn
LOG_ARCHIVE_DEST_1
ARCn
Standby Database
MRP LSP
LGWR
LNSn
RFS
Standby Database
ARCn
LOG_ARCHIVE_DEST_1
ARCn
There are two types of role transition Switchover Planned failover to standby database Original primary becomes new standby Original standby becomes new primary No data loss Can switchback at any time Failover Unplanned failover to standby database Original standby becomes new primary Original primary may need to be rebuilt Possible data loss
Database
Database
Database
Database
Primary Database
Standby Database
Standby Database
Primary Database
Database
Database
Database
Database
Primary Database
Standby Database
Unavailable
Physical standby database can be opened in read-only mode (Managed) Recovery must be suspended Reports can use temporary tablespaces Sorts Temporary tables Reports cannot modify permanent objects Failover times may be affected Suspended redo must be applied
Delay in redo application can be configured Redo is transported immediately Provides protection against site failure Redo is not applied immediately Provides protection against human error Increases potential failover times
In Oracle 10.1 and above flashback database can be used as an alternative to delayed redo application
Introduced in Oracle 9.2 Stable in Oracle 10.2 and above Managed using DGMGRL utility Contains Data Guard configuration Additional layer of complexity Used by Enterprise Manager to manage standby Mandatory for some new functionality e.g. Fast Start Failover
Primary Node 1
Observer
Standby Node 2
Site3
Database Database
Site1
Site2
Detects failure of primary database Automatically fails over to nominated standby database Requirements include Flashback logging must be configured DGMGRL must be used Observer process running in third independent site Highly available in Oracle 11.1 and above MAXIMUM AVAILABILITY protection mode Standby database archive log destination must be configured as LGWR SYNC MAXIMUM PERFORMANCE protection mode Oracle 11.1 and above Primary database can potentially be reinstated automatically Using flashback logs
Advantages No interconnect network required between sites No storage network required between sites RAC licences not required if each site is a single-instance Disadvantages Active / Passive Requires Enterprise Edition licence Remaining infrastructure must also failover Network Application tier Clients
Snapshot Standby Standby can be converted to snapshot standby Can be opened in read-write mode (for testing) Redo transport continues Redo apply delayed Standby can subsequently be converted back to physical standby Active Data Guard Separately licensed option Updates applied to primary Changes can be read immediately on standby databases Standby database can be opened in read-only mode Redo can continue to be applied
Standby database nodes must by fully licensed Same metric as primary (named user, CPU etc)
Standard Edition Cannot use Data Guard Use user-defined scripts to transport redo Use Automatic Recovery to apply redo Manually resolve archive log gaps
Enterprise Edition Use Managed Recovery to apply redo Use Fetch Archive Logging to resolve archive log gaps Additional licenses required for Active Data Guard
SAN level Replication technologies Netapp SnapMirror, MetroCluster EMC SRDF, Mirrorview HP StorageWorks
Redo log replication technologies Quest Shareplex
Many sites run physical standbys Well proven technology Spare capacity on standby often used for development or testing during normal operations Relatively few sites run a logical standby Streams is much more popular Many sites enable flashback logging In both development and production environments Very few using Automatic Failover Very few sites working with Oracle 11g yet Consequently none using Active Data Guard
Failover times Normally dependent on management decisions Usually some investigation before failover Time to failover database is minimal (5-10 minutes) Time to failover infrastructure can be hours Network configuration DNS Application / web servers Clients Failover SLAs often up to 48 hours Rebuild times Can take minutes using flashback logging Can take much longer depending on reason for failover