0% found this document useful (0 votes)
3 views51 pages

Oow2018 Plannedmaintenance

The document discusses strategies for achieving Continuous Availability in Oracle databases, emphasizing that maintenance can occur without impacting users. It outlines the importance of proper configuration, automation of maintenance tasks, and the use of services for location transparency to ensure seamless operations. Key techniques include draining sessions, utilizing application continuity, and automating maintenance processes to minimize downtime and errors during scheduled maintenance events.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views51 pages

Oow2018 Plannedmaintenance

The document discusses strategies for achieving Continuous Availability in Oracle databases, emphasizing that maintenance can occur without impacting users. It outlines the importance of proper configuration, automation of maintenance tasks, and the use of services for location transparency to ensure seamless operations. Key techniques include draining sessions, utilizing application continuity, and automating maintenance processes to minimize downtime and errors during scheduled maintenance events.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

No Outages:

Maintenance without Impact

Troy Anthony, Ian Cookson, Carol Colrain


Oracle Database Development
October, 2018

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |


Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 2


Program Agenda

1 Continuous Availability
2 Are your building blocks in place?
3 Automating Maintenance
4 Patching OJVM
5 Customer Story - Epsilon

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 3


From High Availability to Continuous Availability
High Availability Continuous Availability

• Minimizes downtime • No downtime for users


• In-flight work is lost • In-flight work is preserved
• Rolling maintenance at DB • Maintenance is hidden
• Predictable runtime performance • Predictable performance
• Errors may be visible • Errors visible only if unrecoverable
• Designed for single failure • Designed for multiple failures
• Basic HA building blocks • Builds on top of HA
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 4
Continuous Availability
Continuous Availability is not Absolute Availability.
Probable outages and maintenance events at the database level are masked from the
application, which continues to operate with no errors and within the specified response
time objectives while processing these events.
Key points:
• Planned maintenance and likely unplanned outages are hidden from applications
• There is neither data loss nor data inconsistency, guaranteed
• Majority of work (% varies by customer) completes within recovery time SLA
• May appear as a slightly delayed execution

Many customers are achieving Continuous Availability Today

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 5


No Outages: Maintenance without Impact
There is no reason for users to see
downtime during scheduled database
maintenance
• Service is unavailable Family Holidays website is down
• Application owners unable to agree "Our online transaction services are currently
maintenance windows unavailable. Our server may be temporarily down
or we may be performing routine maintenance
• Long running jobs see errors functions scheduled every Sunday from 12 a.m.
to 5 a.m. (Eastern Standard Time). We apologize
• DBA’s and engineers work off hours
for any inconvenience."
• Application and middleware components
need to be restarted

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 6


How Do We Solve This?

• Move work to different instance/database with no errors reported to


applications
• Transparent to applications and mid-tiers
• Support all server side maintenance
– Patches, PSUs, upgrades, repairs, changes, unplug/plug, migration, expansion, h/w
replacement
• Configure once, same for all commands

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 7


Are Your Building Blocks in Place?

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 8


What Needs to be Configured?

Which Server Stack for me? • RAC or RAC One, Active Data Guard, GoldenGate, GDS

Flex ASM • All databases on Flex ASM

Services • Services for Location Transparency

Continuous Connections • Connections appear continuous

Draining • FAN or 18c Database for draining

Inflight work continues • Application Continuity

SLA’s • Drain in a timely manner

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 9


Draining and Failover Locally – Switchover between sites
Fast Application Notification
Production Site – Notify draining, failover, load balancing
Transparent Application Continuity
– Application HA Active Data Guard
RAC Global Data Services – Scheduled switchover
– Online Rolling Maintenance – Cross Site Placement – Data Protection, DR
– Scalability – Query Offload
– Server HA
Data Guard
RAC One – Scheduled switchover
– Online Rolling Maintenance – Data Protection, DR
– Server HA

Drain within RAC GoldenGate


Drain within RAC – Scheduled switchover
– Active-active replication
Switchover – Heterogeneous
ADG
Sharding
– Massive OLTP
– Scheduled switchover
– Active-active replication
– Heterogeneous

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 10


TIP: Use Services for Location Transparency
Services provide a “dial in number” for your application
• Regardless of location, application
keeps the name

• Moving, reshaping, prioritizing


controls how a service is offered

• Batch and OLTP separated

• DB and PDB names for admin only


RAC instance RAC instance

Node 1 Node 2 OLTP Batch


service service

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 11


TIP: Drain sessions before maintenance

Drain connections where


applications do not notice
• Web requests
• Transaction boundaries
• Connection tests
• Connection pools
• Custom rules

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 12


Moving Work Since Oracle Database 10.2
Repeat for each service allowing time to drain
• Stop service
srvctl stop service –db .. -instance .. -service ..

• Relocate service
srvctl relocate service –db .. -service .. –oldinst .. –newinst
srvctl relocate service –db .. -service .. –currentnode.. –targetnode

Wait for sessions to drain…


For remaining sessions, stop transactional
exec DBMS_APP_CONT.disconnect_session( ‘… your service ..‘,
DBMS_APP_CONT.POST_TRANSACTION);

Stop the instances using your preferred tool


Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 13
TIP: Use Oracle Pools – Full Lifecycle
Drain and Rebalance
Oracle – WebLogic Active GridLink, UCP, ODP.NET
managed and unmanaged, OCI Session Pool, FAN Planned
Applications using … Tuxedo, CMAN TDM
3rd party App Servers using UCP: IBM WebSphere,
Apache Tomcat, NEC WebOTX, Red Hat JBoss, Spring Drain &
rebalance
DBA Step srvctl [relocate|stop] service (no –force)

Immediately new work is redirected


Gradually
Sessions Drain
Active sessions are released when returned to
pools

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 14


TIP: UCP with other Java-based Application Servers
A simple data source replacement

• IBM WebSphere
• Apache Tomcat
• NEC WebOTX
• Red Hat JBoss
• Spring

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 15


TIP: Configure TNS for Continuous Connections
Everything Oracle uses FAN FAN is Auto-Configured
JDBC Universal Connection Pool DESCRIPTION =
OCI/OCCI driver (CONNECT_TIMEOUT=90)
(RETRY_COUNT=20)(RETRY_DELAY=3)
ODP.NET Unmanaged Provider (OCI) (TRANSPORT_CONNECT_TIMEOUT=3)
(ADDRESS_LIST =
ODP.NET Managed Provider (C#)
(LOAD_BALANCE=on)
OCI Session Pool ( ADDRESS = (PROTOCOL = TCP)
(HOST=primary-scan)(PORT=1521)))
WebLogic Active GridLink
(ADDRESS_LIST =
Tuxedo (LOAD_BALANCE=on)
(ADDRESS = (PROTOCOL = TCP)
JDBC Thin Driver (HOST=standby-scan)
(PORT=1521)))
CMAN in Traffic Director mode (CONNECT_DATA=(SERVICE_NAME=gold)))

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 16


TIP: Are You Receiving FAN Notifications?
• Create a FAN callout in GRID_HOME/racg/usrco
• Download FANwatcher from OTN oracle.com/goto/ac

FANwatcher
..
VERSION=1.0 event_type=SERVICEMEMBER service=orcl_swing_pdb2 instance=orcl1 database=orcl
db_domain= host=sun01 status=down reason=USER timestamp=2014-07-30 12:02:51 timezone=-07:00
VERSION=1.0 event_type=SERVICEMEMBER service=orcl_swing_pdb10 instance=orcl1 database=orcl
db_domain= host=sun01 status=down reason=USER timestamp=2014-07-30 12:02:52 timezone=-07:00
VERSION=1.0 event_type=SERVICE service=orcl_swing_pdb10 database=orcl db_domain= host=sun01
status=down reason=USER

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 18


Draining by the Oracle Database
Connection Tests, All Drivers

Plan B
Application or
mid-tier tests The application “tests” the
connections
connection
Database responds connection is
bad
Database marks New work continues on another
sessions to drain connection
Stop or relocate services/PDBs
Tip: enable in
DBMS_APP_CONT
view in DBA_CONNECTION_TESTS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 19
TIP: Enable Connection Tests for Application Servers
Application Server Test Name Connection Test to DB

TestConnectionsOnReserve isUsable
Oracle WebLogic –
Generic & Multi data sources TestConnectionsOnCreate SQL – SELECT 1 FROM DUAL

Oracle WebLogic Active


embedded isUsable
GridLink

IBM WebSphere PreTest Connections SQL - SELECT 1 FROM DUAL

Red Hat JBoss check-valid-connection-sql SQL - SELECT COUNT(*) FROM DUAL

TestonBorrow
Apache TomCat SQL - SELECT 1 FROM DUAL
TestonRelease

ODP.NET Unmanaged Connection.status OCI_ATTR_SERVER_STATUS

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 20


TIP: Enable Connection Tests for Applications
Application Condition Connection Test to DB

TestConnectionsOnReserve
eBusiness Suite Connection borrowed from Weblogic
with "BEGIN NULL; END;"
TestConnectionsOnReserve
Connection returned to WebLogic and C++ with isValid
Fusion Applications
pools and checked OCIPing
OCI_ATTR_SERVER_STATUS

Siebel Connection requested OCI_ATTR_SERVER_STATUS

Peoplesoft Connection requested OCIPing

Custom pool with Meta data table


Customer OCI_ATTR_SERVER_STATUS
Checks status every 60s

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 21


Draining by the Oracle Database and Drivers
End of Request

Plan B continued
Application or
mid-tier ends Request boundaries sent by all
a request
Oracle 12c Pools & JDK9
18c Database or the 18c Drivers
close connection after end request
Database & Driver New work continues on another
marks sessions to drain connection
Stop or relocate services/PDBs
Tip: enable in
DBMS_APP_CONT
view in DBA_CONNECTION_TESTS
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 22
DBA Operations Simplified

•Group operations pdb,


instance, node, or database

•New service attributes

drain_timeout (seconds)

stopoption (immediate,
transactional)

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 23


DBA commands: Group commands
Relocate all services by database/node/pdb
srvctl relocate service -database -instance -drain_timeout.. –stopoption
[immediate|transactional]
srvctl relocate service -node . -drain_timeout.. –stopoption
srvctl stop service –pdb . -drain_timeout.. –stopoption

Start/Stop everything at a node


srvctl stop service -node <node_name> -drain_timeout.. –stopoption
srvctl stop database -node <node_name> -drain_timeout.. –stopoption

Data Guard Switchover


Switchover to <db_resource_name> [wait [xx]];

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 24


Drain… Connect… Failover
(2) Connect
to Node 2
Node 1
(4) Replay in-flight
requests
(1) Drain
Node 1
TPM

Node 2
(3) Terminate in-flight
requests

Time
Drain Timeout

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 25


My application has not drained on time
TIP: Use Transparent Application Continuity 18c

Hides errors, timeouts, and


Database maintenance
Request

No application knowledge or
changes to use
Rebuilds session state & in-flight
transactions

Errors/Timeouts hidden
Adapts as applications change:
protected for the future
TAC 12c use Application Continuity
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 26
TIP: Some Applications have Built-in Read-Only Failover
Example Applications Set Once and Done

Siebel

PeopleSoft Recommended TNS +


STOPOPTION = Transactional
JD Edwards
Drain_Timeout = n
Informatica FAN is autoconfigured
for unplanned
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 27
If you have TAF SELECT now
TIP: TAF SELECT with Transaction Guard – 12.2

With stopoption transactional:


Database
Request • COMMIT_OUTCOME=true for
Transaction Guard
• FAILOVER_RESTORE = LEVEL1
TAF + TG restore
a new session + For applications that do
common states
not change session
Disconnects state after connect

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 28


Align Application Timeouts
TIP: Nothing to do when TIP: Application Timeout
Active/Active RAC  Drain + Switch to DG

DG Broker

Switchover
Stop or Relocate Drain Work
Service
to Switchover
Drain Work
RAC Primary RAC Standby

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 29


Automating Maintenance

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 30


Improving the Maintenance Timeline

Preparation Maintenance Window

Eliminate Steps No Risk Performance Risk


Eliminating unnecessary Doing things outside the Doing things during this
steps is the best solution maintenance window is the phase is acceptable for any
whenever possible such as preferred solution for necessary steps that cannot
backup existing home necessary steps such as be done ahead of time e.g.
• real-time checks • final pre-check
• create gold image • restart on new home
• out of place distribution
• switch service
• drain service

Shrin as much as possible


k

Move tasks to the left

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |


Software Maintenance in the Time of Cloud
Datacenter Datacenter

Datacenter

Proprietary
• Challenges
OpenStack
Cloud • Complex, Error-prone Process Cloud
• Maintaining standardization
• Minimizing Downtime
• Takes too much time and effort

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. |


Rapid Home Provisioning
RHP Server

11.2.0.4
12.1.0.2
12.2.0.1
18c
Gold Image
Repository
Local target Connected Targets
11.2, 12.1, 12.2,
18c

10/29/2018 Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 33
RHP Support
VM VM VM VM

Database & Grid Infrastructure


BM VM VM VM VM VM
• Single Instance
11.2.0.3. 12.2.0.1 • Oracle Restart
11.2.0.4. 18c • Oracle RAC One
12.1.0.2 • Oracle RAC Non-CDB CDB/PDB

• Generic Software
Multi-OS
• Data Guard Aware

• Customizable
Rapid Home Provisioning

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 34


OJVM Patching

Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 35


OJVM PSU – Flow chart

RAC Rolling Install Process for the "Oracle JavaVM Component Database PSU" (OJVM PSU) Patches (Doc ID 2217053.1)

Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 36


TIP: OJVM Rolling
An Automated Approach to patching the OJVM

• RAC will coordinate the apply of the OJVM patch across all nodes in a
rolling fashion OJVM
Non-
• Non-OJVM work not affected OJVM
Blocked
• RAC tracks sessions that use OJVM in the Database
• Phased approach: prepare, execute update, post-update Exec Update

• During the execute update phase OJVM work is blocked Prepared Prepared

– < 10 seconds brownout

RAC Rolling
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 37
©2014 Epsilon. Private & Confidential

Hiding Scheduled Maintenance at Epsilon

38
Epsilon at a Glance

• Epsilon is comprised of three divisions that deliver a


global platform for customer experience marketing
• More than 3,500 associates and 37 offices
worldwide
• Largest permission-based e-mailer in the world,
delivering over 40 billion emails annually
• World’s leading source of data with information
covering over 250 million consumers and 22 million
businesses
• More than 2,000 global clients, including 26 of the
Fortune 100

©2014 Epsilon Data Management, LLC. Private & Confidential


 9 out of 10 Top Banks
 8 out of 10 Top Retailers
 The Top 10 Pharmaceutical Companies
 The Top 10 Automotive Companies
39
High Level Business Requirements

• New client opportunity with extreme performance and availability requirements


• Real time POS integration with over 10000 sites
• Decrease time to market
• Real-time monitoring and reporting of system performance and health
• Need to run OLTP, batch and reporting workload concurrently against real time
data without impacting user experience
• Support over 2000 real-time transactions per second with SLA of less than 100 ms
and 99.95% availability
• Less than 8 hours RPO and RTO

©2014 Epsilon Data Management, LLC. Private & Confidential


40
Previous State and Challenges
• No draining was available using
dedicated connection model
• Dedicated connection model from • Every planned maintenance/unplanned
application server to database outage needs application layer restart –
a major pain point
• Application was using ODAC
11gR5 • Every application executable needs FAN
notification port in odp.net
• Database hardware platform using
HP DL980 server with PCIe Flash • Due to large number of dedicated
storage and Hitachi storage array connection , application server CPU
utilization was high
• Database version 11gR2
(11.2.0.3) • No support for commit outcome in case
of failure in 11g
• Running mixed workload was not very
user friendly on non-engineered system

©2014 Epsilon Data Management, LLC. Private & Confidential


as there is no IO prioritization in tradition
storage array
• Premier support of 11.2.0.3 was coming
to an end ( Aug 2015 )

41
Current State and Resolutions
• Connection pool drains quickly after
receiving FAN events
• No longer application server restart required
• Used connection pool instead of for planned maintenance or unplanned
dedicated connection outage of oracle stack – a big relief
• Application is now using ODAC • Only one port required for ONS remote
12cR3 communication ( 6200 ) in 12c instead of
many ports in 11g
• Database hardware platform using
Exadata X4-2 half rack • CPU utilization reduced by 40% in
application servers after connection pool
• Database version 12cR1 ( 12.1.0.2 implementation
BP10 )
• Fast Application Notification and Application
Continuity provides a robust error handling
mechanism

©2014 Epsilon Data Management, LLC. Private & Confidential


Exadata is ideal platform for running mixed
workload with both CPU and IO
prioritization handling
• Moved to a supported release good for next
3 years

42
Technology Stack

• Exadata X4-2 half rack for both Primary and DR site

• Web servers
• Application server uses :
• ODP.NET ( ODAC 12cR3 )
• WebLogic server using Active Gridlink

• Oracle Database 12c ( 12.1.0.2 )


Real Application Cluster (RAC )
Data Guard
Fast Application Notification ( FAN )
Application Continuity ( AC )
Transaction Guard ( TG )

©2014 Epsilon Data Management, LLC. Private & Confidential


• Database backup uses ZFS Backup Appliance (ZS3-BA )

43
Normal Operation : Application service placement

Primary Site Standby Site

Web Server cluster Web Server cluster

Web Service Call

Application Server Cluster Application Server Cluster


ODAC 12cR3 ODAC 12cR3
Read Read
OLTP Batch OLTP Batch
only only

OLTP Batch Report


Service Service Service

©2014 Epsilon Data Management, LLC. Private & Confidential


RRAC Node 1 RRAC Node 2 RRAC Node 1 RRAC Node 2
12.1.0.2 12.1.0.2 12.1.0.2 12.1.0.2

Primary DB Standby DB

44
Scheduled maintenance: Application service placement

Primary Site Standby Site

Web Server cluster Web Server cluster

Web Service Call

Application Server Cluster Application Server Cluster


ODAC 12cR3 ODAC 12cR3
Read Read
OLTP Batch OLTP Batch
only only

Fast
Application
Notification

OLTP Batch Report


Service Service Service

©2014 Epsilon Data Management, LLC. Private & Confidential


RAC Node 1 RAC Node 2 RRAC Node 1 RRAC Node 2
12.1.0.2 12.1.0.2 12.1.0.2 12.1.0.2

Primary DB Standby DB

45
Scheduled maintenance: Application service placement

Primary Site Standby Site

Web Server cluster Web Server cluster

Web Service Call

Application Server Cluster Application Server Cluster


ODAC 12cR3 ODAC 12cR3
Read Read
OLTP Batch OLTP Batch
only only

Fast
Application
Notification

OLTP Batch Report


Service Service Service

©2014 Epsilon Data Management, LLC. Private & Confidential


RAC Node 1 RAC Node 2 RRAC Node 1 RRAC Node 2
12.1.0.2 12.1.0.2 12.1.0.2 12.1.0.2

Primary DB Standby DB

46
©2014 Epsilon Data Management, LLC. Private & Confidential
Case study – Transaction response time

47
©2014 Epsilon Data Management, LLC. Private & Confidential
Case study – Overall transaction response time

48
Business Benefits

• Scheduled maintenance of Oracle technology stack can be done


without disrupting business user experience.

• Application restart is no longer required for scheduled maintenance


which is a major relief.

• Usage of connection pool reduces CPU utilization of middle tier servers


by 40%

• Database and operating system can be patched periodically


without taking any system downtime ( meet security compliance as
well as uptime SLA )

©2014 Epsilon Data Management, LLC. Private & Confidential


We have similar success for Java based applications using WebLogic
Server Active Gridlink to hide scheduled maintenance operation without
any application code change

49
More Successes

• Application Continuity for ODP.NET to hide unplanned outages and


scheduled maintenance without any application code change

• Application Continuity for Java with WebLogic Server Active Gridlink to


hide unplanned outages and scheduled maintenance without any
application code change

©2014 Epsilon Data Management, LLC. Private & Confidential


50
Copyright © 2018, Oracle and/or its affiliates. All rights reserved. | 51

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy