Wp-Site Recovery With Nakivo Backup Replication
Wp-Site Recovery With Nakivo Backup Replication
Site Recovery:
DR Automation
and Orchestration
White Paper
6%
Successfully Recover
51%
100%
Never Reopen
The larger the company, the costlier the downtime and, thus, the greater the importance
of preparedness for unforeseen circumstances ranging from natural disasters to simple
hardware failures. Knowing how to minimize (or even prevent) the negative impacts of such
disaster scenarios lays the foundation for a resilient and successful company.
This White Paper explores disaster recovery planning as a means of mitigating the negative
consequences of disasters, with a special focus on how NAKIVO’s advanced Site Recovery
functionality can assist in this area.
1
“Management information systems for the information age” Haag, Cummings & McCubbrey (2005)
2
White Paper
Ensuring Availability
IT technologies have been instrumental in shaping the modern business landscape. Every
part of a modern company’s infrastructure is connected with IT in one way or another, and
constant access to the world-wide web is essential. Email, VoIP, CRM, and instant messaging
must be online at all times just for the business to survive, let alone prosper. That is why
traditional high availability (i.e., 90% availability) is no longer enough today. A 90% availability
rate might seem high, but it would mean that your customers cannot use your services for
36.5 days per year.
At the same time, many companies operate across several locations, with data centers spread
all over the world, which makes ensuring the availability of the entire infrastructure extremely
difficult and costly. Simply ensuring the protection of data, such as files or directories, is not
enough. Complex applications responsible for business-critical processes are now running on
virtual machines (VMs). If those applications are disrupted for any reason and you are unable
to restore them, then your business may be at risk.
As such, ensuring high availability for the always-on business is a considerable challenge from
the IT perspective. However, even though constant availability might be difficult to achieve in
itself, there is still another thing to consider – business continuity.
3
White Paper
Revenue Loss
Heavy dependence on IT infrastructure and IT technologies certainly has the potential to
give companies the competitive advantage, but this reliance can also destroy the business
outright. Consider the following statistics, drawn from Aberdeen Group2:
In terms of revenue loss, for small businesses, an hour of downtime costs around $8,580,
all things considered.
For medium-sized companies, the losses are greater; they can amount to $215,637 per
hour.
Large enterprises can experience revenue loss that reaches $686,250 per single hour of
downtime.
900 000
800 000
$686 250
700 000
600 000
500 000
400 000
300 000 $215 637
200 000
100 000
$8 580
Small Medium-Sized Larger
Companies Companies Enterprises
The length of an average outage is around 18.5 hours. Using the estimates above, this would
mean around $158,000 to $3,389,000 in lost revenue for SMBs and more than $12 million for
enterprises.
2
“Building a Fast Lane to Better Data Center Performance” Aberdeen Group (2016)
4
White Paper
Productivity Loss
In addition to the catastrophic losses in terms of revenue, certain disasters can also impact
productivity. If your entire business workflow has stopped as a result of a disaster, then your
employees cannot do their jobs, which might mean that critical business operations cannot
be performed. In the US only, productivity drops caused by failures translate into over $250
billion in losses annually3.
Loss of Customers
Customers expect your company to deliver the necessary goods and services at any time
and under any circumstances. High levels of competition on the market have driven down
the prices on services, improved their quality, and, most importantly, vastly increased the
expectations that a customer has. When a company is facing a disaster, the customer won’t
wait for such company to recover; they’ll simply move on and choose one of the competitors.
The situation gets even worse in cases where the customers are losing their own money
because of the downtime. Companies that plan to stay in the game abide by the age-old
insight: “Retaining customers may be costly, but re-acquiring them can be tremendously more
expensive.”
Customers choose to use a company if they can reliably get the necessary services or goods
provided to them whenever and wherever they need. Things such as service outages can be
either prevented or swiftly recovered from. Even hurricanes and volcanos can be worked
around if the company sets its mind to providing truly continuous delivery of services.
Therefore, these circumstances should never be an excuse for not providing the services the
customer is paying for.
This is where companies face an important challenge – choosing the appropriate disaster
recovery (DR) solution.
3
“Health and productivity among U.S. workers” Commonwealth Fund (2005)
5
White Paper
Relying on DRaaS
This option places the protection of your IT infrastructure in the hands of a company that
specializes in DR. One of the main benefits of this type of DR is that you avoid needing to
create and maintain your own DR environment. This option costs much less than organizing
DR by yourself. Furthermore, your backups can be stored in the cloud, which increases
the probability of successfully restoring the necessary data in case of disaster.
Two of the biggest concerns associated with DRaaS are the security and privacy of the
company’s critical data when it is stored on the cloud servers of the third-party service
provider. You may never know if unauthorized personnel can gain access to your financial
records, important documents, etc.
To avoid these risks, companies often choose to manage their own DR. This involves having
a DR location (preferably geographically distant from the production site) and purchasing
a reliable DR software solution.
In-house DR Planning
Before choosing your DR solution, you must ensure that your company has taken all the
necessary steps to allow for the seamless DR process. These may include identifying key
business services potentially affected by disasters, performing a risk assessment and impact
analysis, determining RTOs and RPOs, designating a DR site for network/data failover, and
performing regular testing. Performing these steps takes a considerable amount of time and
resources. However, it may benefit the whole process in the long run and help you calculate
the total costs of the DR later on.
Understanding DR Costs
If you choose to handle disaster recovery by yourself, you will incur corresponding costs. As
mentioned, you can choose to put your DR operations in the hands of a DRaaS provider. If you
understand and choose to accept the risks mentioned above (or if you find a DRaaS provider
you can trust wholeheartedly), then the cost of DR can be made significantly lower.
On the other hand, if you choose the more reliable and personal approach of managing DR by
yourself, the following should be accounted for:
6
White Paper
Among the many approaches, there is one strategy particularly well suited to decreasing
the costs of your DR plan and increasing its effectiveness: virtualization.
Advantages of Virtualization
The virtualization practice has garnered much attention in recent years due to its efficiency as
well as its cost-effectiveness. One of the key benefits of virtualization is that you can significantly
reduce the amount of additional hardware needed by making efficient use of your existing
hardware. Indeed, by relying on virtualization one can reduce both server energy consumption
and floor space requirements by over 80%4. Additionally, backing up and restoring virtualized data
is much easier, since the server files are encapsulated in a single image file. This is especially useful
for DR, as your entire virtualized environment can be quickly restored at an off-site location.
Choosing to proceed with virtualization can make the DR process significantly easier.
However, you must still choose an appropriate DR solution that not only fits your budget, but
is versatile and reliable enough to work for any type of disaster.
Furthermore, the DR software chosen must be flexible enough to orchestrate a disaster recovery
plan of any complexity at any time. The solution should also allow you to constantly keep track of,
update, and test the DR plan whenever you like without disrupting the production servers.
NAKIVO Backup & Replication can accommodate the DR needs of any business with its
advanced Site Recovery functionality.
4
“Gartner Outlines Seven Practical Ways to Save Costs in the Data Center” Pettey and Meulen (2009)
7
White Paper
8
White Paper
You could create a multi-layered Site Recovery job for complex situations when you have
to deal with the consequences of a major disaster. This job could, for example, start or
stop specific jobs (e.g. replication jobs), run specific scripts when you need to fine-tune the
process, attach specific repositories for archival purposes, and even launch a different Site
Recovery job if necessary.
There are several things you may want to consider when it comes to Site Recovery
orchestration. Planning should be the first step as it may have a direct influence on the
complexity of your recovery workflow and testing procedure.
A comprehensive emergency Site Recovery job should include Automated Failover – a step
that transfers the workloads in your production environment to your VM replicas at the DR
location. However, in order for this to work, those replicas must be created beforehand.
Only after you have replicated the VMs that are going to be involved in the DR process, can
you move on to creating a Site Recovery workflow.
9
White Paper
The replication job wizard can guide you through the job creation process, asking you to
specify the necessary details along the way. While creating the replication job(s), please
consider the following:
The container and the datastore for the replicas. These are crucial as these replicas
are going to run the workloads after Failover and should therefore be at the location
separate from the DR site.
The Network Mapping and Re-IP rules. Source and target network parameters often differ.
Network Mapping can ensure that VMs are connected to the right network upon failover.
The Re-IP feature can automatically assign new IPs to replicas at the DR location, following a
simple set of rules that you input. You can also create a virtual isolated network for testing the
Site Recovery job. Note that Network Mapping and Re-IP can also be configured as part of Site
Recovery job creation if it has the Failover action. In case of any conflicts, the rules for the Site
Recovery job overrule the individual rules for a replication job.
10
White Paper
For example, you can create a small Site Recovery job using two actions: Check condition
and Send email. By scheduling this job to run every 5 minutes, you can tell the solution to
periodically scan your virtual environment to check if your VMs are reachable and send you
an e-mail in case there are any problems so that you can launch your primary Site Recovery
job with Failover.
Also, by including the specific actions into your Site Recovery jobs, you can account for
situations that other DR solutions would not. To illustrate, you can include Stop VMs into
11
White Paper
the job that is going to stop all of the unimportant VMs at the DR location, thus freeing up
valuable CPU and RAM resources for the upcoming Failover.
To help connect all these actions together, Site Recovery functionality uses the Action Options.
These options are present when configuring every single step of your Site Recovery job. They
allow you to decide how NAKIVO Backup & Replication should act in different scenarios:
Run this action in: Here, you can determine if the solution should run this action in
production mode only, in testing mode only, or in both modes. This allows you to fine-
tune your jobs for specific purposes or make them more general.
Waiting behavior: You can decide whether NAKIVO Backup & Replication should wait
for step to complete before proceeding or move on to the next step immediately after
initiating the action.
Error handling: This is where you determine how the solution should handle any error
that can arise during the set action. You can have the product either stop and fail the job
if there are any issues or proceed to the next step despite them.
12
White Paper
Keep in mind that if your Site Recovery job includes Failover or Failback actions, you may
want to configure Network Mapping and Re-IP options to allow for seamless transition of
workflows to VM replicas at the DR site.
At the end of the process you can configure test scheduling to make the whole testing procedure
fully automated or allow it to run only on demand. Additionally, in the Options section you can
set the RTO goal for the Site Recovery job, which will be useful for testing later on.
13
White Paper
As mentioned earlier, each separate action can be configured to run in production mode, in
testing mode, or in both. If you have left the default options for the actions that you created,
all actions will run in both modes, so your entire Site Recovery job can be run in testing or in
production mode.
You can initiate a test run by selecting Test site recovery job in the small menu that pops up
after the Site Recovery job is prompted to run. Additionally, you can reselect RTO once again
for the test run if needed.
NAKIVO Backup & Replication is going to send you a comprehensive report on the test run of
the Site Recovery job if you have enabled this option in the job settings earlier. Analyzing the
report is critical to see if all actions in the job were completed properly, how long they took,
and if you were able to meet your RTO goals with this test run. In case there are problems,
you can update the job at any time to accommodate your needs.
14
White Paper
Also, note that if any Failover took place during the test, the corresponding Failback will
automatically be carried out after testing is complete; all workloads return to their original
location. Testing mode was designed to be non-disruptive and shall not affect the production
IT infrastructure if it is run in the isolated network.
15
White Paper
Emergency Failover. This option sees the solution transfer your workloads to the replicas
immediately. You should select this option when you need to perform the Site Recovery
job urgently because of an unexpected disaster, and have no time to spare for a final data
synch.
The main difference in running a Site Recovery job with the Failover action, in production
mode as opposed to testing, is that there is going to be no automated Failback after the job
has finished running. To perform the Failback later on you need to create a different Site
Recovery job for this purpose.
Performing Failback
Failback is the process of moving the workloads back from the replicas to their source
VMs. This action can be included to run in a Site Recovery job and can be used whenever
necessary.
When you configure the Failback action, you can choose the location of the Failback.
By default, this can be the original source location – e.g., your main office or data center.
However, should the production site be still unavailable (e.g., as a result of a fire burning down
the whole office), you can transfer the workloads to a new long-term location instead.
Before you proceed with Failback, however, you may want to consider performing reverse
replication. This basically means creating a replication job and selecting the VM replicas at
the DR location as your source VMs. Such procedure may be crucial for synching the data
between VMs before the Failback.
Conclusion
In today’s fast-paced business world, successful companies operate on the “always-on” basis.
Ensuring availability as well as business continuity is crucial for any company that wants to
retain their customers and avoid losing revenue. Considering that disaster, whether natural
disaster or hardware failure, can occur at any point in time, virtualizing your infrastructure
and creating a strong disaster recovery plan are integral for the survival of your business.
NAKIVO Backup & Replication v8.0 introduces the advanced Site Recovery functionality
that redefines traditional DR with its versatility and reliability. Site Recovery is extremely
scalable and flexible, letting you have multiple Site Recovery jobs active to accommodate
every conceivable scenario. You can perform non-disruptive testing and carry out planned
migration, as well as building a Site Recovery job that you activate for fast recovery after a
disaster. NAKIVO Backup & Replication with Site Recovery functionality can be your personal
solution for achieving the “always-on” status your business needs to stay competitive.
16
White Paper
About NAKIVO
The winner of a “Best of VMworld 2018” and the Gold Award for Data Protection, NAKIVO is a US
corporation dedicated to developing the ultimate VM backup and site recovery solution. With 20
consecutive quarters of double-digit growth, 5-star online community reviews, 97.3% customer
satisfaction with support, and more than 10,000 deployments worldwide, NAKIVO delivers an
unprecedented level of protection for VMware, Hyper-V, and Amazon EC2 environments.
As a unique feature, NAKIVO Backup & Replication runs natively on leading storage systems
including QNAP, Synology, ASUSTOR, Western Digital, and NETGEAR to deliver up to 2X
performance advantage. The product also offers support for high-end deduplication appliances
including Dell/EMC Data Domain and NEC HYDRAstor. Being one of the fastest-growing data
protection software vendors in the industry, NAKIVO provides a data protection solution for
major companies such as Coca-Cola, Honda, and China Airlines, as well as works with over
3,000 channel partners in 137 countries worldwide. Learn more at www.nakivo.com
© 2018 NAKIVO, Inc. All rights reserved. All trademarks are the property of their respective owners.