BP-2005_Data_Protection
BP-2005_Data_Protection
Disaster Recovery
Nutanix Best Practices
Copyright
Copyright 2019 Nutanix, Inc.
Nutanix, Inc.
1740 Technology Drive, Suite 150
San Jose, CA 95110
All rights reserved. This product is protected by U.S. and international copyright and intellectual
property laws.
Nutanix is a trademark of Nutanix, Inc. in the United States and/or other jurisdictions. All other
marks and names mentioned herein may be trademarks of their respective companies.
Copyright | 2
Data Protection and Disaster Recovery
Contents
1. Executive Summary.................................................................................6
2. Introduction.............................................................................................. 7
2.1. Audience.........................................................................................................................7
2.2. Purpose.......................................................................................................................... 7
5. Deployment Overview........................................................................... 13
5.1. Native Nutanix Snapshots............................................................................................13
5.2. Two-Way Mirroring....................................................................................................... 13
5.3. Many-to-One.................................................................................................................13
5.4. To the Cloud.................................................................................................................14
5.5. Single-Node Backup.................................................................................................... 14
5.6. Leap: DR Orchestration............................................................................................... 15
7. Protection Domains...............................................................................19
7.1. Consistency Groups ................................................................................................... 19
3
Data Protection and Disaster Recovery
11. Bandwidth............................................................................................. 40
11.1. Seeding.......................................................................................................................40
15. Conclusion............................................................................................46
Appendix..........................................................................................................................47
Best Practices Checklist......................................................................................................47
PowerShell Scripts.............................................................................................................. 52
About the Author................................................................................................................. 57
About Nutanix...................................................................................................................... 57
List of Figures................................................................................................................ 58
4
Data Protection and Disaster Recovery
List of Tables.................................................................................................................. 59
5
Data Protection and Disaster Recovery
1. Executive Summary
The Nutanix Enterprise Cloud is a hyperconverged infrastructure system delivering storage,
compute, and virtualization services for any application. Designed for supporting multiple
virtualized environments, including Nutanix AHV, VMware ESXi, Microsoft Hyper-V, and Citrix
Hypervisor, Nutanix invisible infrastructure is exceptionally robust and provides many ways to
achieve your required recovery point objectives (RPOs).
Enterprises are increasingly vulnerable to data loss and downtime during disasters as they come
to rely on virtualized applications and infrastructure that their legacy data protection and disaster
recovery (DR) solutions can no longer adequately support. This best practices guide discusses
the optimal configuration for achieving data protection using the DR capabilities integrated
into Acropolis and the Leap DR orchestration features available both on-premises and in Xi.
Whatever your use case, you can protect your applications with drag-and-drop functionality.
The Nutanix Prism UI facilitates seamless management to configure the shortest recovery time
objectives (RTOs) possible, so customers can build out complex DR workflows at a moment’s
notice. With Leap built in, Prism Central allows you to apply protection policies across all of
your managed clusters. Once the business has decided on the required RPO, you can activate
recovery plans to validate, test, migrate, and fail over in a seamless fashion. Recovery plans can
protect availability zones both on-premises and hosted in Xi.
As application requirements change and grow, Nutanix can easily adapt to business needs.
Nutanix is uniquely positioned to protect and operate in environments with minimal administrative
effort because of its web-scale architecture and commitment to enterprise cloud operations.
1. Executive Summary | 6
Data Protection and Disaster Recovery
2. Introduction
2.1. Audience
We intend this guide for IT administrators and architects who want more information about the
data protection and disaster recovery features built into the Nutanix Enterprise Cloud. Consumers
of this document should have basic familiarity with Acropolis.
2.2. Purpose
This document provides best practice guidance for data protection solutions implementation on
Nutanix servers running Acropolis 5.10. We present the following concepts:
• Scalable metadata.
• Backup.
• Crash-consistent versus application-consistent snapshots.
• Protection domains.
• Protection policies.
• Recovery plans.
• Scheduling snapshots and asynchronous replication.
• Sizing disk space for local snapshots and replication.
• Scheduling lightweight snapshots (LWS) and near-sync replication.
• Sizing disk space for LWS and near-sync replication.
• Determining bandwidth requirements.
• File-level restore.
Version
Published Notes
Number
1.0 December 2014 Original publication.
2. Introduction | 7
Data Protection and Disaster Recovery
Version
Published Notes
Number
Updated recommendations for current best practices
2.0 March 2016
throughout.
Updated Backup and Disaster Recovery on Remote
2.1 June 2016
Sites section.
2.2 July 2016 Updated bandwidth sizing information.
2.3 December 2016 Updated for AOS 5.0.
Updated information on sizing SSD space on a
2.4 May 2017
remote cluster.
3.0 December 2017 Updated for AOS 5.5.
3.1 September 2018 Updated overview and Remote Site Setup section.
4.0 December 2018 Updated for AOS 5.10 and Xi Leap.
Updated Sizing Space section and Leap product
4.1 February 2019
details.
2. Introduction | 8
Data Protection and Disaster Recovery
In asynchronous replication, every node can replicate four files, up to an aggregate of 100 MB/
s at one time. Thus, in a four-node configuration, the cluster can replicate 400 MB/s or 3.2 Gb/s.
As you grow the cluster, the virtual storage controllers keep replication traffic distributed. In many-
to-one deployments, as when remote branch offices communicate with a main datacenter, the
main datacenter can use all its available resources to handle increased replication load from the
branch offices. When the main site is scalable and reliable, administrators don't have multiple
replication targets to maintain, monitor, and manage. You can protect both VMs and volume
groups with asynchronous replication.
Near-sync replication offers unbound throughput. Because all writes go to SSD, we want to make
sure that the performance tier does not fill up. Near-sync replication, which covers both VMs and
volume groups, is supported for bidirectional replication between two clusters.
Nutanix also provides cross-hypervisor disaster recovery natively via asynchronous replication.
Existing vSphere clusters can target AHV-based clusters as their DR and backup targets. Thanks
to true VM mobility, Nutanix customers can place their workloads on the platform that best meets
their needs.
5. Deployment Overview
Nutanix meets real-world requirements with native backup and replication infrastructure and
management features that support a wide variety of enterprise topologies.
5.3. Many-to-One
In a many-to-one or hub-and-spoke architecture, you can replicate workloads running on sites
A and B, for example, to a central site C. Centralizing replication to a single site may improve
operational efficiency for geographically dispersed environments. Remote and branch offices
(ROBO) are a classic many-to-one topology use case.
5. Deployment Overview | 13
Data Protection and Disaster Recovery
5. Deployment Overview | 14
Data Protection and Disaster Recovery
5. Deployment Overview | 15
Data Protection and Disaster Recovery
and flush the caches so everything is at the same point. VSS freezes write I/O while the native
Nutanix snapshot takes place, so all data and metadata is written in a consistent manner. Once
the Nutanix snapshot takes place, VSS thaws the system and allows queued writes to occur.
Application-consistent snapshots do not snapshot the OS memory during this process.
Requirements for Nutanix VSS snapshots:
• Configure an external cluster IP address.
• Guest VMs should be able to reach the external cluster IP on port 2074.
• Guest VMs should have an empty IDE CD-ROM for attaching NGT.
• Only available for ESXi and AHV.
• Virtual disks must use the SCSI bus type.
• For near-sync support, NGT version 1.3 must be installed.
• VSS is not supported with near-sync, or if the VM has any delta disks (hypervisor snapshots).
• Only available for these supported versions:
⁃ Windows 7
⁃ Windows Server 2008 R2 and later
⁃ CentOS 6.5 and 7.0
⁃ Red Had Enterprise Linux (RHEL) 6.5 and 7.0
⁃ Oracle Linux 6.5 and 7.0
⁃ SUSE Linux Enterprise Server (SLES) 11 SP4 and 12
• VSS must be running on the guest VM. Check the PowerShell Scripts section of the Appendix
for a script that verifies whether the service is running.
• The guest VM must support the use of VSS writers. Check the PowerShell Scripts section of
the Appendix for a script that ensures VSS writer stability (Windows only).
• VSS is not supported for volume groups.
• You cannot include volume groups in a protection domain that is configured for Metro
Availability.
• You cannot include volume groups in a protected VStore.
• You cannot use Nutanix native snapshots to protect VMs on which VMware fault tolerance is
enabled.
For ESXi, if you haven’t installed NGT, the process fails back using VMware Tools. Because the
VMware Tools method creates and deletes an ESXi-based snapshot whenever it creates a native
Nutanix snapshot, it generates more I/O stress. To eliminate this stress, we strongly recommend
installing NGT.
Best practices:
• Schedule application-consistent snapshots during off-peak hours. NGT takes less time to
quiesce applications than VMware Tools, but application-consistent snapshots still take longer
than crash-consistent snapshots.
• Increase cluster heartbeat settings when using Windows Server Failover Cluster.
• To avoid accidental cluster failover when performing a vMotion, follow VMware best practices
to increase heartbeat probes:
⁃ Change the tolerance of missed heartbeats from the default of 5 to 10.
⁃ Increase the number to 20 if your servers are on different subnets.
⁃ If you’re running Windows Server 2012, adjust the RouteHistoryLength to double the
CrossSubnetThreshold value.
(get-cluster).SameSubnetThreshold = 10
(get-cluster).CrossSubnetThreshold = 20
(get-cluster).RouteHistoryLength = 40
VSS on Hyper-V
Hyper-V on Nutanix supports VSS only through third-party backup applications, not snapshots.
Because the Microsoft VSS framework requires a full share backup for every virtual disk
contained in the share, Nutanix recommends limiting the number of VMs on any container
utilizing VSS backup.
Best practices:
• Create different containers for VMs needing VSS backup support. Do not exceed 50 VMs on
each container.
• Create a separate large container for crash-consistent VMs.
7. Protection Domains
A protection domain is a group of VMs or volume groups that can be either snapshotted locally or
replicated to one or more clusters, when you have configured a remote site. Prism Element uses
protection domains when replicating between remote sites.
Best practices:
• Protection domain names must be unique across sites.
• No more than 200 VMs per protection domain.
⁃ VMware Site Recovery Manager and Metro Availability protection domains are limited to
3,200 files.
⁃ Near-sync is not currently supported with VMware Site Recovery Manager and Metro
Availability.
⁃ No more than 10 VMs per protection domain with LWS.
• Group VMs with similar RPO requirements.
⁃ Near-sync can only have one schedule, so be sure to place near-sync VMs in their own
protection domain.
7. Protection Domains | 19
Data Protection and Disaster Recovery
Address
Use the external cluster IP as the address for the remote site. The external cluster IP is highly
available, as it creates a virtual IP address for all of the virtual storage controllers. You can
configure the external cluster IP in the Prism UI under cluster details.
Other recommendations include:
• Try to keep both sites at the same Acropolis version. If both sites require compression, both
must have the compression feature licensed and enabled.
• Open the following ports between both sides: 2009 TCP, 2020 TCP, 9440 TCP, and 53 UDP. If
using the SSH tunnel described below, also open 22. Use the external cluster IP address for
source and destination. Cloud Connect uses a port between 3000–3099, but that setup occurs
automatically. All CVM IPs must be allowed to pass replication traffic between sites with the
ports detailed above. To simplify firewall rules, you can use the proxy described below.
Enable Proxy
The enable proxy option redirects all egress remote replication traffic through one node. It’s
important to note that this remote site proxy is different from the Prism proxy. With “enable proxy”
selected, replication traffic goes to the remote site proxy, which then forwards it to other nodes in
the cluster. This arrangement significantly reduces the number of firewall rules you need to set up
and maintain.
Best practice:
• Use proxy in conjunction with the external address.
SSH Tunnel
An SSH tunnel is a point-to-point connection—one node in the primary cluster connects to a node
in the remote cluster. By enabling proxy, we force replication traffic to go over this node pair. You
can use the SSH tunnel between Cloud Connect and physical Nutanix clusters when you can’t
set up a virtual private network (VPN) between the two clusters. We recommend using an SSH
tunnel as a fail-back option in lieu of a VPN.
Best practices:
• To use SSH tunnel, select enable proxy.
• Open port 22 between external cluster IPs.
• Only use SSH tunnel for testing—not production. Use a VPN between remote sites or a Virtual
Private Cloud (VPC) with Amazon Web Services.
Capabilities
The disaster recovery option requires that both sites either support cross-hypervisor disaster
recovery or have the same hypervisor. Today, Nutanix supports only ESXi and AHV for cross-
hypervisor disaster recovery with full snapshots. When using the backup option, the sites can use
different hypervisors, but you can’t restore VMs on the remote side. The backup option is also
used with backing up to AWS and Azure.
Bandwidth Throttling
Max bandwidth is set to throttle traffic between sites when no network device can limit replication
traffic. The max bandwidth option allows for different settings throughout the day, so you can
assign a max bandwidth policy when your sites are busy with production data and disable the
policy when they’re not as busy. Max bandwidth does not imply a maximum observed throughput.
When talking with your networking teams, it’s important to note that this setting is in MB/s not Mb/
s. Near-sync does not currently honor maximum bandwidth thresholds.
Remote Container
VStore name mapping identifies the container on the remote cluster used as the replication
target. When establishing the VStore mapping, we recommend creating a new, separate remote
container with no VMs running on it on the remote side. This configuration allows the hypervisor
administrator to distinguish failed-over VMs quickly and to apply polices on the remote side easily
in case of a failover.
Best practices:
• Create a new remote container as the target for the VStore mapping.
• If many clusters are backing up to one destination cluster, use only one destination container if
the source containers have similar advanced settings.
• Enable MapReduce compression if licensing permits.
• If you are using vCenter Server to manage both the primary and remote sites, do not have
storage containers with the same name on both sites.
If the aggregate incoming bandwidth required to maintain the current change rate is less than
500 Mb/s, we recommend skipping the performance tier. This setting saves your flash for other
workloads while also saving on SSD write endurance.
To skip the performance tier, use the following command from the nCLI:
Network Mapping
Acropolis supports network mapping for disaster recovery migrations moving to and from AHV.
Best practice:
• Whenever you delete or change the network attached to a VM specified in the network map,
modify the network map accordingly.
You can create multiple schedules for a protection domain (PD) using full snapshots, and you can
have multiple protection domains. The figure above shows seven daily snapshots, four weekly
snapshots, and three monthly snapshots to cover a three-month retention policy. This policy is
more efficient in managing metadata on the cluster than using a daily snapshot with a 180-day
retention policy would be.
Best practices:
• Stagger replication schedules across PDs. If you have a PD starting at the top of the hour,
stagger the PDs by half of the most commonly used RPO. The goal is to spread out replication
impact on performance and bandwidth.
• Configure snapshot schedules to retain the lowest number of snapshots while still meeting the
retention policy, as shown in the previous figure.
Remote snapshots implicitly expire based on how many snapshots there are and how frequently
they are taken. For example, if you take daily snapshots and keep a maximum of five, on the
sixth day the first snapshot expires. At that point, you can’t recover from the first snapshot
because the system deletes it automatically.
In case of a prolonged network outage, Nutanix always retains the last snapshot to ensure that
you don’t ever lose all of the snapshots. You can modify the retention schedule from nCLI by
changing the min-snap-retention-count. This value ensures that you retain at least the specified
number of snapshots, even if all of the snapshots have reached the expiry time. This setting
works at the PD level.
Limitations:
• Linked clone VMs (typically nonpersistent View desktops) are not supported.
• Metro and SRM-protected containers are not supported.
• Don’t configure near-sync for Hyper-V.
• Ensure that you have enough bandwidth to support the change rate.
• Deduplication on the source container is not supported.
• Enabling near-sync requires at least three nodes are in a cluster on both primary and remote
sites.
• Do not enable near-sync on a cluster where you have any node with more than 40 TB of
storage (either all SSDs, or a combination of SSDs and HDDs).
• Minimum 1.2 TB SSDs are needed in hybrid systems; 1.9 TB SSDs are preferred.
Best practice:
• Limit the number of VMs to 10 or fewer per protection domain. If it’s possible to do so,
maintaining one VM per protection domain can help you transition back to near-sync if you run
out of LWS reserve storage.
• You can safely create new entities in either or both availability zones as long as you do not
assign the same name to entities in both availability zones. After the connectivity issue is
resolved, force synchronization from the availability zone in which you created entities.
• If one of the availability zones becomes unavailable, or if a service in the paired availability
zone is down, perform a forced sync from the paired availability zone after the issue is
resolved.
Xi Leap limitations:
Xi Leap does not allow you to create a recovery plan in the following scenario:
• The recovery network that you specify in the recovery plan exists with the same name on
multiple clusters.
• Though the networks have the same name, they have different IP address spaces.
• If you add a VM to multiple recovery plans and perform failover simultaneously on those
recovery plans, each recovery plan creates an instance of the VM at the recovery location.
You must manually clean up the additional instances.
Best practices:
• For on-premises availability zones, create a nonroutable network for testing failovers.
• Run the Validate workflow after making changes to recovery plans.
• After running the Test workflow, run the Clean-Up workflow instead of manually deleting VMs.
• A recovery plan should cover a maximum of 200 VMs at any one time.
• Maximum of 20 categories in a recovery plan.
• Maximum of 20 stages in a recovery plan.
• Maximum of 15 categories per stage in a recovery plan.
• Maximum of 5 recovery plans can be executed in parallel.
Network Mapping
You can look at your backups and compare the incremental difference between them to find
the change rate. You could also take a conservative approach and start with a low snapshot
frequency and a short expiry policy, then gauge the size difference between backups before
consuming too much space.
Using the local snapshot reserve formula presented above, assuming for demonstration
purposes that the change rate is 35 GB of data every six hours and that we keep ten snapshots:
snapshot reserve = (frequency of snapshots * change rate per frequency) +
(change rate per frequency * # of snapshots in a full curator scan * 0.1)
= (10 * 35,980 MB) + (35,980 MB * 1 * 0.1)
= 359,800 + (35,980 * 1 * 0.1)
= 359,800 + 3,598
= 363,398 MB
= 363 GB
For the minimum amount of space needed at the remote side, 130 percent of the PD is a good
average to work from.
If the remote target is also running a workload, note that incoming replication uses the
performance tier. If you’re using a hybrid cluster, be sure to size for the additional hot data. You
can also skip the performance tier by creating a separate container for incoming replication and
following the steps provided in the Remote Container section above.
= 8,550 MB
= 8.6 GB
If we were running a 3460 hybrid system with two 1.9 TB SSDs, our LWS reserve would be:
We apply the 7 percent of the SSD space used by the extent store after we have accounted for
the rest of the system. Because our LWS cluster reserve is 945 GB, this system has lots of room
for the workload as well as for additional business-critical applications.
As we discussed in the Scheduling LWS and Near-Sync Replication section above, a telescopic
schedule would have a total of six hourly, seven daily, four weekly, and one monthly snapshots.
You only need to account for additional garbage space for the dailies due to the higher frequency.
Using the change rate above, we can calculate each separate schedule and add them all
together. Because full snapshots occur less often than LWS, we don’t have to be concerned with
overwrites and can use the original 35 GB change rate for every six hours.
For most small and medium businesses, a daily change of 140 GB would be considered high.
This near-sync example highlights the difference between keeping a lot of snapshots around and
keeping only the 10 snapshots in the full snapshot example.
11. Bandwidth
You must have enough available bandwidth to keep up with the replication schedule. If you are
still replicating when the next snapshot is scheduled, the current replication job finishes first.
The newest outstanding snapshot then starts to get the newest data to the remote side first. To
help replication run faster when you have limited bandwidth, you can seed data on a secondary
cluster at the primary site before shipping that cluster to the remote site.
11.1. Seeding
To seed data for a new site:
• Set up a secondary cluster with local IPs at the primary site.
• Enable compression on the remote site within the production domain.
• Set the initial retention time to “3 months.”
• Once setup completes, reconfigure the secondary cluster with remote IPs.
• Shut down the secondary cluster and ship it to the remote site.
• Power on the remote cluster and update the remote site on the primary cluster to the new IP.
If you are not able to seed the protection domain at the local site, you can create the remote
cluster as a normal install and turn on compression over the wire. Manually create a one-time
replication with retention time set to “3 months.” We recommend this retention time setting due to
the extra time it takes to replicate the first data set across the wire.
To figure out the needed throughput, you must know your RPO. If you set the RPO to one hour,
you must be able to replicate the changes within that time.
Assuming you know your change rate based on incremental backups or local snapshots, you can
calculate the bandwidth needed. The next example uses a change rate of 15 GB and an RPO
of one hour. We do not use deduplication in the calculation, partly so the dedupe savings can
serve as a buffer in the overall calculation, and partly because the one-time cost for deduped
data going over the wire has less of an impact once the data is present at the remote site. We’re
assuming an average of 30 percent bandwidth savings for compression on the wire.
Bandwidth needed = (RPO change rate * (1 - compression on wire savings %)) / RPO
11. Bandwidth | 40
Data Protection and Disaster Recovery
Example:
(15 GB * (1 - 0.3)) / 3,600s
(15 GB * 0.7) / 3,600s
10.5 GB / 3,600s
(10.5 * 1,000 MB) / 3,600s - changing to MB/s
(10,500 MB) / 3,600s
10,500 MB / 3,600 = 2.92 MB/s
Bandwidth needed = 23.33 Mb/s
11. Bandwidth | 41
Data Protection and Disaster Recovery
You can test a VM at the remote site without breaking replication using the restore or clone
functionality.
Figure 13: Cloning a VM for Testing Using the Local Snapshot Browser at the Remote Site
Using the local snapshot browser on the inactive production domain at the remote site, choose
the restore option to clone a VM to the datastore. Add a prefix to the VM’s path.
Best practices:
• When activating PDs on the remote site, use intelligent placement for Hyper-V and DRS for
ESXi clusters. Intelligent placement evenly spreads out the VMs on boot during a failover.
Acropolis powers on VMs uniformly at boot time.
• Install Nutanix Guest Tools (NGT) on machines using volume groups.
• Configure the data services cluster IP on the remote cluster.
Note: Nutanix AHV guest VMs and Nutanix Files are not certified for backup via
Cohesity because of the current Cohesity architectural implementation. Issues
related to Cohesity backups and configuration are not supported by Nutanix. For a list
of backup partners currently supported by and certified with Nutanix AHV, refer to the
Nutanix Technology Alliance Partners program.
15. Conclusion
Nutanix offers granular data protection based on the required recovery point objectives across
many different deployment models. As your application requirements change and your cluster
grows, you have the flexibility to add and remove VMs from protection domains. Sizing capacity
and bandwidth are key to achieving optimal data protection and maintaining cluster health.
Snapshot frequency and the daily change rate affect the capacity and bandwidth needed
between sites to meet the needs of the business.
With data protection features that are purpose-built, fully integrated, and 100 percent software
defined, Nutanix provides the ultimate in adaptability and flexibility for meeting your enterprise’s
backup and recovery needs.
15. Conclusion | 46
Data Protection and Disaster Recovery
Appendix
Appendix | 47
Data Protection and Disaster Recovery
Appendix | 48
Data Protection and Disaster Recovery
Appendix | 49
Data Protection and Disaster Recovery
c. After running the Test workflow, run the Clean-Up workflow instead on manually deleting
VMs.
d. A recovery plan should cover a maximum of 200 VMs at any one time.
e. Maximum of 20 categories in a recovery plan.
f. Maximum of 20 stages in a recovery plan.
g. Maximum of 15 categories per stage in a recovery plan.
h. Maximum of 5 recovery plans can be executed in parallel.
16. Network Mapping
a. Set up administrative distances on VLANs for subnets that will completely fail over. If
you don’t set up administrative distances, shut down the VLAN on the source side after
failover if the VPN connection is maintained between the two sites. If you’re failing over to
a new subnet, set up the subnet beforehand so you can test the routing.
b. The prefix length for network mappings at the source and the destination must be the
same.
c. If you’re not using Nutanix IPAM, you must install the NGT software package to a
maintain static address.
d. To maintain a static address for Linux VMs that aren’t using Nutanix IPAM, the VMs must
have the NetworkManager command-line tool (nmcli) version 0.9.10.0 or later installed.
Additionally, you must use NetworkManager to manage the network for the Linux VMs.
To enable NetworkManager on a Linux VM, set the value of the NM_CONTROLLED field
to yes in the interface configuration file (for example, in CentOS, the file is /etc/sysconfig/
network-scripts/ifcfg-eth0). After setting the field, restart the network service on the VM.
17. Xi Leap Hypervisor Support
a. Xi Leap only supports clusters running AHV.
18. Xi Leap Virtual Machine Configuration Restrictions
a. Cannot power on VMs configured with a GPU resource.
b. Cannot power on VMs configured with four vNUMA sockets.
19. Single-Node Backup
a. Combined, all protections domains should be under 30 VMs.
b. Limit backup retention to a three-month policy. A recommended policy would be seven
daily, four weekly, and three monthly backups.
c. Only map an NX-1155 to one physical cluster.
d. Snapshot schedule should be greater than or equal to six hours.
e. Turn off deduplication.
20. Cloud Connect
a. Try to limit each protection domain to one VM to speed up restores. This approach also
saves money, as it limits the amount of data going across the WAN.
b. The RPO should not be lower than four hours.
Appendix | 50
Data Protection and Disaster Recovery
Appendix | 51
Data Protection and Disaster Recovery
PowerShell Scripts
Check for VSS Service
This PowerShell script checks whether the VSS service is running as required for application-
consistent snapshots.
#Connect to the Nutanix cluster of your choice, try to use the external address.
Connect-NutanixCluster -AcceptInvalidSSLCerts -server External_cluster_IP -UserName admin
#load Nutanix CMDlets, make sure your local version matches the cluster version
Add-PSSnapin NutanixCmdletsPSSnapin
#Get a list of all Consistency Groups
$pdvss = Get-NTNXProtectionDomainConsistencyGroup
#array of all the appConsistentVMs
$appConsistentVM = @()
Foreach ($vssVM in $pdvss)
{
if ($vssVM.appConsistentSnapshots)
{
$appConsistentVM += $vssVM.consistencyGroupName
}
}
get-service -name VSS -computername $appConsistentVM | format-table -property MachineName,
Status, Name, DisplayName -auto
Appendix | 52
Data Protection and Disaster Recovery
Appendix | 53
Data Protection and Disaster Recovery
WriterName
StateID
StateDesc
LastError
Appendix | 54
Data Protection and Disaster Recovery
.Link
https://superwidgets.wordpress.com/category/powershell/
.Notes
Function by Sam Boutros
v1.0 - 09/17/2014
#>
[CmdletBinding(SupportsShouldProcess=$true,ConfirmImpact='Low')]
Param(
[Parameter(Mandatory=$false,
ValueFromPipeLine=$true,
ValueFromPipeLineByPropertyName=$true,
Position=0)]
[ValidateNotNullorEmpty()]
[String[]]$ComputerName = $env:COMPUTERNAME
)
$Writers = @()
$k = 0
foreach ($Computer in $ComputerName) {
try {
Write-Verbose "Getting VssWriter information from computer $Computer"
$k++
$Progress = "{0:N0}" -f ($k*100/$ComputerName.count)
Write-Progress -Activity "Processing computer $Computer ... $k out of
$($ComputerName.count) computers" `
-PercentComplete $Progress -Status "Please wait" -CurrentOperation "$Progress%
complete"
$RawWriters = Invoke-Command -ComputerName $Computer -ErrorAction Stop -ScriptBlock {
return (VssAdmin List Writers)
}
for ($i=0; $i -lt ($RawWriters.Count-3)/6; $i++) {
$Writer = New-Object -TypeName psobject
$Writer| Add-Member "ComputerName" $Computer
$Writer| Add-Member "WriterName" $RawWriters[($i*6)+3].Split("'")[1]
$Writer| Add-Member "StateID" $RawWriters[($i*6)+6].SubString(11,1)
Appendix | 55
Data Protection and Disaster Recovery
Appendix | 56
Data Protection and Disaster Recovery
About Nutanix
Nutanix makes infrastructure invisible, elevating IT to focus on the applications and services that
power their business. The Nutanix Enterprise Cloud OS leverages web-scale engineering and
consumer-grade design to natively converge compute, virtualization, and storage into a resilient,
software-defined solution with rich machine intelligence. The result is predictable performance,
cloud-like infrastructure consumption, robust security, and seamless application mobility for a
broad range of enterprise applications. Learn more at www.nutanix.com or follow us on Twitter
@nutanix.
Appendix | 57
Data Protection and Disaster Recovery
List of Figures
Figure 1: Nutanix Enterprise Cloud................................................................................... 9
Figure 12: Example Snapshot Schedule: Taking a Snapshot at Noon and 6 PM............ 36
Figure 13: Cloning a VM for Testing Using the Local Snapshot Browser at the Remote
Site................................................................................................................................42
58
Data Protection and Disaster Recovery
List of Tables
Table 1: Document Version History................................................................................... 7
59