Me5 Sra Ug
Me5 Sra Ug
May 2023
Rev. A01
Notes, cautions, and warnings
NOTE: A NOTE indicates important information that helps you make better use of your product.
CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid
the problem.
WARNING: A WARNING indicates a potential for property damage, personal injury, or death.
© 2022 - 2023 Dell Inc. or its subsidiaries. All rights reserved. Dell Technologies, Dell, and other trademarks are trademarks of Dell Inc. or its
subsidiaries. Other trademarks may be trademarks of their respective owners.
Contents
Chapter 3: Troubleshooting......................................................................................................... 10
Contents 3
1
Installing and configuring the SRA
The Dell PowerVault ME5 Series Storage Replication Adapter (SRA) for vSphere enables full-featured use of the VMware
vCenter Site Recovery Manager (SRM) version 6.5 or later. Combining the ME5 Series storage system’s replication functionality
with the vCenter SRM, the SRA provides an automated solution for implementing and testing disaster recovery between
geographically separated sites. It also enables you to use SRM for planned migrations between two sites.
Topics:
• About VMware Site Recovery Manager
• Protected sites and recovery sites
• SRM requirements
• Configuring the ME5 Series storage systems
• Install SRM software
• Install the SRA software
• Configure SRM
Planned migration
Planned migration is the orderly decommissioning of virtual machines at the protected site and commissioning of equivalent
machines at the recovery site. For planned migration to succeed, both sites must be up and fully functioning.
Disaster recovery
Disaster recovery is similar to planned migration except it does not require that both sites be up. During a disaster recovery
operation, failure of operations on the protected site are reported but otherwise ignored.
SRM coordinates the recovery process with the underlying replication mechanisms to ensure that the virtual machines at the
protected site are shut down cleanly (in the event that the protected site virtual machines are still available) and the replicated
virtual machines can be powered up. Recovery of protected virtual machines to the recovery site is guided by a recovery plan
that specifies the order in which virtual machines are started up. The recovery plan also specifies network parameters, such as
IP addresses, and can contain user-specified scripts that can be executed to perform custom recovery actions.
After a recovery has been performed, the running virtual machines are no longer protected. To address this reduced protection,
SRM supports a reprotect operation for virtual machines. The reprotect operation reverses the roles of the two sites after the
original protected site is back up. The site that was formerly the recovery site becomes the protected site and the site that was
formerly the protected site becomes the recovery site.
SRM enables you to test recovery plans. You can conduct tests using a temporary copy of the replicated data in a way that does
not disrupt ongoing operations at either site. You can conduct tests after a reprotect has been done to confirm that the new
protected/recovery site configuration is valid.
SRM requirements
A typical SRM configuration involves two geographically separated sites with TCP/IP connectivity, the protected site and the
recovery site. The protected site is the site that is being replicated to the recovery site for disaster recovery. Each site contains
a Dell PowerVault ME5 Series storage system, VMware ESX servers, a Virtual Center (vCenter) server, and an SRM server
running the SRM.
After you have set up the protected site and the recovery site, and installed the necessary infrastructure for networking
between the two sites, you can install and configure the software. For more information, see Configuring the ME5 Series
storage systems.
Before configuring the SRA, fully complete basic storage system configuration. This ensures that the storage system
name, user credentials, and IP addresses for both storage systems are set. The arrays cannot be registered with SRM
until peer connections have been established between the local and remote storage systems. The SRA uses the same
user credentials for both the local and remote storage system. If necessary, create a new user for the SRA with these
attributes:
3. Use the PowerVault Manager to configure replication functionality, following the instructions in the replication section of the
Administrator’s Guide, including the following settings for SRA:
● snapshot-count: 3 (or higher)
● snapshot-history: both
● queue-policy: queue-latest
● (optional) snapshot-basename: same-as-volume-name
NOTE: Setting the basename as indicated makes troubleshooting easier because replication snapshots will have the
same name as the base volume with _nnnn appended (indicating the replication generation number).
4. Use the PowerVault Manager to perform at least one replication from the protected site to the recovery site. Doing so
ensures that, in the event of a disaster that disables the protected site, damages hardware, or damages files, SRM can use
the most recently replicated copy at the recovery site for disaster recovery. It is important, when using scheduled replication
replications, to verify that the source of the most recent replication was in a valid state.
NOTE: If desired, see Best practices for additional setup information.
Configure SRM
After you have both SRM and SRA installed, the Getting Started tab of the main SRM window guides you through the steps
necessary to configure it. For detailed SRM configuration instructions, see the VMware publication Site Recovery Manager
Administration guide.
● The IP addresses of the ME5 Series storage systems. You must enter the IP address for each controller in both the local and
remote storage systems.
● A user name and password that are valid in both the local and remote storage systems. This is the user name and password
as configured in the PowerVault Manager.
Configuring ME5 Series storage systems in SRM requires the following:
Make the following changes to your SRM settings:
● Set storageProvider.autoResignatureMode to 1 (required).
● Set storageProvider.hostRescanRepeatCnt to 2 (required).
● Set Storage.commandTimeout to 1200 seconds (recommended).
Automatic failover
SRM automates the execution of recovery plans to ensure accurate and consistent execution. Through the vCenter Server you
can gain full visibility and control of the process, including the status of each step, progress indicators, and detailed descriptions
of any error that occurs.
In the event of a disaster when an SRM actual failover is requested, the SRA will perform the following steps:
1. Select the replicated volumes.
2. Identify and remove any incomplete remote copies that are in progress and present the most recently completed Remote
Copy as a primary volume.
3. Convert remote volumes into primary volumes and configure authentication for ESXi hosts to mount them.
If an actual failover does not run completely for any reason, the failover can be called many times to try to complete the run.
If, for example, only one volume failed to restore and that was due to a normal snapshot being present, the snapshot could be
manually deleted and the failover be requested again.
Automated failback
You can set up an automated failback workflow to return the entire environment to the primary site from the recovery site.
The failback happens after the reprotection has ensured that data replication and synchronization are established to the original
primary site.
Automated failback runs the same workflow that was used to migrate the environment to the protected site. It ensures that
the critical systems encapsulated by the recovery plan are returned to their original environment. The workflow executes only if
reprotection is successfully completed. Failback is only available with storage system replication.
Failback ensures the following:
● All virtual machines that were initially migrated to the recovery site are moved back to the primary site.
● Environments that require that disaster recovery testing be done with live environments with genuine migrations can be
returned to their initial site.
● Simplified recovery processes enable a return to standard operations after a failure.
● Failover can be done in case of disaster or in case of planned migration.
10 Troubleshooting
Table 1. SRA error messages and suggested actions (continued)
Message Message Suggested action
number
1020 Could not find peer volume for local volume {localsn}. Ensure that the specified volume has been set up as
part of a replication set.
1021 Invalid or missing parameters in SRM '{cmd}' request Verify that the replication sets, remote systems, and
received by the SRA. SRM configuration are correct.
1022 Invalid or unknown ArrayId '{ArrayId}' in {cmd} request. Ensure that the storage controller system names
and IP addresses have not been reconfigured since
SRM was configured.
1023 Failed to open lock file {filename}. Check file and directory permissions for the
specified filename.
1024 Unknown or missing DeviceId parameter '{DeviceId}' in Verify that SRM and the SRA are configured
{command} request. correctly. Also check the health of storage system
and network paths between the SRM host and both
storage systems.
1025 No valid sync point found for volume {vol} during the The operation failed on this volume because no valid
{command} operation. sync point exists for the volume. In PowerVault
Manager, use the Snapshots table to verify that
the specified volume has been completely replicated
from the protected site. For more information, see
the Administrator’s Guide.
1026 Timed out waiting for replication set for volume {volume} Verify that the specified volume has been created
to transition to conflict status on storage system on the storage system and retry the operation.
{arrayname} at {file}:{line}.
1027 The SRA syncOnce command timed out waiting for Check to make sure that the storage system is
replication images for volume(s) [{volumes}] to start on healthy, and repeat the operation if necessary to
the storage system. ensure that the volumes are replicated.
1028 No SRA snapshot found for volume '{DeviceId}' in The SRA failed to export the snapshot in a previous
{command} request. testFailoverStart operation, or the snapshot has
already been removed, or the snapshot was not
found due to a problem communicating with the
management port on the storage system.
1029 An existing SRA snapshot {snapshot} must be removed Remove snapshot volume {snapshot} before trying
before the testFailoverStart function can be performed on the test failover operation again.
{volume}.
1030 reverseReplication cannot be performed on target volume Ensure that both storage systems ({localArray}
{volume} because original protected volume {target} is still and {remoteArray}) and their corresponding SRM
mapped on the remote storage system {remoteArray} servers are running and manageable over the
network.
1101 Failed to log in to storage system at {url} ({response}) Ensure that storage system IP addresses are
configured correctly and that the storage system is
reachable from the SRM host. Also, if any storage
system IP addresses have changed, it may be
necessary to delete and recreate the remote system
definitions on one or both storage systems.
1102 Execution of command “{cmd}” failed on storage system at If the error message did not specify the reason for
{ipAddr}: {err} the failure, open the specified address with a web
browser to check the health of the storage system.
1103 No IP addresses specified for MC for command “{cmd}” Verify that the IP addresses for the storage system
are configured correctly on the storage system and
on the host.
1104 Response from storage system at {ipAddr} did not include Check the health of the storage system and restart
status indication. the management controller if necessary.
Troubleshooting 11
Table 1. SRA error messages and suggested actions (continued)
Message Message Suggested action
number
1105 Failed to run command “{cmd}” on storage system at Verify the IP address configuration on the storage
{system}: {err} system and on the host, and check network
connectivity.
2001 Volume {volume}({name}) is already unmapped. SRM requested that a volume be prepared for
failover, but the volume is already prepared.
2002 No data found for {volume}replication image {imageSn} Verify that replication has started for volume
({err}). {volume}.
2003 querySyncStatus: No data found for replication image Verify that replication has started for the specified
{imageSn} for volume {vol} ({err}). volume.
NOTE: You can expect to see certain errors in the log file when commands are executed to ensure that volumes are in
a particular state if the volumes are already in that state. These errors are -3395 (Replication is not active
on this secondary volume) and -10306 (Unable to set the specified volume as the primary
volume because the specified volume is already a primary volume). You can safely disregard these
error messages if they occur under these circumstances.
12 Troubleshooting
4
Best practices
Specific guidelines and recommendations for using the SRA and replication software in conjunction with the VMware SRM
disaster recovery solution include the following:
● Prepare a plan in advance for how you will re-establish replication schedules in the event of a site failover. After performing
a reverse-replication operation, you must set up replication schedules in order to ensure periodic replication of data from the
new source volumes back to the original source site. Alternatively, you can initiate replication manually if appropriate.
● Try to group virtual machines with related backup requirements or schedules on the same datastore volume, since replication
occurs on a per-volume basis. For example, if some virtual machines do not need to be replicated to a remote site, or need to
be replicated less frequently, do not store them on the same datastore volume as virtual machines which must be replicated
frequently, to avoid replicating data unnecessarily.
● The SRA only supports replication between identical hardware models. For example, replication between an iSCSI-only
system and a FC-only system is not supported.
● Avoid mapping replication volumes to LUN 0 to avoid issues with dynamically mapping and unmapping LUNs, due to special
management functionality assigned to LUN 0. You can map volumes to LUN 0 if those volumes are not expected to be
mapped and unmapped automatically the way replication volumes are, such as local datastores that are not replicated.
● Replication volumes should be mapped with the same LUN number on all hosts.
● Do not use the same LUN number for different volumes that are mapped to different hosts.
● Failover operations will cause read-write host mappings for replication volumes to be converted to read-only, and restoring
replication will convert all read-only mappings for the same volume to read-write. Be careful not to create read-only
mappings of replication volumes such as for data mining purposes. If a read-only mapping of a replication volume is required,
consider creating a non-replicated hardware or software snapshot of the volume.
● The SRA might create host entries on the storage system to keep track of remote IP or FC addresses. Do not delete host
entries whose name starts with “SRA.” However, you may rename them to be more descriptive.
● Replication set basenames for replicated volumes (assigned when creating the replication set) should be no more than 23
bytes long to allow for suffixes to be appended when creating replication snapshots. 23 bytes allows for up to 23 ASCII
characters, but non-ASCII UTF-8 characters require more than one byte each.
● Do not change the name of replication snapshots or change replication-set basenames except to conform to these best
practices. The SRA depends on consistency between the replication-set basenames and the snapshot names.
Best practices 13