0% found this document useful (0 votes)
57 views9 pages

For More Information On CMG Please Visit: The Association of System Performance Professionals

This paper was originally published in the Proceedings of the Computer Measurement Group's 1997 international conference. By downloading this publication, you agree to be bound by the following terms and conditions. No part of this publication or electronic file may be reproduced or transmitted in any form to anyone else.

Uploaded by

vanvalkenburghr
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views9 pages

For More Information On CMG Please Visit: The Association of System Performance Professionals

This paper was originally published in the Proceedings of the Computer Measurement Group's 1997 international conference. By downloading this publication, you agree to be bound by the following terms and conditions. No part of this publication or electronic file may be reproduced or transmitted in any form to anyone else.

Uploaded by

vanvalkenburghr
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

The Association of System

Performance Professionals

The Computer Measurement Group, commonly called CMG, is a not for profit, worldwide organization of data processing professionals committed to the
measurement and management of computer systems. CMG members are primarily concerned with performance evaluation of existing systems to maximize
performance (eg. response time, throughput, etc.) and with capacity management where planned enhancements to existing systems or the design of new
systems are evaluated to find the necessary resources required to provide adequate performance at a reasonable cost.

This paper was originally published in the Proceedings of the Computer Measurement Group’s 1997 International Conference.

For more information on CMG please visit http://www.cmg.org

Copyright Notice and License

Copyright 1997 by The Computer Measurement Group, Inc. All Rights Reserved. Published by The Computer Measurement Group, Inc. (CMG), a non-profit
Illinois membership corporation. Permission to reprint in whole or in any part may be granted for educational and scientific purposes upon written application to
the Editor, CMG Headquarters, 151 Fries Mill Road, Suite 104, Turnersville , NJ 08012.

BY DOWNLOADING THIS PUBLICATION, YOU ACKNOWLEDGE THAT YOU HAVE READ, UNDERSTOOD AND AGREE TO BE BOUND BY THE
FOLLOWING TERMS AND CONDITIONS:

License: CMG hereby grants you a nonexclusive, nontransferable right to download this publication from the CMG Web site for personal use on a single
computer owned, leased or otherwise controlled by you. In the event that the computer becomes dysfunctional, such that you are unable to access the
publication, you may transfer the publication to another single computer, provided that it is removed from the computer from which it is transferred and its use
on the replacement computer otherwise complies with the terms of this Copyright Notice and License.

Concurrent use on two or more computers or on a network is not allowed.

Copyright: No part of this publication or electronic file may be reproduced or transmitted in any form to anyone else, including transmittal by e-mail, by file
transfer protocol (FTP), or by being made part of a network-accessible system, without the prior written permission of CMG. You may not merge, adapt,
translate, modify, rent, lease, sell, sublicense, assign or otherwise transfer the publication, or remove any proprietary notice or label appearing on the
publication.

Disclaimer; Limitation of Liability: The ideas and concepts set forth in this publication are solely those of the respective authors, and not of CMG, and CMG
does not endorse, approve, guarantee or otherwise certify any such ideas or concepts in any application or usage. CMG assumes no responsibility or liability
in connection with the use or misuse of the publication or electronic file. CMG makes no warranty or representation that the electronic file will be free from
errors, viruses, worms or other elements or codes that manifest contaminating or destructive properties, and it expressly disclaims liability arising from such
errors, elements or codes.

General: CMG reserves the right to terminate this Agreement immediately upon discovery of violation of any of its terms.
Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

RELOCATING A DATA CENTER USING EXTENDED DISTANCE SRDF:


A USER EXPERIENCE

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
Mark P. Grinnell
EMC Corporation

Jim Feurig
Chrysler Corporation
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

Symmetrix Remote Data Facility (SRDF) is a remote DASD copy feature that can be used
to relocate a data center to a new site using T1 or T3 communication facilities. This paper
describes a 2.3TB relocation accomplished in late 1996 using four (4) 60 mile T3 links
and the DASD project planning activities used to support the move. A model to predict the
communication link throughput is presented.

OVERVIEW • Communication link configuration


• New site DASD copy and refresh strategy
Symmetrix Remote Data Facility (SRDF) is an EMC • Elapsed time for sync prior to final cut over
DASD to DASD remote copy feature that can be
used to relocate a data center to a new location New and Existing Site DASD Configuration
using T3, T1 or other communication links (ESCON,
ATM, etc.). SRDF allows the new site to be tested Building the Existing DASD Site Configuration
with current data prior to the final cut over as well as
minimizing the outage required at cut over time. This This project began DASD configuration of fourteen
paper describes the major planning activities and (14) controllers containing approximately 2.3TB or
events associated with a 2.3TB data center 840 volumes of non-EMC 3390-3 type DASD.
relocation performed in late 1996 using four (4) 60
mile T3 communication links. To simplify the relocation, it was decided that both
the existing and new site should have identical
SRDF was selected as the method of data center DASD configurations. Performance data was
relocation over more traditional methods because analyzed to determine the existing CPU channel
the on-line application environment could not sustain utilization and write content of the existing DASD
an extended outage. A one (1) hour cut over was the configuration. Channel busy is important because
objective and was accomplished on Thanksgiving SRDF requires “front end” ESCON adapters (EAs)
day to minimize the impact on users located in the which normally would support CPU channel traffic be
United States. reconfigured as remote adapters (RAs). RAs support
write traffic over the communication links to DASD at
This paper is intended for project planner level the new site. The affect is to reduce the front end
readers and is organized into the following sections: bandwidth of the DASD and since the relocation was
to take place over a period of nearly two (2) months,
• Planning performance degradation due to excessive CPU
• Testing channel busy was a concern. The write content data
• Operational Issues was used to balance the write activity over the
• Cut Over Day communication links. SRDF does RA load balancing
• Recommendations within the Symmetrix box itself but does not balance
across DASD boxes.
PLANNING The analysis used averages of each of three (3)
eight (8) hour shifts. The “evening shift” or 16:00 to
SRDF requires EMC DASD at both the existing site
24:00 was the busiest and was chosen as the design
as well as the new data center location. The steps
basis. The total workload across all DASD
used to accomplish the relocation include:
controllers was:
• New and existing site DASD configuration

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

• 4000 I/O per second


• 22% writes
• 3.4msec CONNECT per I/O

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
Using this workload, a five (5) box Symmetrix 5500 model 9M56 solution was chosen and is shown in Figure 1.
Each 5500 can support up to eight (8) concurrent I/O activities. With two (2) of the eight (8) ESCON adapters (EA)
in each box reconfigured as remote adapters (RA) there were thirty (30) total EAs to support CPU channel traffic
and ten (10) RAs to support T3 link traffic. Each of the 5500-9M56 boxes were configured with 168 3390-3 type
volumes which resulted in 476GB of usable fully RAID-1 protected storage.
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

Figure 1

Data Center Relocation


Sym m /Data Switch Configuration
Box #4 EMC 5500-9M 5 6 |CA00|
Data Switch 9800
T3: Circuit ID #1
476GB
|CA01|
220 W rite I/O
Data Switch 9800
LinkAdapter
|CA10|
Box #5 EMC 5500-9M 5 6 Data Switch 9800 T3: Circuit ID #2
476GB |CA11|
220 W rite I/O

Box #2 EMC 5500-9M 5 6 |CA20| 2


476GB Data Switch 9800 T3: Circuit ID #3
|CA21|
Data Switch 9800 180 W rite I/O
Link Adapter
|CA30|
Box #1 EMC 5500-9M 5 6 Data Switch 9800 3
476GB |CA31|
T3: Circuit ID #4

230 W rite I/O

Box #3 EMC 5500-9M 5 6


476GB 4000 I/O per second, 22% W rites

97KICK

considering write content as well as individual box


Averaging the CPU workload evenly over the thirty channel busy as stated above.
(30) channel paths we get a acceptable path busy
value of 45% busy for the busiest eight (8) hour shift. Building the New Site DASD Configuration
The same calculation, with the RAs reconfigured as
EAs, representing the environment after the While it is possible to do some re-mapping of
relocation reduced the average path busy to 34% devices between the existing and new site using
busy. SRDF, it adds complexity and was avoided. Because
this project required a “clean slate” design of the
Symmetrix Data Migration Services (SDMS) and existing site, it was relatively easy to make the new
some manual movement was used to copy data to site an exact replica of the existing site. The new site
the Symmetrix DASD. The actual copy process was consisted of the same five (5) box configuration, and
performed over the period of three (3) weekends each new site box had an identical volume map to
taking advantage of normal early Sunday morning the corresponding existing site box. All DASD
outage windows. The combination or “blend” of volumes were mirrored to the remote/new site
approximately three (3) existing donor DASD including page and public volumes. The relatively
controllers to each recipient DASD box was done short distance of this move allowed these high write

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

content volumes to be remotely mirrored without distance “as the crow flies” between sites will never
special consideration. Long distance moves, or be the effective circuit distance because of delays
those with low capacity communication link added by the communication carrier equipment. For
configurations (such as T1 links) may require planning purposes, use of a 100% increase above

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
different procedures to handle individual write air mile distance is not unreasonable.
intensive volumes.
link overhead: this value is technology dependent
Communication link configuration and is usually on the order of 1msec for each pass
through a link attachment device (in our case, a Data
Communication link planning is critical to a Switch/General Signal Network 9800 Max protocol
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

successful SRDF data center relocation project. The converter). An SRDF link I/O requires two (2) passes
communication links used for SRDF are dedicated to (a send and acknowledgment) so 4 msec addition to
DASD to DASD communication and cannot be link service time for a single SRDF T3 transmission
shared with other workloads. Considerations for link is typical. Again, for T3 at short distances, this is not
planning should include: a trivial value and is the reason that a single SRDF
remote adapter (RA) cannot keep a T3 link fully
• link throughput utilized. As distances get longer the 4 msec value
• link redundancy becomes less and less important. For T1 links, this
• link installation lead time and availability value is virtually insignificant when compared to the
365 msec data transfer component. This is why a
Before these topics are covered, a brief and high single RA can in most cases fully utilize a T1 link.
level discussion of communication link theory is
appropriate. Just to position some of the link Link service time is the sum of the above
technologies, consider Table 1 which shows the link components. Effective link throughput is a direct
capacities of ESCON, T3 and T1 and the amount of result of link service time and may be calculated by:
time it would take to copy a single 2.838GB 3390-3
type volume. 1. Link I/O’s per second = 1 / Link service time

Table 1: Time to Copy 3390-3 Volume Since each SRDF link I/O (in adaptive mode)
represents a 56kbyte unit of work the effective link
ESCON 17 mbytes/sec 2.8 min throughput can be calculated by:
T3 4.5 mbytes/sec 10.8 min
T1 0.15 mbytes/sec 323 min 2. Effective link throughput =
Link I/O’s per second * 56kbytes
Communication link service time can be broken into
three (3) separate components: A fully utilized 4.5mbyte/sec T3 will support about 82
of these 56kbyte link I/O’s a second. A T1, on the
data transfer: this component is dependent on the other hand, will support about 2.5 56kbyte I/O’s per
link “unit of work” or block size as well as the speed second.
of the link itself. For example: transfer of a 56kbyte
block at 4.5mbytes/sec T3 speed will take 12.2 These simple equations can be used to estimate
milliseconds. Link data compression would help by how many remote adapters (RA) should be
reducing this component of link service time. For T1 configured to maximize utilization of the
links, data transfer is the dominant component. A communication link. Links are expensive and should
56kbyte block takes about 365 msec to transfer at be configured to run as close to rated bandwidth as
the 0.15mbyte/sec speed of a T1. Data compression possible. Fully utilizing the link will ALWAYS require
is VERY helpful on T1 links at any distance. Data a minimum of two (2) RAs per link regardless of
compression will help T3 links as well, but it will lose distance because of the link overhead component
importance as the distances increase and as block described above. Figure 2 shows a graphical
sizes decrease. representation of T3 link throughput vs circuit
distance in a SRDF environment. This figure shows
propagation delay: this component is related to as the distance increases, more RAs are required to
effective distance of the circuit and is calculated by 1 fully utilize the link. Table 2 shows the resulting link
millisecond for every 125 miles of circuit distance. At utilization as a function of distance and RA
long T3 distances, this component will be the major configuration. Both Figure 2 and Table 2
contributor to overall link service time. For T1 links, demonstrate that a single RA cannot keep a T3
the data transfer component is so long (i.e. 365 busy. They show a maximum throughput of
milliseconds for 56kbyte) that the propagation delay 3.5MB/sec which is a waste of 25% of a T3.
is usually very small in comparison. Actual physical

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

With this as background we can now move on to our 60 mile T3 relocation project.

FIGURE 2

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
EFFECTIVE LINK MB/SEC VERSUS CIRCUIT MILE DISTANCE
1,2,3 AND 4 REMOTE ADAPTERS (RA) MULTIPLEXED ON SINGLE T3 LINK
100% LINK UTILIZATION: ADD RAs TO GET HERE
5.0
ASSUMPTIONS:
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

MBYTES PER SEC


EFFECTIVE LINK

4.0 4 RA 4.5MB/SEC T3 SPEED


NO LINK COMPRESSION
3.0 3 RA
56KBYTE TRACK
2 RA
ADAPTIVE SRDF MODE
2.0 1 RA
4.0MSEC LINK OVERHEAD

1.0

0.0
0 1000 2000 3000 4000 5000
CIRCUIT MILES

TABLE 2: Single T3 Link Utilization Vs Distance and RA Configuration

CIRCUIT MILES 1RA 2RA 3RA 4RA


0 75% 100% 100% 100%
100 68% 100% 100% 100%
200 63% 100% 100% 100%
500 50% 100% 100% 100%
750 43% 86% 100% 100%
1000 38% 76% 100% 100%
1300 33% 66% 100% 100%
1500 30% 61% 91% 100%
2000 25% 50% 76% 100%
2500 22% 44% 65% 87%
3000 19% 38% 57% 76%
3500 17% 34% 51% 67%
5000 13% 25% 38% 51%

Note: Table 2 assumptions: 56kbyte unit of work, no compression, 4.5mbyte/sec T3 link and 4 msec overhead.

response time. Write I/O’s to the local DASD are not


Link Throughput elongated by SRDF in this mode and results in a
56kbyte link “unit of work”. This mode is intended to be
Link throughput is important because it will determine used for bulk data copy activities such as those
the elapsed time of the following events: associated with data center relocations.

• initial population of the new site DASD For our 60 circuit mile data center relocation, we had
• resync of the new site DASD after testing four (4) T3 links, and eight (8) active RA link paths as
• delay of new site IPL at final cut over shown in Figure 1. Because our distance was so short,
we were able to fully utilize our T3 links. It is not until
SRDF has an adaptive copy mode which enables link the circuit mile distance exceeds 500 miles that a third
writes to occur without any impact to local DASD RA should be considered for input to a single T3 link.

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

rate at the existing/production site. Write workload


The initial population of the new site DASD will be ongoing at the existing site is “working against” the
determined by both the link throughput as well as the remote copy process and will elongate the
write content of the workload during the copy. The synchronization time. In fact, if the links are too long or

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
SRDF network configuration chosen allowed for only too slow (i.e. T1) the remote site may never fully catch
eight (8) DASD link (RA) connections (see Figure 1 for up until the local site is brought down. Estimating the
link configuration). Because we had five (5) DASD copy time is in fact one of the most difficult planning
boxes with ten (10) RA connections, two (2) of five (5) tasks. We usually know the write content of the
DASD boxes could have only a single active RA path. workload but the locality of reference is an unknown
We did configure each box with two (2) RA paths, but and potentially very important variable. Four (4) T3s at
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

chose boxes #1 and #3 to have only a single active 60 miles provided sufficient bandwidth to overwhelm
connection. These boxes were chosen because of the write workload at the existing/local site for our
their low write workload compared to the other boxes. project. However for long distance moves, or for T1
It turns out this configuration was more than adequate moves, the local host update rate can approach or
from a link throughput perspective, but left us even surpass, for a time, the capacity of the SRDF
vulnerable to a T3 outage on the boxes with a single communication links. Data compression is also an
RA. As luck would have it, we did in fact experience a unknown that can affect the time to copy. Our
significant T3 down event on the DAY OF THE MOVE! experience has been that counting on benefit from T3
Lesson learned: ALWAYS provide a minimum of two link compression is risky and not recommended. T1
(2) active RA’s per DASD box. links seem to be more “compression friendly” and as
previously mentioned, is a great benefit for T1 data
Link Redundancy movement at any distance. We’ve seen at least 2:1
compression on T1 links with SRDF adaptive mode.
Link redundancy is important to minimize exposure to
link outages. SRDF has the ability to recover The link throughput value should be constant once
automatically following a link down event, and resume installed, and should only fluctuate if the links were to
the copy process without requirement of a full resync. go down. The existing/local site update rate is of
Also, all the DASD volumes within a single box share course a variable. Our four (4) T3 configuration with
all the box link paths, (RA) equally balancing the 2.3TB of data located 60 circuit miles away, copied to
workload across them and switch to the surviving RA the new site in about 44 hours. If there were no
or link automatically in the event of a failure. updates, our copy would have taken about 36 hours.

It is recommended that “diverse routing” be requested In order to IPL and test the new site, SRDF must be
to minimize the likelihood of a single event disabling suspended. The normal status of the remote SRDF
multiple links. Additionally using multiple DASD is read only and must be changed to write
communication link vendors could be used to improve enabled following suspension of SRDF. Ideally the
the overall link availability. Communication links are local site should be shut down (or the SRDF mode
most definitely the most “fragile” component of the could be switched to SYNCHRONOUS mode during a
configuration. It is also possible to request quiet time) and the updates allowed to flow to the
communication links be available “on demand” if they remote site providing a perfect point in time copy at the
are needed which greatly reduces their cost. remote site. After synchronization, both the local and
remote sites can be IPL’d to support production at the
Whenever possible configure each DASD box to local site and testing at the remote site. After testing,
multiple diversely routed communication links. SRDF can be resumed and only the updates that have
occurred since the suspend are copied to the desired
Link Installation Lead Time and Availability site. Updates may be kept on a volume level basis at
either site. For simplicity, we elected to discard all
Link availability and lead time can easily become the updates made at the new/remote site.
critical path in the project. They can take months to
order and install and depending on the geographic Our suspend/resume strategy was to suspend SRDF
locations of the desired service, communication links on Sunday during the normally scheduled outage
may not be available at all. Lesson learned: check with window and allow test processing to occur at the new
your communication vendor(s) early and often during site during the entire week. On Friday we resumed
the project. SRDF at 18:00 to “get caught back up” in time for the
Sunday outage. A that time we would again suspend
New site DASD copy and refresh strategy SRDF. We had sufficient T3 bandwidth to allow this
approach so we could get back into sync over the
The time required to initially populate the new site will weekend, providing a refreshed copy at the new site.
be dependent on the link throughput and the update Obviously, during the time SRDF is suspended,

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

updates are accumulating on the local/production site. Updates are maintained at the DASD track level and are
referred to as “invalid” if the update has not yet been received and acknowledged by the remote site.

Figure 3 shows an invalid track history profile for one of the weeks during the testing period.

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
Figure 3: Testing Support
Invalid Track History: Sun Nov 3 to Sat Nov 10, 1996
4000000 RESUME FRI NOV 9 18:00
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

SUSPEND SUN NOV 3


3500000

3000000
Invalid Tracks

2500000 Box #2
Box #3
2000000
Box #4
1500000 Box #5
LAST BOX SYNC 12:00 SAT NOV 10
1000000

500000

0
11/3/96

11/4/96

11/5/96

11/6/96

11/7/96

11/8/96

11/8/96

11/9/96

11/9/96
18:00

12:00
8:00

0:00

0:00

0:00

0:00

0:00

0:00
Time and Date

Note: Data from Box #1 was not available

Figure 3 shows how quickly the remote copy can get never catch up. This is because the capacity of our T3
out of sync when SRDF is suspended. A similar was:
behavior is observed if the communication link
throughput is insufficient to keep up with the write • four (4) T3 links
workload. This could happen at longer distances, or at • each T3 at 100% moves 82 tracks/sec
any distance with a T1 link move. Notice how the • 328 tracks per second total “burn off” capability
shape of the curve tends to flatten out as time • only 37% of potential update rate (328/880)
progresses. This is because the “locality” of the writes
is resulting in updates of tracks that are already invalid Fortunately, experience has shown writes tend to be
and therefore not incrementing the invalid track count. very localized and therefore “re-hit” tracks that are
already invalid. Figure 3 can be used to calculate the
It is important to understand just how difficult it COULD highest invalid track accumulation rate which always
be to maintain data currency between the local and occur right after SRDF has been suspended and the
remote copy of the DASD. Recall our design workload invalid track count is low. Our project indicated that
was: invalid track accumulation rates of 40 to 70,000 tracks
per hour was a typical rate. This rate would fall off as
• 4,000 I/O per second the box became more and more invalid. We have seen
• 880 write I/O per second (22%) values like this for other projects as well. Comparing a
50,000 track per hour box average invalidation rate to
This is a POTENTIAL update workload of 880 tracks the “burn off” capability of our T3 links:
per second across our five (5) boxes. If we were
unfortunate enough to have these writes be so random • each box invalidation rate = 50,000 tracks/hr
that they always invalidated a new track, we would • total invalidation rate = 250,000 tracks/hr
• convert to seconds = 69 tracks/sec

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

• total write workload = 880 per second ALL DASD updates made at the new site are
• 92% of writes re-hit invalid track (=1-69/880) discarded each time this refresh occurred, so any
• 8% “effectiveness” of writes (=69/880) permanent changes (MVS, network etc.) must be
made at the existing site.

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
This is obviously very good news, but keep in mind the
potential of your workload to overwhelm the link “burn Figure 3 shows that over the period of a Sunday
off” capability is always there. Pay particular attention through Friday that 25% to 35% of the data in each box
to batch workloads and HSM activity such as defrags was changed, or “invalidated” by production updates
or database reorgs. These activities can quickly performed at the existing site. Figure 3 also shows
invalidate a lot of data and should be avoided near cut there were significant differences between the boxes
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

over or synchronization prior to a test. #2 through #5 which was expected from review of the
write I/O rates to each box. At Friday 18:00 the links
Elapsed time for sync prior to final cut over were resumed, and the slowest box to “sync up” took
until 12:00 noon Saturday (the following day), that
This is always the million dollar question when being one of the boxes with only a single T3 link. Since
performing a data center move using SRDF. ANY the suspend activities occurred very early Sunday
communication link configuration will “work”, that is get morning the T3 link throughput was more than
the data from site A to site B. The real question is how adequate to support this testing strategy.
long will the time period be from shutdown at the
existing production site until the DASD at the new site OPERATIONAL ISSUES
is in sync and can be IPL’d. This is NOT a trivial
question and particularly difficult when: Handling cartridge tape during a move and testing prior
to the move is always a challenge. This section
• communication links are underconfigured describes how tapes were managed for our project.
• distance is very long
• T1 links are being used The existing production data center had a cart library
of over 35,000 volumes stored in seven silos. These
In this case we had plenty of bandwidth and within an carts had to be relocated to the new site,
hour from shutdown we were IPLing the new site. approximately a 17 mile drive. The relocation began
Invalid track counts are available from the Symmetrix about a month before the data center move. As the
service processor or through the SRDF Host day of the move approached, the identification of carts
Component software product which runs on a CPU eligible for early relocation began. One month before
that has connection to the Symmetrix. A batch job the relocation, all carts not accessed in the last 30
which records internal Symmetrix statistics such as days were ejected from the silos. These carts were
invalid track counts, RA link I/O rates, can be set up to placed in racks and transported to the new location.
run on periodic intervals to study the environment. In Thus, in theory, moved carts that would not be needed
fact, configuring the Symmetrix for SRDF and until the next "monthly" processing window occurred.
collecting internal data BEFORE the communication The carts that remained in the silos would be the ones
links are installed could greatly assist with link sizing required for daily, weekly, etc. processing. As the
and synchronization estimates. relocation date neared, about 10 days before the
move, all carts not accessed in the last five days were
TESTING ejected from the silos. These carts were kept at the
existing/production site, stacked on the floor by volume
SRDF SUSPEND/RESUME serial number. There remained about 4,000 carts in
the silos. Over the next few days, the theory of
One of the benefits of using SRDF to relocate a data ejection by access age was tested frequently. It had
center is ability to test the new environment prior to the been expected that those carts on the floor would be
cut over. Our project included four (4) weeks of testing accessed, and they were. The sequence of those
prior to the move. During the week, SRDF was volumes was soon corrupted. In addition, there were
suspended allowing the new site DASD to be write frequent trips to the new site to retrieve some of the
enabled and application testing at the new site with older carts for processing. Some of those trips were in
“old data” could occur. This testing at the new site was support of capacity planning jobs being run to assist in
performed concurrently with production workload at the the planning for the move. Operations also identified
existing site. SRDF in adaptive mode does NOT affect those carts required for "critical" jobs and migrated
response time at the existing site. At the end of the those carts to a single silo in preparation for their
week SRDF was resumed with the production updates relocation. On the night of the move, all batch
flowing from the existing site, to the new site. It is not processing was to be completed by midnight. About
necessary to do a full resync in this configuration, only two hours before that, the ejection of the remaining
the data that has been changed needs to be copied. 4,000-5,000 carts began. The physical movement

Find a CMG regional meeting near you at www.cmg.org/regions


Learn the basics and latest aspects of IT Service Management at CMG's Annual Conference - www.cmg.org/conference

required that silos be staffed at both sites. As the carts


were unloaded at the original site, they were packed 5. To assist with link capacity planning, consider
and trucked to the new site. The staff at the new site reconfiguring existing Symmetrix subsystems for
loaded the new silos. This unload, truck, and load SRDF and collect invalid track count statistics

Buy the Latest Conference Proceedings and Find Latest Computer Performance Management 'How To' for All Platforms at www.cmg.org
process had to be completed so that batch processing BEFORE the links configuration is finalized.
could resume by 8:00 am. Because silos require about
two hours to "audit" their contents, "critical" carts were 6. To minimize exposure to link outages avoid
loaded in the first silo to ensure their availability at start configuring less than a single remote adapter (RA)
up time. The rest of the carts were loaded as soon as per DASD box
they arrived from the old data center. All the carts
Join over 14,000 peers - subscribe to free CMG publication, MeasureIT(tm), at www.cmg.org/subscribe

were loaded and the "audit" processing completed by 7. Request diverse routing for the communication
the target time. links and consider using multiple communication
carrier vendors
CUT OVER DAY
8. Consider ordering spare communication links that
What should have been routine, became very exciting may be used if needed for additional capacity.
when we lost three (3) of the four (4) T3 links the These can be obtained at a greatly reduced cost.
morning of the move. Fortunately, SRDF has the ability
to recover from a link down event without requirement 9. Carefully document link connections and circuit
of a full resync. Without this capability, the move would id’s to assist with outage problem determination
have been postponed. Within a few hours the two links
supporting boxes #4 and #5 were restored. But, the 10. Develop a suspend/resume strategy which will
single T3 circuit supporting box #1 remained down and support the testing process without allowing the
invalid track counts were mounting to above 500,000. new site copy to become too outdated and require
At 60 tracks per second we predicted almost 2.5 hours an extended amount of time to be resynchronized.
to get back into sync, once the link came back. By 7:00 The strategy will depend almost entirely on the
pm the link was restored and synchronization was circuit distance and type (i.e. T3, T1, etc.).
established. The production side was taken down at
midnight and we were IPLing the new site by 1:00 am 11. Be extremely careful when applying changes to
the following. Lesson learned: have a plan for when DASD at the new site such as network parameters
circuits go down!!! Know your circuit ID’s and your etc. because they could be lost when SRDF is
problem escalation procedure. Configure the resumed.
communication links with availability in mind as well as
throughput.

RECOMMENDATIONS
SRDF is one strategy that can be used to minimize the
outage required for a data center relocation and
provide for a smooth cut over by supporting the testing
process. Summarizing the major issues:

1. Configure local and remote DASD boxes


identically when possible to minimize complexity.
Try to copy ALL volumes using remote copy to
ensure all the required data and DASD volumes
are present in the new site.

2. Analyze the front end DASD channel path busy


WITH the remote adapter (RA) hardware
configured to minimize the likelihood of excessive
channel busy prior to the move.

3. Use large DASD boxes to allow for consolidation


and sharing of communication link resources.

4. Use the remote adapter (RA) to T3 versus circuit


distances shown in Figure 2 and Table 2 to
maximize communication link utilization.

Find a CMG regional meeting near you at www.cmg.org/regions

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy