IBM FlashSystem 5045 Redbook
IBM FlashSystem 5045 Redbook
Redbooks
Draft Document for Review September 20, 2023 12:37 pm 8543edno.fm
IBM Redbooks
August 2023
SG24-8543-00
8543edno.fm Draft Document for Review September 20, 2023 12:37 pm
Note: Before using this information and the product it supports, read the information in “Notices” on
page xxix.
Contents
Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxvii
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxix
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxx
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxi
Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv
Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxv
iv Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543TOC.fm
Contents v
8543TOC.fm Draft Document for Review September 26, 2023 12:27 pm
vi Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543TOC.fm
Contents vii
8543TOC.fm Draft Document for Review September 26, 2023 12:27 pm
viii Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543TOC.fm
Contents ix
8543TOC.fm Draft Document for Review September 26, 2023 12:27 pm
x Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543TOC.fm
Contents xi
8543TOC.fm Draft Document for Review September 26, 2023 12:27 pm
xii Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543TOC.fm
Contents xiii
8543TOC.fm Draft Document for Review September 26, 2023 12:27 pm
xiv Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543LOF.fm
Figures
82
1-49 Front view of an IBM FlashSystem 5045 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
1-50 Rear view of an IBM FlashSystem 5045 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
1-51 View of available connectors and LEDs on an IBM FlashSystem 5045 single canister .
85
1-52 Easy Tier concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
1-53 IBM Storage Virtualize GUI dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
1-54 IBM Storage Insights dashboard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
1-55 IBM FlashCore Module (NVMe) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
1-56 Storage technologies versus latency for Intel drives. . . . . . . . . . . . . . . . . . . . . . . . . 105
1-57 “Star” and “Cascade” modes in a three-site solution. . . . . . . . . . . . . . . . . . . . . . . . . 113
2-1 ISL data flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
2-2 Single-switch SAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2-3 Core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2-4 Edge-core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
2-5 Full mesh topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2-6 IBM Storage Virtualize as a SAN bridge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
2-7 Storage and hosts attached to the same SAN switch . . . . . . . . . . . . . . . . . . . . . . . . . 131
2-8 Edge-core-edge segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2-9 SAN Volume Controller 2145-SV1 rear port view . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2-10 SV2/SA2 node layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
2-11 SV3 node layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
2-12 Port location in the IBM FlashSystem 9200 rear view. . . . . . . . . . . . . . . . . . . . . . . . 136
2-13 Port location in IBM FlashSystem 9500 rear view. . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2-14 IBM FlashSystem 7200 rear view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
2-15 IBM FlashSystem 7300 rear view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
2-16 IBM FlashSystem 5100 rear view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
2-17 IBM FlashSystem 5200 rear view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2-18 IBM FlashSystem 5015 rear view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2-19 IBM FlashSystem 5035 rear view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
2-20 IBM Storage Virtualize NPIV Port WWPN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
2-21 IBM Storage Virtualize NPIV Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
2-22 IBM Storage Virtualize output of the lstargetportfc command. . . . . . . . . . . . . . . . . . 142
2-23 SAN Volume Controller model 2145-SV1 port distribution . . . . . . . . . . . . . . . . . . . . 142
2-24 SAN Volume Controller model SV3 port distribution. . . . . . . . . . . . . . . . . . . . . . . . . 143
2-25 IBM FlashSystem 9200 port distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
2-26 Port masking configuration on SVC or IBM FlashSystem with 16 ports . . . . . . . . . . 144
2-27 Port masking configuration on IBM FlashSystem or SVC with 24 ports . . . . . . . . . . 145
2-28 IBM Storage Virtualize Portsets overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
2-29 Listing the available ports and portsets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
2-30 How to assign ports pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
2-31 Output of the IBM Storage Virtualize lstargetportfc command . . . . . . . . . . . . . . . . . 153
2-32 Back-end storage zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
2-33 V5000 zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
2-34 Dual core zoning schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
2-35 ISL traffic overloading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
2-36 XIV port cabling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
2-37 IBM FlashSystem A9000 connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
2-38 IBM FlashSystem A9000 grid configuration cabling . . . . . . . . . . . . . . . . . . . . . . . . . 163
2-39 Connecting IBM FlashSystem A9000 fully configured as a back-end controller. . . . 164
2-40 V7000 connected as a back-end controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
2-41 IBM FlashSystem 9100 as a back-end controller . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
2-42 IBM FlashSystem 900 connectivity to a SAN Volume Controller cluster . . . . . . . . . 167
xvi Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543LOF.fm
Figures xvii
8543LOF.fm Draft Document for Review September 26, 2023 12:27 pm
xviii Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543LOF.fm
6-39 Clustered or multinode systems with a single inter-site link with only one link . . . . . 467
6-40 Dual links with two replication portsets on each system configured . . . . . . . . . . . . . 469
6-41 Clustered/multinode systems with dual inter-site links between the two systems . . 470
6-42 Multiple IP partnerships with two links and only one I/O group. . . . . . . . . . . . . . . . . 472
6-43 Multiple IP partnerships with two links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474
6-44 1-Gbps port throughput trend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
6-45 Volume mirroring overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 477
6-46 Attributes of a volume and volume mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481
6-47 IOgrp feature example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485
6-48 Possible pool and policy relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
6-49 Region mapping with bitmap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495
7-1 Typical concept scheme HyperSwap configuration with IBM Storage Virtualize . . . . 506
7-2 IBM Storage Virtualize HyperSwap in a storage failure scenario . . . . . . . . . . . . . . . . 507
7-3 IBM FlashSystem HyperSwap in a site failure scenario . . . . . . . . . . . . . . . . . . . . . . . 508
7-4 Initializing the first node of a HyperSwap system . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
7-5 IP quorum network layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513
7-6 HyperSwap volume UID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516
8-1 SCSI ID assignment on volume mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536
8-2 Dual Port 100 GbE adapter placement on IBM FlashSystem Storage 7300 . . . . . . . 548
8-3 Dual Port 100 GbE adapter placement on IBM Storage FlashSystem 9500 . . . . . . . 549
8-4 Dual Port 100 GbE adapter placement on SAN Volume Controller node SV3 . . . . . . 549
9-1 Email users showing customizable notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 552
9-2 Call Home with cloud services configuration window . . . . . . . . . . . . . . . . . . . . . . . . . 553
9-3 SNMP configuration summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
9-4 SNMP server configuration window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554
9-5 Syslog servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
9-6 Pool threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
9-7 VDisk threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 556
9-8 Monitoring/Performance overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
9-9 Workload metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 558
9-10 Management GUI Dashboard view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 559
9-11 Authentication in REST API Explorer: Token displayed in the Response body . . . . 560
9-12 The lsnodestats command for node ID 1 (fc_mb) with JSON results in response body .
561
9-13 Easy Tier Data Movement window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562
9-14 Easy Tier Movement description window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563
9-15 Easy Tier Composition report window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
9-16 Easy Tier Composition Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 564
9-17 Workload skew: Single tier pool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
9-18 Workload skew: Multitier configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 566
9-19 IBM Spectrum Control Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
9-20 Key Performance Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 569
9-21 Write response Time by I/O Group > 5 ms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570
9-22 IBM Storage Insights registration window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574
9-23 IBM Storage Insights or IBM Storage Insights for IBM Spectrum Control registration
options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
9-24 Registration login window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575
9-25 Creating an IBM account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
9-26 IBMid account privacy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
9-27 IBM Storage Insights registration form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 577
9-28 IBM Storage Insights initial setup guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
9-29 IBM Storage Insights Deployment Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578
9-30 IBM Storage Insights info event after the si_tenant_id was added to Cloud Call Home .
Figures xix
8543LOF.fm Draft Document for Review September 26, 2023 12:27 pm
579
9-31 Storage Insights - Add Call Home with cloud service device . . . . . . . . . . . . . . . . . . 579
9-32 Select Operating System window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
9-33 Data collector license agreement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582
9-34 Downloading the data collector in preparation for its installation . . . . . . . . . . . . . . . 582
9-35 Data collector installation on a Linux host. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
9-36 Adding storage systems to IBM Storage Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . 583
9-37 Operations Dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
9-38 NOC dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 585
9-39 Block Storage Systems table view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586
9-40 Advisor Insights window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
9-41 Understanding capacity information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588
9-42 Capacity terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
9-43 Usable capacity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
9-44 Capacity Savings window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 591
9-45 Sidebar > Pools > Properties > Properties for Pool . . . . . . . . . . . . . . . . . . . . . . . . . 592
9-46 Sidebar > Pools > MDisks by Pools > Properties > More details . . . . . . . . . . . . . . . 593
9-47 Easy Tier Overallocation Limit GUI support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594
9-48 IBM Spectrum Control overview page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
9-49 IBM Storage Insights overview page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
9-50 Block Storage Systems overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 596
9-51 Capacity overview of Storage System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
9-52 Used Capacity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597
9-53 Example of Adjusted Used Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598
9-54 Capacity limit example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
9-55 Capacity-to-Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599
9-56 Zero Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603
9-57 IBM Spectrum Control Alert policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
9-58 IBM Storage Insights Alert policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608
9-59 All alert policies in IBM Spectrum Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
9-60 Copying a policy in IBM Spectrum Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609
9-61 Copy Policy window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
9-62 New policy with inherited alert definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
9-63 Choosing the required alert definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
9-64 Alert parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 611
9-65 Setting up the Warning level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612
9-66 Setting up the informational threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612
9-67 IBM Spectrum Control notification settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612
9-68 System Health state of management GUI Dashboard . . . . . . . . . . . . . . . . . . . . . . . 613
9-69 Expanded Hardware Components view for a SAN Volume Controller Cluster . . . . . 614
9-70 Expanded Hardware Components view for IBM FlashSystem 9100 . . . . . . . . . . . . 614
9-71 Prioritizing tiles that need attention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614
9-72 Dashboard entry point drills down to the event log . . . . . . . . . . . . . . . . . . . . . . . . . . 615
9-73 Events by Priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615
9-74 IBM Spectrum Control Dashboard summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 616
9-75 IBM Spectrum Control Block Storage Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
9-76 Detailed Block Storage System view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
9-77 Offline volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
9-78 Marking the status as acknowledged . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
9-79 Error status cleared. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
9-80 IBM Storage Insights dashboard showing a volume error . . . . . . . . . . . . . . . . . . . . 619
9-81 Actions available from the Volume tile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
9-82 IBM Spectrum Control: Export Performance Data . . . . . . . . . . . . . . . . . . . . . . . . . . 630
xx Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543LOF.fm
9-83 IBM Spectrum Control: Export Performance Data - Advanced Export . . . . . . . . . . . 630
9-84 IBM Spectrum Control: Package files example. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
9-85 Selecting Block Storage Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 631
9-86 Selecting Export Performance Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 632
9-87 CSM sessions preparing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633
9-88 CSM sessions that are prepared and 100% synced . . . . . . . . . . . . . . . . . . . . . . . . . 633
9-89 CSM automatic restart is disabled by default . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634
9-90 Secondary consistency warning when automatic restart is enabled. . . . . . . . . . . . . 635
10-1 Example for restricted view for ownership groups . . . . . . . . . . . . . . . . . . . . . . . . . . 642
10-2 Current software version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
10-3 Up-to-date software version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
10-4 Fix Central download . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
10-5 Upload package manually. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648
10-6 Unhide GUI buttons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649
10-7 Transfer update package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
10-8 Initiating transfer of the update package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 650
10-9 IBM Storage Virtualize Upgrade Test Utility by using the GUI . . . . . . . . . . . . . . . . . 653
10-10 Example result of the IBM Storage Virtualize Upgrade Test Utility . . . . . . . . . . . . . 654
10-11 Installing patch using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
10-12 list installed patches in GUI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660
10-13 Drive firmware upgrade. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
10-14 Drive update test result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 662
10-15 IBM FlashSystem RCL Schedule Service page . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
10-16 RCL Product type page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664
10-17 Timeframe selection page. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665
10-18 RCL Time selection page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 665
10-19 RCL booking information page . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666
10-20 IBM Storage Virtualize performance statistics (IOPS) . . . . . . . . . . . . . . . . . . . . . . 676
10-21 Distribution of controller resources before and after I/O throttling. . . . . . . . . . . . . . 684
10-22 Creating a volume throttle in the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
10-23 Creating a host throttle in the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 686
10-24 Creating a host cluster throttle in the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
10-25 Creating a storage pool throttle in the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 687
10-26 Creating a system offload throttle in the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 688
10-27 Poorly formatted SAN diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
10-28 Brocade SAN Health Options window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
10-29 Creating a subscription to IBM Storage Virtualize notification . . . . . . . . . . . . . . . . 699
11-1 Events icon in the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 703
11-2 GUI Dashboard displaying system health events and hardware components . . . . . 704
11-3 System Health expanded section in the dashboard . . . . . . . . . . . . . . . . . . . . . . . . . 705
11-4 Recommended actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705
11-5 Monitoring → Events window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
11-6 Properties and Sense Data for an event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 708
11-7 Upload Support Package window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 712
11-8 Upload Support Package details. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713
11-9 PBR replication status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
11-10 RPO statuses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 741
11-11 Remote Support options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746
11-12 Call Home Connect Cloud . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 748
11-13 Asset summary dashboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749
11-14 List of configured assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750
11-15 Call Home Connect Cloud details window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 751
11-16 IBM Storage Insights versus IBM Storage Insights Pro . . . . . . . . . . . . . . . . . . . . . 753
Figures xxi
8543LOF.fm Draft Document for Review September 26, 2023 12:27 pm
xxii Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543LOT.fm
Tables
xxiv Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543LOT.fm
Tables xxv
8543LOT.fm Draft Document for Review September 26, 2023 12:27 pm
xxvi Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm 8543LOE.fm
Examples
9-5 Latency reported in milliseconds (ms) with microsecond (µs) granularity. . . . . . . . . . 559
9-6 REST API clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560
9-7 Command to add the SI tenant ID through the CLI. . . . . . . . . . . . . . . . . . . . . . . . . . . 579
9-8 CMMVC9305E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 579
9-9 Getting the value filtered from lssystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
9-10 Physical_capacity and physical_free_capacity from lssystem command . . . . . . . . . 590
9-11 Physical_capacity from lssystem command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 590
9-12 Deduplication and compression savings and used capacity. . . . . . . . . . . . . . . . . . . 591
9-13 CLI output example for lsportstats command to show the TX & RX power . . . . . . . 616
9-14 CLI example to change the interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629
10-1 The lssystem command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646
10-2 Copying the upgrade test utility to IBM Storage Virtualize . . . . . . . . . . . . . . . . . . . . 654
10-3 Ensure that files are uploaded . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
10-4 Upgrade test by using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655
10-5 Verify software version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 658
10-6 Installing a patch on all nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659
10-7 Verify patch installation using lsservicestatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 660
10-8 Removing patch on single nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
10-9 Removing all patches from a node in the system . . . . . . . . . . . . . . . . . . . . . . . . . . . 661
10-10 Listing the firmware level for drives 0, 1, 2, and 3. . . . . . . . . . . . . . . . . . . . . . . . . . 663
10-11 Output of the pcmpath query WWPN command . . . . . . . . . . . . . . . . . . . . . . . . . . . 667
10-12 Output of the lshost <hostname> command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
10-13 Cross-referencing information with SAN switches . . . . . . . . . . . . . . . . . . . . . . . . . 668
10-14 Results of running the lshostvdiskmap command. . . . . . . . . . . . . . . . . . . . . . . . . . 669
10-15 Mapping the host to I/O groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672
10-16 Creating a throttle by using the mkthrottle command in the CLI . . . . . . . . . . . . . . . 685
10-17 Sample svcconfig backup command output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694
10-18 Saving the configuration backup files to your workstation . . . . . . . . . . . . . . . . . . . 695
11-1 The svc_livedump command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 714
11-2 preplivedump and lslivedump commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
11-3 Output for the multipath -ll command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715
11-4 Output of “esxcli storage core path” list command . . . . . . . . . . . . . . . . . . . . . . . . . . 718
11-5 Output of esxcli storage core path list -d <naaID> . . . . . . . . . . . . . . . . . . . . . . . . . . 719
11-6 Output for esxcli storage nmp device list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 720
11-7 The triggerdrivedump command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 721
11-8 The lshost command. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
11-9 The lshost <host_id_or_name> command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 723
11-10 The lsfabric -host <host_id_or_name> command. . . . . . . . . . . . . . . . . . . . . . . . . . 724
11-11 Incorrect WWPN zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
11-12 Correct WWPN zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
11-13 lsportstats command output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 730
11-14 Issuing a lsmdisk command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 732
11-15 Output of the svcinfo lscontroller command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734
11-16 Determining the ID for the MDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736
xxviii Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:37 pm 8543spec.fm
Notices
This information was developed for products and services offered in the US. This material might be available
from IBM in other languages. However, you may be required to own a copy of the product or product version in
that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user’s responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not grant you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive, MD-NC119, Armonk, NY 10504-1785, US
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM websites are provided for convenience only and do not in any
manner serve as an endorsement of those websites. The materials at those websites are not part of the
materials for this IBM product and use of those websites is at your own risk.
IBM may use or distribute any of the information you provide in any way it believes appropriate without
incurring any obligation to you.
The performance data and client examples cited are presented for illustrative purposes only. Actual
performance results may vary depending on specific configurations and operating conditions.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
Statements regarding IBM’s future direction or intent are subject to change or withdrawal without notice, and
represent goals and objectives only.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to actual people or business enterprises is entirely
coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrate programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are
provided “AS IS”, without warranty of any kind. IBM shall not be liable for any damages arising out of your use
of the sample programs.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines
Corporation, registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at “Copyright
and trademark information” at https://www.ibm.com/legal/copytrade.shtml
The following terms are trademarks or registered trademarks of International Business Machines Corporation,
and might also be trademarks or registered trademarks in other countries.
AIX® IBM® Interconnect®
Db2® IBM Cloud® PowerHA®
DS8000® IBM FlashCore® Redbooks®
Easy Tier® IBM FlashSystem® Redbooks (logo) ®
FICON® IBM Research® Service Request Manager®
FlashCopy® IBM Security® Storwize®
Guardium® IBM Spectrum® Tivoli®
HyperSwap® IBM Z® XIV®
Intel, Intel Xeon, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks
of Intel Corporation or its subsidiaries in the United States and other countries.
The registered trademark Linux® is used pursuant to a sublicense from the Linux Foundation, the exclusive
licensee of Linus Torvalds, owner of the mark on a worldwide basis.
Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States,
other countries, or both.
Java, and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its
affiliates.
Ansible, OpenShift, Red Hat, are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in
the United States and other countries.
UNIX is a registered trademark of The Open Group in the United States and other countries.
VMware, VMware vCenter Server, VMware vSphere, and the VMware logo are registered trademarks or
trademarks of VMware, Inc. or its subsidiaries in the United States and/or other jurisdictions.
Other company, product, or service names may be trademarks or service marks of others.
xxx Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:37 pm 8543pref.fm
Preface
This IBM® Redbooks® publication captures several best practices and describes the
performance gains that can be achieved by implementing the IBM Storage FlashSystem and
IBM SAN Volume Controller (SVC) products running IBM Storage Virtualize 8.6. These
practices are based on field experience.
This book highlights configuration guidelines and best practices for the storage area network
(SAN) topology, clustered system, back-end storage, storage pools and managed disks
(MDisks), volumes, Remote Copy services, and hosts.
This book is intended for experienced storage, SAN, IBM Storage FlashSystem, and SVC
administrators and technicians. Understanding this book requires advanced knowledge of
these environments.
IBM Storage rebranding: In January 2023, IBM announced that it would be renaming its
Spectrum software-defined storage products to IBM Storage products. This change was
made to simplify the product portfolio and make it easier for customers to find the products
they need. For example, IBM Spectrum Virtualize was renamed to IBM Storage Virtualize.
You will likely find documentation under both the Spectrum and Storage names for some
time, as the transition to the new names will take place over a period of several months.
However, all future documentation will be under the IBM Storage name.
Authors
This book was produced by a team of specialists from around the world.
xxxii Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:37 pm 8543pref.fm
Preface xxxiii
8543pref.fm Draft Document for Review September 20, 2023 12:37 pm
Thanks to the authors of the previous edition of this book: Performance and Best Practices
Guide for IBM Spectrum Virtualize 8.5, SG24-8521, published on 09 August 2022:
Andy Haerchen, Ashutosh Pathak, Barry Whyte, Cassio Alexandre de Aguiar, Fabio
Trevizan de Oliveira, Luis Eduardo Silva Viera, Mahendra S Brahmadu, Mangesh M
Shirke, Nezih Boyacioglu, Stephen Solewin, Thales Ferreira, Uwe Schreiber and
Youssef Largou
xxxiv Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:37 pm 8543pref.fm
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us!
We want our books to be as helpful as possible. Send us your comments about this book or
other IBM Redbooks publications in one of the following ways:
Use the online Contact us review Redbooks form found at:
ibm.com/redbooks
Send your comments in an email to:
redbooks@us.ibm.com
Mail your comments to:
IBM Corporation, IBM Redbooks
Dept. HYTD Mail Station P099
2455 South Road
Poughkeepsie, NY 12601-5400
Preface xxxv
8543pref.fm Draft Document for Review September 20, 2023 12:37 pm
xxxvi Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Note: For more information, see this IBM FlashSystem Storage portfolio web page and the
IBM SAN Volume Controller web page.
With the introduction of the IBM Storage family, the software that runs on IBM SAN Volume
Controller and on IBM Storage FlashSystem (IBM FlashSystem) products is called IBM
Storage Virtualize. The name of the underlying hardware platform remains intact.
IBM FlashSystem storage systems and IBM SAN Volume Controllers are built with
award-winning IBM Storage Virtualize software that simplifies infrastructure and eliminates
the differences in management, function, and even hybrid multicloud support.
IBM Storage Virtualize is an offering that has been available for years for the IBM SAN
Volume Controller and IBM FlashSystem family of storage solutions. It provides an ideal way
to manage and protect huge volumes of data from mobile and social applications, enable
rapid and flexible cloud services deployments, and deliver the performance and scalability
that is needed to gain insights from the latest analytics technologies.
Note: This version of the IBM Redbooks deals with those systems that can run IBM
Storage Virtualize V8.6. As such there are products that are listed in the book that are no
longer sold by IBM (so, End of Marketing - EOM) but still can run the V8.6 software. Where
this is applicable, it will be mentioned in the text.
Table 1-1 shows the IBM Storage Virtualize V8.6 supported product list and whether the
product is still currently sold or EOM,
FS5000 (FS5015, 2072, 4680 2N2, 2N4, 3N2, 3N4 Current Product
FS5035)
2 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Note: These benefits are not a complete list of features and functions that are available
with IBM Storage Virtualize software.
Applications typically read and write data as vectors of bytes or records. However, storage
presents data as vectors of blocks of a constant size (512 or in the newer devices, 4096 bytes
per block).
The file, record, and namespace virtualization and file and record subsystem layers convert
records or files that are required by applications to vectors of blocks, which are the language
of the block virtualization layer. The block virtualization layer maps requests of the higher
layers to physical storage blocks, which are provided by storage devices in the block
subsystem.
Each of the layers in the storage domain abstracts away complexities of the lower layers and
hides them behind an easy to use, standard interface that is presented to upper layers. The
resultant decoupling of logical storage space representation and its characteristics that are
visible to servers (storage consumers) from underlying complexities and intricacies of storage
devices is a key concept of storage virtualization.
The focus of this publication is block-level virtualization at the block virtualization layer,
which is implemented by IBM as IBM Storage Virtualize software that is running on IBM SAN
Volume Controller and the IBM FlashSystem family. The IBM SAN Volume Controller is
implemented as a clustered appliance in the storage network layer. The IBM FlashSystems
are deployed as modular storage systems that can virtualize their internally and externally
attached storage.
IBM Storage Virtualize uses the Small Computer System Interface (SCSI) protocol to
communicate with its clients. It presents storage space as SCSI logical units (LUs), which are
identified by SCSI logical unit numbers (LUNs).
Note: Although LUs and LUNs are different entities, the term LUN in practice often is used
to refer to a logical disk; that is, an LU.
Most applications do not directly access storage; instead, they work with files or records using
a file system. However, the operating system of a host must convert these abstractions to the
language of storage; that is, vectors of storage blocks that are identified by logical block
addresses (LBAs) within an LU.
Inside IBM Storage Virtualize, each of the externally visible LUs is internally represented by a
volume, which is an amount of storage that is taken out of a storage pool. Storage pools are
made of managed disks (MDisks); that is, they are LUs that are presented to the storage
system by external virtualized storage or arrays that consist of internal disks. LUs that are
presented to IBM Storage Virtualize by external storage often correspond to RAID arrays that
are configured on that storage.
The hierarchy of objects, from a file system block down to a physical block on a physical drive,
is shown in Figure 1-1.
4 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
With storage virtualization, you can manage the mapping between logical blocks within an LU
that is presented to a host, and blocks on physical drives. This mapping can be as simple or
as complicated as required by a use case. A logical block can be mapped to one physical
block or for increased availability, multiple blocks that are physically stored on different
physical storage systems, and in different geographical locations.
IBM Storage Virtualize utilizes the concept of an extent, which is a group of physical blocks,
as management construct to allow great flexibility in the utilization of available storage.
Importantly, the mapping can be dynamic: With IBM Easy Tier®, IBM Storage Virtualize can
automatically change underlying storage to which an extent is mapped to better match a
host’s performance requirements with the capabilities of the underlying storage systems.
IBM Storage Virtualize gives a storage administrator a wide range of options to modify
volume characteristics: from volume resize to mirroring, creating a point-in-time (PiT) copy
with IBM FlashCopy®, and migrating data across physical storage systems.
Importantly, all the functions that are presented to the storage users are independent from the
characteristics of the physical devices that are used to store data. This decoupling of the
storage feature set from the underlying hardware and ability to present a single, uniform
interface to storage users that masks underlying system complexity is a powerful argument
for adopting storage virtualization with IBM Storage Virtualize.
Although the result is similar (the data block is written to two different arrays), the effort that is
required for per-host configuration is disproportionately larger than for a centralized solution
with organization-wide storage virtualization that is done on a dedicated system and
managed from a single GUI.
Note: IBM Real-time Compression (RtC) is only available for earlier generation
engines. The newer IBM SAN Volume Controller engines (SV3, SV2, and SA2) do not
support RtC; however, they support software compression through Data Reduction
Pools (DRP).
6 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Summary
Storage virtualization is a fundamental technology that enables the realization of flexible and
reliable storage solutions. It helps enterprises to better align IT architecture with business
requirements, simplify their storage administration, and facilitate their IT departments efforts
to meet business demands.
IBM Storage Virtualize running on IBM SAN Volume Controller and IBM FlashSystem family
is a mature, 12th-generation virtualization solution that uses open standards and complies
with the SNIA storage model. All products use in-band block virtualization engines that move
the control logic (including advanced storage functions) from a multitude of individual storage
devices to a centralized entity in the storage network.
IBM Storage Virtualize can improve the use of your storage resources, simplify storage
management, and improve the availability of business applications.
Figure 1-2 shows the feature set that is provided by the IBM Storage Virtualize systems.
The following major approaches are used today for the implementation of block-level
aggregation and virtualization:
Symmetric: In-band appliance
Virtualization splits the storage that is presented by the storage systems into smaller
chunks that are known as extents. These extents are then concatenated by using various
policies to make virtual disks or volumes. With symmetric virtualization, host systems can
be isolated from the physical storage. Advanced functions, such as data migration, can run
without reconfiguring the host.
With symmetric virtualization, the virtualization engine is the central configuration point for
the SAN. The virtualization engine directly controls access to the storage and data that is
written to the storage. As a result, locking functions that provide data integrity and
advanced functions (such as cache and Copy Services) can be run in the virtualization
engine. Therefore, the virtualization engine is a central point of control for device and
advanced function management.
Symmetric virtualization includes some disadvantages. The main disadvantage that is
associated with symmetric virtualization is scalability. Scalability can cause poor
performance because all input/output (I/O) must flow through the virtualization engine.
To solve this problem, you can use an n-way cluster of virtualization engines that includes
failover capability.
You can scale the extra processor power, cache memory, and adapter bandwidth to
achieve the level of performance that you want. More memory and processing power are
needed to run advanced services, such as Copy Services and caching. IBM SAN Volume
Controller uses symmetric virtualization. Single virtualization engines, which are known as
nodes, are combined to create clusters. Each cluster can contain 2 - 8 nodes.
Asymmetric: Out-of-band or controller-based
With asymmetric virtualization, the virtualization engine is outside the data path and
performs a metadata-style service. The metadata server contains all of the mapping and
the locking tables, and the storage devices contain only data. In asymmetric virtual
storage networks, the data flow is separated from the control flow.
A separate network or SAN link is used for control purposes. Because the control flow is
separated from the data flow, I/O operations can use the full bandwidth of the SAN. A
separate network or SAN link is used for control purposes.
Asymmetric virtualization can have the following disadvantages:
– Data is at risk to increased security exposures, and the control network must be
protected with a firewall.
– Metadata can become complicated when files are distributed across several devices.
– Each host that accesses the SAN must know how to access and interpret the
metadata. Therefore, specific device drivers or agent software must be running on
each of these hosts.
– The metadata server cannot run advanced functions, such as caching or Copy
Services, because it only “knows” about the metadata and not the data.
8 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
By using this approach, you replace a controller and implicitly replace your entire
virtualization solution. In addition to replacing the hardware, other actions (such as updating
or repurchasing the licenses for the virtualization feature, and advanced copy functions) might
be necessary.
Servers and applications remain online, data migration occurs transparently on the
virtualization platform, and licenses for virtualization and copy services require no update. No
other costs are incurred when disk subsystems are replaced.
Only the fabric-based appliance solution provides an independent and scalable virtualization
platform that can provide enterprise-class Copy Services that is open for future interfaces and
protocols. By using the fabric-based appliance solution, you can choose the disk subsystems
that best fit your requirements, and you are not locked into specific SAN hardware.
For these reasons, IBM chose the SAN-based appliance approach with inline block
aggregation for the implementation of storage virtualization with IBM Storage Virtualize.
On the SAN storage that is provided by the disk subsystems, IBM SAN Volume Controller
offers the following services:
– Creates a single pool of storage
– Provides LU virtualization
– Manages logical volumes
– Mirrors logical volumes
IBM SAN Volume Controller running IBM Storage Virtualize V8.6 also provides the following
functions:
Large scalable cache
Copy Services
IBM FlashCopy (PiT copy) function, including thin-provisioned FlashCopy to make multiple
targets affordable
IBM Transparent Cloud Tiering (TCT) function that enables IBM SAN Volume Controller to
interact with cloud service providers (CSPs)
Metro Mirror (MM), which is a synchronous copy
Global Mirror (GM), which is an asynchronous copy
Policy-based replication
Safeguarded Copy
Inline Data Corruption Detection
10 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Data migration
Storage space efficiency (thin-provisioning, compression, and deduplication)
IBM Easy Tier to automatically migrate data between storage types of different
performance that is based on disk workload
Encryption of external attached storage
Supports IBM HyperSwap®
Supports VMware vSphere Virtual Volumes (VVOLs) and Microsoft Offloaded Data
Transfer (ODX)
Direct attachment of hosts
Hot spare nodes with a standby function of single or multiple nodes
Containerization connectivity with Container Storage Interface (CSI), which enables
supported storage to be used as persistent storage in container environments
Hybrid Multicloud function with IBM Storage Virtualize for Public Cloud
Within this software release, IBM SAN Volume Controller also supports iSCSI networks. This
feature enables the hosts and storage systems to communicate with IBM SAN Volume
Controller to build a storage virtualization solution.
It is important that hosts cannot see or operate on the same volume (LUN) that is mapped to
the IBM SAN Volume Controller. Although a set of LUNs can be mapped to IBM SAN Volume
Controller, and a separate set of LUNs can be mapped directly to one or more hosts, care
must be taken to ensure that a separate set of LUNs is always used.
The zoning capabilities of the SAN switch must be used to create distinct zones to ensure that
this rule is enforced. SAN fabrics can include standard FC, FC-NVMe, FCoE, iSCSI over
Ethernet, or possible future types.
IBM Storage Virtualize 8.6.0 also supports the following NVMe protocols:
NVMe/FC
– Supported since 8.2.1 [4Q18]
– Supported with 16/32 Gb FC adapters
– Supports SLES/RH/ESX/Windows as initiators
– Will support 64 Gb FC adapters in a future release
NVMe/RoCE
– Supported since 8.5.0 [1Q22]
– Supports RoCE (Mellanox CX-4/CX-6) adapters with 25Gb/100Gb speeds on storage
(target) side
– Supports SLES/RH/ESX as host initiators OS, with RoCE 25/40/100Gb (Mellanox
CX-4/CX-5/CX-6), and Broadcom adapters
Figure 1-4 on page 12 shows a conceptual diagram of a storage system that uses IBM SAN
Volume Controller. It also shows several hosts that are connected to a SAN fabric or local
area network (LAN).
In practical implementations that have HA requirements (most of the target clients for IBM
SAN Volume Controller), the SAN fabric cloud represents a redundant SAN. A redundant SAN
consists of a fault-tolerant arrangement of two or more counterpart SANs, which provide
alternative paths for each SAN-attached device.
Figure 1-4 IBM SAN Volume Controller conceptual and topology overview
Both scenarios (the use of a single network and the use of two physically separate networks)
are supported for iSCSI-based and LAN-based access networks to IBM SAN Volume
Controller. Redundant paths to volumes can be provided in both scenarios.
12 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
For simplicity, Figure 1-4 shows only one SAN fabric and two zones: host and storage. In a
real environment, it is a best practice to use two redundant SAN fabrics. IBM SAN Volume
Controller can be connected to up to four fabrics.
A clustered system of IBM SAN Volume Controller nodes that are connected to the same
fabric presents logical disks or volumes to the hosts. These volumes are created from
managed LUNs or MDisks that are presented by the storage systems.
As explained in 1.4, “IBM SAN Volume Controller family” on page 24, hosts are not permitted
to operate on the RAID LUNs directly. All data transfer happens through the IBM SAN Volume
Controller nodes. This flow is referred to as symmetric virtualization.
For iSCSI-based access, the use of two networks and separating iSCSI traffic within the
networks by using a dedicated virtual local area network (VLAN) path for storage traffic
prevents any IP interface, switch, or target port failure from compromising the iSCSI
connectivity across servers and storage controllers.
The following major software and hardware changes are included in Version 8.6.0:
14 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
For further information about these features, see this section in the IBM Storage Virtualize
documentation pages What’s new in v8.6.0.
The following major software and hardware changes are included in Version 8.5.0:
Support for:
– The new IBM FlashSystem 9500, 7300, and IBM SAN Volume Controller SV3 systems,
including:
• 100 Gbps Ethernet adapter
• 48 NVMe drives per distributed RAID-6 array (IBM FlashSystem 9500 only)
• Secure boot drives
– Multifactor authentication and single sign-on
– NVMe host attachment over RDMA
– Fibre Channel port sets
– I/O stats in microseconds
– Domain names for IP replication
– IBM Storage Virtualize 3-Site Orchestrator version 4.0
– Support for increased number of hosts per I/O group
Improved distributed RAID array recommendations
Improved default time for updates
Updates to OpenStack support summary
IBM FlashSystems 9500, 7300 and IBM SAN Volume Controller SV3
The following new hardware platforms were released with the IBM Storage Virtualize V8.5.0.
For more information about these new hardware platforms, see 1.5, “IBM SAN Volume
Controller models” on page 44, and 1.6, “IBM FlashSystem family” on page 49:
IBM FlashSystems 9500
The IBM FlashSystems 9500 is a 4U control enclosure and contains up to 48
NVMe-attached IBM FlashCore® Modules or other self-encrypted NVMe-attached SSDs.
The NVMe-attached drives in the control enclosures provide significant performance
improvements compared to SAS-attached flash drives. The system supports 2U and 5U
all-Flash SAS attached expansion enclosure options.
IBM FlashSystems 7300
The FlashSystem 7300 is a 2U dual controller that contains up to 24 NVMe-attached IBM
FlashCore Modules or other self-encrypted NVMe-attached SSDs or Storage Class
Module drives.
IBM FlashSystem 7300 system also supports 2U and 5U SAS-attached expansion
enclosure options.
IBM SAN Volume Controller SV3
The IBM SAN Volume Controller SV3 system is a 2U single controller that combines
software and hardware into a comprehensive, modular appliance that provides symmetric
virtualization. The SV3 is run in pairs from an I/O group, which is the building block for any
IBM SAN Volume Controller based virtualization setup.
16 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
One of the key concepts of multifactor authentication is each factor comes from a different
category; that is, something the users knows, has, or is.
Single Sign-on (SSO) authentication requires users to register their credentials only once
when the user signs on to the application for the first time. The user information is stored at
the Identity Provider (IdP) that manages the user credentials and determines whether the
user is required to authenticate again.
trends in bandwidth and CPU usage. IBM Storage Virtualize V8.5.0 gives you the ability to
see the granularity down at microseconds intervals, which was not available on previous code
levels.
Safeguarded Copy
Safeguarded Copy is a virtual air gap mechanism that uses FlashCopy functions to take
immutable copies. This feature aids in the recovery from ransomware or internal “bad actors”
who seek to destroy data.
Note: The Safeguarded Copy function is available with IBM Storage Virtualize software
8.5.0, but is not supported for the FlashSystem 5000, Storwize V5030E, Storwize V7000
Gen2, and Storwize V7000 Gen2+ models.
18 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Safeguarded Copy is a feature or solution with which you can create point-in time copies
(“granularity”) of active production data that cannot be altered or deleted (so that is Immutable
or protected copies). It requires a user with the correct privilege access to modify the
Safeguarded Copy expiration settings (Separation of Duties). Lastly, Safeguarded Copy uses
copy management software for testing and ease of recovery of copies.
Focusing mainly on the feature set from a customer’s perspective, consider the following
three pillars, as shown in Figure 1-5 on page 20:
Separation of Duties
Protected Copies
Automation
Figure 1-5 shows the following IBM Storage Virtualize Safeguarded Copy Data Resilience
examples:
Separation of duties
Traditional backup and restore capabilities are normally storage administrator controlled
and do not protect against intentional (for example, rogue employee) or non-intentional
attacks.
Primary and backup are treated differently. Protecting current backups, or securing and
hardening current backup, does not solve the problem.
Protected copies of the data
These backups must be immutable; for example, hidden, non-addressable, cannot be
altered or deleted, only usable after recovery.
Copies must deliver a higher level of security while meeting industry and business
regulations.
Automation:
– Initially setting and managing of policies (number of copies, retention period)
– Automating, managing, and restoring of those copies.
20 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Volume mobility
Volume mobility is similar to nondisruptive volume move between I/O groups, only you are
migrating a volume between systems (for example, IBM SAN Volume Controller to FS9500). It
allows a user to nondisruptively move data between systems that do not natively cluster. This
feature is a major benefit for upgrading and changing Storage Virtualize systems.
Volume mobility uses enhancements to SCSI (ALUA) path states. The migration is based on
Remote Copy (Metro Mirror) functions.
The IBM STaaS offering is a pure OpEx solution and does not require initial capital.
With the IBM STaaS offering, the customer makes the following decisions:
Which tier level is needed.
The amount of storage capacity is needed.
For how long the customer wants to use this offering.
Connection type that is required.
Encryption option that is needed.
Figure 1-6 on page 22 shows the STaaS tiers available for IBM FlashSystems.
22 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Feature codes to support these models are related to the setup type, annual capacity growth,
and options for Ethernet, encryption, and decreasing capacity.
9601 models
The following 9601 models are available:
Balanced performance: BT1, BT2, BT3, BT4, and BT5
Premium performance: MT1, MT2, MT3, MT4, and MT5
Extreme performance: HT1, HT2, HT3, HT4, and HT5
Note: The numeric value in the model number is the duration of the STaaS contract in
years.
The IBM STaaS offering also includes the new Storage Expert Care Premium Level of
service, which features the resource of a dedicated Technical Account Manager (TAM),
enhanced response to severity 1 and 2 issues, and predictive support by way of IBM Storage
Insights.
For more information about the STaaS offering, see IBM Storage as a Service Offering Guide,
REDP-5644.
Figure 1-7 shows the complete IBM SAN Volume Controller family that supports the IBM
Storage Virtualize V8.6 software.
1.4.1 Components
IBM SAN Volume Controller provides block-level aggregation and volume management for
attached disk storage. In simpler terms, IBM SAN Volume Controller manages several
back-end storage controllers or locally attached disks.
IBM SAN Volume Controller maps the physical storage within those controllers or storage
systems into logical disk images, or volumes, that can be seen by application servers and
workstations in the SAN. It logically sits between hosts and storage systems. It presents itself
to hosts as the storage provider (target) and to storage systems as one large host (initiator).
The SAN is zoned such that the application servers cannot “see” the back-end storage or
controller. This configuration prevents any possible conflict between IBM SAN Volume
Controller and the application servers that are trying to manage the back-end storage.
The IBM SAN Volume Controller is based on the components that are described next.
24 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
1.4.2 Nodes
Each IBM SAN Volume Controller hardware unit is called a node. Each node is an individual
server in an IBM SAN Volume Controller clustered system on which the Storage Virtualize
software runs. The node provides the virtualization for a set of volumes, cache, and copy
services functions.
The IBM SAN Volume Controller nodes are deployed in pairs (io_group), and one or multiple
pairs constitute a clustered system or system. A system can consist of one pair and a
maximum of four pairs.
One of the nodes within the system is known as the configuration node. The configuration
node manages the configuration activity for the system. If this node fails, the system chooses
a new node to become the configuration node.
Because the active nodes are installed in pairs, each node provides a failover function to its
partner node if a node fails.
A specific volume is always presented to a host server by a single I/O group of the system.
The I/O group can be changed.
When a host server performs I/O to one of its volumes, all the I/Os for a specific volume are
directed to one specific I/O group in the system. Under normal conditions, the I/Os for that
specific volume are always processed by the same node within the I/O group. This node is
referred to as the preferred node for this specific volume.
Both nodes of an I/O group act as the preferred node for their own specific subset of the total
number of volumes that the I/O group presents to the host servers. However, both nodes also
act as failover nodes for their respective partner node within the I/O group. Therefore, a node
takes over the I/O workload from its partner node when required.
In an IBM SAN Volume Controller-based environment, the I/O handling for a volume can
switch between the two nodes of the I/O group. Therefore, it is a best practice that servers are
connected to two different fabrics through different FC host bus adapters (HBAs) to use
multipath drivers to give redundancy.
The IBM SAN Volume Controller I/O groups are connected to the SAN so that all application
servers that are accessing volumes from this I/O group can access this group. Up to 512 host
server objects can be per I/O group. The host server objects can access volumes that are
provided by this specific I/O group.
If required, host servers can be mapped to more than one I/O group within the IBM SAN
Volume Controller system. Therefore, they can access volumes from separate I/O groups.
You can move volumes between I/O groups to redistribute the load between the I/O groups.
Modifying the I/O group that services the volume can be done concurrently with I/O
operations if the host supports nondisruptive volume moves.
It also requires a rescan at the host level to ensure that the multipathing driver is notified that
the allocation of the preferred node changed, and the ports (by which the volume is
accessed) changed. This modification can be done in the situation where one pair of nodes
becomes overused.
1.4.4 System
The system or clustered system consists of 1 - 4 I/O groups. Specific configuration limitations
are then set for the entire system. For example, the maximum number of volumes that is
supported per system is 10,000, or the maximum capacity of MDisks that is supported is
~28 PiB (32 PB) per system.
All configuration, monitoring, and service tasks are performed at the system level.
Configuration settings are replicated to all nodes in the system. To facilitate these tasks, a
management IP address is set for the system.
A process is provided to back up the system configuration data on to storage so that it can be
restored if a disaster occurs. This method does not back up application data. Only the
IBM SAN Volume Controller system configuration information is backed up.
For remote data mirroring, two or more systems must form a partnership before relationships
between mirrored volumes are created.
For more information about the maximum configurations that apply to the system, I/O group,
and nodes, see this IBM Support web page.
1.4.5 MDisks
The SAN Volume Controller system and its I/O groups view the storage that is presented to
them by the back-end storage system as several disks or LUNs, which are known as MDisks.
Because IBM SAN Volume Controller does not attempt to provide recovery from physical disk
failures within the back-end storage system, an MDisk must be provisioned from a RAID
array.
These MDisks are placed into storage pools where they are divided into several extents. The
application servers do not “see” the MDisks at all. Rather, they see logical disks, which are
known as volumes. These disks are presented by the IBM SAN Volume Controller I/O groups
through the SAN or LAN to the servers.
For information about the system limits and restrictions, see this IBM Support web page.
When an MDisk is presented to the IBM SAN Volume Controller, it can be one of the following
statuses:
Unmanaged MDisk
An MDisk is reported as unmanaged when it is not a member of any storage pool. An
unmanaged MDisk is not associated with any volumes and has no metadata that is stored
on it.
IBM SAN Volume Controller does not write to an MDisk that is in unmanaged mode except
when it attempts to change the mode of the MDisk to one of the other modes. IBM SAN
Volume Controller can see the resource, but the resource is not assigned to a storage
pool.
Managed MDisk
Managed mode MDisks are always members of a storage pool, and they contribute
extents to the storage pool. Volumes (if not operated in image mode) are created from
these extents. MDisks that are operating in managed mode might have metadata extents
that are allocated from them and can be used as quorum disks. This mode is the most
common and normal mode for an MDisk.
26 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Tier
It is likely that the MDisks (LUNs) that are presented to the IBM SAN Volume Controller
system have different characteristics because of the disk or technology type on which they
are placed. The following tier options are available:
tier0_flash
tier1_flash
tier_enterprise
tier_nearline
tier_scm
The default value for a newly discovered unmanaged MDisk is enterprise. You can change
this value by running the chmdisk command.
The tier of external MDisks is not detected automatically and is set to enterprise. If the
external MDisk is made up of flash drives or nearline Serial Attached SCSI (SAS) drives and
you want to use IBM Easy Tier, you must specify the tier when adding the MDisk to the
storage pool or run the chmdisk command to modify the tier attribute.
1.4.6 Cache
The primary benefit of storage cache is to improve I/O response time. Reads and writes to a
magnetic disk drive experience seek time and latency time at the drive level, which can result
in 1 ms - 10 ms of response time (for an enterprise-class disk).
The IBM SAN Volume Controller Model SV3 features 512 GB of memory with options for
1536 GB of memory in a 2U 19-inch rack mount enclosure.
The IBM SAN Volume Controller provides a flexible cache model as described next.
Cache is allocated in 4 kibibyte (KiB) segments. A segment holds part of one track. A track is
the unit of locking and destaging granularity in the cache. The cache virtual track size is
32 KiB (eight segments).
A track might be only partially populated with valid pages. The IBM SAN Volume Controller
combines writes up to a 256 KiB track size if the writes are in the same tracks before
destaging. For example, if 4 KiB is written into a track, another 4 KiB is written to another
location in the same track.
Therefore, the blocks that are written from the IBM SAN Volume Controller to the disk
subsystem can be any size of 512 bytes - 256 KiB. The large cache and advanced cache
management algorithms enable it to improve the performance of many types of underlying
disk technologies.
The IBM SAN Volume Controller capability to manage in the background the destaging
operations that are incurred by writes (in addition to still supporting full data integrity) assists
with the IBM SAN Volume Controller capability in achieving good performance.
The cache is separated into two layers: upper cache and lower cache. Figure 1-8 shows the
separation of the upper and lower cache.
28 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
The upper cache delivers the following functions, which enable the IBM SAN Volume
Controller to streamline data write performance:
Fast write response times to the host by being as high up in the I/O stack as possible
Partitioning
Combined, the two levels of cache also deliver the following functions:
Pins data when the LUN goes offline.
Provides:
– Enhanced statistics for IBM Storage Control and IBM Storage Insights
– Trace for debugging
Reports medium errors.
Depending on the size, age, and technology level of the disk storage system, the total
available cache in the IBM SAN Volume Controller nodes can be larger, smaller, or about the
same as the cache that is associated with the disk storage.
Because hits to the cache can occur in the IBM SAN Volume Controller or the back-end
storage system level of the overall system, the system as a whole can take advantage of the
larger amount of cache wherever the cache is available.
In addition, regardless of their relative capacities, both levels of cache tend to play an
important role in enabling sequentially organized data to flow smoothly through the system.
The IBM SAN Volume Controller cannot increase the throughput potential of the underlying
disks in all cases because this increase depends on the underlying storage technology and
the degree to which the workload exhibits hotspots or sensitivity to cache size or cache
algorithms.
However, the write cache is still assigned to a maximum of 12 GB and compression cache to
a maximum of 34 GB. The remaining installed cache is used as read cache (including
allocation for features, such as IBM FlashCopy, GM, or MM). Data reduction pools share
memory with the main I/O process.
The nodes are split into groups where the remaining nodes in each group can communicate
with each other, but not with the other group of nodes that were formerly part of the system. In
this situation, some nodes must stop operating and processing I/O requests from hosts to
preserve data integrity while maintaining data access. If a group contains less than half the
nodes that were active in the system, the nodes in that group stop operating and processing
I/O requests from hosts.
30 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
It is possible for a system to split into two groups with each group containing half the original
number of nodes in the system. A quorum disk determines which group of nodes stops
operating and processing I/O requests. In this tiebreaker situation, the first group of nodes
that accesses the quorum disk is marked as the owner of the quorum disk. As a result, all of
the nodes that belong to the owner group continue to operate as the system and handle all
I/O requests.
If the other group of nodes cannot access the quorum disk or discover that the quorum disk is
owned by another group of nodes, it stops operating as the system and does not handle I/O
requests. A system can have only one active quorum disk that is used for a tiebreaker
situation. However, the system uses three quorum disks to record a backup of the system
configuration data that is used if a disaster occurs. The system automatically selects one
active quorum disk from these three disks.
The other quorum disk candidates provide redundancy if the active quorum disk fails before a
system is partitioned. To avoid the possibility of losing all of the quorum disk candidates with a
single failure, assign quorum disk candidates on multiple storage systems.
If possible, the IBM SAN Volume Controller places the quorum candidates on separate
storage systems. However, after the quorum disk is selected, no attempt is made to ensure
that the other quorum candidates are presented through separate storage systems.
Quorum disk placement verification and adjustment to separate storage systems (if
possible) reduce the dependency from a single storage system, and can increase the
quorum disk availability.
You can list the quorum disk candidates and the active quorum disk in a system by running
the lsquorum command.
When the set of quorum disk candidates is chosen, it is fixed. However, a new quorum disk
candidate can be chosen in one of the following conditions:
When the administrator requests that a specific MDisk becomes a quorum disk by running
the chquorum command.
When an MDisk that is a quorum disk is deleted from a storage pool.
When an MDisk that is a quorum disk changes to image mode.
For DR purposes, a system must be regarded as a single entity so that the system and the
quorum disk can be collocated.
Special considerations are required for the placement of the active quorum disk for a
stretched, split cluster or split I/O group configurations. For more information, see this IBM
Documentation web page.
Important: Running an IBM SAN Volume Controller system without a quorum disk can
seriously affect your operation. A lack of available quorum disks for storing metadata
prevents any migration operation.
Mirrored volumes can be taken offline if no quorum disk is available. This behavior occurs
because the synchronization status for mirrored volumes is recorded on the quorum disk.
During the normal operation of the system, the nodes communicate with each other. If a node
is idle for a few seconds, a heartbeat signal is sent to ensure connectivity with the system. If a
node fails for any reason, the workload that is intended for the node is taken over by another
node until the failed node is restarted and readmitted into the system (which happens
automatically).
If the Licensed Internal Code on a node becomes corrupted, which results in a failure, the
workload is transferred to another node. The code on the failed node is repaired, and the
node is readmitted into the system (which is an automatic process).
IP quorum configuration
In a stretched configuration or IBM HyperSwap configuration, you must use a third,
independent site to house quorum devices. To use a quorum disk as the quorum device, this
third site must use FC or IP connectivity together with an external storage system. In a local
environment, no extra hardware or networking, such as FC or SAS-attached storage, is
required beyond what is normally always provisioned within a system.
To use an IP-based quorum application as the quorum device for the third site, no FC
connectivity is used. Java applications are run on hosts at the third site. However, strict
requirements on the IP network and some disadvantages with the use of IP quorum
applications exist.
Unlike quorum disks, all IP quorum applications must be reconfigured and redeployed to
hosts when certain aspects of the system configuration change. These aspects include
adding or removing a node from the system, or when node service IP addresses are
changed.
For stable quorum resolutions, an IP network must provide the following requirements:
Connectivity from the hosts to the service IP addresses of all nodes. If IP quorum is
configured incorrectly, the network must also deal with the possible security implications of
exposing the service IP addresses because this connectivity also can be used to access
the service GUI.
Port 1260 is used by IP quorum applications to communicate from the hosts to all nodes.
The maximum round-trip delay must not exceed 80 ms, which means 40 ms each
direction.
A minimum bandwidth of 2 MBps for node-to-quorum traffic.
Even with IP quorum applications at the third site, quorum disks at site one and site two are
required because they are used to store metadata. To provide quorum resolution, run the
mkquorumapp command or use the GUI in Settings → Systems → IP Quorum to generate a
Java application that is then copied to and run on a host at a third site. A maximum of five
applications can be deployed.
32 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
At any point, an MDisk can be a member in one storage pool only, except for image mode
volumes.
Figure 1-9 shows the relationships of the IBM SAN Volume Controller entities to each other.
Figure 1-9 Overview of an IBM SAN Volume Controller clustered system with an I/O group
Each MDisk capacity in the storage pool is divided into several extents. The size of the extent
is selected by the administrator when the storage pool is created and cannot be changed
later. The size of the extent is 16 MiB - 8192 MiB.
It is a best practice to use the same extent size for all storage pools in a system. This
approach is a prerequisite for supporting volume migration between two storage pools. If the
storage pool extent sizes are not the same, you must use volume mirroring to copy volumes
between pools.
The IBM SAN Volume Controller limits the number of extents in a system to 222 =~4 million.
Because the number of addressable extents is limited, the total capacity of an IBM SAN
Volume Controller system depends on the extent size that is chosen by the IBM SAN Volume
Controller administrator.
1.4.9 Volumes
Volumes are logical disks that are presented to the host or application servers by the IBM
SAN Volume Controller.
34 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Sequential
A sequential volume is where the extents are allocated sequentially from one MDisk to the
next MDisk (see Figure 1-11).
Image mode
Image mode volumes (see Figure 1-12) are special volumes that have a direct relationship
with one MDisk. The most common use case of image volumes is a data migration from
your old (typically nonvirtualized) storage to the IBM SAN Volume Controller based
virtualized infrastructure.
When the image mode volume is created, a direct mapping is made between extents that are
on the MDisk and the extents that are on the volume. The LBA x on the MDisk is the same as
the LBA x on the volume, which ensures that the data on the MDisk is preserved as it is
brought into the clustered system.
Because some virtualization functions are not available for image mode volumes, it is useful
to migrate the volume into a new storage pool. After the migration completion, the MDisk
becomes a managed MDisk.
If you add an MDisk that contains any historical data to a storage pool, all data on the MDisk
is lost. Ensure that you create image mode volumes from MDisks that contain data before
adding MDisks to the storage pools.
Easy Tier monitors the host I/O activity and latency on the extents of all volumes with the
Easy Tier function that is turned on in a multitier storage pool over a 24-hour period. Then, it
creates an extent migration plan that is based on this activity, and then, dynamically moves
high-activity or hot extents to a higher disk tier within the storage pool. It also moves extents
whose activity dropped off or cooled down from the high-tier MDisks back to a lower-tiered
MDisk.
Easy Tier supports the new SCM drives with a new tier that is called tier_scm.
Turning on or off Easy Tier: The Easy Tier function can be turned on or off at the storage
pool level and the volume level.
The automatic load-balancing function is enabled by default on each volume and cannot be
turned off by using the GUI. This load-balancing feature is not considered an Easy Tier
function, although it uses the same principles.
The management GUI supports monitoring Easy Tier data movement in graphical reports.
The data in these reports helps you understand how Easy Tier manages data between the
different tiers of storage, how tiers within pools are used, and the workloads among the
different tiers. Charts for data movement, tier composition, and workload skew comparison
can be downloaded as comma-separated value (CSV) files.
You can also offload the statistics file from the IBM SAN Volume Controller nodes and by
using the IBM Storage Tier Advisor Tool (STAT) to create a summary report. The STAT can be
downloaded for no initial cost from this web page.
36 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
1.4.11 Hosts
A host is a logical object that represents a list of worldwide port names (WWPNs), NVMe
qualified names (NQNs), or iSCSI or iSER names that identify the interfaces that the host
system uses to communicate with the IBM SAN Volume Controller. Fibre Channel
connections use WWPNs to identify host interfaces to the system. iSCSI or iSER names can
be iSCSI qualified names (IQNs) or extended unique identifiers (EUIs). NQNs are used to
identify hosts that use FC-NVMe connections.
Node failover can be handled without having a multipath driver that is installed on the iSCSI
server. An iSCSI-attached server can reconnect after a node failover to the original target IP
address, which is now presented by the partner node. To protect the server against link
failures in the network or HBA failures, a multipath driver must be used.
N_Port ID Virtualization (NPIV) is a method for virtualizing a physical Fibre Channel port that
is used for host I/O. When NPIV is enabled, the partner node takes over the WWPN of the
failing node. This takeover allows for rapid recovery of in-flight I/O when a node fails. In
addition, path failures that occur because an offline node is masked from host multipathing.
Host cluster
A host cluster is a group of logical host objects that can be managed together. For example,
you can create a volume mapping that is shared by every host in the host cluster. Host
objects that represent hosts can be grouped in a host cluster and share access to volumes.
New volumes can also be mapped to a host cluster, which simultaneously maps that volume
to all hosts that are defined in the host cluster.
1.4.12 Array
An array is an ordered configuration, or group, of physical devices (drives) that is used to
define logical volumes or devices. An array is a type of MDisk that is made up of disk drives
(these drives are members of the array). A Redundant Array of Independent Disks (RAID) is a
method of configuring member drives to create high availability (HA) and high-performance
systems. The system supports nondistributed and distributed array configurations.
In nondistributed arrays, entire drives are defined as “hot-spare” drives. Hot-spare drives are
idle and do not process I/O for the system until a drive failure occurs. When a member drive
fails, the system automatically replaces the failed drive with a hot-spare drive. The system
then resynchronizes the array to restore its redundancy.
However, all member drives within a distributed array have a rebuild area that is reserved for
drive failures. All the drives in an array can process I/O data and provide faster rebuild times
when a drive fails. The RAID level provides different degrees of redundancy and performance;
it also determines the number of members in the array.
1.4.13 Encryption
The IBM SAN Volume Controller provides optional encryption of data at rest, which protects
against the potential exposure of sensitive user data and user metadata that is stored on
discarded, lost, or stolen storage devices. Encryption of system data and system metadata is
not required; therefore, system data and metadata are not encrypted.
Planning for encryption involves purchasing a licensed function and then activating and
enabling the function on the system.
To encrypt data that is stored on drives, the nodes that are capable of encryption must be
licensed and configured to use encryption. When encryption is activated and enabled on the
system, valid encryption keys must be present on the system when the system unlocks the
drives or the user generates a new key.
They can also be stored on USB flash drives that are attached to a minimum of one of the
nodes. Since Version 8.1, IBM Storage Virtualize provides a combination of external and USB
key repositories.
IBM Security Guardium Key Lifecycle Manager is an IBM solution that provides the
infrastructure and processes to locally create, distribute, backup, and manage the lifecycle of
encryption keys and certificates. Before activating and enabling encryption, you must
determine the method of accessing key information during times when the system requires an
encryption key to be present.
When Security Key Lifecycle Manager is used as a key manager for the IBM SAN Volume
Controller encryption, you can encounter a deadlock situation if the key servers are running
on encrypted storage that is provided by the IBM SAN Volume Controller. To avoid a deadlock
situation, ensure that the IBM SAN Volume Controller can communicate with an encryption
server to get the unlock key after a power-on or restart scenario. Up to four Security Key
Lifecycle Manager servers are supported.
Although both Thales CipherTrust Manager and Gemalto KeySecure key servers support the
same type of configurations, you need to ensure that you complete the prerequisites on these
key servers before you can enable encryption on the system
Data encryption is protected by the Advanced Encryption Standard (AES) algorithm that uses
a 256-bit symmetric encryption key in XTS mode, as defined in the Institute of Electrical and
Electronics Engineers (IEEE) 1619-2007 standard as XTS-AES-256.1 That data encryption
key is protected by a 256-bit AES key wrap when it is stored in nonvolatile form.
Another data security enhancement, which is delivered with the Storage Virtualize 8.4.2 code
and above, is the new Safeguarded Copy function that can provide protected read-only air
gap copies of volumes. This enhancement gives the customer effective data protection
against cyber attacks.
For more information, see IBM FlashSystem Safeguarded Copy Implementation Guide,
REDP-5654.
1 https://ieeexplore.ieee.org/document/4493450
38 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
The iSCSI function is a software function that is provided by the IBM Storage Virtualize
software, not hardware. In Version 7.7, IBM introduced software capabilities to enable the
underlying virtualized storage to attach to IBM SAN Volume Controller by using the iSCSI
protocol.
The iSCSI protocol enables the transportation of SCSI commands and data over an IP
network (TCP/IP), which is based on IP routers and Ethernet switches. iSCSI is a block-level
protocol that encapsulates SCSI commands. Therefore, it uses an IP network rather than FC
infrastructure.
The major functions of iSCSI include encapsulation and the reliable delivery of CDB
transactions between initiators and targets through the IP network, especially over a
potentially unreliable IP network.
Every iSCSI node in the network must have the following iSCSI components:
An iSCSI name is a location-independent, permanent identifier for an iSCSI node. An
iSCSI node has one iSCSI name, which stays constant for the life of the node. The terms
initiator name and target name also refer to an iSCSI name.
An iSCSI address specifies the iSCSI name of an iSCSI node and a location of that node.
The address consists of a hostname or IP address, a TCP port number (for the target),
and the iSCSI name of the node. An iSCSI node can have any number of addresses,
which can change at any time, particularly if they are assigned by way of Dynamic Host
Configuration Protocol (DHCP). An IBM SAN Volume Controller node represents an iSCSI
node and provides statically allocated IP addresses.
IBM SAN Volume Controller models SV3, SV2, and SA2 support 25 Gbps Ethernet adapters
that provide iSCSI and iSCSI Extensions over RDMA (iSER) connections. The IBM SAN
Volume Controller models SV3 also supports 100 Gbps Ethernet adapters.
iSER is a network protocol that extends the iSCSI protocol to use RDMA. You can implement
RDMA-based connections that use Ethernet networking structures and connections without
upgrading hardware. As of this writing, the system supports RDMA-based connections with
RDMA over Converged Ethernet (RoCE) or Internet-Wide Area RDMA Protocol (iWARP).
For host attachment, these 25 Gbps adapters support iSCSI and RDMA-based connections;
however, for external storage systems, only iSCSI connections are supported through these
adapters. When the 25 Gbps adapter is installed on nodes in the system, RDMA technology
can be used for node-to-node communications.
Note: The 100 Gbps adapter on the IBM SAN Volume Controller models SV3 supports
iSCSI. However, the performance is limited 25 Gbps per port.
DRPs are a new type of storage pool that implements various techniques, such as
thin-provisioning, compression, and deduplication, to reduce the amount of physical capacity
that is required to store data. Savings in storage capacity requirements translate into the
reduction of the cost of storing the data.
By using DRPs, you can automatically de-allocate and reclaim the capacity of
thin-provisioned volumes that contain deleted data and enable this reclaimed capacity to be
reused by other volumes. Data reduction provides more capacity from compressed volumes
because of the implementation of the new log-structured array.
Deduplication
Data deduplication is one of the methods of reducing storage needs by eliminating redundant
copies of data. Data reduction is a way to decrease the storage disk infrastructure that is
required, optimize the usage of storage disks, and improve data recovery infrastructure
efficiency.
Existing data or new data is standardized into chunks that are examined for redundancy. If
data duplicates are detected, pointers are shifted to reference a single copy of the chunk, and
the duplicate data sets are then released.
To estimate potential capacity savings that data reduction can provide on the system, use the
Data Reduction Estimation Tool (DRET). This tool scans target workloads on all attached
storage arrays, consolidates these results, and generates an estimate of potential data
reduction savings for the entire system.
The DRET is available for download at this IBM Support web page.
1.4.16 IP replication
IP replication was introduced in Version 7.2. It enables data replication between
IBM Storage Virtualize family members. IP replication uses the IP-based ports of the cluster
nodes.
The IP replication function is transparent to servers and applications in the same way as
traditional FC-based mirroring. All remote mirroring modes (MM, GM, and GMCV and the
new policy-based replication) are supported.
The configuration of the system is straightforward. IBM Storage Virtualize family systems
normally “find” each other in the network and can be selected from the GUI.
IP connections that are used for replication can have long latency (the time to transmit a
signal from one end to the other), which can be caused by distance or by many “hops”
between switches and other appliances in the network. Traditional replication solutions
transmit data, wait for a response, and then transmit more data, which can result in network
utilization as low as 20% (based on IBM measurements). In addition, this scenario worsens
the longer the latency.
Bridgeworks SANSlide technology, which is integrated with the IBM Storage Virtualize family,
requires no separate appliances and incurs no extra cost and configuration steps. It uses
artificial intelligence (AI) technology to transmit multiple data streams in parallel, adjusting
automatically to changing network environments and workloads.
40 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
SANSlide improves network bandwidth usage up to 3x. Therefore, customers can deploy a
less costly network infrastructure, or take advantage of faster data transfer to speed
replication cycles, improve remote data currency, and enjoy faster recovery.
Copy services functions are implemented within a single IBM SAN Volume Controller, or
between multiple members of the IBM Storage Virtualize family.
The copy services layer sits above and operates independently of the function or
characteristics of the underlying disk subsystems that are used to provide storage resources
to an IBM SAN Volume Controller.
Synchronous remote copy ensures that updates are committed at the primary and secondary
volumes before the application considers the updates complete. Therefore, the secondary
volume is fully dated if it is needed in a failover. However, the application is fully exposed to
the latency and bandwidth limitations of the communication link to the secondary volume. In a
truly remote situation, this extra latency can have a significant adverse effect on application
performance.
Special configuration guidelines exist for SAN fabrics and IP networks that are used for data
replication. Consider the distance and available bandwidth of the intersite links.
A function of Global Mirror for low bandwidth was introduced in IBM Storage Virtualize 6.3. It
uses change volumes that are associated with the primary and secondary volumes. These
points in time copies are used to record changes to the remote copy volume, the FlashCopy
map that exists between the secondary volume and the change volume, and between the
primary volume and the change volume. This function is called Global Mirror with change
volumes (cycling mode).
Figure 1-13 shows an example of this function where you can see the relationship between
volumes and change volumes.
In asynchronous remote copy, the application acknowledges that the write is complete before
the write is committed at the secondary volume. Therefore, on a failover, specific updates
(data) might be missing at the secondary volume.
The application must have an external mechanism for recovering the missing updates, if
possible. This mechanism can involve user intervention. Recovery on the secondary site
involves starting the application on this recent backup, and then rolling forward or backward to
the most recent commit point.
Policy-based replication uses volume groups to automatically deploy and manage replication.
This feature significantly simplifies configuring, managing, and monitoring replication between
two systems. Policy-based replication simplifies asynchronous replication with the following
key advantages:
Uses volume groups instead of consistency groups. With volume groups, all volumes
are replicated based on the assigned policy.
Simplifies administration by removing the need to manage relationships and change
volumes.
Automatically manages provisioning on the remote system.
Supports easier visualization of replication during a site failover.
Automatically notifies you when the recovery point objective (RPO) is exceeded.
Easy-to-understand status and alerts on the overall health of replication.
42 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
To learn more about concepts and objects that are related to PBR, see the Policy-based
replication section in the IBM Documentation pages.
Also, there is an IBM Redpaper about policy-based replication and its implementation here -
IBM Redp5704.
FlashCopy
FlashCopy is sometimes described as an instance of a time-zero (T0) copy or a point-in-time
copy technology.
FlashCopy can be performed on multiple source and target volumes. FlashCopy enables
management operations to be coordinated so that a common single PiT is chosen for copying
target volumes from their respective source volumes.
With IBM Storage Virtualize, multiple target volumes can undergo FlashCopy from the same
source volume. This capability can be used to create images from separate PiTs for the
source volume, and to create multiple images from a source volume at a common PiT.
Reverse FlashCopy enables target volumes to become restore points for the source volume
without breaking the FlashCopy relationship, and without waiting for the original copy
operation to complete. IBM Storage Virtualize supports multiple targets and multiple rollback
points.
Most customers aim to integrate the FlashCopy feature for PiT copies and quick recovery of
their applications and databases. An IBM solution for this goal is provided by IBM Storage
Protect and IBM Copy Data Management. For more information, see this IBM Storage web
page.
TCT uses IBM FlashCopy techniques that provide full and incremental snapshots of several
volumes. Snapshots are encrypted and compressed before being uploaded to the cloud.
Reverse operations are also supported within that function. When a set of data is transferred
out to cloud, the volume snapshot is stored as object storage.
IBM Cloud Object Storage uses an innovative approach and a cost-effective solution to store
a large amount of unstructured data. It also delivers mechanisms to provide security services,
HA, and reliability.
The management GUI provides an easy- to-use initial setup, advanced security settings, and
audit logs that record all backup and restore to cloud.
For more information about IBM Cloud Object Storage, see this IBM Cloud web page.
When an HyperSwap topology is configured, each node, external storage system, and host in
the system configuration must be assigned to one of the sites in the topology. Both nodes of
an I/O group must be at the same site. This site must be the same site as the external storage
systems that provide the managed disks to that I/O group.
When managed disks are added to storage pools, their site attributes must match. This
requirement ensures that each copy in a HyperSwap volume is fully independent and is at a
distinct site.
When the system is configured between two sites, HyperSwap volumes have a copy at one
site and a copy at another site. Data that is written to the volume is automatically sent to both
copies. If one site is no longer available, the other site can provide access to the volume. If
ownership groups are used to manage access to HyperSwap volumes, both volume copies
and users who access them must be assigned to the same ownership group.
A 2-site HyperSwap configuration can be extended to a third site for DR that uses the IBM
Storage Virtualize 3-Site Orchestrator. For more information, see IBM Storage Virtualize
3-Site Replication, SG24-8504.
44 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Figure 1-15 shows the rear view of the IBM SAN Volume Controller SV3.
Figure 1-16 shows the internal hardware components of an IBM SAN Volume Controller SV3
node canister. To the left is the front of the canister where fan modules are located, followed
by two Ice Lake CPUs and Dual Inline Memory Module (DIMM) slots. The battery backup
units and the PCIe adapter cages are shown on the right side. Each of these adapters cages
holds two PCIe adapter cards, except for cage number 2, which is dedicated to the
compression card.
Figure 1-16 IBM SAN Volume Controller SV3 internal hardware components
Figure 1-17 shows the internal architecture of the IBM SAN Volume Controller SV3 model.
You can see that the PCIe switch is still present, but has no outbound connections because
these models do not support any internal drives. The PCIe switch is used for internal
functions and monitoring purposes within the IBM SAN Volume Controller enclosure.
Figure 1-18 IBM SAN Volume Controller SV2 and SA2 front view
Figure 1-19 shows the rear view of the IBM SAN Volume Controller SV2 / SA2.
Figure 1-19 IBM SAN Volume Controller SV2 and SA2 rear view
46 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Figure 1-20 shows the internal hardware components of an IBM SAN Volume Controller SV2
and SA2 node canister. To the left is the front of the canister where fan modules and battery
backup are located, followed by two Cascade Lake CPUs and Dual Inline Memory Module
(DIMM) slots, and PCIe risers for adapters on the right.
Figure 1-21 shows the internal architecture of the IBM SAN Volume Controller SV2 and SA2
models. You can see that the PCIe switch is still present, but has no outbound connections
because these models do not support any internal drives. The PCIe switch is used for internal
monitoring purposes within the IBM SAN Volume Controller enclosure.
Figure 1-21 IBM SAN Volume Controller SV2 and SA2 internal architecture
Note: IBM SAN Volume Controller SV3, SV2 and SA2 do not support any type of expan-
sion enclosures.
More information: For the most up-to-date information about features, benefits, and
specifications of the IBM SAN Volume Controller models, see this web page.
The information in this book is valid at the time of this writing and covers IBM Storage
Virtualize V8.6. However, as IBM SAN Volume Controller matures, expect to see new
features and enhanced specifications.
Processor Two Intel Cascade Lake Two Intel Cascade Two Intel Ice Lake 4189
5218 Series, 16-cores, Lake 4208 Series, Series, 24-cores, and
and 2.30 GHz (Gold) 8-cores, and 2.10 GHz 2.4 GHz (Gold)
(Silver)
I/O ports and man- Four 10 Gb Ethernet Four 10 Gb Ethernet Two 1 Gb Ethernet
agement ports for 10 Gb iSCSI ports for 10 Gb iSCSI ports for system
connectivity and connectivity and management only
system management system management (non-iSCSI ports)
USB ports 2 2 1
Integrated battery 1 1 2
units
The following optional features are available for IBM SAN Volume Controller SV2 and SA2:
A 768 GB cache upgrade
A 4-port 16 Gb FC/FC over NVMe adapter for 16 Gb FC connectivity
A 4-port 32 Gb FC/FC over NVMe adapter for 32 Gb FC connectivity
A 2-port 25 Gb iSCSI/iSER/RDMA over Converged Ethernet (RoCE)
A 2-port 25 Gb iSCSI/iSER/internet Wide-area RDMA Protocol (iWARP)
48 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
The SV2 and SA2 systems have dual CPU sockets and three adapter slots along with four
10-GbE RJ45 ports on board.
Note: IBM SAN Volume Controller models SA2 and SV2 do not support FCoE.
The following optional features are available for IBM SAN Volume Controller SV3:
A 1536 GB cache upgrade
A 4-port 32 Gb FC/FC over NVMe adapter for 32 Gb FC connectivity
A 2-port 25 Gb iSCSI/iSER/RDMA over Converged Ethernet (RoCE)
A 2-port 25 Gb iSCSI/iSER/internet Wide-area RDMA Protocol (iWARP)
A 2-port 100 Gb NVMe/iSCSI//RDMA over Converged Ethernet (RoCE)
Note: The 25 and 100 Gb adapters are NVMe capable; however, to support NVMe, a
software dependency exists (at the time of this writing). Therefore, NVMe/NVMeoF is not
supported on these cards.
All Ethernet cards can be used with the iSCSI protocol. 25 Gb iWARP Ethernet cards can
also be used for clustering. 25 Gb and 100 Gb RoCE Ethernet cards can be used for NVMe
RDMA.
The comparison of current and previous models of IBM SAN Volume Controller is shown in
Table 1-3. Expansion enclosures are not included in the list.
2145-SV2 128 - 768 16 and 32 25, 50, and 100 Intel Xeon Cascade Lake 06 March 2020
2147-SV2 128 - 768 16 and 32 25, 50, and 100 Intel Xeon Cascade Lake 06 March 2020
2145-SA2 128 - 768 16 and 32 25, 50, and 100 Intel Xeon Cascade Lake 06 March 2020
2147-SA2 128 - 768 16 and 32 25, 50, and 100 Intel Xeon Cascade Lake 06 March 2020
2145-SV3 512 - 1536 32 25 and 100 by way Intel Xeon Ice Lake 08 March 2022
of PCIe adapters
only
2147-SV3 512 - 1536 32 25 and 100 by way Intel Xeon Ice Lake 08 March 2022
of PCIe adapters
only
Note: IBM SAN Volume Controller SV3, SV2, and SA2 do not support any type of SAS
expansion enclosures.
IBM FlashSystem 5015, IBM FlashSystem 5035, IBM FlashSystem 5045 and IBM
FlashSystem 5200 deliver entry enterprise solutions. IBM FlashSystem 7200 and 7300
provides a midrange enterprise solution. IBM FlashSystem 9200 and 9500 plus the
rack-based IBM FlashSystem 9200R and 9500R provide four high-end enterprise solutions.
Although all the IBM FlashSystem family systems are running the same IBM Storage
Virtualize software, the feature set that is available with each of the models is different.
Note: The IBM FlashSystem 9100 Models AF7, AF8, UF7, and UF8 plus IBM
FlashSystem 7200 Model 824 and U7C are no longer sold by IBM, but are included here
for completeness because they support IBM Storage Virtualize V8.6 software.
For more information about the complete IBM FlashSystem family, see IBM FlashSystem
Family Data Sheet
IBM Storage Expert Care is designed to simplify and standardize the support approach on
the IBM FlashSystem portfolio to keep customer’s systems operating at peak performance.
The Storage Expert Care offering was originally released with the IBM FlashSystem 5200 and
now also covers the IBM FlashSystems 7200, 7300, 9200/R, and 9500/R.
50 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Customers can now choose their preferred level of support from up to three tiers
(product-dependent), each priced as a simple percentage of the hardware sales price. This
feature allows for easy, straightforward quoting from a single system.
These three tiers allow customers to select the best level of required service to support their
environment, ranging from base level service, through to premium-enhanced service. This
Storage Expert Care offering is designed to improve product resiliency and reliability and
reduce the operational costs that are associated with managing and maintaining increasingly
complex and integrated IT environments.
Figure 1-23 shows a summary of the Storage Expert Care Tier Levels.
Figure 1-23 Storage Expert Care tier levels for IBM FlashSystem 5015 and IBM FlashSystem 5045
Note: Not all geographies and regions offer all the Storage Expert Care levels of support. If
the Storage Expert Care is not announced in a specific country, the traditional warranty
and maintenance options are still offered.
For more information about in which countries it is applicable, see the following
announcement letters:
FS5200 Announcement Letter
FS7200 Announcement Letter
FS9200 Announcement Letter
FS7300 and FS9500 Announcement Letters
To support the new Storage Expert Care offering on the older IBM FlashSystems 7200 and
9200, new machine types and models were introduced for these products.
52 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Table 1-4 lists the comparison of the old machine types with the traditional warranty and
maintenance offering and the new Storage Expert Care offering.
Table 1-5 lists the software PIDs and SWMA feature codes that must be added to the order,
depending on the required level of cover.
When selecting the level of Storage Expert Care, you also must select the duration of the
contract, which can be 1 - 5 years. You also can opt for committed maintenance service levels
(CMSL).
The contract and duration has its own machine types and models (in addition to the hardware
machine type and model that are listed in Table 1-5):
FS7200:
– 4665-P01-05 for Premium
– 4665-Pxx for Premium with CMSL
FS9200:
– 4673-P01-05 for Premium
– 4673-Pxx for Premium with CMSL
For example, an FS9200 with Premium Expert care for three years is 4673-PX3, where:
P: Premium Level service
X: Reserved for committed services (CMSL) if added to the expert care contract
3: Denotes a three-year contract (if 0, no committed services were purchased)
For more information about IBM Storage Expert Care, see the following IBM Documentation
web pages:
For more information about IBM Storage Expert Care, see the following IBM Documentation
web pages:
54 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Figure 1-24 shows the IBM FlashSystem 9500 front and rear views.
As shown in Figure 1-24 on page 55, the IBM FlashSystem 9500 enclosure consists of
redundant PSUs, node canisters, and fan modules to provide redundancy and HA.
56 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Figure 1-26 shows the internal hardware components of a node canister. On the left is the
front of the canister, where the NVMe drives and fan modules are installed, followed by two
Ice Lake CPUs and memory DIMM slots, and Peripheral Component Interconnect® Express
(PCIe) cages for the adapters on the right. The dual battery backup units are in the center,
between the PCIe adapter cages.
Note: There are new rules for the plugging of the NVMe drives in the control enclosure.
See the “IBM FlashSystem 9500 NVMe drive options” on page 59.
An IBM FlashSystem 9500 clustered system can contain up to two IBM FlashSystem 9500
systems and up to 3,040 drives in expansion enclosures. The following clustering rules must
be considered:
IBM FlashSystem 9500 systems can be clustered only with another IBM FlashSystem
9500.
IBM FlashSystem 9500 systems cannot be clustered with existing IBM FlashSystem 9200
or IBM FlashSystem 7200 or 7300 systems.
The IBM FlashSystem 9500 control enclosure node canisters are configured for active-active
redundancy. The node canisters provide a web interface, Secure Shell (SSH) access, and
Simple Network Management Protocol (SMNP) connectivity through external Ethernet
interfaces. By using the web and SSH interfaces, administrators can monitor system
performance and health metrics, configure storage, and collect support data, among other
features.
Note: There is a new machine type 4983 models AH8 and UHB being introduced which is
identical to the 4666, except it will be sold with Licensed Internal Code (LIC) in line with the
other products in the FlashSystems product portfolio. This ensures that all features are
included in the product price with the exception of the encryption.
IBM Storage Insights is responsible for monitoring the system and reporting the capacity that
was used beyond the base 35%, which is then billed on the capacity-used basis. You can
grow or shrink usage, and pay only for the configured capacity.
The IBM FlashSystem Utility Model is provided for customers who can benefit from a variable
capacity system, where billing is based on actual provisioned space only. The hardware is
leased through IBM Global Finance on a three-year lease, which entitles the customer to use
approximately 30 - 40% of the total system capacity at no extra cost (depending on the
individual customer contract). If storage needs increase beyond that initial capacity, usage is
billed based on the average daily provisioned capacity per terabyte per month, on a quarterly
basis.
The system monitors daily provisioned capacity and averages those daily usage rates over
the month term. The result is the average daily usage for the month.
58 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
If a customer uses 45 TB, 42.5 TB, and 50 TB in three consecutive months, IBM Storage
Insights calculates the overage as listed in Table 1-6, rounding to the nearest terabyte.
45 TB 40.25 TB 4.75 TB 5 TB
50 TB 40.25 TB 9.75 TB 10 TB
The total capacity that is billed at the end of the quarter is 17 TB per month in this example.
Flash drive expansions can be ordered with the system in all supported configurations.
Table 1-7 lists the feature codes that are associated with the IBM FlashSystem 9500 Utility
Model UH8 billing.
Table 1-7 IBM FlashSystem 9500 Utility Model UG8 billing feature codes
Feature code Description
These features are used to purchase the variable capacity that is used in the
IBM FlashSystem 9500 Utility Models. The features (#AE00, #AE01, and #AE02) provide
terabytes of capacity beyond the base subscription on the system. Usage is based on the
average capacity that is used per month. The total of the prior three months’ usage should be
totaled and the corresponding number of #AE00, #AE01, and #AE02 features that are
ordered quarterly.
With partially populated control enclosures, we have some drive slot plugging rules that must
be adhered to, ensuring the best possible operating conditions for the drives.
Figure 1-27 on page 60 shows the logical NMVe drive placement, starting from the center of
the enclosure (slot 12) on the upper 24 slots. Any slots that do not have an NVMe drive
present must have a blank filler installed.
Figure 28 shows the actual drive population with numbering. This shows how the drives are
populated from center out, and then distributing them from top and bottom, as the number of
drives increase over time.
Note: The layout in Figure 28 has been split at slots 12 and 13 for better clarity on this
page, but in reality slots 1 to 24 and slots 25 to 48 are contiguous.
1.7.3 IBM FlashSystem 9000 Expansion Enclosure Models AFF and A9F
IBM FlashSystem 9500 Model AH8 and IBM FlashSystem 9500 Utility Model UH8 support
the expansion enclosures IBM FlashSystem 9000 Models AFF and A9F.
For more information, see 1.9, “IBM FlashSystem 9000 Expansion Enclosure Models AFF
and A9F” on page 64.
Note: The IBM FlashSystem 9500 Model AH8 and IBM FlashSystem 9500 Model UH8 can
support a maximum of one IBM FlashSystem 9000 Model A9F dense expansion or three
IBM FlashSystem 9000 Model AFF enclosures per chain.
60 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
The IBM FlashSystem 9500R Rack Solution system features a dedicated FC network for
clustering and optional expansion enclosures, which are delivered assembled in a rack.
Available with two clustered IBM FlashSystem 9500 systems and up to four expansion
enclosures, it can be ordered as an IBM FlashSystem 9502R, with the last number denoting
the two AG8 controller enclosures in the rack.
The final configuration occurs on site following the delivery of the systems. More components
can be added to the rack after delivery to meet the growing needs of the business.
Note: Other than the IBM FlashSystem 9500 control enclosures and its expansion
enclosures, the extra components of this solution are not covered under Storage Expert
Care. Instead, they have their own warranty, maintenance terms, and conditions.
Rack rules
The IBM FlashSystem 9500R Rack Solution product represents a limited set of possible
configurations. Each IBM FlashSystem 9500R Rack Solution order must include the following
components:
Two 4666 Model AH8 control enclosures.
Two IBM SAN24B-6 or two IBM SAN32C-6 FC switches.
The following optional expansion enclosures are available by way of MES only, They
cannot be ordered with the machine if ordered as a new build:
– 0 - 4 4666 Model AFF expansion enclosures, with no more than one expansion
enclosure per Model AG8 control enclosure and no mixing with the 9848/4666 Model
A9F expansion enclosures.
– 0 - 2 4666 Model A9F expansion enclosures, with no more than one expansion
enclosure per Model AG8 control enclosure and no mixing with 9848/4666 Model AFF
expansion enclosures.
One 7965-S42 rack with the suitable power distribution units (PDUs) that are required to
power components within the rack.
All components in the rack much include feature codes #FSRS and #4651.
For Model AH8 control enclosures, the first and largest capacity enclosure includes
feature code #AL01, and #AL02, in capacity order. The 4666 / 4983 model AH8 control
enclosure with #AL01 also must include #AL0R.
Following the initial order, each 4666 Model AH8 control enclosure can be upgraded through
a MES.
More components can be ordered separately and added to the rack within the configuration
limitations of the IBM FlashSystem 9500 system. Customers must ensure that the space,
power, and cooling requirements are met. If assistance is needed with the installation of these
additional components beyond the service that is provided by your IBM System Services
Representative (IBM SSR), IBM Lab Services are available.
Note: There is a new machine type 4983 model AH8 being introduced which is physically
identical to the 4666, except it will be sold with Licensed Internal Code (LIC) in line with the
other products in the FlashSystems product line. This ensures all features are included in
the product price with the exception of the encryption.
Table 1-8 lists the IBM FlashSystem 9500R Rack Solution combinations, the MTMs, and their
associated feature codes.
The following expansion enclosures are available by way of MES order only.
Key to figures
The key to the symbols that are used in the figures in this section are listed in Table 1-9.
FC SWn FC switch n of 2
These switches are both 8977-T32 or both 8960-F24
PDU A, PDU B PDUs. Both have the same rack feature code: #ECJJ, #ECJL, #ECJN, or
#ECJQ.
62 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Figure 1-29 shows the legend that is used to denote the component placement and
mandatory gaps for the figures that show the configurations.
Figure 1-30 shows the standard IBM FlashSystem 9500R Rack Solution configuration in the
rack.
Figure 1-30 IBM FlashSystem 9500R Rack Solution configuration in the rack
Minimum configuration
Consider the following points:
Control enclosures (CTL) 1 and 2 are mandatory.
The product includes cables that are suitable for inter-system FC connectivity. You must
order extra cables for host and Ethernet connectivity.
The PDUs and power cabling that are needed depends on what expansion enclosures are
ordered.
Two PDUs with nine C19 outlets are required. This PDU also has three C13 outlets on the
forward-facing side.
FC SW1 and FC SW2 are a pair of IBM SAN32C-6 or IBM SAN24B-6 FC switches.
You can allocate different amounts of storage (drives) to each CTL component.
A gap of 1U is maintained below the expansion area to allow for power cabling routing.
For more information about the FC cabling at the rear of the IBM FlashSystem 9500R Rack
Solution, see this IBM Documentation web page.
64 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
An intermix of capacity drives can be used in any drive slot. The following attachment rules
are applicable for each SAS chain:
IBM FlashSystem 9200: Up to 10 IBM FlashSystem 9000 Expansion Enclosure Model
AFF enclosures can be attached to the control enclosure to a total of 240 drives
maximum.
IBM FlashSystem 9500: Up to three IBM FlashSystem 9000 Expansion Enclosure Model
AFF enclosures can be attached. This configuration provides extra capacity with a
maximum of 72 drives.
Figure 1-31 shows the front view of the IBM FlashSystem 9000 Expansion Enclosure Model
AFF.
An intermix of capacity drives is allowed in any drive slot and the following attachment rules
are applicable for each SAS chain:
IBM FlashSystem 9200: Up to four IBM FlashSystem 9000 Expansion Enclosure Model
A9F enclosures can be attached to the control enclosure to a total of 368 drives maximum.
IBM FlashSystem 9500: One IBM FlashSystem 9000 Expansion Enclosure Model A9F
can be attached. This configuration provides extra capacity with a maximum of 92 drives.
Figure 1-32 shows the front view of the IBM FlashSystem 9000 Expansion Enclosure Model
A9F.
Figure 1-32 IBM FlashSystem 9000 Expansion Enclosure Model front view
An example of chain weight 4.5 with two IBM FlashSystem 9000 Expansion Enclosure Model
AFF enclosures and one IBM FlashSystem 9000 Expansion Enclosure Model A9F enclosure
all correctly cabled is shown in Figure 1-33, which shows an IBM FlashSystem 9200 system
connecting through SAS cables to the expansion enclosures while complying with the
maximum chain weight.
Figure 1-33 IBM FlashSystem 9200 system that is connected to expansion enclosure
66 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
An example of chain weights 3 and 2.5 with three IBM FlashSystem 9000 Expansion
Enclosure Model AFF enclosures and one IBM FlashSystem 9000 Expansion Enclosure
Model A9F enclosure all correctly cabled is shown in Figure 1-34, which shows an IBM
FlashSystem 9500 system connecting through SAS cables to the expansion enclosures while
complying with the maximum chain weight.
Figure 1-34 IBM FlashSystem 9500 system that is connected to expansion enclosure
Note: The expansion enclosures rules will be the same for the FlashSystem 9500 machine
type 4983, which is functionally equivalent to the 4666, except it has Licensed Internal
Code (LIC).
For more information about V8.6.0x configuration and limit restrictions, see the following IBM
Support web page:
IBM FlashSystem 9500
IBM FlashSystem 7300 has a new machine type of 4657. This new machine type includes the
Storage Virtualize component as Licensed Machine Code (LMC); therefore, it does not
require the purchase of separate software maintenance (SWMA). IBM FlashSystem 7300 still
requires licenses for the external virtualization of storage.
Figure 1-35 shows the front and rear views of the IBM FlashSystem 7300 system.
68 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
As shown in Figure 1-36, the IBM FlashSystem 7300 enclosure consists of redundant PSUs,
node canisters, and fan modules to provide redundancy and HA.
Figure 1-37 shows the internal hardware components of a node canister. On left side is the
front of the canister where fan modules and battery backup are installed, followed by two
Cascade Lake CPUs, Dual Inline Memory Module (DIMM) slots, and PCIe risers for adapters
on the right side.
For more information about the supported drive types, see 1.17, “IBM FlashCore Module
drives, NVMe SSDs, and SCM drives” on page 103.
An IBM FlashSystem 7300 clustered system can contain up to four IBM FlashSystem 7300
systems and up to 3,040 drives. IBM FlashSystem 7300 systems can be added only into
clustered systems that include other IBM FlashSystem 7300 systems.
The variable capacity billing uses IBM Storage Insights to monitor the system use, which
allows allocated storage use that is above a base subscription rate to be billed per TB, per
month.
Allocated storage is identified as storage that is allocated to a specific host (and unusable to
other hosts), whether data is written or not written. For thin-provisioning, the data that is
written is considered used. For thick provisioning, total allocated volume space is considered
used.
1.10.2 IBM FlashSystem 7000 Expansion Enclosure 4657 Models 12G, 24G,
and 92G
The following types of expansion enclosures are available:
IBM FlashSystem 7000 LFF Expansion Enclosure 4657 Model 12G
IBM FlashSystem 7000 SFF Expansion Enclosure 4657 Model 24G
IBM FlashSystem 7000 LFF Expansion Enclosure 4657 Model 92G
70 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Figure 1-38 IBM FlashSystem 7000 LFF Expansion Enclosure Model 12G
The SFF expansion enclosure is a 2U enclosure that includes the following components:
A total of 24 2.5-inch drives (hard disk drives [HDDs] or SSDs).
Two Storage Bridge Bay (SBB)-compliant Enclosure Services Manager (ESM) canisters.
Two fan assemblies, which mount between the drive midplane and the node canisters.
Each fan module is removable when the node canister is removed.
Two power supplies.
An RS232 port on the back panel (3.5 mm stereo jack), which is used for configuration
during manufacturing.
Figure 1-39 Front view of an IBM FlashSystem 7000 SFF expansion enclosure
Each dense drawer can hold up to 92 drives that are positioned in four rows of 14 and another
three rows of 12 mounted drives assemblies. Two SEMs are centrally located in the chassis:
one SEM addresses 54 drive ports, and the other addresses 38 drive ports.
The drive slots are numbered 1 - 14, starting from the left rear slot and working from left to
right, back to front.
Each canister in the dense drawer chassis features two SAS ports numbered 1 and 2. The
use of SAS port1 is mandatory because the expansion enclosure must be attached to an
IBM FlashSystem 7300 node or another expansion enclosure. SAS connector 2 is optional
because it is used to attach to more expansion enclosures.
Each IBM FlashSystem 7300 system can support up to four dense drawers per SAS chain.
72 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
For example, you can combine seven 24G and one 92G expansions (7x1 + 1x2.5 = 9.5 chain
weight), or two 92G enclosures, one 12G, and four 24G (2x2.5 + 1x1 + 4x1 = 10 chain
weight).
An example of chain weight 4.5 with one 24G, one 12G, and one 92G enclosures, all correctly
cabled, is shown in Figure 1-42.
Figure 1-42 Connecting FS7300 SAS cables while complying with the maximum chain weight
This system also has the flexibility and performance of flash and Non-Volatile Memory
Express (NVMe) end to end, the innovation of IBM FlashCore technology, and Storage Class
Memory (SCM) to help accelerate your business execution.
The innovative IBM FlashSystem family is based on a common storage software platform,
IBM Storage Virtualize, that provides powerful all-flash and hybrid-flash solutions that offer
feature-rich, cost-effective, and enterprise-grade storage solutions.
Its industry-leading capabilities include a wide range of data services that can be extended to
more than 500 heterogeneous storage systems, including the following examples:
Automated data movement
Synchronous and asynchronous copy services on-premises or to the public cloud
HA configurations
Storage automated tiering
Data reduction technologies, including deduplication
Available on IBM Cloud and Amazon Web Services (AWS), IBM Storage Virtualize for Public
Cloud works with IBM FlashSystem 5200 to deliver consistent data management between
on-premises storage and public cloud. You can move data and applications between
on-premises and public cloud, implement new DevOps strategies, use public cloud for DR
without the cost of a second data center, or improve cyber resiliency with “air gap” cloud
snapshots.
IBM FlashSystem 5200 offers world-class customer support, product upgrades, and other
programs. Consider the following examples:
IBM Storage Expert Care service and support IBM Storage Expert Care service and
support are simple. You can easily select the level of support and period that best fits your
needs with predictable and up front pricing that is a fixed percentage of the system cost.
Note: For more information, see 1.6.1, “Storage Expert Care” on page 50.
The IBM Data Reduction Guarantee helps reduce planning risks and lower storage costs
with baseline levels of data compression effectiveness in IBM Storage Virtualize-based
offerings.
The IBM Controller Upgrade Program enables customers of designated all-flash IBM
storage systems to reduce costs while maintaining leading-edge controller technology for
essentially the cost of ongoing system maintenance.
The IBM FlashSystem 5200 control enclosure supports up to 12 2.5-inch NVMe-capable flash
drives in a 1U high form factor.
Note: The IBM FlashSystem 5200 control enclosure supports the new IBM FCM3 drives if
running IBM Storage Virtualize software V8.5 and above. These new drives feature the
same capacities as the previous FCM2 drives, but have a higher internal compression ratio
of up to 3:1. Therefore, it can effectively store more data, assuming that the data is
compressible.
One standard model of IBM FlashSystem 5200 (4662-6H2) and one utility model (4662-UH6)
are available.
74 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Figure 1-43 shows the IBM FlashSystem 5200 control enclosure front view with 12 NVMe
drives and a 3/4 ISO view.
Figure 1-43 IBM FlashSystem 5200 control enclosure front and 3/4 ISO view
Table 1-10 lists the host connections, drive capacities, features, and standard options with
IBM Storage Virtualize that are available on IBM FlashSystem 5200.
Table 1-10 IBM FlashSystem 5200 host, drive capacity, and functions summary
Feature or function Description
Control enclosure supported 2.5-inch NVMe self-compressing FCMs: 4.8 TB, 9.6 TB,
drives (12 maximum) 19.2 TB, and 38.4 TB
NVMe flash drives: 800 GB, 1.92 TB, 3.84 TB, 7.68 TB, and
15.36 TB
NVMe storage-class memory drives: 375 GB, 750 GB, 800
GB, and 1.6 TB
SAS Expansion Enclosures 2.5-inch flash drives supported: 800 GB, 1.6 TB, 1.92 TB,
760 per control enclosure 3.84 TB, 7.68 TB, 15.36 TB, and 30.72 TB
1,520 per clustered system 2.5-inch disk drives supported:
Model 12G 2U 12 drives – 600 GB, 900 GB, 1.2 TB, 1.8 TB, and 2.4 TB 10k SAS disk
Model 24G 2U 24 drives – 2 TB 7.2 K nearline SAS disk
Model 92G 5U 92 drives 3.5-inch disk drives supported: 4 TB, 6 TB, 8 TB, 10 TB,
12 TB, 14 TB, 16 TB, and 18 TB 7.2 K nearline SAS disk
For more information about configuration and restrictions, see this IBM Support web page.
The following 2.5-inch SFF flash drives are supported in the expansion enclosures:
400 and 800 GB
1.6, 1.92, 3.2, 3.84, 7.68, 15.36, and 30.72 TB
The following 3.5-inch LFF flash drives are supported in the expansion enclosures:
1.6, 1.92, 3.2, 3.84, 7.68, 15.36, and 30.72 TB
3.5-inch SAS disk drives (Model 12G):
– 900 GB, 1.2 TB, 1.8 TB, and 2.4 TB 10,000 rpm
– 4 TB, 6 TB, 8 TB, 10 TB, 12 TB, 14 TB, and 16 TB 7,200 rpm
3.5-inch SAS drives (Model 92G):
– 1.6 TB, 1.92 TB, 3.2 TB, 3.84 TB, 7.68 TB, 15.36 TB, and 30.72 TB flash drives
– 1.2 TB, 1.8 TB, and 2.4 TB 10,000 rpm
– 6 TB, 8 TB, 10 TB, 12 TB, 14 TB, and 16 TB 7,200 rpm
2.5-inch SAS disk drives (Model 24G):
– 900 GB, 1.2 TB, 1.8 TB, and 2.4 TB 10,000 rpm
– 2 TB 7,200 rpm
2.5-inch SAS flash drives (Model 24G):
– 400 and 800 GB
– 1.6, 1.92, 3.2, 3.84, 7.68, 15.36, and 30.72 TB
76 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
IBM FlashSystem 5000 is a member of the IBM FlashSystem family of storage solutions. It
delivers increased performance and new levels of storage efficiency with superior ease of
use. This entry storage solution enables organizations to overcome their storage challenges.
The solution includes technology to complement and enhance virtual environments, which
delivers a simpler, more scalable, and cost-efficient IT infrastructure. IBM FlashSystem 5000
features two node canisters in a compact, 2U 19-inch rack mount enclosure.
Figure 1-44 shows the IBM FlashSystem 5015, 5035 and 5045 SFF control enclosure front
view.
Figure 1-44 IBM FlashSystem 5015, 5035 and 5045 SFF control enclosure front view
Figure 1-45 shows the IBM FlashSystem 5015, 5035 and 5045 LFF control enclosure front
view.
Figure 1-45 IBM FlashSystem 5015, 5035 and 5045 LFF control enclosure front view
Table 1-11 lists the model comparison chart for the IBM FlashSystem 5000 family.
Table 1-11 Machine type and model comparison for the IBM FlashSystem 5000
MTM Full name
4680-2P2 IBM FlashSystem 5015 LFF control enclosure (with Storage Expert Care)
4680-2P4 IBM FlashSystem 5015 SFF control enclosure (with Storage Expert Care)
4680-3P2 IBM FlashSystem 5045 LFF control enclosure (with Storage Expert Care)
4680-3P4 IBM FlashSystem 5045 SFF control enclosure (with Storage Expert Care)
4680-12H IBM FlashSystem 5000 LFF expansion enclosure (with Storage Expert Care)
4680-24H IBM FlashSystem 5000 SFF expansion enclosure (with Storage Expert Care)
4680-92H IBM FlashSystem 5000 High-Density LFF expansion enclosure (with Storage
Expert Care)
Note: IBM FlashSystems 5015/5035 (M/T2072) must use the 2072 XXG models of
expansion enclosures. Similarly, the 5105/5045 (M/T 4680) must use the 4680 XXH
models on expansion enclosures. The two type cannot be intermixed.
78 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Table 1-12 lists the host connections, drive capacities, features, and standard options with
IBM Storage Virtualize that are available on IBM FlashSystem 5015.
Table 1-12 IBM FlashSystem 5015 host, drive capacity, and functions summary
Feature / Function Description
Control enclosure and SAS For SFF enclosures, see Table 1-13 on page 79
expansion enclosures For LFF enclosures, see Table 1-14 on page 80
supported drives
Table 1-13 lists the 2.5-inch supported drives for IBM FlashSystem 5000 family.
Table 1-13 2.5-inch supported drives for the IBM FlashSystem 5000 family
2.5-inch (SFF) Capacity
Table 1-14 lists the 3.5-inch supported drives for IBM FlashSystem 5000 family.
Table 1-14 3.5-inch supported drives for the IBM FlashSystem 5000 family
3.5-inch (LFF) Speed Capacity
The IBM FlashSystem 5000 expansion enclosures are available in the following form factors
2U 12 Drive Large Form Factor LFF Model 12H,
2U 24 Drive Small Form Factor SFF Model 24H,
2U 92 Drive HD Form Factor LFF Model 92H
Available with the IBM FlashSystem 5035 model, DRPs help transform the economics of data
storage. When applied to new or existing storage, they can increase usable capacity while
maintaining consistent application performance. DPRs can help eliminate or drastically
reduce costs for storage acquisition, rack space, power, and cooling, and can extend the
useful life of storage assets.
Table 1-15 lists the host connections, drive capacities, features, and standard options with
IBM Storage Virtualize that are available on IBM FlashSystem 5035.
Table 1-15 IBM FlashSystem 5035 host, drive capacity, and functions summary
Feature / Function Description
Control enclosure and SAS For SFF enclosures, see Table 1-13 on page 79
expansion enclosures For LFF enclosures, see Table 1-14 on page 80
supported drives
80 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
For more information about configuration and restrictions, see this IBM Support web page.
This next section provides hardware information about the IBM FlashSystem 5035 models.
The IBM FlashSystem 5035 control enclosure features the following components:
Two node canisters, each with a six-core processor
32 GB cache (16 GB per canister) with optional 64 GB cache (32 GB per canister)
10 Gb iSCSI (copper) connectivity standard with optional 16 Gb FC, 12 Gb SAS, 10 Gb
iSCSI (optical), or 25 Gb iSCSI (optical)
12 Gb SAS port for expansion enclosure attachment
12 slots for 3.5-inch LFF SAS drives (Model 3N2) and 24 slots for 2.5-inch SFF SAS drives
(Model 3N4)
2U, 19-inch rack mount enclosure with 100 - 240 V AC or -48 V DC power supplies
The IBM FlashSystem 5035 control enclosure models offer the highest level of performance,
scalability, and functions and include the following features:
Support for 760 drives per system with the attachment of eight IBM FlashSystem 5000
High-Density LFF expansion enclosures and 1,520 drives with a two-way clustered
configuration
DRPs with deduplication, compression,2 and thin provisioning for improved storage
efficiency
Encryption of data-at-rest that is stored within the IBM FlashSystem 5035 system
Figure 1-46 lists the IBM FlashSystem 5035 SFF control enclosure with 24 drives.
Figure 1-47 lists the rear view of an IBM FlashSystem 5035 control enclosure.
Figure 1-48 lists the available connectors and LEDs on a single IBM FlashSystem 5035
canister.
Figure 1-48 View of available connectors and LEDs on an IBM FlashSystem 5035 single canister
For more information about configuration and restrictions, see this IBM Support web page.
82 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Safeguarded Copy
FlashCopy 2 with internal scheduler
IBM FlashSystem 5045 also has a new machine type and model to support the IBM Storage
Expert care service offerings:
4680-3P2 (2U 12 Drive Large Form Factor LFF)
4680-3P4 (2U 24 Drive Small Form Factor SFF)
The IBM FlashSystem 5000 expansion enclosures are available in the following form factors
2U 12 Drive Large Form Factor LFF Model 12H,
2U 24 Drive Small Form Factor SFF Model 24H,
2U 92 Drive HD Form Factor LFF Model 92H
Available with the IBM FlashSystem 5045 model, DRPs help transform the economics of data
storage. When applied to new or existing storage, they can increase usable capacity while
maintaining consistent application performance. DPRs can help eliminate or drastically
reduce costs for storage acquisition, rack space, power, and cooling, and can extend the
useful life of storage assets.
Table 1-16 lists the host connections, drive capacities, features, and standard options with
IBM Storage Virtualize that are available on IBM FlashSystem 5045.
Table 1-16 IBM FlashSystem 5045 host, drive capacity, and functions summary
Feature / Function Description
Control enclosure and SAS For SFF enclosures, see Table 1-13 on page 79
expansion enclosures For LFF enclosures, see Table 1-14 on page 80
supported drives – NOTE Support for Enterprise 15K rpm drives is removed
for IBM FlashSystem 5045
IBM FlashSystem 5045 does not cluster with IBM FlashSystem 5035 original or any models
other than IBM FlashSystem 5045. Otherwise, in common with FlashSystem 5035, a
maximum of two I/O groups per cluster is allowed. IBM FlashSystem 5045 also has a
reduction in Volume Group extents from 4 million to 1 million.
IBM FlashSystem 5045 uses a Licence Machine Code (LMC) model similar to that used by
FlashSystem 5200. This is a change from FlashSystem 5035 as all feature software (apart
from encryption) is included with the base license. Encryption requires a hardware licence
key.
In keeping with current trends, expansion chain weight is reduced from 20U in FlashSystem
5035 to 12U for Flash System 5045.
This means that a single chain can contain up to 2x5U92 expansions or up to 6x2U12/24
expansions or a combination of 5U and 2U not exceeding 12U in total. This reduces the
maximum number of expansion drive slots per chain, per I/O Group and per system.
per chain: 2x92 + 1x24 + 24 = 232 drives
per I/O Group: 2 chains per I/O Group so 4x92 + 2x24 + 24 = 440 drives
per System: 2 I/O Groups so 2x440 = 880 drives
For more information about configuration and restrictions, see this IBM Support web page.
This next section provides hardware information about the IBM FlashSystem 5045 models.
The IBM FlashSystem 5045 control enclosure features the following components:
Two node canisters, each with a six-core processor
32 GB cache (16 GB per canister) with optional 64 GB cache (32 GB per canister)
10 Gb iSCSI (copper) connectivity standard with optional 16 Gb FC, 12 Gb SAS, 10 Gb
iSCSI (optical), or 25 Gb iSCSI (optical)
12 Gb SAS port for expansion enclosure attachment
12 slots for 3.5-inch LFF SAS drives (Model 3N2) and 24 slots for 2.5-inch SFF SAS drives
(Model 3N4)
2U, 19-inch rack mount enclosure with 100 - 240 V AC power supplies
The IBM FlashSystem 5045 control enclosure models offer the highest level of performance,
scalability, and functions and include the following features:
84 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Support for 760 drives per system with the attachment of eight IBM FlashSystem 5000
High-Density LFF expansion enclosures and 1,520 drives with a two-way clustered
configuration
DRPs with deduplication, compression,3 and thin provisioning for improved storage
efficiency
Encryption of data-at-rest that is stored within the IBM FlashSystem 5045 system
Figure 1-46 lists the IBM FlashSystem 5045 SFF control enclosure with 24 drives.
Figure 1-47 lists the rear view of an IBM FlashSystem 5045 control enclosure.
Figure 1-48 lists the available connectors and LEDs on a single IBM FlashSystem 5045
canister.
Figure 1-51 View of available connectors and LEDs on an IBM FlashSystem 5045 single canister
For more information about configuration and restrictions, see this IBM Support web page.
3 Deduplication and compression require 64 GB of system cache.
In a tiered storage pool, IBM Easy Tier acts to identify this skew and automatically place data
in the suitable tier to take advantage of it. By moving the hottest data onto the fastest tier of
storage, the workload on the remainder of the storage is reduced. By servicing most of the
application workload from the fastest storage, Easy Tier acts to accelerate application
performance.
Easy Tier is a performance optimization function that automatically migrates extents that
belong to a volume among different storage tiers based on their I/O load. The movement of
the extents is online and unnoticed from a host perspective.
As a result of extent movement, the volume no longer has all its data in one tier, but rather in
two or three tiers. Each tier provides optimal performance for the extent, as shown in
Figure 1-52.
Easy Tier monitors the I/O activity and latency of the extents on all Easy Tier enabled storage
pools to create heat maps. Based on these maps, Easy Tier creates an extent migration plan
and promotes (or moves) high activity or hot extents to a higher disk tier within the same
storage pool. It also demotes extents whose activity dropped off, or cooled, by moving them
from a higher disk tier managed disk (MDisk) back to a lower tier MDisk.
86 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Storage pools that contain only one tier of storage also can benefit from Easy Tier if they
include multiple disk arrays (or MDisks). Easy Tier has a balancing mode: It moves extents
from busy disk arrays to less busy arrays of the same tier, which balances I/O load.
All MDisks (disk arrays) belong to one of the tiers. They are classified as SCM, Flash,
Enterprise, or NL tier.
For more information about the EasyTier reports, see this IBM Documentation web page.
For example, the most common use case is a host application, such as VMware, which frees
storage in a file system. Then, the storage controller can perform functions to optimize the
space, such as reorganizing the data on the volume so that space is better used.
When a host allocates storage, the data is placed in a volume. To return the allocated space
to the storage pools, the SCSI UNMAP feature is used. UNMAP enables host operating
systems to deprovision storage on the storage controller so that the resources can
automatically be freed in the storage pools and used for other purposes.
A DRP increases infrastructure capacity use by using new efficiency functions and reducing
storage costs. By using the end-to-end SCSI UNMAP function, a DRP can automatically
de-allocate and reclaim the capacity of thin-provisioned volumes that contain deleted data so
that this reclaimed capacity can be reused by other volumes.
At its core, a DRP uses a Log Structured Array (LSA) to allocate capacity. An LSA enables a
tree-like directory to be used to define the physical placement of data blocks that are
independent of size and logical location. Each logical block device has a range of logical block
addresses (LBAs), starting from 0 and ending with the block address that fills the capacity.
When written, you can use an LSA to allocate data sequentially and provide a directory that
provides a lookup to match an LBA with a physical address within the array. Therefore, the
volume that you create from the pool to present to a host application consists of a directory
that stores the allocation of blocks within the capacity of the pool.
In DRPs, the maintenance of the metadata results in I/O amplification. I/O amplification
occurs when a single host-generated read or write I/O results in more than one back-end
storage I/O request because of advanced functions. A read request from the host results in
two I/O requests: a directory lookup and a data read. A write request from the host results in
three I/O requests: a directory lookup, a directory update, and a data write. This aspect must
be considered when sizing and planning your data-reducing solution.
Standard pools, which make up a classic solution that also is supported by the
IBM FlashSystem and IBM SAN Volume Controller system, do not use LSA. A standard pool
works as a container that receives its capacity from MDisks (disk arrays), splits it into extents
of the same fixed size, and allocates extents to volumes.
Standard pools do not cause I/O amplification and require less processing resource usage
compared to DRPs. In exchange, DRPs provide more flexibility and storage efficiency.
Table 1-17 lists the volume capacity saving types that are available with standard pools and
DRPs.
Best practice: If you want to use deduplication, create thin-provisioned compressed and
deduplicated volumes.
This book provides only an overview of DRP aspects. For more information, see Introduction
and Implementation of Data Reduction Pools and Deduplication, SG24-8430.
In IBM Storage Virtualized systems, each volume includes virtual capacity and real capacity
parameters:
Virtual capacity is the volume storage capacity that is available to a host. It is used by the
host operating system to create a file system.
Real capacity is the storage capacity that is allocated to a volume from a pool. It shows the
amount of space that is used on a physical storage volume.
Fully allocated
Fully allocated volumes are created with the same amount of real capacity and virtual
capacity. This type uses no storage efficiency features.
When a fully allocated volume is created on a DRP, it bypasses the LSA structure and works
in the same manner as in a standard pool; Therefore, it has no effect on processing and
provides no data reduction options at the pool level.
When fully allocated volumes are used on the IBM Storage Virtualized systems with FCM
drives (whether a DRP or standard pool is used), capacity savings are achieved by
compressing data with hardware compression that runs on the FCM drives. Hardware
compression on FCM drives is always on and cannot be turned off. This configuration
provides maximum performance in combination with outstanding storage efficiency.
Thin-provisioned
A thin-provisioned volume presents a different capacity to mapped hosts than the capacity
that the volume uses in the storage pool. Therefore, real and virtual capacities might not be
equal. The virtual capacity of a thin-provisioned volume is typically significantly larger than its
real capacity. As more information is written by the host to the volume, more of the real
capacity is used. The system identifies read operations to unwritten parts of the virtual
capacity, and returns zeros to the server without the use of any real capacity.
88 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
In a shared storage environment, thin provisioning is a method for optimizing the use of
available storage. Thin provisioning relies on the allocation of blocks of data on demand,
versus the traditional method of allocating all of the blocks up front. This method eliminates
almost all white space, which helps avoid the poor usage rates that occur in the traditional
storage allocation method where large pools of storage capacity are allocated to individual
servers but remain unused (not written to).
Space that is freed from the hosts is a process that is called UNMAP. A host can issue SCSI
UNMAP commands when the user deletes files on a file system, which result in the freeing of all
of the capacity that is allocated within that unmapping.
A thin-provisioned volume in a standard pool does not return unused capacity to the pool with
SCSI UNMAP.
The IBM FlashSystem and IBM SAN Volume Controller family DRP compression is based on
the Lempel-Ziv lossless data compression algorithm that operates by using a real-time
method. When a host sends a write request, the request is acknowledged by the write cache
of the system and then, staged to the DRP.
As part of its staging, the write request passes through the compression engine and is stored
in a compressed format. Therefore, writes are acknowledged immediately after they are
received by the write cache with compression occurring as part of the staging to internal or
external physical storage. This process occurs transparently to host systems, which makes
them unaware of the compression.
The tool scans target workloads on various earlier storage arrays (from IBM or another
company), merges all scan results and then, provides an integrated system-level data
reduction estimate.
Both tools are available as stand-alone, host-based utilities that can analyze data on IBM or
third-party storage devices. For more information, see this IBM Support web page.
Deduplication can be configured with thin-provisioned and compressed volumes in DRPs for
added capacity savings. The deduplication process identifies unique chunks of data, or byte
patterns, and stores a signature of the chunk for reference when writing new data chunks.
If the new chunk’s signature matches a signature, the new chunk is replaced with a small
reference that points to the stored chunk. The matches are detected when the data is written.
The same byte pattern might occur many times, which greatly reduce the amount of data that
must be stored.
Compression and deduplication are not mutually exclusive: One, both, or none of the features
can be enabled. If the volume is deduplicated and compressed, data is deduplicated first and
then, compressed. Therefore, deduplication references are created on the compressed data
that is stored on the physical domain.
Encryption is performed by the IBM FlashSystem or IBM SAN Volume Controller controllers
for data that is stored:
Within the entire system
The IBM FlashSystem control enclosure
All attached expansion enclosures
As externally virtualized by the IBM FlashSystem or IBM SAN Volume Controller storage
systems
Encryption is the process of encoding data so that only authorized parties can read it. Data
encryption is protected by the Advanced Encryption Standard (AES) algorithm that uses a
256-bit symmetric encryption key in XTS mode, as defined in the IEEE 1619-2007 standard
and NIST Special Publication 800-38E as XTS-AES-256.
90 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Two types of encryption are available on devices that are running IBM Storage Virtualize:
hardware and software. Which method is used for encryption is chosen automatically by the
system based on the placement of the data:
Hardware encryption: Data is encrypted by IBM FlashCore module (FCM) hardware and
SAS hardware for expansion enclosures. It is used only for internal storage (drives).
Software encryption: Data is encrypted by using the nodes’ CPU (the encryption code
uses the AES-NI CPU instruction set). It is used only for external storage that is virtualized
by the IBM FlashSystem and IBM SAN Volume Controller managed storage systems.
Both methods of encryption use the same encryption algorithm, key management
infrastructure, and license.
Note: Only data-at-rest is encrypted. Host-to-storage communication and data that is sent
over links that are used for remote mirroring are not encrypted.
The IBM FlashSystem also supports self-encrypting drives, in which data encryption is
completed in the drive.
Before encryption can be enabled, ensure that a license was purchased and activated.
Another data security enhancement that delivered with the IBM Storage Virtualize code is the
new Safeguarded Copy feature that can provide protected read-only logical air gap copies of
volumes. This enhancement gives the customer effective data protection against cyber
attacks.
VVOLs simplify operations through policy-driven automation that enables more agile storage
consumption for VMs and dynamic adjustments in real time when they are needed. They also
simplify the delivery of storage service levels to individual applications by providing finer
control of hardware resources and native array-based data services that can be instantiated
with VM granularity.
With VVOLs, VMware offers a paradigm in which an individual VM and its disks, rather than a
logical unit number (LUN), becomes a unit of storage management for a storage system. It
encapsulates VDisks and other VM files, and natively stores the files on the storage system.
By using a special set of APIs that are called vSphere APIs for Storage Awareness (VASA),
the storage system becomes aware of the VVOLs and their associations with the relevant
VMs. Through VASA, vSphere and the underlying storage system establish a two-way
out-of-band communication to perform data services and offload certain VM operations to the
storage system. For example, some operations, such as snapshots and clones, can be
offloaded.
For more information about VVOLs and the actions that are required to implement this feature
on the host side, see the VMware website. Also, see IBM Storage Virtualize and VMware:
Integrations, Implementation and Best Practices, SG24-8549.
IBM support for VASA is provided by IBM Storage Connect enabling communication between
the VMware vSphere infrastructure and the IBM FlashSystem system. The
IBM FlashSystem administrator can assign ownership of VVOLs to IBM Storage Connect by
creating a user with the VASA Provider security role.
92 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Although the system administrator can complete specific actions on volumes and pools that
are owned by the VASA Provider security role, IBM Storage Connect retains management
responsibility for VVOLs. For more information about IBM Storage Connect, see the
IBM Storage Connect documentation.
Note: At the time of this writing, VVOLs are not supported on DRPs. However, they are still
valid with Version 8.6.0.
The IBM storage systems use a GUI with the same look and feel across all platforms for a
consistent management experience. The GUI includes an improved overview dashboard that
provides all of the information in an easy-to-understand format and enables visualization of
effective capacity. By using the GUI, you can quickly deploy storage and manage it efficiently.
Figure 1-53 shows the Storage Virtualize GUI dashboard view. This default view is displayed
after the user logs on to the system.
The IBM FlashSystem storage systems and the IBM SAN Volume Controller also provide a
CLI, which is useful for advanced configuration and scripting.
The systems support SNMP, email notifications that use Simple Mail Transfer Protocol
(SMTP), and syslog redirection for complete enterprise management access.
The IBM Call Home function opens a service alert if a serious error occurs in the system,
which automatically sends the details of the error and contact information to IBM Service
Personnel.
If the system is eligible for support, a Cognitive Support Program (CSP) ticket is automatically
created and assigned to the suitable IBM Support team. The information that is provided to
IBM is an excerpt from the event log that contains the details of the error, and customer
contact information from the system.
IBM Service Personnel contact the customer and arrange service on the system, which can
greatly improve the speed of resolution by removing the need for the customer to detect the
error and raise a support call themselves.
The system supports the following methods to transmit notifications to the support center:
Call Home with cloud services
Call Home with cloud services sends notifications directly to a centralized file repository
that contains troubleshooting information that is gathered from customers. Support
personnel can access this repository and automatically be assigned issues as problem
reports.
This method of transmitting notifications from the system to support removes the need for
customers to create problem reports manually. Call Home with cloud services also
eliminates email filters dropping notifications to and from support, which can delay
resolution of problems on the system.
This method sends notifications only to the predefined support center.
Call Home with email notifications
Call Home with email notification sends notifications through a local email server to
support and local users or services that monitor activity on the system. With email
notifications, you can send notifications to support and designate internal distribution of
notifications, which alerts internal personnel about potential problems. Call Home with
email notifications requires configuring at least one email server, and local users.
However, external notifications to the support center can be dropped if filters on the email
server are active. To eliminate this problem, Call Home with email notifications is not
recommended as the only method to transmit notifications to the support center. Call
Home with email notifications can be configured with cloud services.
IBM highly encourages all customers to take advantage of the Call Home feature so that you
and IBM can collaborate for your success.
For more information about the features and functions of both IBM Call Home methods, see
this IBM Support web page.
94 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Call Home Connect Cloud works offline, too, although real-time updates are not available
while offline. For clients who want a more tailored mobile experience, we offer a mobile
companion app, Call Home Connect Anywhere, for Android and iOS devices.
For more information about CHCC, see links here, Introducing IBM Call Hone Connect Cloud.
You can view data from the perspectives of the servers, applications, and file systems. Two
versions of IBM Storage Insights are available: IBM Storage Insights and IBM Storage
Insights Pro.
When you order any IBM FlashSystem storage system or IBM SAN Volume Controller, IBM
Storage Insights is available at no extra cost. With this version, you can monitor the basic
health, status, and performance of various storage resources.
Note: With some models of Storage Virtualize systems that offer the Premium Storage
Expert Care level of support, Storage Insights Pro is included as part of the offering. For
more information, see 1.6.1, “Storage Expert Care” on page 50.
IBM Storage Insights is a part of the monitoring and helps to ensure continued availability of
the IBM FlashSystem storage or IBM SAN Volume Controller systems.
The tool provides a single dashboard that gives you a clear view of all your IBM block and file
storage and some other storage vendors (the IBM Storage Insights Pro version is required to
view other storage vendors’ storage). You can make better decisions by seeing trends in
performance and capacity. With storage health information, you can focus on areas that need
attention.
When IBM Support is needed, IBM Storage Insights simplifies uploading logs, speeds
resolution with online configuration data, and provides an overview of open tickets, all in one
place.
96 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
For IBM Storage Insights to operate, a lightweight data collector must be deployed in your
data center to stream only system metadata to your IBM Cloud instance. The metadata flows
in one direction: from your data center to IBM Cloud over HTTPS.
The application data that is stored on the storage systems cannot be accessed by the data
collector. In IBM Cloud, your metadata is AES256-encrypted and protected by physical,
organizational, access, and security controls.
For more information about IBM Storage Insights, see the following websites:
IBM Storage Insights Fact Sheet
Functional demonstration environment (requires an IBMid)
IBM Storage Insights security information
IBM Storage Insights registration
The RESTful apiserver does not consider transport security (such as Secure Sockets Layer
(SSL)), but instead assumes that requests are started from a local, secured server. The
HTTPS protocol provides privacy through data encryption. The RESTful API provides more
security by requiring command authentication, which persists for 2 hours of activity or 30
minutes of inactivity, whichever occurs first.
Uniform Resource Locators (URLs) target different node objects on the system. The HTTPS
POST method acts on command targets that are specified in the URL. To make changes or
view information about different objects on the system, you must create and send a request to
the system. You must provide specific elements for the RESTful apiserver to receive and
transform the request into a command.
To interact with the system by using the RESTful API, make an HTTPS command request
with a valid configuration node URL destination. Open TCP port 7443 and include the
keyword rest and then, use the following URL format for all requests:
https://system_node_ip:7443/rest/command
Where:
system_node_ip is the system IP address, which is the address that is taken by the
configuration node of the system.
The port number is always 7443 for the IBM Storage Virtualize RESTful API.
rest is a keyword.
command is the target command object (such as auth or lseventlog with any
parameters). The command specification uses the following format:
command_name,method="POST",headers={'parameter_name': 'parameter_value',
'parameter_name': 'parameter_value',...}
Volume mirroring
By using volume mirroring, a volume can have two physical copies in one IBM Storage
System. Each volume copy can belong to a different pool and use a different set of capacity
saving features.
When a host writes to a mirrored volume, the system writes the data to both copies. When a
host reads a mirrored volume, the system picks one of the copies to read. If one of the
mirrored volume copies is temporarily unavailable, the volume remains accessible to servers.
The system remembers which areas of the volume are written, and resynchronizes these
areas when both copies are available.
You can create a volume with one or two copies, and you can convert a non mirrored volume
into a mirrored volume by adding a copy. When a copy is added in this way, the system
synchronizes the new copy so that the new copy is the same as the existing volume. Servers
can access the volume during this synchronization process.
Volume mirroring can be used to migrate data to or from an IBM Storage Systems running
IBM Storage Virtualize. For example, you can start with a non mirrored image mode volume in
the migration pool and then, add a copy to that volume in the destination pool on internal
storage. After the volume is synchronized, you can delete the original copy that is in the
source pool. During the synchronization process, the volume remains available.
Volume mirroring also is used to convert fully allocated volumes to use data reduction
technologies, such as thin-provisioning, compression, or deduplication, or to migrate volumes
between storage pools.
98 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
FlashCopy
The FlashCopy or snapshot function creates a point-in-time (PiT) copy of data that is stored
on a source volume to a target volume. FlashCopy is sometimes described as an instance of
a time-zero (T0) copy. Although the copy operation takes some time to complete, the resulting
data on the target volume is presented so that the copy appears to occur immediately, and all
data is available immediately. Advanced functions of FlashCopy allow operations to occur on
multiple source and target volumes.
Management operations are coordinated to provide a common, single PiT for copying target
volumes from their respective source volumes to create a consistent copy of data that spans
multiple volumes.
The function also supports multiple target volumes to be copied from each source volume,
which can be used to create images from different PiTs for each source volume.
FlashCopy is used to create consistent backups of dynamic data and test applications, and to
create copies for auditing purposes and for data mining. It can be used to capture the data at
a specific time to create consistent backups of dynamic data. The resulting image of the data
can be backed up; for example, to a tape device object storage or another disk/flash based
storage technology. When the copied data is on tape, the data on the FlashCopy target disks
becomes redundant and can be discarded.
FlashCopy can perform a restore from any FlashCopy mapping. Therefore, you can restore
(or copy) from the target to the source of your regular FlashCopy relationships. When
restoring data from FlashCopy, this method can be qualified as reversing the direction of the
FlashCopy mappings. This approach can be used for various applications, such as recovering
a production database application after an errant batch process that caused extensive
damage.
Remote mirroring
You can use remote mirroring (also referred as Remote Copy [RC]) function to set up a
relationship between two volumes, where updates made to one volume are mirrored on the
other volume. The volumes can be on two different systems (intersystem) or on the same
system (intrasystem).
For an RC relationship, one volume is designated as the primary and the other volume is
designated as the secondary. Host applications write data to the primary volume, and
updates to the primary volume are copied to the secondary volume. Normally, host
applications do not run I/O operations to the secondary volume.
volume, which ensures that both volumes have identical data when the copy operation
completes. After the initial copy operation completes, the MM function always maintains a
fully synchronized copy of the source data at the target site. The MM function supports
copy operations between volumes that are separated by distances up to 300 km
(186.4 miles).
For DR purposes, MM provides the simplest way to maintain an identical copy on the
primary and secondary volumes. However, as with all synchronous copies over remote
distances, host application performance can be affected. This performance effect is
related to the distance between primary and secondary volumes and depending on
application requirements, its use might be limited based on the distance between sites.
Global Mirror (GM)
Provides a consistent copy of a source volume on a target volume. The data is written to
the target volume asynchronously and the copy is continuously updated. When a host
writes to the primary volume, a confirmation of I/O completion is received before the write
operation completes for the copy on the secondary volume. Because of this situation, the
copy might not contain the most recent updates when a DR operation is completed.
If a failover operation is started, the application must recover and apply any updates that
were not committed to the secondary volume. If I/O operations on the primary volume are
paused for a short period, the secondary volume can become a match of the primary
volume. This function is comparable to a continuous backup process in which the last few
updates are always missing. When you use GM for DR, you must consider how you want
to handle these missing updates.
The secondary volume is generally less than 1 second behind the primary volume, which
minimizes the amount of data that must be recovered if a failover occurs. However, a
high-bandwidth link must be provisioned between the two sites.
Global Mirror with Change Volumes (GMCV)
Enables support for GM with a higher recovery point objective (RPO) by using change
volumes. This function is for use in environments where the available bandwidth between
the sites is smaller than the update rate of the replicated workload.
With GMCV, or GM with cycling, change volumes must be configured for the primary and
secondary volumes in each relationship. A copy is taken of the primary volume in the
relationship to the change volume. The background copy process reads data from the
stable and consistent change volume and copies the data to the secondary volume in the
relationship.
CoW technology is used to maintain the consistent image of the primary volume for the
background copy process to read. The changes that occurred while the background copy
process was active also are tracked. The change volume for the secondary volume also
can be used to maintain a consistent image of the secondary volume while the
background copy process is active.
GMCV provides fewer requirements to inter-site link bandwidth than other RC types. It is
mostly used when link parameters are insufficient to maintain the RC relationship without
affecting host performance.
Note: Although all three types of RC are supported to work over an IP link, the
recommended type is GMCV.
100 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Note: IBM FlashSystems 5015, 5035 and 5045 do not support policy-based replication.
With policy-based replication, you can replicate data between systems with minimal
management, significantly higher throughput and reduced latency compared to the
remote-copy function. A replication policy has following properties:
A replication policy can be assigned to one or more volume groups.
Replication policies cannot be changed after they are created. If changes are required, a
new policy can be created and assigned to the associated volume group.
Each system supports up to a maximum of 32 replication policies.
– Replication policies
• Replication policies define the replication settings that are assigned to the volume
groups. Replication policies replicate the volume groups and ensure that consistent
data is available on the production and recovery system.
– Volume groups for policy-based replication
• Policy-based replication uses volume groups and replication policies to
automatically deploy and manage replication. Policy-based replication significantly
simplifies configuring, managing, and monitoring replication between two systems.
– Partnerships
• Two-site partnerships replicate volume data that is on one system to a remote
system. Two-site partnerships are required for policy-based replication.
Partnerships can be used for migration, 3-site replication, and disaster recovery
situations.
For more information on PBR see this link,Getting Started with policy-based-replication
1.16.2 HyperSwap
The IBM HyperSwap function is a HA feature that provides dual-site, active-active access to a
volume. It is available on systems that can support more than one I/O group.
With HyperSwap, a fully independent copy of the data is maintained at each site. When data
is written by hosts at either site, both copies are synchronously updated before the write
operation is completed. The HyperSwap function automatically optimizes to minimize data
that is transmitted between two sites, and to minimize host read and write latency.
If the system or the storage at either site goes offline and an online and accessible up-to-date
copy is left, the HyperSwap function can automatically fail over access to the online copy. The
HyperSwap function also automatically resynchronizes the two copies when possible.
The relationships provide access to whichever copy is up to date through a single volume,
which has a unique ID. This volume is visible as a single object across both sites (I/O groups),
and is mounted to a host system.
A 2-site HyperSwap configuration can be extended to a third site for DR that uses the IBM
Storage Virtualize 3-Site Orchestrator.
IBM Storage Virtualize 3-Site Orchestrator coordinates replication of data for disaster
recovery and high availability scenarios between systems that are on three geographically
dispersed sites. IBM Storage Virtualize 3-Site Orchestrator is a command-line based
application that runs on a separate Linux host that configures and manages supported
replication configurations on IBM Storage Virtualize products.
The HyperSwap function works with the standard multipathing drivers that are available on
various host types, with no extra host support that is required to access the highly available
volume. Where multipathing drivers support Asymmetric Logical Unit Access (ALUA), the
storage system tells the multipathing driver which nodes are closest to it and must be used to
minimize I/O latency. You tell the storage system which site a host is connected to, and it
configures host pathing optimally.
102 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Note: The SCM drives and XL FCM drives require IBM Storage Virtualize V8.3.1 or later to
be installed on the IBM FlashSystem control enclosure.
The following IBM FlashSystem products can support all three versions of these drives:
9500
9500R Rack Solution
9200
9200R Rack Solution
7300
7200
5200
5100
Figure 1-55 shows an FCM (NVMe) with a capacity of 19.2 TB. The first generation of FCM
drives were built by using 64-layer Triple Level Cell (TLC) flash memory and an Everspin
MRAM cache into a U.2 form factor. FCM generation 2 and 3 are built using 96-layer Quad
Level Cell (QLC) flash memory. These later generations also reformat some of the flash
capacity to a pseudo-SLC mode (pSLC) to improve performance, reduce latency and provide
a dynamic read cache on the device.
FCM drives are designed for high parallelism and optimized for 3D QLC and updated FPGAs.
IBM also enhanced the FCM drives by adding read cache to reduce latency on highly
compressed pages. Also added was four-plane programming to lower the overall power
during writes. FCM drives offer hardware-assisted compression up to 3:1 and are FIPS 140-2
complaint.
FCM drives carry IBM Variable Stripe RAID (VSR) at the FCM level and use DRAID to protect
data at the system level. VSR and DRAID together optimize RAID rebuilds by off-loading
rebuilds to DRAID, and they offer protection against FCM failures.
NVMe supports multiple I/O queues up to 64 K queues, and each queue can support up to
64 K entries. Earlier generations of SAS and Serial Advanced Technology Attachment (SATA)
support a single queue with only 254 and 32 entries and use many more CPU cycles to
access data. NVMe handles more workload for the same infrastructure footprint.
Storage-class memory
SCM drives use persistent memory technologies that improve endurance and reduce the
latency of flash storage device technologies. All SCM drives use the NVMe architecture.
IBM Research® is actively engaged in researching these new technologies.
For more information about nanoscale devices, see this IBM Research web page.
104 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
For a comprehensive overview of the flash drive technology, see the SNIA Educational
Library web page.
IBM supports SCM class drives. Table 1-20 lists the SCM drive size options.
* These drives are not sold anymore, but they are still supported.
Easy Tier supports the SCM drives with a new tier that is called tier_scm.
Note: The SCM drive type supports only DRAID 6, DRAID 5, DRAID 1, and TRAID 1 or 10.
Applications typically read and write data as vectors of bytes or records. However, storage
presents data as vectors of blocks of a constant size (512 or in the newer devices, 4096 bytes
per block).
The file, record, and namespace virtualization and file and record subsystem layers convert
records or files that are required by applications to vectors of blocks, which are the language
of the block virtualization layer. The block virtualization layer maps requests of the higher
layers to physical storage blocks, which are provided by storage devices in the block
subsystem.
Each of the layers in the storage domain abstracts away complexities of the lower layers and
hides them behind an easy to use, standard interface that is presented to upper layers. The
resultant decoupling of logical storage space representation and its characteristics that are
visible to servers (storage consumers) from underlying complexities and intricacies of storage
devices is a key concept of storage virtualization.
The focus of this publication is block-level virtualization at the block virtualization layer, which
is implemented by IBM as IBM Storage Virtualize software that is running on an IBM SAN
Volume Controller and the IBM FlashSystem family. The IBM SAN Volume Controller is
implemented as a clustered appliance in the storage network layer. The IBM FlashSystem
storage systems are deployed as modular systems that can virtualize their internally and
externally attached storage.
IBM Storage Virtualize uses the SCSI protocol to communicate with its clients and presents
storage space as SCSI logical units (LUs), which are identified by SCSI LUNs.
Note: Although LUs and LUNs are different entities, the term LUN in practice is often used
to refer to a logical disk, that is, an LU.
Although most applications do not directly access storage but work with files or records, the
operating system of a host must convert these abstractions to the language of storage; that is,
vectors of storage blocks that are identified by LBAs within an LU.
106 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Inside IBM Storage Virtualize, each of the externally visible LUs is internally represented by a
volume, which is an amount of storage that is taken out of a storage pool. Storage pools are
made of MDisks; that is, they are LUs that are presented to the storage system by external
virtualized storage or arrays that consist of internal disks. LUs that are presented to
IBM Storage Virtualize by external storage usually correspond to RAID arrays that are
configured on that storage.
With storage virtualization, you can manage the mapping between logical blocks within an LU
that is presented to a host and blocks on physical drives. This mapping can be as simple or as
complicated as required. A logical block can be mapped to one physical block, or for
increased availability, multiple blocks that are physically stored on different physical storage
systems, and in different geographical locations.
Importantly, the mapping can be dynamic: With Easy Tier, IBM Storage Virtualize can
automatically change underlying storage to which groups of blocks (extent) are mapped to
better match a host’s performance requirements with the capabilities of the underlying
storage systems.
IBM Storage Virtualize gives a storage administrator various options to modify volume
characteristics, from volume resize to mirroring, creating a point-in-time (PiT) copy with
FlashCopy, and migrating data across physical storage systems.
Importantly, all the functions that are presented to the storage users are independent from the
characteristics of the physical devices that are used to store data. This decoupling of the
storage feature set from the underlying hardware and ability to present a single, uniform
interface to storage users that masks underlying system complexity is a powerful argument
for adopting storage virtualization with IBM Storage Virtualize.
IBM FlashSystem and IBM SAN Volume Controller families are scalable solutions running on
a HA platform that can use diverse back-end storage systems to provide all the benefits to
various attached hosts.
You can use IBM Storage Systems running IBM Storage Virtualize to preserve your
investments in storage, centralize management, and make storage migrations easier with
storage virtualization and Easy Tier. Virtualization helps insulate applications from changes
that are made to the physical storage infrastructure.
To verify whether your storage can be virtualized by IBM FlashSystem or IBM SAN Volume
Controller, see the IBM System Storage Interoperation Center (SSIC).
All the IBM Storage Systems that are running IBM Storage Virtualize can migrate data from
external storage controllers, including migrating from any other IBM or third-party storage
systems. IBM Storage Virtualize uses the functions that are provided by its external
virtualization capability to perform the migration. This capability places external LUs under the
control of an IBM FlashSystem or IBM SAN Volume Controller system. Then, hosts continue
to access them through the IBM FlashSystem system or IBM SAN Volume Controller system,
which acts as a proxy.
108 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
The GUI of the IBM Storage Systems that are running IBM Storage Virtualize provides a
storage migration wizard, which simplifies the migration task. The wizard features intuitive
steps that guide users through the entire process.
Note: The IBM FlashSystem 5015, 5035 and 5045 systems do not support external
virtualization for any other purpose other than data migration.
Summary
Storage virtualization is a fundamental technology that enables the realization of flexible and
reliable storage solutions. It helps enterprises to better align their IT architecture with
business requirements, simplify their storage administration, and facilitate their IT
departments efforts to meet business demands.
IBM Storage Virtualize that is running on the IBM FlashSystem family is a mature,
11th-generation virtualization solution that uses open standards and complies with the SNIA
storage model. All the products are appliance-based storage, and use in-band block
virtualization engines that move the control logic (including advanced storage functions) from
many individual storage devices to a centralized entity in the storage network.
IBM Storage Virtualize can improve the use of your storage resources, simplify storage
management, and improve the availability of business applications.
The IBM SAN Volume Controller is a collection of up to eight nodes, which are added in pairs
that are known as I/O groups. These nodes are managed as a set (system), and they present
a single point of control to the administrator for configuration and service activity.
The eight-node limit for an IBM SAN Volume Controller system is a limitation that is imposed
by the Licensed Internal Code, and not a limit of the underlying architecture. Larger system
configurations might be available in the future.
Although the IBM SAN Volume Controller code is based on a purpose-optimized Linux kernel,
the clustered system feature is not based on Linux clustering code. The clustered system
software within the IBM SAN Volume Controller (that is, the event manager cluster
framework) is based on the outcome of the COMPASS research project. It is the key element
that isolates the IBM SAN Volume Controller application from the underlying hardware nodes.
The clustered system software makes the code portable. It provides the means to keep the
single instances of the IBM SAN Volume Controller code that are running on separate
systems’ nodes in sync. Therefore, restarting nodes during a code upgrade, adding nodes,
removing nodes from a system, or failing nodes cannot affect IBM SAN Volume Controller
availability.
All active nodes of a system must know that they are members of the system. This knowledge
is especially important in situations where it is key to have a solid mechanism to decide which
nodes form the active system, such as the split-brain scenario where single nodes lose
contact with other nodes. A worst case scenario is a system that splits into two separate
systems.
Within an IBM SAN Volume Controller system, the voting set and a quorum disk are
responsible for the integrity of the system. If nodes are added to a system, they are added to
the voting set. If nodes are removed, they are removed quickly from the voting set. Over time,
the voting set and the nodes in the system can change so that the system migrates onto a
separate set of nodes from the set on which it started.
The IBM SAN Volume Controller clustered system implements a dynamic quorum. Following
a loss of nodes, if the system can continue to operate, it adjusts the quorum requirement so
that further node failure can be tolerated.
The lowest Node Unique ID in a system becomes the boss node for the group of nodes. It
determines (from the quorum rules) whether the nodes can operate as the system. This node
also presents the maximum two-cluster IP addresses on one or both of its nodes’ Ethernet
ports to enable access for system management.
Stretched clusters are considered HA solutions because both sites work as instances of the
production environment (no standby location exists). Combined with application and
infrastructure layers of redundancy, stretched clusters can provide enough protection for data
that requires availability and resiliency.
110 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
In a stretched cluster configuration, nodes within an I/O group can be separated by a distance
of up to 10 km (6.2 miles) by using specific configurations. You can use FC inter-switch links
(ISLs) in paths between nodes of the same I/O group. In this case, nodes can be separated
by a distance of up to 300 km (186.4 miles); however, potential performance impacts can
result.
The site awareness concept enables more efficiency for host I/O traffic through the SAN, and
an easier host path management.
The use of an IP-based quorum application as the quorum device for the third site does not
require FC connectivity. Java applications run on hosts at the third site.
Note: Stretched cluster and ESC features are supported for IBM SAN Volume Controller
only. They are not supported for the IBM FlashSystem family of products.
For more information and implementation guidelines about deploying stretched cluster or
ESC, see IBM Spectrum Virtualize and SAN Volume Controller Enhanced Stretched Cluster
with VMware, SG24-8211.
1.19.4 Business continuity for FlashSystem and IBM SAN Volume Controller
In this section, we discuss business continuity for FlashSystem and IBM SAN Volume
Controller.
The HyperSwap feature provides HA volumes that are accessible through two sites at up to
300 km (186.4 miles) apart. A fully independent copy of the data is maintained at each site.
When data is written by hosts at either site, both copies are synchronously updated before the
write operation is completed. The HyperSwap feature automatically optimizes to minimize
data that is transmitted between sites and to minimize host read and write latency.
Works with the standard multipathing drivers that are available on various host types, with
no other host support that is required to access the HA volume.
For more information about HyperSwap implementation use cases and guidelines, see the
following publications:
IBM Storwize V7000, Spectrum Virtualize, HyperSwap, and VMware Implementation,
SG24-8317
High Availability for Oracle Database with IBM PowerHA SystemMirror and IBM Spectrum
Virtualize HyperSwap, REDP-5459
IBM Spectrum Virtualize HyperSwap SAN Implementation and Design Best Practices,
REDP-5597
IBM Storage Virtualize V8.4 and above expands the three-site replication model to include
HyperSwap, which improves data availability options in three-site implementations. Systems
that are configured in a three-site topology have high DR capabilities, but a disaster might
take the data offline until the system can be failed over to an alternative site.
To better assist with three-site replication solutions, IBM Storage Virtualize 3-Site
Orchestrator coordinates replication of data for DR and HA scenarios between systems.
IBM Storage Virtualize 3-Site Orchestrator is a command-line based application that runs on
a separate Linux host that configures and manages supported replication configurations on
IBM Storage Virtualize products.
112 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Figure 1-57 shows the two supported topologies for the three-site, replication -coordinated
solutions.
For more information about this type of implementation, see Spectrum Virtualize 3-Site
Replication, SG24-8474.
1.19.7 Automatic hot spare nodes (IBM SAN Volume Controller only)
In previous stages of IBM SAN Volume Controller development, the scripted warm standby
procedure enables administrators to configure spare nodes in a cluster by using the
concurrent hardware upgrade capability of transferring WWPNs between nodes. The system
can automatically take on the spare node to replace a failed node in a cluster or to keep the
entire system under maintenance tasks, such as software upgrades. These extra nodes are
called hot spare nodes.
Up to four nodes can be added to a single cluster. When the hot-spare node is used to
replace a node, the system attempts to find a spare node that matches the configuration of
the replaced node perfectly.
However, if a perfect match does not exist, the system continues the configuration check until
a matching criteria is found. The following criteria is used by the system to determine suitable
hot-spare nodes:
Requires an exact match:
– Memory capacity
– Fibre Channel port ID
– Compression support
– Site
Recommended to match, but can be different:
– Hardware type
– CPU count
– Number of Fibre Channel ports
If the criteria are not the same for both, the system uses lower criteria until the minimal
configuration is found. For example, if the Fibre Channel ports do not match exactly but all the
other required criteria match, the hot-spare node can still be used. The minimal configuration
that the system can use as a hot-spare node includes identical memory, site, Fibre Channel
port ID, and, if applicable, compression settings.
If the nodes on the system support and are licensed to use encryption, the hot-spare node
must also support and be licensed to use encryption.
The hot spare node essentially becomes another node in the cluster, but is not doing anything
under normal conditions. Only when it is needed does it use the N_Port ID Virtualization
(NPIV) feature of the Storage Virtualize virtualized storage ports to take over the job of the
failed node. It performs this takeover by moving the NPIV WWPNs from the failed node first to
the surviving partner node in the I/O group and then, over to the hot spare node.
Approximately 1 minute passes intentionally before a cluster swaps in a node to avoid any
thrashing when a node fails. In addition, the system must be sure that the node definitely
failed, and is not (for example) restarting. The cache flushes while only one node is in the I/O
group, the full cache is returned when the spare swaps in.
This entire process is transparent to the applications; however, the host systems notice a
momentary path lost for each transition. The persistence of the NPIV WWPNs lessens the
multipathing effort on the host considerably during path recovery.
Note: A warm start of active node (code assert or restart) does not cause the hot spare to
swap in because the restarted node becomes available within 1 minute.
The other use case for hot spare nodes is during a software upgrade. Normally, the only
impact during an upgrade is slightly degraded performance. While the node that is upgrading
is down, the partner in the I/O group writes through cache and handles both nodes’ workload.
Therefore, to work around this issue, the cluster uses a spare in place of the node that is
upgrading. The cache does not need to go into write-through mode and the period of
degraded performance from running off a single node in the I/O group is significantly reduced.
After the upgraded node returns, it is swapped back so that you roll through the nodes as
normal, but without any failover and failback at the multipathing layer. This process is handled
by the NPIV ports; therefore, the upgrades must be seamless for administrators who are
working in large enterprise IBM SAN Volume Controller deployments.
Note: After the cluster commits new code, it also automatically upgrades hot spares to
match the cluster code level.
This feature is available to IBM SAN Volume Controller only. Although IBM FlashSystem
systems can use NPIV and realize the general failover benefits, no hot spare canister or split
I/O group option is available for the enclosure-based systems.
114 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
You can maintain a chat session with the IBM SSR so that you can monitor this activity and
understand how to fix the problem yourself or enable them to fix it for you.
When you access the website, you sign in and enter a code that the IBM SSR provides to
you. This code is unique to each IBM Assist On-site session. A plug-in is downloaded to
connect you and your IBM SSR to the remote service session. The IBM Assist On-site tool
contains several layers of security to protect your applications and your computers. The
plug-in is removed after the next restart.
You also can use security features to restrict access by the IBM SSR. Your IBM SSR can
provide you with more information about the use of the tool.
The Remote Support Server provides predictive analysis of the IBM Storage Controller
running IBM Storage Virtualize software status and assists administrators with
troubleshooting and fix activities. Remote Support Assistance is available at no extra charge,
and no extra license is needed.
For more information about setting up Remote Support Assistance, see this IBM Support web
page.
Notifications are normally sent immediately after an event is raised. Each event that IBM
Storage Controller detects is assigned a notification type of Error, Warning, or Information.
You can configure the IBM Storage Controller to send each type of notification to specific
recipients.
You can use an SNMP manager to view the SNMP messages that IBM Storage Virtualize
sends. You can use the management GUI or the CLI to configure and modify your SNMP
settings.
The MIB file for SNMP can be used to configure a network management program to receive
SNMP messages that are sent by the IBM Storage Virtualize.
Syslog messages
The syslog protocol is a standard protocol for forwarding log messages from a sender to a
receiver on an IP network. The IP network can be Internet Protocol Version 4 (IPv4) or
Internet Protocol Version 6 (IPv6).
IBM SAN Volume Controller and IBM FlashSystems can send syslog messages that notify
personnel about an event. The event messages can be sent in expanded or concise format.
You can use a syslog manager to view the syslog messages that IBM SAN Volume Controller
or IBM FlashSystems sends.
IBM Storage Virtualize uses UDP to transmit the syslog message. You can use the
management GUI or the CLI to configure and modify your syslog settings.
To send email, you must configure at least one SMTP server. You can specify as many as five
other SMTP servers for backup purposes. The SMTP server must accept the relaying of email
from the IBM SAN Volume Controller clustered system IP address. Then, you can use the
management GUI or the CLI to configure the email settings, including contact information and
email recipients. Set the reply address to a valid email address.
Send a test email to check that all connections and infrastructure are set up correctly. You can
disable the Call Home function at any time by using the management GUI or CLI.
116 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
Call Home with cloud services also eliminates email filters dropping notifications to and from
support, which can delay resolution of problems on the system. Call Home with cloud
services uses Representational State Transfer (RESTful) APIs, which are a standard for
transmitting data through web services.
For new system installations, Call Home with cloud services is configured as the default
method to transmit notifications to support. When you update the system software, Call Home
with cloud services is also set up automatically. You must ensure that network settings are
configured to allow connections to the support center. This method sends notifications to only
the predefined support center.
To use Call Home with cloud services, ensure that all of the nodes on the system have
internet access, and that a valid service IP is configured on each node on the system. In
addition to these network requirements, you must configure suitable routing to the support
center through a domain name service (DNS) or by updating your firewall configuration so it
includes connections to the support center.
After a DNS server is configured, update your network firewall settings to allow outbound
traffic to esupport.ibm.com on port 443.
If not using DNS but you have a firewall to protect your internal network from outside traffic,
you must enable specific IP addresses and ports to establish a connection to the support
center. Ensure that your network firewall allows outbound traffic to the following IP addresses
on port 443:
129.42.21.70 (New)
129.42.56.189 **
129.42.60.189 **
Note: ** During 2023, some of these IP addresses are changing, so for the latest updates
on this, refer to this IBM Support Tip for the new addresses. IP Address Changes.
You can configure either of these methods or configure both for redundancy. DNS is the
preferred method because it ensures that the system can still connect to the IBM Support
center if the underlying IP addresses to the support center change.
IBM highly encourages all customers to take advantage of the Call Home feature so that you
and IBM can collaborate for your success.
For more information about the features and functions of Call Home methods, see this IBM
Support web page.
1.21 Licensing
All IBM FlashSystem functional capabilities are provided through IBM Storage Virtualize
software. Each platform is licensed as described in the following sections.
1.21.2 Licensing IBM FlashSystem 9500/R, 9200/R, 7300, 7200, 5200 and 5045
The IBM FlashSystems 9500, 9500R, 9200, 9200R, 7300, 7200, 5200 and 5045 include
all-inclusive licensing for all functions except encryption (which is a country-limited feature
code) and external virtualization.
Note: All internal enclosures in the FS9xx0, 7300, 7200, and 5200 require a license.
However, the new FS9500 (4983 only), the FS5200, FS5045 and FS7300 software is
License Machine Code.
Any externally virtualized storage requires the External Virtualization license per storage
capacity unit (SCU) that is based on the tier of storage that is available on the external
storage system. In addition, if you use FlashCopy and Remote Mirroring on an external
storage system, you must purchase a per-tebibyte license to use these functions.
The SCU is defined in terms of the category of the storage capacity, as listed in Table 1-21.
Flash All flash devices, other than SCU equates to 1.18 TiB usable of
SCM drives Category 2 storage.
118 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 26, 2023 12:27 pm Ch1 Intro and System Overview.fm
The encryption feature uses a key-based license that is activated by using an authorization
code. The authorization code is sent with the IBM FlashSystem Licensed Function
Authorization documents that you receive after purchasing the license.
The Encryption USB Flash Drives (Four Pack) feature or an external key manager, such as
the IBM Security Key Lifecycle Manager, are required for encryption keys management.
Each function is licensed to an IBM FlashSystem 5000 control enclosure. It covers the entire
system (control enclosure and all attached expansion enclosures) if it consists of one I/O
group. If the IBM FlashSystem 5030 / 5035 systems consists of two I/O groups, two keys are
required.
The following functions require a license key before they can be activated on the system:
Easy Tier
Easy Tier automatically and dynamically moves frequently accessed data to flash
(solid-state) drives in the system, which results in flash drive performance without
manually creating and managing storage tier policies. Easy Tier makes it easy and
economical to deploy flash drives in the environment. In this dynamically tiered
environment, data movement is seamless to the host application, regardless of the
storage tier in which the data is stored.
Remote Mirroring
The Remote Mirroring (also known as remote copy [RC]) function enables you to set up a
relationship between two volumes so that updates that are made by an application to one
volume are mirrored on the other volume.
The license settings apply to only the system on which you are configuring license
settings. For RC partnerships, a license also is required on any remote systems that are in
the partnership.
FlashCopy upgrade
The FlashCopy upgrade extends the base FlashCopy function that is included with the
product. The base version of FlashCopy limits the system to 64 target volumes. With the
FlashCopy upgrade license activated on the system, this limit is removed. If you reach the
limit that is imposed by the base function before activating the upgrade license, you cannot
create more FlashCopy mappings.
To help evaluate the benefits of these new capabilities, Easy Tier and RC licensed functions
can be enabled at no extra charge for a 90-day trial. Trials are started from the
IBM FlashSystem management GUI and do not require any IBM intervention. When the trial
expires, the function is automatically disabled unless a license key for that function is installed
onto the machine.
If you use a trial license, the system warns you at regular intervals when the trial is about to
expire. If you do not purchase and activate the license on the system before the trial license
expires, all configurations that use the trial licenses are suspended.
Note: Encryption hardware feature is available on the IBM FlashSystem 5035 and 5045
only.
This encryption feature uses a key-based license and is activated with an authorization code.
The authorization code is sent with the IBM FlashSystem 5000 Licensed Function
Authorization documents that you receive after purchasing the license.
The Encryption USB flash drives (Four Pack) feature or IBM Security Key Lifecycle Manager
are required for encryption keys management.
120 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
This chapter provides guidance to connect IBM Storage Virtualize in a SAN to achieve a
stable, redundant, resilient, scalable, and performance-likely environment. Although this
chapter does not describe how to design and build a flawless SAN from the beginning, you
can consider the principles that are presented here when building your SAN.
For more information about SAN design and best practices, see SAN Fabric Administration
Best Practices Guide.
A topology is described in terms of how the switches are interconnected. There are several
different SAN topologies, such as core-edge, edge-core-edge, or full mesh. Each topology
has its utility, scalability, and cost, so one topology is a better fit for some SAN demands than
others. Independent of the environment demands, there are a few best practices that must be
followed to keep your SAN working correctly, performing correctly, and be redundant and
resilient.
IBM Storage Virtualize systems support end-to-end NVMe connectivity along with Small
Computer System Interface (SCSI). NVMe is a high-speed transfer protocol to leverage the
parallelism of solid-state drives (SSDs) / FlashCore Modules (FCMs) in IBM Storage
Virtualize systems.
IBM Storage Virtualize systems support Ethernet connectivity by using 25 Gb and 100 Gb
adapter options for internet Small Computer Systems Interface (ISCSI), iSCSI Extensions for
RDMA (iSER) on RoCE or iWARP, or NVMe over RDMA. Each adapter supports a different
use case. With 25 Gb and 100 Gb, both adapters support host attachment by using iSCSI and
NVMe over RDMA (RoCE). iSER (RoCE or iWARP), clustering or HyperSwap, native IP
replication, and external virtualization are supported only on a 25 Gb adapter.
Because most SAN installations continue to grow over the years, the main SAN industry-lead
companies design their products to support a certain type of growth. Your SAN must be
designed to accommodate both short-term and medium-term growth.
From the performance standpoint, the following topics must be evaluated and considered:
Host-to-storage fan-in fan-out ratios
Host to Inter-Switch Link (ISL) oversubscription ratio
Edge switch to core switch oversubscription ratio
Storage to ISL oversubscription ratio
122 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
From a scalability standpoint, ensure that your SAN can support the new storage and host
traffic. Make sure that the chosen topology can support growth in performance and port
density.
If new ports must be added to the SAN, you might need to drastically modify the SAN to
accommodate a larger-than-expected number of hosts or storage. Sometimes these changes
increase the number of hops on the SAN and so cause performance and ISL congestion
issues. For more information, see 2.1.2, “ISL considerations” on page 123.
Consider using SAN director-class switches. They reduce the number of switches in a SAN
and provide the best scalability that is available. Most of the SAN equipment vendors provide
high port density switching devices. With MDS 9718 Multilayer Director, Cisco offers the
industry’s highest port density single chassis with up to seven hundred and sixty-eight 16 or
32 Gb ports. The Brocade UltraScale Inter-Chassis Links (ICL) technology enables you to
create multichassis configurations with up to nine directors, or four thousand six hundred and
eight 16 or 32 Gb ports.
Also IBM offers IBM Storage Networking SAN128B-7 high-density 64G switch that provides
an opportunity for high scale fabrics in a less rack space.
Therefore, if possible, plan for the maximum size configuration that you expect your
IBM Storage Virtualize installation to reach. Planning for the maximum size does not mean
that you must purchase all the SAN hardware initially; it requires you to design only the SAN
to reach the expected maximum size.
Regardless of your SAN size, topology, or the size of your IBM Storage Virtualize installation,
consider applying the following best practices to your SAN ISL design:
Be aware of the ISL oversubscription ratio.
The standard recommendation is up to 7:1 (seven hosts that use a single ISL). However, it
can vary according to your SAN behavior. Most successful SAN designs are planned with
an oversubscription ratio of 7:1, and some extra ports are reserved to support a 3:1 ratio.
However, high-performance SANs start at a 3:1 ratio.
Exceeding the standard 7:1 oversubscription ratio requires you to implement fabric
bandwidth threshold alerts. If your ISLs exceed 70%, the schedule fabric changes to
distribute the load.
Avoid unnecessary ISL traffic.
Connect all IBM Storage Virtualize node ports in a clustered system to the same SAN
switches or directors as all the storage devices with which the clustered system of
IBM Storage Virtualize is expected to communicate. Conversely, storage traffic and
internode traffic should be avoided over an ISL (except during migration scenarios). For
certain configurations, such as HyperSwap or Stretched Cluster topology it is a good
practice to organize private SANs (for inter-node communication) and if possible separate
ISL communication.
Keep high-bandwidth use servers and I/O-intensive applications on the same SAN
switches as the IBM Storage Virtualize host ports. Placing these servers on a separate
switch can cause unexpected ISL congestion problems. Also, placing a high-bandwidth
server on an edge switch wastes ISL capacity.
Properly size the ISLs on your SAN. They must have adequate bandwidth and buffer
credits to avoid traffic or frames congestion. A congested ISL can affect the overall fabric
performance.
Always deploy redundant ISLs on your SAN. Using an extra ISL avoids congestion if an
ISL fails because of certain issues, such as a SAN switch line card or port blade failure.
Use the link aggregation features, such as Brocade Trunking or Cisco Port Channel to
obtain better performance and resiliency.
Avoid exceeding two hops between IBM Storage Virtualize and the hosts. More than two
hops are supported. However, when ISLs are not sized properly, more than two hops can
lead to ISL performance issues and buffer credit starvation (SAN congestion).
When sizing over two hops, consider that all the ISLs that go to the switch where
IBM Storage Virtualize is connected also handle the traffic that is coming from the switches
on the edges, as shown in Figure 2-1.
124 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
One of the advantages of a single-switch SAN is that no hop exists when all servers and
storages are connected to the same switches.
A best practice is to use a multislot director-class single switch over setting up a core-edge
fabric that is made up solely of lower-end switches, as described in 2.1.1, “SAN performance
and scalability” on page 122.
The single switch topology, as shown in Figure 2-2, has only two switches, so the IBM
Storage Virtualize ports must be equally distributed on both fabrics.
Note: To correctly size your network, always calculate the short-term and mid-term growth
to avoid lack of ports. On this topology, the limit of ports is based on the switch size. If other
switches are added to the network, the topology type is changed automatically.
126 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
When IBM Storage Virtualize and the servers are connected to different switches, the hop
count for this topology is one.
Note: This topology is commonly used to easily grow your SAN network by adding edge
switches to the core. Consider the ISL ratio and usage of physical ports from the core
switch when adding edge switches to your network.
Figure 2-4 shows an edge-core-edge topology with two different edges, one of which is
exclusive for the IBM Storage Virtualize system and high-bandwidth servers. The other pair is
exclusively for servers.
Performance can be slightly affected if the number of hops increases, which depends on the
total number of switches and the distance between the host and the IBM Storage Virtualize
system.
Edge-core-edge fabrics allow better isolation between tiers. For more information, see 2.2.6,
“Device placement” on page 130.
128 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Note: Each ISL uses one physical port. Depending on the total number of ports that each
switch has and the total number of switches, this topology uses several ports from your
infrastructure to be set up.
One of these topologies is the usage of an IBM Storage Virtualize system as a multi-SAN
device between two isolated SANs. This configuration is useful for storage migration or sharing
resources between SAN environments without merging them.
To use an external storage with an IBM Storage Virtualize system, this external storage must
be attached to the IBM Storage Virtualize system through the zoning configuration and set up
as virtualized storage. This feature can be used for storage migration and decommissioning
processes and to speed up host migration. In some cases, based on the external storage
configuration, virtualizing external storage with an IBM Storage Virtualize system can increase
performance based on the cache capacity and processing.
F ibre channel director F ibre channel director F ibre channel director F ibre channel director
SAN 1 SAN 2
No ISLs between
SAN-1 and SAN-2
IBM IBM
Spectrum Virtualize
Storage Virtualize
In Figure 2-6, both SANs are isolated. When connected to both SAN networks, the
IBM Storage Virtualize system can allocate storage to hosts on both SAN networks. It is also
possible to virtualize storage from each SAN network. This way, you can have established
storage on SAN2 (SAN 2 in Figure 2-6) that is attached to the IBM Storage Virtualize system
and provide disks to hosts on SAN1 (SAN 1 in Figure 2-6). This configuration is commonly
used for migration purposes or in cases where the established storage has a lower
performance compared to the IBM Storage Virtualize system.
130 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Keeping the traffic local to the fabric is a strategy to minimize the traffic between switches
(and ISLs) by keeping storages and hosts attached to the same SAN switch, as shown in
Figure 2-7.
Figure 2-7 Storage and hosts attached to the same SAN switch
This solution can work well in small- and medium-sized SANs. However, it is not as scalable
as other topologies that are available. The most scalable SAN topology is the
edge-core-edge, which is described in 2.2, “SAN topology-specific guidelines” on page 125.
In addition to scalability, this topology provides different resources to isolate the traffic and
reduce possible SAN bottlenecks. Figure 2-8 shows an example of traffic segmentation on
the SAN by using edge-core-edge topology.
Even when sharing core switches, it is possible to use virtual switches (see 2.2.7, “SAN
partitioning” on page 132) to isolate one tier from another one. This configuration helps avoid
traffic congestion that is caused by slow drain devices that are connected to the backup tier
switch.
As the number of available ports on a switch continues to grow, partitioning switches allow
storage administrators to take advantage of high port density switches by dividing physical
switches into different virtual switches. From a device perspective, SAN partitioning is
transparent, so the same guidelines and practices that apply to physical switches apply also
to virtual ones.
Although the main purposes of SAN partitioning are port consolidation and environment
isolation, this feature is also instrumental in the design of a business continuity solution that is
based on IBM Storage Virtualize.
SAN partitioning can be used to dedicate the IBM Storage Virtualize Fibre Channel (FC) ports
for internode communication, replication communication, and host to storage communication
along with IBM Storage Virtualize port masking.
Note: When director-class switches are used, use ports from different blades to seek load
balance and avoid a single point of failure.
For more information about IBM Storage Virtualize business continuity solutions, see
Chapter 7, “Ensuring business continuity” on page 501.
132 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
FC Host Bus Four Quad 16 Gb Three Quad 16 or 32 Three Quad 16 or 32 Six Quad 32 Gb
Adapters (HBAs) Gb (FC-NVMe Gb (FC-NVMe (FC-NVMe
supported) supported) supported)
OR
Three Quad 64 Gb
OR
Ethernet I/O Four Quad 10 Gb One Dual 25 Gb One Dual 25 Gb Six Dual 25 Gb
iSCSI and FCoE (available up to three (available up to three OR
25 Gb) 25 Gb) Six Dual 100 Gb for
iSCSI, iSER, RoCE,
RoCEv2, and iWARP
This new port density expands the connectivity options and provides new ways to connect the
SVC to the SAN. This section describes some best practices and use cases that show how to
connect an SVC on the SAN to use this increased capacity.
Figure 2-10 shows the port locations for the SV2/SA2 nodes.
The IBM SAN Volume Controller SV3 can have up to six quad FC HBA cards (24 FC ports)
per node. Figure 2-11 shows the port location in the rear view of the 2145-SV3 node.
SVC SV3 supports eight Peripheral Component Interconnect Express (PCIe) slots that are
organized by cages, where three cages (cage1, cage2, cage3) can be used for host/storage
connectivity adapters installation:
Slots 1 - 2 (cage1), 5 - 6 (cage2), and 7 - 8 (cage3).
Slot 3 is for compression offload.
Slot 4 is empty.
For maximum redundancy and resiliency, spread the ports across different fabrics. Because
the port count varies according to the number of cards that is included in the solution, try to
keep the port count equal on each fabric.
Table 2-2 IBM FlashSystem 9200 ports connectivity maximum options per enclosure
Connectivity type IBM FlashSystem 9200 ports per enclosure
134 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
SAS expansion ports Two Quad 12 Gb SAS (two ports active)(1x SAS
expansion adapter per node canister)
Port connectivity options are significantly increased with IBM FlashSystem 9500 compared to
previous models. IBM FlashSystem 9500 Model AH8 at code level 8.6.0 number of
connectivity ports per enclosure (two node canisters) are listed in Table 2-3.
Table 2-3 IBM FlashSystem 9500 ports connectivity maximum options per enclosure
Connectivity type IBM FlashSystem 9500 ports per enclosure
SAS expansion ports Two Quad 12 Gb SAS (two ports active) (1x SAS
expansion adapter per node canister)
Note:
IBM FlashSystem 9200 node canisters feature three PCIe slots where you can combine
the adapters as needed. If expansions are used, one of the slots must have the SAS
expansion card. Then, two ports are left for FC HBA cards, whether iWARP or RoCE
Ethernet adapters.
The IBM FlashSystem 9500 node canister features eight PCIe slots where you can
combine the cards as needed. And the information in the table above shows the
maximum combination of specific one type connectivity HBA. For example if 24 Fibre
Channel ports 64Gb are desired for configuration - this configuration occupies all three
cages in both node canisters, each cage with one 64Gb card installed, thus eliminating
possibility to install Ethernet HBAs.
Figure 2-12 Port location in the IBM FlashSystem 9200 rear view
For maximum redundancy and resiliency, spread the ports across different fabrics. Because
the port count varies according to the number of cards that is included in the solution, try to
keep the port count equal on each fabric.
IBM FlashSystem 9500 has eight PCIe slots where six of them (dedicated for cage1, cage2,
cage3) can be used for host/storage connectivity adapters:
Slots 1 - 2 (cage1), 5 - 6 (cage2), and 7 - 8 (cage3).
Slot 3 is for compression offload
Slot 4 is empty.
Figure 2-13 shows the port location in the rear view of the IBM FlashSystem 9200 node
canister.
Table 2-4 IBM FlashSystem 7200 ports connectivity maximum per enclosure
Feature IBM FlashSystem 7200 per enclosure
FC HBA 24 x 16 Gb FC/NVMeoF OR
24 x 32 Gb FC/NVMeoF
that is 3 x Quad port FC HBAs per node canister
136 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Table 2-5 lists the port connectivity options for IBM FlashSystem 7300.
Table 2-5 IBM FlashSystem 7300 ports connectivity maximum per enclosure
Feature IBM FlashSystem 7300
IBM FlashSystem 7300 can have up to three quad FC HBA cards (12 FC ports) per node
canister. Figure 2-15 shows the port location in the rear view of the IBM FlashSystem 7300
node canister.
For maximum redundancy and resiliency, spread the ports across different fabrics. Because
the port count varies according to the number of cards that is included in the solution, try to
keep the port count equal on each fabric.
2.3.4 IBM FlashSystem 5100 and 5200 and IBM FlashSystem 5015 and 5035
controller ports
IBM FlashSystem 5xxx systems bring the simplicity and innovation of other family members.
The following tables list the port connectivity options for IBM FlashSystem 5100 (Table 2-6),
IBM FlashSystem 5200 (Table 2-7 on page 139), IBM FlashSystem 5015 (Table 2-8 on
page 139), and IBM FlashSystem 5035 (Table 2-9 on page 139).
Table 2-6 IBM FlashSystem 5100 port connectivity maximum per enclosure
Feature IBM FlashSystem 5100
138 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Table 2-7 IBM FlashSystem 5200 port connectivity maximum per enclosure
Feature IBM FlashSystem 5200
IBM FlashSystem 5200 can have up to two dual FC HBA cards (four 32 Gb FC ports) per
node canister. Figure 2-17 on page 140 shows the port location in the rear view of the IBM
FlashSystem 5200 node canister.
IBM FlashSystem 5015 can have up to one quad FC HBA cards (four 16 Gb FC ports) per
node canister. Figure 2-18 shows the port location in the rear view of the IBM FlashSystem
5015 node canister.
IBM FlashSystem 5035 can have up to one quad FC HBA card (four 16 Gb or 32 Gb FC
ports) per node canister. Figure 2-19 shows the port location in the rear view of the
IBM FlashSystem 5035 node canister.
Transitional mode on an IBM Storage Virtualize system can be used to change or update your
zoning from traditional physical WWPNs to NPIV WWPNs that are based on zoning.
140 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Figure 2-20 and Figure 2-21 represent the IBM Storage Virtualize NPIV port WWPN and
failover.
The lstargerportfc command output (see Figure 2-22 on page 142) from the IBM Storage
Virtualize command-line interface (CLI) shows that each port has three WWPNs: One is a
physical WWPN for SCSI connectivity; the second is an NPIV WWPN for SCSI connectivity;
and the third is an NPIV WWPN for NVMe connectivity.
Note: NPIV is not supported for Ethernet connectivity, such as FCOE, iSCSI, and iSER.
The same ports (port IDs) must be in the same fabric to fail over in a hardware failure. For
example:
Port IDs 1 and 3 of all nodes and spare nodes are part of the odd fabric.
Port IDs 2 and 4 of all nodes and spare nodes are part of the even fabric.
Use the Transitional mode to convert your host zoning from physical WWPN to virtual WWPN.
To simplify the SAN connection identification and troubleshooting, keep all odd ports on the
odd fabrics or “A” fabrics, and the even ports on the even fabric or “B” fabrics, as shown in
Figure 2-23, which shows the port arrangement for IBM Storage Virtualize different models.
142 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
The same port distribution can be used for other IBM Storage Virtualize model, for example,
IBM FlashSystem 9500 or 9500R, IBM FlashSystem 7300, and IBM FlashSystem 5015 or
5035.
Because of the increased port availability on IBM FlashSystem or SVC clusters and the
increased bandwidth with the 32 Gb ports and 64Gb ports, it is not only possible to separate
the port utilization among hosts and storage, node-to-node (intracluster) and replication traffic
but also very sensible.
The example port utilization designation that is shown in the following figures can be used for
for up to 16 ports (Figure 2-26 on page 144)(32 ports total per cluster) and 24 ports (48 ports
total per cluster)(Figure 2-27 on page 145) IBM Storage Virtualize systems.
Design your port designation as required by your workload and to achieve maximum
availability in case of a failure as it separates failure domains, helps in better analysis and
management of the system.
Fibre channel port masking provides tool for such port communication segregation. There are
two masks available, one that allows to designate ports for local inter-node communication
that is recommended to have in case of multi-IO group configuration, and could be checked
by lssystem command in parameter localfcportmask. EthernetThis mask is should be read
from right to left, thus the most right bit in the mask corresponds with fc_io_port_id 1 and so
on. Setting corresponding (to ports) bits of this mask to “1” enables those ports for only
inter-node (intracluster) communication. While ports that have “0” in the mask are restricted to
only host/storage traffic and are not used for intracluster communication.
Second mask allows to assign specific node ports to only replication function (communication
with other remote cluster nodes). As well, it is mentioned in the output of lssystem command
in parameter remotefcportmask and it works and is read similar to localfcportmask, thus
setting corresponding bits to “1” enables only those specific ports for replication, all other
ports that are set to “0” can only communicate with hosts or storages (depending on SAN
zoning).
Figure 2-26 Port masking configuration on SVC or IBM FlashSystem with 16 ports
144 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Host and storage ports have different traffic behavior, so keeping host and storage ports
together produces maximum port performance and utilization by benefiting from its full duplex
bandwidth. For this reason, sharing host and storage traffic in the same ports is a best
practice. However, traffic segmentation can also provide some benefits in terms of
troubleshooting and host zoning management, see Figure 2-27 on page 145 for example of
traffic segmentation. Consider, for example, SAN congestion conditions due to a slow
draining device.
In this case, segregating the ports simplifies the identification of the device that is causing the
problem while limiting the effects of the congestion to the hosts or back-end ports only.
Furthermore, dedicating ports for host traffic reduces the possible combinations of host
zoning and simplifies SAN management. It is a best practice to implement port traffic
segmentation with configurations with more ports only.
Switch port buffer credit: For Stretched cluster and IBM HyperSwap configurations that
do not use ISLs for the internode communication, it is a best practice to set the switch port
buffer credits to match the IBM Storage Virtualize port.
146 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
2.4 Zoning
This section describes the zoning recommendations for IBM Storage Virtualize systems.
Zoning an IBM Storage Virtualize cluster into a SAN fabric requires planning and following
specific guidelines.
Important: Errors that are caused by improper IBM Storage Virtualize system zoning are
often difficult to isolate, and the steps to fix them can affect the SAN environment.
Therefore, create your zoning configuration carefully.
The initial configuration for IBM Storage Virtualize requires the following separate zones:
Internode and intra-cluster zones
Replication zones (if replication is used)
Back-end storage to IBM Storage Virtualize zoning for external virtualization
Host to IBM Storage Virtualize zoning
Different guidelines must be followed for each zoning type, as described in 2.4.1, “Types of
zoning” on page 147.
Note: Although internode and intra-cluster zone is not necessary for non-clustered
IBM Storage Virtualize systems except SVC, it is generally preferred to use one of these
zones.
When switch-port based zoning is used, the ability to allow only specific hosts to connect to
an IBM Storage Virtualize cluster is lost.
Consider an NPV device, such as an access gateway that is connected to a fabric. If 14 hosts
are attached to that NPV device, and switch port-based zoning is used to zone the switch port
for the NPV device to IBM Storage Virtualize system node ports, all 14 hosts can potentially
connect to the IBM Storage Virtualize cluster, even if IBM Storage Virtualize is providing
storage for only four or five of those hosts.
However, the problem is exacerbated when the IBM Storage Virtualize NPIV feature is used in
transitional mode. In this mode, a host can connect to the physical and virtual WWPNs on the
cluster. With switch port zoning, this configuration doubles the connection count for each host
that is attached to the IBM Storage Virtualize cluster. This issue can affect the function of path
failover on the hosts by resulting in too many paths, and the IBM Storage Virtualize Cluster
can exceed the maximum host connection count on a large fabric.
If you have the NPIV feature enabled on your IBM Storage Virtualize system, you must use
WWPN-based zoning.
Zoning types: Avoid using a zoning configuration that includes a mix of port and WWPN
zoning. For NPIV configurations, host zoning must use the WWPN zoning type.
A best practice for traditional zone design calls for single initiator zoning, that is, a zone can
consist of many target devices, but only one initiator because target devices often wait for an
initiator device to connect to them, and initiators actively attempt to connect to each device to
which they are zoned. The singe initiator approach removes the possibility of a misbehaving
initiator affecting other initiators.
The drawback to single initiator zoning is that on a large SAN that features many zones, the
SAN administrator’s job can be more difficult, and the number of zones on a large SAN can
exceed the zone database size limits.
Cisco and Brocade developed features that can reduce the number of zones by allowing the
SAN administrator to control which devices in a zone can communicate with other devices in
the zone. The features are called Cisco Smart Zoning and Brocade Peer Zoning, which are
supported by IBM Storage Virtualize systems.
Note: Brocade Traffic Isolation (TI) zoning is deprecated in Brocade Fabric OS 9.0. You
can still use TI zoning if you have existing zones, but you must keep at least one switch
running a pre-9.0 version of FOS in the fabric to make changes to the TI zones.
For more information about Smart Zoning, see this web page.
For more information about implementation, see this IBM Support web page.
For more information, see the section “Peer zoning” in Modernizing Your IT Infrastructure with
IBM b-type Gen 6 Storage Networking and IBM Spectrum Storage Products, SG24-8415.
Note: Use Smart and Peer Zoning for the host zoning only. Use traditional zoning for
intracluster, back-end, and intercluster zoning.
For systems with fewer than 64 hosts that are attached, zones that contain host HBAs must
contain no more than 40 initiators, including the ports that acts as initiators, such as the IBM
Storage Virtualize based system ports that are target + initiator.
148 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Therefore, a valid zone can be 32 host ports plus eight IBM Storage Virtualize based system
ports. Include only one port from each node in the I/O groups that are associated with this
host.
Note: Do not place more than one HBA port from the same host in the same zone. Also,
do not place dissimilar hosts in the same zone. Dissimilar hosts are hosts that are running
different operating systems or are different hardware products.
IBM Storage Virtualize supports Ethernet and FC portsets for host attachment. Back-end and
external storage connectivity, host attachment, and IP replication can be configured by using
IP portsets, and host attachment can be configured on FC portsets.
A host can access the storage only from those IP addresses or FC ports that are configured
on a portset and associated with that host.
Every portset is identified by a name. The default portsets are portset0 and portset64
portsets for host attachment. For Ethernet ports connectivity - portset0. And portset64 is for
FC ports connectivity host attachment. The default portset - portset3 is for the storage port
type by using Ethernet connectivity, as shown in Figure 2-28.
A host should always be associated with a correct portset on the storage. If you have a zoned
host with different ports that are not part of the same portset, the storage generates an event
of a wrong port login (Event ID 064002).
For more information about resolving a blocked FC login event, see Resolving a problem with
a blocked Fibre Channel login.
Note:
A host can be part of only one portset, but an FC port can be part of multiple portsets.
A correct zoning configuration along with a portset is recommended to achieve the
recommended number of paths on a host.
Here is an example, lets say we have two hosts and FlashSystem storage with 8 total ports
for fibre channel connectivity and 4 Ethernet ports. Two hosts have fibre channel connectivity
and two have Ethernet connectivity and we would like to make sure that those hosts are
separated for some reason, lets say due to difference in communication speed.
Thus, it is possible to create two portsets for fibre channel connected hosts. Firstly it is
recommended to create the portsets before defining the host objects on the storage, then
assign the fibre channel ports to the dedicated portsets, see the note above, it is possible to
add all 8 FC ports to the portset, but if we want to segregate the communication by for
example communication speed or specific failure domain, we can dedicate 4 FC ports to one
portset and 4 FC ports to another portset. It is, as well, important to make sure that those
ports belong to different fabrics for redundancy reasons. So, 4 ports dedicated to each portset
should have possibility to communicate with host through redundant fabric.
Similar can be done with Ethernet ports and they could be assigned to their dedicated
portsets.
After portsets are created and ports are assigned, host objects can be defined on the storage,
and each host object should be assigned to one portset. Thus, they would be able to
communicate with the storage only through the dedicated ports and if communication speeds
are different that can improve the reliability of communication and if at some point there is any
congestion or other communication issues, this segregation enhances the ability to
investigate and find the culprit faster.
So IBM Storage Virtualize Portsets can be used for effective workload distribution, proper
failure domain segmentation and host-based functional grouping (FC_SCSI, NVMeoF, and
performance). Figure 2-29 and Figure 2-30 show an example of configuration using the
portsets. Figure 2-29 on page 151 shows initial configuration with fibre channel ports and list
of portsets on the system. Notice the portset itsoV860 that was added to the list.
150 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
IBM_FlashSystem:FS9xx0:superuser>lsportfc
id fc_io_port_id port_id type port_speed node_id node_name WWPN nportid status attachment cluster_use
adapter_location adapter_port_id
0 1 1 fc 16Gb 5 node1 5005076810110214 080100 active switch
local_partner 1 1
1 2 2 fc 16Gb 5 node1 5005076810120214 080000 active switch
local_partner 1 2
2 3 3 fc 16Gb 5 node1 5005076810130214 120100 active switch
local_partner 1 3
3 4 4 fc 16Gb 5 node1 5005076810140214 120000 active switch
local_partner 1 4
24 1 1 fc 16Gb 2 node2 5005076810110216 080300 active switch
local_partner 1 1
25 2 2 fc 16Gb 2 node2 5005076810120216 120200 active switch
local_partner 1 2
26 3 3 fc 16Gb 2 node2 5005076810130216 120300 active switch
local_partner 1 3
27 4 4 fc 16Gb 2 node2 5005076810140216 080200 active switch
local_partner 1 4
IBM_FlashSystem:FS9xx0:superuser>lsportset
id name type port_count host_count lossless owner_id owner_name port_type is_default
0 portset0 host 0 0 ethernet yes
1 portset1 replication 0 0 ethernet no
2 portset2 replication 0 0 ethernet no
3 portset3 storage 0 0 ethernet no
4 itsoV860 host 0 0 yes fc no
64 portset64 host 4 2 yes fc yes
Figure 2-30 shows how to assign ports pairs with fc_io_port_id 1 and fc_io_port_id 3 to
itsoV840 portset group. It also lists those specific pairs.
IBM_FlashSystem:FS9xx0:superuser>addfcportsetmember -portset itsoV860 -fcioportid 1&&addfcportsetmember -portset
itsoV860 -fcioportid 3
IBM_FlashSystem:FS9xx0:superuser>lsportset
id name type port_count host_count lossless owner_id owner_name port_type is_default
0 portset0 host 0 0 ethernet yes
1 portset1 replication 0 0 ethernet no
2 portset2 replication 0 0 ethernet no
3 portset3 storage 0 0 ethernet no
4 itsoV860 host 2 0 yes fc no
64 portset64 host 4 2 yes fc yes
IBM_FlashSystem:FS9xx0:superuser>lstargetportfc -filtervalue "port_id=1"&&lstargetportfc -filtervalue "port_id=3"
id WWPN WWNN port_id owning_node_id current_node_id nportid host_io_permitted virtualized
protocol fc_io_port_id portset_count host_count active_login_count
1 5005076810110214 5005076810000214 1 5 5 080100 no no
scsi 1 0 0 1
2 5005076810150214 5005076810000214 1 5 5 080101 yes yes
scsi 1 2 2 2
3 5005076810190214 5005076810000214 1 5 5 080102 yes yes
nvme 1 2 0 0
73 5005076810110216 5005076810000216 1 2 2 080300 no no
scsi 1 0 0 2
74 5005076810150216 5005076810000216 1 2 2 080301 yes yes
scsi 1 2 2 0
75 5005076810190216 5005076810000216 1 2 2 080302 yes yes
nvme 1 2 0 0
id WWPN WWNN port_id owning_node_id current_node_id nportid host_io_permitted virtualized
protocol fc_io_port_id portset_count host_count active_login_count
7 5005076810130214 5005076810000214 3 5 5 120100 no no
scsi 3 0 0 1
8 5005076810170214 5005076810000214 3 5 5 120101 yes yes
scsi 3 2 2 2
9 50050768101B0214 5005076810000214 3 5 5 120102 yes yes
nvme 3 2 0 0
79 5005076810130216 5005076810000216 3 2 2 120300 no no
scsi 3 0 0 2
80 5005076810170216 5005076810000216 3 2 2 120301 yes yes
scsi 3 2 2 2
81 50050768101B0216 5005076810000216 3 2 2 120302 yes yes
nvme 3 2 0 0
While the portsets provide additional layer for keeping the configuration clean and ordered as
well as improves conditions for maintenance it is still important to plan it carefully taking into
account the overall zoning.
Environments have different requirements, which means that the level of detail in the zoning
scheme varies among environments of various sizes. Therefore, ensure that you have an
easily understandable scheme with an appropriate level of detail. Then, make sure that you
use it consistently and adhere to it whenever you change the environment.
For more information about IBM Storage Virtualize system naming conventions, see 10.14.1,
“Naming conventions” on page 689.
Aliases
Use zoning aliases when you create your IBM Storage Virtualize system zones if they are
available on your specific type of SAN switch. Zoning aliases makes your zoning easier to
configure and understand, and causes fewer possibilities for errors. Table 2-10 shows some
alias name examples.
152 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Note: In Table 2-10, not all ports have an example for aliases. NPIV ports can be used for
host attachment only, as shown in Figure 2-31. If you are using external virtualized
back-ends, use the physical port WWPN. For replication and inter-node, use the physical
WWPN. In the alias examples that are listed in Table 2-10, the N is for node, and all
examples are from node 1. An N2 example is FSx_N2P4_CLUSTER. The x represents the
model of your IBM Storage Virtualize system, for example, SVC or IBM FlashSystem 9200
or 9500, IBM FlashSystem 7300, or IBM FlashSystem 5015 or 5035.
One approach is to include multiple members in one alias because zoning aliases can
normally contain multiple members (similar to zones). This approach can help avoid some
common issues that are related to zoning and make it easier to maintain the port balance in a
SAN.
By creating template zones, you keep the number of paths on the host side to four for each
volume and a good workload balance among the IBM Storage virtualize ports. Table 2-11
shows how the aliases are distributed if you create template zones as described in the
example.
The ports to be used for intracluster communication varies according to the machine type and
model number and port count. For more information about port assignment
recommendations, see Figure 2-26 on page 144. Use the port from different adapters of each
node to achieve maximum redundancy while creating zones for node-to-node
communication.
NPIV configurations: On NPIV-enabled configurations, use the physical WWPN for the
intracluster zoning.
Only 16-port logins are allowed from one node to any other node in a SAN fabric. Ensure that
you apply the correct port masking to restrict the number of port logins. Without port masking,
any IBM Storage Virtualize system port and any member of the same zone can be used for
intracluster communication, even the port members of an IBM Storage Virtualize system to a
host and an IBM Storage Virtualize system to storage zoning.
As a best practice, use FlashCopy mapping between a source and target on the same node
or I/O group to minimize cross-I/O-group communication. The same best practice applies to
change volumes (CVs) in Global Mirror with Change Volumes (GMCV) replication.
High availability (HA) solutions such as ESC and HyperSwap rely on node-to-node
communication between I/O group or nodes. In such scenarios, dedicate enough ports for
node-to-node and intracluster communication, which is used for metadata exchange and
mirroring write cache.
Configure a private SAN along with port masking over dark fibre links for HA (ESC or
HyperSwap) configurations to achieve maximum performance.
If a link or ISL between sites is not stable in a HA solution (for example, ESC or HyperSwap),
this instability makes the node-to-node communication and cluster unstable.
Note: To check whether the login limit is exceeded, count the number of distinct ways by
which a port on node X can log in to a port on node Y. This number must not exceed 16.
For more information about port masking, see Chapter 8, “Hosts” on page 519.
154 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
A zone for each back-end storage to each IBM Storage Virtualize system node or canister
must be created in both fabrics, as shown in Figure 2-32. Doing so reduces the overhead that
is associated with many logins. The ports from the storage subsystem must be split evenly
across the dual fabrics.
Often, all nodes or canisters in an IBM Storage Virtualize system should be zoned to the
same ports on each back-end storage system, with the following exceptions:
When implementing ESC or HyperSwap configurations where the back-end zoning can be
different for the nodes or canisters according to the site definition (for more information,
see IBM Spectrum Virtualize and SAN Volume Controller Enhanced Stretched Cluster
with VMware, SG24-8211 and IBM Storwize V7000, Spectrum Virtualize, HyperSwap, and
VMware Implementation, SG24-8317).
When the SAN has a multi-core design that requires special zoning considerations, as
described in “Zoning to storage best practice” on page 156.
When two nodes or canisters are zoned to different set of ports for the same storage system,
the IBM Storage Virtualize operation mode is considered degraded. Then, the system logs
errors that request a repair action. This situation can occur if incorrect zoning is applied to the
fabric.
Figure 2-33 shows a zoning example (that uses generic aliases) between a 2-node SVC and
an IBM Storwize V5000. Both SVC nodes can access the same set of Storwize V5000 ports.
Each storage controller or model has its own zoning and port placement best practices. The
generic guideline for all storage is to use the ports that are distributed between the redundant
storage components, such as nodes, controllers, canisters, and FC adapters (respecting the
port count limit, as described in “Back-end storage port count” on page 159).
The following sections describe the IBM Storage specific zoning guidelines. Storage vendors
other than IBM might have similar best practices. For more information, contact your vendor.
For more information about SAN design options, see 2.2, “SAN topology-specific guidelines”
on page 125.
This section describes best practices for zoning IBM Storage Virtualize system ports to
controller ports on each of the different SAN designs.
The high-level best practice is to configure zoning such that the SVC and IBM FlashSystem
ports are zoned only to the controller ports that are attached to the same switch. For
single-core designed fabrics, this practice is not an issue because only one switch is used on
each fabric to which the SVC, IBM FlashSystem, and controller ports are connected. For the
mesh and dual-core and other designs in which the IBM Storage Virtualize system is
connected to multiple switches in the same fabric, zoning might become an issue.
156 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Figure 2-34 shows the best practice zoning on a dual-core fabric. You can see that two zones
are used:
Zone 1 includes only the IBM Storage Virtualize system and back-end ports that are
attached to the core switch on the left.
Zone 2 includes only the IBM Storage Virtualize system and back-end ports that are
attached to the core switch on the right.
Mesh fabric designs that have the IBM Storage Virtualize and controller ports that are
connected to multiple switches follow the same general guidelines. Failure to follow this best
practice might result in IBM Storage Virtualize system performance impacts to the fabric.
The design violates the best practices of ensuring that the IBM Storage Virtualize system and
storage ports are connected to the same switches, and zoning the ports, as shown in
Figure 2-34 on page 157. It also violates the best practice of connecting the host ports (the
GPFS cluster) to the same switches as the IBM Storage Virtualize system where possible.
This design creates an issue with traffic that is traversing the ISL unnecessarily, as shown in
Figure 2-35. I/O requests from the GPFS cluster must traverse the ISL four times. This design
must be corrected such that the IBM Storage Virtualize system, controller, and GPFS cluster
ports are all connected to both core switches, and zoning is updated to be in accordance with
the example that is shown in Figure 2-34 on page 157.
Figure 2-35 shows a real-world customer SAN design. The effect of the extra traffic on the ISL
between the core switches from this design caused significant delays in command response
time from the GPFS cluster to the SVC or IBM FlashSystem and from the SVC to the
controller.
The SVC or IBM FlashSystem cluster also logged nearly constant errors against the
controller, including disconnecting from controller ports. The SAN switches logged frequent
link timeouts and frame drops on the ISL between the switches. Finally, the customer had
other devices sharing the ISL that were not zoned to the SVC or IBM FlashSystem. These
devices also were affected.
158 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
For example, at the time of writing, EMC DMX/Symmetrix, all Hitachi Data Systems (HDS)
storage, and SUN/HP use one WWNN per port. This configuration means that each port
appears as a separate controller to the SVC or IBM FlashSystem. Therefore, each port that is
connected to the IBM Storage Virtualize system means one WWPN and a WWNN increment.
IBM storage and EMC Clariion/VNX use one WWNN per storage subsystem, so each
appears as a single controller with multiple port WWPNs.
A best practice is to assign up to 16 ports from each back-end storage to the SVC or
IBM FlashSystem cluster. The reason for this limitation is that since version 8.5, the maximum
number of ports that is recognized by the IBM Storage Virtualize system per each WWNN is
16. The more ports that are assigned, the more throughput is obtained.
In a situation where the back-end storage has hosts direct-attached, do not mix the host ports
with the IBM Storage Virtualize system ports. The back-end storage ports must be dedicated
to the IBM Storage Virtualize system. Therefore, sharing storage ports are functional only
during migration and for a limited time. However, if you intend to have some hosts that are
permanently directly attached to the back-end storage, you must separate the IBM Storage
Virtualize system ports from the host ports.
From a connectivity standpoint, four FC ports are available in each interface module for a total
of 24 FC ports in a fully configured XIV system. The XIV modules with FC interfaces are
present on modules 4 - 9. Partial rack configurations do not use all ports, even though they
might be physically present.
Table 2-12 lists the XIV port connectivity according to the number of installed modules.
6 8 2 4 and 5
9 16 4 4, 5, 7, and 8
10 16 4 4, 5, 7, and 8
11 20 5 4, 5, 7, 8, and 9
12 20 5 4, 5, 7, 8, and 9
13 24 6 4, 5, 6, 7, 8, and 9
14 24 6 4, 5, 6, 7, 8, and 9
15 24 6 4, 5, 6, 7, 8, and 9
Note: If the XIV includes the capacity on demand (CoD) feature, all active FC interface
ports are usable at the time of installation regardless of how much usable capacity you
purchased. For example, if a 9-module system is delivered with six modules active, you
can use the interface ports in modules 4, 5, 7, and 8, although effectively three of the nine
modules are not yet activated through CoD.
To use the combined capabilities of SVC, IBM FlashSystem, and XIV, you must connect two
ports (one per fabric) from each interface module with the SVC or IBM FlashSystem ports.
For redundancy and resiliency purposes, select one port from each HBA that is present on
the interface modules. Use port 1 and 3 because both ports are on different HBAs. By default,
port 4 is set as a SCSI initiator and dedicated to XIV replication.
Therefore, if you decide to use port 4 to connect to an SVC or IBM FlashSystem, you must
change its configuration from initiator to target. For more information, see IBM XIV Storage
System Architecture and Implementation, SG24-7659.
Figure 2-36 shows how to connect an XIV frame to an SVC storage controller.
A best practice for zoning is to create a single zoning to each SVC or IBM FlashSystem node
on each SAN fabric. This zone must contain all ports from a single XIV and the SVC or
IBM FlashSystem node ports that are destined to connect host and back-end storage. All
nodes in an SVC or IBM FlashSystem cluster must see the same set of XIV host ports.
Figure 2-36 shows that a single zone is used for each XIV to SVC or IBM FlashSystem node.
For this example, the following zones are used:
Fabric A, XIV → SVC Node 1: All XIV fabric A ports to SVC node 1
Fabric A, XIV → SVC Node 2: All XIV fabric A ports to SVC node 2
Fabric B, XIV → SVC Node 1: All XIV fabric B ports to SVC node 1
Fabric B, XIV → SVC Node 1: All XIV fabric B ports to SVC node 2
160 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
For more information about other best practices and XIV considerations, see Chapter 3,
“Storage back-end” on page 189.
However, considering that any replication or migration is done through IBM Storage Virtualize,
ports 2 and 4 also can be used for IBM Storage Virtualize connectivity. Port 4 must be set to
target mode for replication or migration to work.
Assuming a dual fabric configuration for redundancy and resiliency purposes, select one port
from each HBA that is present on the grid controller. Therefore, a total of six ports (three per
fabric) are used.
Figure 2-37 shows a possible connectivity scheme for IBM SAN Volume Controller
2145-SV2/SA2 nodes and IBM FlashSystem A9000 systems.
The IBM FlashSystem A9000R system has more choices because many configurations are
available, as listed in Table 2-13.
2 8
3 12
4 16
5 20
6 24
However, IBM Storage Virtualize can support only 16 WWPN from any single WWNN. The
IBM FlashSystem A9000 or IBM FlashSystem A9000R system has only one WWNN, so you
are limited to 16 ports to any IBM FlashSystem A9000R system.
Table 2-14 shows the same table, but with columns added to show how many and which ports
can be used for connectivity. The assumption is a dual fabric, with ports 1 in one fabric, and
ports 3 in the other.
For the 4-grid element system, it is possible to attach 16 ports because that is the maximum
that IBM Storage Virtualize allows. For 5- and 6-grid element systems, it is possible to use
more ports up to the 16 maximum; however, that configuration is not recommended because
it might create unbalanced work loads to the grid controllers with two ports attached.
162 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Figure 2-38 shows a possible connectivity scheme for SVC 2145-SV2/SA2 nodes and
IBM FlashSystem A9000R systems with up to three grid elements.
Figure 2-39 shows a possible connectivity schema for SVC 2145-SV2/SA2 nodes and
IBM FlashSystem A9000R systems fully configured.
Figure 2-39 Connecting IBM FlashSystem A9000 fully configured as a back-end controller
For more information about IBM FlashSystem A9000 and A9000R implementation, see IBM
FlashSystem A9000 and A9000R Architecture and Implementation (Version 12.3.2),
SG24-8345.
Volumes that form the storage layer can be presented to the replication layer and are seen on
the replication layer as managed disks (MDisks), but not vice versa. That is, the storage layer
cannot see a replication layer’s MDisks.
The SVC layer of replication cannot be changed, so you cannot virtualize SVC behind
IBM Storage Virtualize. However, IBM FlashSystem can be changed from storage to
replication and from replication to storage layer.
If you want to virtualize one IBM FlashSystem behind another one, the IBM FlashSystem that
is used as external storage must have a layer of storage; the IBM FlashSystem that is
performing virtualization must have a layer of replication.
164 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
The storage layer and replication layer have the following differences:
In the storage layer, an IBM Storage Virtualize family system has the following
characteristics and requirements:
– The system can complete Metro Mirror (MM) and Global Mirror (GM) replication with
other storage layer systems.
– The system can provide external storage for replication layer systems or SVC.
– The system cannot use another IBM Storage Virtualize family system that is configured
with the storage layer as external storage.
In the replication layer, an IBM Storage Virtualize family system has the following
characteristics and requirements:
– The system can complete MM and GM replication with other replication layer systems
or SVC.
– The system cannot provide external storage for a replication layer system or SVC.
– The system can use another IBM FlashSystem family system that is configured with a
storage layer as external storage.
Note: To change the layer, you must disable the visibility of every other IBM FlashSystem
or SVC on all fabrics. This process involves deleting partnerships, remote copy
relationships, and zoning between IBM FlashSystem and other IBM FlashSystem or SVC.
Then, run the chsystem -layer command to set the layer of the system.
For more information about the storage layer, see this IBM Documentation web page.
To zone the IBM FlashSystem as a back-end storage controller of SVC, every SVC node must
access the same IBM FlashSystem ports as a minimum requirement. Create one zone per
SVC node per fabric to the same ports from an IBM FlashSystem storage.
Figure 2-40 shows a zone between a 16-port IBM Storwize or IBM FlashSystem and an SVC.
The ports from the Storwize V7000 system in Figure 2-40 on page 165 are split between both
fabrics. The odd ports are connected to Fabric A and the even ports are connected to Fabric
B. You also can spread the traffic across the IBM Storwize V7000 FC adapters on the same
canister.
However, it does not significantly increase the availability of the solution because the mean
time between failures (MTBF) of the adapters is not significantly less than that of the
non-redundant canister components.
Note: If you use an NPIV-enabled IBM FlashSystem system as back-end storage, only the
NPIV ports on the IBM FlashSystem system must be used for the storage back-end
zoning.
Connect as many ports as necessary to service your workload to the SVC. For more
information about back-end port limitations and best practices, see “Back-end storage port
count” on page 159.
Considering the IBM Storage Virtualize family configuration, the configuration is the same for
new IBM FlashSystem systems (see Figure 2-41, which shows an IBM FlashSystem 9100 as
an SVC back-end zone example).
Connect as many ports as necessary to service your workload to the SVC. For more
information about back-end port limitations and best practices, see “Back-end storage port
count” on page 159.
166 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
The main advantage of integrating IBM FlashSystem 900 with IBM Storage Virtualize is to
combine the extreme performance of IBM FlashSystem with the IBM Storage Virtualize
enterprise-class solution with such features as tiering, mirroring, IBM FlashCopy, thin
provisioning, IBM Real-time Compression (RtC), and copy services.
Before starting, work closely with your IBM Sales, pre-sales, and IT architect to correctly size
the solution by defining the suitable number of IBM FlashSystem I/O groups or clusters and
FC ports that are necessary according to your servers and application workload demands.
To maximize the performance that you can achieve when deploying the IBM FlashSystem 900
with IBM Storage Virtualize, carefully consider the assignment and usage of the FC HBA
ports on IBM Storage Virtualize, as described in 2.3.2, “IBM FlashSystem 9200 and 9500
controller ports” on page 134. The IBM FlashSystem 900 ports must be dedicated to the
IBM Storage Virtualize workload, so do not mix direct-attached hosts on IBM FlashSystem
900 with IBM Storage Virtualize ports.
Connect IBM FlashSystem 900 to the SAN network by completing the following steps:
1. Connect the IBM FlashSystem 900 odd-numbered ports to the odd-numbered SAN fabric
(or SAN fabric A) and the even-numbered ports to the even-numbered SAN fabric (or SAN
fabric B).
2. Create one zone for each IBM Storage Virtualize node with all IBM FlashSystem 900 ports
on each fabric.
Figure 2-42 IBM FlashSystem 900 connectivity to a SAN Volume Controller cluster
After the IBM FlashSystem 900 is zoned to two SVC nodes, four zones exist with one zone
per node and two zones per fabric.
You can decide to share or not the IBM Storage Virtualize ports with other back-end storage.
However, it is important to monitor the buffer credit usage on IBM Storage Virtualize switch
ports and, if necessary, modify the buffer credit parameters to properly accommodate the
traffic to avoid congestion issues.
For more information about IBM FlashSystem 900 best practices, see Chapter 3, “Storage
back-end” on page 189.
IBM DS8900F
The IBM DS8000 family is a high-performance, high-capacity, highly secure, and resilient
series of disk storage systems. The DS8900F family is the latest and most advanced of the
DS8000 series offerings to date. The HA, multiplatform support, including IBM Z®, and
simplified management tools help provide a cost-effective path to an on-demand world.
From a connectivity perspective, the DS8900F family is scalable. Two different types of host
adapters are available: 16 gigabit Fibre Channel (GFC) and 32 GFC. Both can auto-negotiate
their data transfer rate down to an 8 Gbps full-duplex data transfer. The 16 GFC and 32 GFC
host adapters are all 4-port adapters.
Tip: As a best practice for using 16 GFC or 32 GFC technology in DS8900F and IBM
Storage Virtualize, consider using the IBM Storage Virtualize maximum of 16 ports for the
DS8900F. Also, ensuring that more ranks can be assigned to the IBM Storage Virtualize
system than the number of slots that are available on that host ensures that the ports are
not oversubscribed.
A single 16 or 32 GFC host adapter does not provide full line rate bandwidth with all ports
active:
16 GFC host adapter: 3300 MBps read and 1730 MBps write
32 GFC host adapter: 6500 MBps read and 3500 MBps write
The DS8910F model 993 configuration supports a maximum of eight host adapters. The
DS8910F model 994 configurations support a maximum of 16 host adapters in the base
frame. The DS8950F model 996 configurations support a maximum of 16 host adapters in the
base frame and an extra 16 host adapters in the DS8950F model E96.
168 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Host adapters are installed in slots 1, 2, 4, and 5 of the I/O bay. Figure 2-43 shows the
locations for the host adapters in the DS8900F I/O bay.
The system supports an intermix of both adapter types up to the maximum number of ports,
as listed in Table 2-15.
Important: Each of the ports on a DS8900F host adapter can be independently configured
for FCP or IBM FICON®. The type of port can be changed through the DS8900F Data
Storage Graphical User Interface (DS GUI) or by using Data Storage Command-Line
Interface (DS CLI) commands. To work with SAN and IBM Storage Virtualize, use the
Small Computer System Interface- Fibre Channel Protocol (SCSI- FCP) FC-switched
fabric. FICON is for IBM Z only.
For more information about DS8900F hardware, port, and connectivity, see IBM Storage
DS8900F Architecture and Implementation: Updated for Release 9.3.2, SG24-8456.
Despite the wide DS8900F port availability, you attach a DS8900F series to an IBM Storage
Virtualize system by using Disk Magic to know how many host adapters are required
according to your workload, and you spread the ports across different HBAs for redundancy
and resiliency proposes. However, consider the following points as a place to start for a single
IBM Storage Virtualize cluster configuration:
For 16 or fewer arrays, use two host adapters - 8 FC ports.
Note: To check the current code MAX limitation, search for the term “and restrictions” for
your code level and IBM Storage Virtualize 8.6 at this IBM Support web page.
Note: Figure 2-44 also is valid example that can be used for DS8900F to SVC connectivity.
In Figure 2-44, 16 ports are zoned to IBM Storage Virtualize, and the ports are spread across
the different HBAs that are available on the storage.
170 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
To maximize performance, the DS8900F ports must be dedicated to the IBM Storage
Virtualize connections. However, the IBM Storage Virtualize ports must be shared with hosts
so that you can obtain the maximum full duplex performance from these ports.
For more information about port usage and assignments, see 2.3.2, “IBM FlashSystem 9200
and 9500 controller ports” on page 134.
Create one zone per IBM Storage Virtualize system node per fabric. IBM Storage Virtualize
must access the same storage ports on all nodes. Otherwise, the DS8900F operation status
is set to Degraded on the IBM Storage Virtualize system.
After the zoning steps, you must configure the host connections by using the DS8900F D DS
GUI or DS CLI commands for all IBM Storage Virtualize nodes WWPNs. This configuration
creates a single volume group that adds all IBM Storage Virtualize cluster ports within this
volume group.
For more information about volume group, host connection, and DS8000 administration, see
IBM Storage DS8900F Architecture and Implementation: Updated for Release 9.3.2,
SG24-8456.
The specific best practices to present DS8880 logical unit numbers (LUNs) as back-end
storage to the SVC are described in Chapter 3, “Storage back-end” on page 189.
This configuration provides four paths to each volume, with two preferred paths (one per
fabric) and two non-preferred paths. Four paths is the number of paths (per volume) that is
optimal for multipathing software, such as AIX Path Control Module (AIXPCM), Linux device
mapper, and the IBM Storage Virtualize system.
IBM Storage virtualize supports 32 Gb FC and 64 Gb FC ports. Cisco and Brocade new
generation switches also support 32 Gb FC ports connectivity.
A multipath design balances the usage of physical resources. The application should be able
to continue to work at the required performance even if the redundant hardware fails. More
paths do not equate to better performance or HA. For example, if there are eight paths to a
volume and 250 volumes that are mapped to a single host, this configuration would generate
2000 active paths for a multipath driver to manage. Too many paths to a volume can cause
excessive I/O waits, resulting in application failures and, under certain circumstances, it can
reduce performance.
When the recommended number of paths to a volume is exceeded, sometimes path failures
are not recovered in the required amount of time.
It is a best practice to know the workload and application requirements and plan for the
number of paths. The number of paths to a host must not exceed eight. Using end-to-end
32 Gb FC ports, four paths for a volume is the suggested configuration.
A best practice is to keep paths to all available nodes in an I/O group to achieve maximum
availability.
NPIV consideration: All the recommendations in this section also apply to NPIV-enabled
configurations. For more information about the systems that are supported by NPIV, see
IBM SAN Volume Controller Configuration Limits.
Eight paths per volume are supported. However, this design provides no performance
benefits and in some circumstances can reduce performance. Also, it does not significantly
improve reliability or availability. However, fewer than four paths do not satisfy the minimum
redundancy, resiliency, and performance requirements.
To obtain the best overall performance of the system and to prevent overloading, the workload
on each IBM Storage Virtualize system’s ports must be equal. Having the same amount of
workload typically involves zoning approximately the same number of host FC ports to each
IBM Storage Virtualize system node FC port.
Use an IBM Storage Virtualize FC Portsets configuration along with zoning to distribute the
load equally on storage FC ports.
The reason for not assigning one HBA to each path is because the IBM Storage Virtualize I/O
group works as a cluster. When a volume is created, one node is assigned as preferred and
the other node serves solely as a backup node for that specific volume, which means that
using one HBA to each path will never balance the workload for that particular volume.
Therefore, it is better to balance the load by I/O group instead so that the volume is assigned
to nodes automatically.
172 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Because the optimal number of volume paths is four, you must create two or more hosts on
the IBM Storage Virtualize system. During the volume assignment, alternate which volume is
assigned to one of the pseudo-hosts in a round-robin fashion.
Ensure that you follow these best practices when configuring your VMware ESX clustered
hosts:
Zone a single VMware ESX cluster in a manner that avoids ISL I/O traversing.
Spread multiple host clusters evenly across the IBM Storage Virtualize system node ports
and I/O groups.
Map LUNs and volumes evenly across zoned ports, alternating the preferred node paths
evenly for optimal I/O spread and balance.
Create separate zones for each host node in IBM Storage Virtualize and on the VMware
ESX cluster.
174 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Create a pseudo-host in IBM Storage Virtualize host definitions that contain only two
virtual WWPNs (one from each fabric), as shown in Figure 2-48.
Map the LUNs or volumes to the pseudo-LPARs (active and inactive) in a round-robin
fashion.
Figure 2-48 shows the correct SAN connection and zoning for LPARs.
During Live Partition Mobility (LPM), both inactive and active ports are active. When LPM is
complete, the previously active ports show as inactive, and the previously inactive ports show
as active.
Figure 2-49 shows LPM from the hypervisor frame to another frame.
Note: During LPM, the number of paths double from four to eight. Starting with eight paths
per LUN or volume results in 16 unsupported paths during LPM, which can lead to I/O
interruption.
For more information about HSNs, see IBM Spectrum Virtualize: Hot-Spare Node and NPIV
Target Ports, REDP-5477.
176 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
For the HSN feature to be fully effective, you must enable the NPIV feature. In an
NPIV-enabled cluster, each physical port is associated with two WWPNs. When the port
initially logs in to the SAN, it uses the normal WWPN (primary port), which does not change
from previous releases or from NPIV-disabled mode. When the node completes its startup
and is ready to begin processing I/O, the NPIV target ports log on to the fabric with the
second WWPN.
Special zoning requirements must be considered when implementing the HSN function.
Similarly, when a spare node comes online, its primary ports are used for remote copy
relationships, so the spare node must be zoned with the remote cluster.
Note: Currently, the zoning configuration for spare nodes is not policed while the spare is
inactive, and no errors are logged if the zoning or back-end configuration is incorrect.
For IBM Storage Virtualize based back-end controllers, such as IBM Storwize V7000, it is a
best practice that the host clusters function is used, with each node forming one host within
this cluster. This configuration ensures that each volume is mapped identically to each
IBM Storage Virtualize node.
The back-end storage zones must be separate, even if the two clustered systems share a
storage subsystem. You must zone separate I/O groups if you want to connect them in one
clustered system. Up to four I/O groups can be connected to form one clustered system.
If you perform a migration into or out of the IBM Storage Virtualize system, make sure that the
LUN is removed from one place before it is added to another place.
The iSCSI protocol is based on TCP/IP, and the iSER protocol is an extension of iSCSI that
uses RDMA technology (RoCE or iWARP).
IBM Storage Virtualize provides three adapter options for Ethernet connectivity: 10 GbE,
25 GbE, and 100 GbE adapters.
A 100 GbE adapter supports only host attachment by using iSCSI and NVMe over RDMA and
NVMe/TCP on IBM FlashSystem 9500 and 9500R, IBM FlashSystem 7300, and SVC SV3.
These models do not support iSER (RoCE or iWARP) for host attachment.
A 25 GbE adapter can be used for clustering, HyperSwap, IP replication, and host attachment
by using iSCSI or iSER (RoCE or iWARP) on all IBM Storage Virtualize models.
For more information about configuring Linux and Windows hosts by using iSER connectivity,
see the following resources:
Windows host
Linux host
178 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Of these options, the optical varieties of distance extension are preferred. IP address distance
extension introduces more complexity, is less reliable, and has performance limitations.
However, in many cases optical distance extension is impractical because of cost or
unavailability.
If you use multiplexor-based distance extension, closely monitor your physical link error
counts in your switches. Optical communication devices are high-precision units. When they
shift out of calibration, you start to see errors in your frames.
FCIP is a tunneling technology, which means FC frames are encapsulated in the TCP/IP
packets. As such, it is not apparent to the devices that are connected through the FCIP link.
To use FCIP, you need some kind of tunneling device on both sides of the TCP/IP link that
integrates FC and Ethernet connectivity. Most of the SAN vendors offer FCIP capability
through stand-alone devices (multiprotocol routers) or by using blades that are integrated in
the director class product. IBM Storage Virtualize systems support FCIP connection.
An important aspect of the FCIP scenario is the IP link quality. With IP-based distance
extension, you must dedicate bandwidth to your FC to IP traffic if the link is shared with other
IP traffic. Because the link between two sites is low-traffic or used only for email, do not
assume that this type of traffic is always the case. The design of FC is sensitive to congestion,
and you do not want a spyware problem or a DDOS attack on an IP network to disrupt
IBM Storage Virtualize.
Also, when you communicate with your organization’s networking architects, distinguish
between megabytes per second (MBps) and megabits per second (Mbps). In the storage
world, bandwidth often is specified in MBps, but network engineers specify bandwidth in
Mbps. If you fail to specify MB, you can end up with an impressive-sounding 155 Mbps OC-3
link, which supplies only 15 MBps or so to IBM Storage Virtualize. If you include the safety
margins, this link is not as fast as you might hope, so ensure that the terminology is correct.
Consider the following points when you are planning for your FCIP TCP/IP links:
For redundancy purposes, use as many TCP/IP links between sites as you have fabrics in
each site to which you want to connect. In most cases, there are two SAN FC fabrics in
each site, so you need two TCP/IP connections between sites.
Try to dedicate TCP/IP links only for storage interconnection. Separate them from other
LAN or wide area network (WAN) traffic.
Make sure that you have a service-level agreement (SLA) with your TCP/IP link vendor
that meets your needs and expectations.
If you do not use GMCV, make sure that you have sized your TCP/IP link to sustain peak
workloads.
Using IBM Storage Virtualize internal GM simulation options can help you test your
applications before production implementation. You can simulate the GM environment
within one SVC system without partnership with another one. To perform GM testing, run
the chsystem command with the following parameters:
– gminterdelaysimulation
– gmintradelaysimulation
For more information about GM planning, see Chapter 6, “Copy services” on page 363.
If you are not sure about your TCP/IP link security, enable Internet Protocol Security
(IPsec) on the all FCIP devices. IPsec is enabled on the Fabric OS level, so you do not
need any external IPsec appliances.
In addition to planning your TCP/IP link, consider adhering to the following best practices:
Set the link bandwidth and background copy rate of partnership between your replicating
IBM Storage Virtualize systems to a value lower than your TCP/IP link capacity. Failing to
set this rate can cause an unstable TCP/IP tunnel, which can lead to stopping all your
remote copy relations that use that tunnel.
The best case is to use GMCV when replication is done over long distances and
bandwidth is limited or the link is shared among multiple workloads.
Use compression on corresponding FCIP devices.
180 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Use at least two ISLs from your local FC switch to local FCIP router.
On a Brocade SAN, use the Integrated Routing feature to avoid merging fabrics from both
sites.
Furthermore, to avoid single points of failure, multiple WDMs and physical links are
implemented. When implementing these solutions, particular attention must be paid in the
intercluster connectivity setup.
Important: HyperSwap and ESC clusters require implementing dedicated private fabrics
for the internode communication between the sites. For more information about the
requirements, see IBM Spectrum Virtualize HyperSwap SAN Implementation and Design
Best Practices, REDP-5597.
Consider a typical implementation of an ESC that uses ISLs, as shown in Figure 2-50.
182 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Two possible configurations are available to interconnect the private SANs. In Configuration 1
(see Figure 2-51), one ISL per fabric is attached to each DWDM. In this case, the physical
paths Path A and Path B are used to extend both fabrics.
In Configuration 2 (see Figure 2-52), ISLs of fabric A are attached only to Path A, while ISLs
of fabric B are attached only to Path B. In this case, the physical paths are not shared
between the fabrics.
Figure 2-52 Configuration 2: Physical paths not shared among the fabrics
With Configuration 1, in a failure of one of the physical paths, both fabrics are simultaneously
affected, and a fabric reconfiguration occurs because of an ISL loss. This situation can lead to
a temporary disruption of the intracluster communication and in the worst case to a split-brain
condition. To mitigate this situation, link aggregation features such as Brocade ISL trunking
can be implemented.
With Configuration 2, a physical path failure leads to a fabric segmentation of one of the two
fabrics, leaving the other fabric unaffected. In this case, the intracluster communication would
be ensured through the unaffected fabric.
184 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
To enable native IP replication, IBM Storage Virtualize implements the Bridgeworks SANSlide
network optimization technology. For more information about this solution,
see IBM SAN Volume Controller and Storwize Family Native IP Replication, REDP-5103.
The main design point for the initial SANSlide implementation and subsequent
enhancements, including the addition of replication compression is to reduce link utilization to
allow the links to run closer to their respective line speed at distance and over poor quality
links. IP replication compression does not significantly increase the effective bandwidth of the
links beyond the physical line speed of the links.
If bandwidths are required that exceed the line speed of the physical links, alternative
technologies should be considered (such as FCIP), where compression is done in the tunnel
and often yields an increase in effective bandwidth of 2:1 or more.
The effective bandwidth of an IP link highly depends on latency and the quality of the link in
terms of the rate of packet loss. Even a small amount of packet loss and the resulting
retransmits will significantly degrade the bandwidth of the link.
Figure 2-53 shows the effects that distance and packet loss have on the effective bandwidth
of the links in MBps. Numbers reflect a pre-compression data rate with compression on and
50% compressible data. These numbers are as tested and can vary depending on specific
link and data characteristics.
To avoid any effects on ISL links and congestion on your SAN, do not put tape ports and
backup servers on different switches. Modern tape devices have high-bandwidth
requirements.
During your backup SAN configuration, use the switch virtualization to separate the traffic
type. The backup process has different frames than production and can affect performance.
Backup requests tend to use all network resources that are available to finish writing on its
destination target. Until the request is finished, the bandwidth is occupied and does not allow
other frames to access the network.
The difference between these two types of frames is shown in Figure 2-54.
Backup frames use the sequential method to write data. It releases only the path after it is
done writing, and production frames write and read data randomly. Writing and reading is
constantly occurring with the same physical path. If backup and production are set up on the
same environment, production frames (read/write) can run tasks only when backup frames
are complete, which causes latency to your production SAN network.
186 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch2 SpectrumVirtualize-SAN FABRIC.fm
Figure 2-55 shows one example of a backup and production SAN configuration to avoid
congestion because of high-bandwidth usage by the backup process.
Note: For more information about interoperability, see IBM System Storage Interoperation
Center (SSIC).
IBM Storage Virtualize is flexible as far as switch vendors are concerned. All the node
connections on an IBM Storage Virtualize clustered system must go to the switches of a
single vendor, that is, you must not have several nodes or node ports plugged into vendor A
and several nodes or node ports plugged into vendor B.
IBM Storage Virtualize supports some combinations of SANs that are made up of switches
from multiple vendors in the same SAN. However, this approach is not preferred in practice.
Despite years of effort, interoperability among switch vendors is less than ideal because FC
standards are not rigorously enforced. Interoperability problems between switch vendors are
notoriously difficult and disruptive to isolate. Also, it can take a long time to obtain a fix. For
these reasons, run only multiple switch vendors in the same SAN long enough to migrate
from one vendor to another vendor, if this setup is possible with your hardware.
You can run a mixed-vendor SAN if you have agreement from both switch vendors that they
fully support attachment with each other. However, Brocade does not support interoperability
with any other vendors.
Interoperability between Cisco switches and Brocade switches is not recommended, except
during fabric migrations, and then only if you have a back-out plan in place. Also, when
connecting BladeCenter switches to a core switch, consider the use of the NPIV technology.
When you have SAN fabrics with multiple vendors, pay special attention to any particular
requirements. For example, observe from which switch in the fabric the zoning must be
performed.
188 IBM Spectrum Virtualize 8.5.x Best Practices and Performance Guidelines
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
At time of writing, XIV, A9000, A9000R and FlashSystem 900 are still supported.
This chapter also provides information about traditional quorum disks. For information about
IP quorum, see Chapter 7, “Ensuring business continuity” on page 501.
With a serial-attached Small Computer System Interface (SCSI) (SAS) attachment, flash
(solid-state drives (SSDs)), and hard disk drives (HDDs) are supported. The set of supported
drives depends on the platform.
The number of NVMe drive slots per platform is listed in Table 3-1.
NVMe storage devices are typically directly attached to a host system over a PCIe bus, that
is, the NVMe controller is contained in the storage device itself, alleviating the need for an
extra I/O controller between the CPU and the storage device. This architecture results in
lower latency, throughput scalability, and simpler system designs.
The NVMe protocol supports multiple I/O queues versus older SAS and Serial Advanced
Technology Attachment (SATA) protocols, which use only a single queue.
NVMe as a protocol like SCSI. It allows for discovery, error recovery, and read/write
operations. However, NVMe uses Remote Direct Memory Access (RDMA) over new or
190 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
existing physical transport layers, such as PCIe, FC, or Ethernet. The major advantage of an
NVMe-drive attachment is that it usually uses PCIe connectivity, so the drives are physically
connected to the CPU by a high-bandwidth PCIe connection rather than by using a “middle
man”, such as a SAS controller chip, which limits total bandwidth to what is available to the
PCIe connection into the SAS controller. Where a SAS controller might use 8 or 16 PCIe
lanes in total, each NVMe drive has its own dedicated pair of PCIe lanes, which means that a
single drive can achieve data rates in excess of multiple GiBps rather than hundreds of MiBps
when compared with SAS.
Overall latency can be improved by the adoption of larger parallelism and the modern device
drivers that are used to control NVMe interfaces. For example, NVMe over FC versus SCSI
over FC are both bound by the same FC network speeds and bandwidths. However, the
overhead on older SCSI device drivers (for example, reliance on kernel-based interrupt
drivers) means that the software functions in the device driver might limit its capability when
compared with an NVMe driver because an NVMe driver typically uses a polling loop
interface rather than an interrupt driven interface.
A polling interface is more efficient because the device itself looks for work to do and typically
runs in user space (rather than kernel space). Therefore, the interface has direct access to
the hardware. An interrupt-driven interface is less efficient because the hardware tells the
software when it work must be done by pulling an interrupt line, which the kernel must
process and then hand control of the hardware to the software. Therefore, interrupt-driven
kernel drivers waste time switching between kernel and user space. As a result, all useful
work that is done is bound by the work that a single CPU core can handle. Typically, a single
hardware interrupt is owned by just one core.
All IBM Storage Virtualize FC and SAS drivers are implemented as polling drivers. Thus, on
the storage side, almost no latency is saved when you switch from SCSI to NVMe as a
protocol. However, the bandwidth increases are seen when a SAS controller is switched to a
PCIe-attached drive.
Table 3-2 lists the supported industry-standard NVMe drives on an IBM Storage Virtualize
system.
Industry-standard NVMe drives start at a smaller capacity point than FCM drives, which
allows for a smaller system.
A variable stripe redundant array of independent disks (RAID) (VSR) stripes data across
more granular, subchip levels, which allows for failing areas of a chip to be identified and
isolated without failing the entire chip. Asymmetric wear-leveling monitors the health of blocks
within the chips and tries to place “hot” data within the healthiest blocks to prevent the weaker
blocks from wearing out prematurely.
Note: At the time of writing, IBM is the only vendor to deliver VSR for multiple dimensions
of RAID protection while maintaining peak performance.
The multiple dimensions come from factoring in system-level RAID protection. The
advantage is that many of the things that would normally require intervention by
system-level RAID are not a problem for IBM solutions because they are dealt with at the
module level.
192 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Bit errors that are caused by electrical interference are continually scanned for, and if any
errors are found, they are corrected by an enhanced Error Correcting Code (ECC) algorithm.
If an error cannot be corrected, then the IBM Storage Virtualize Storage system distributed
RAID (DRAID) layer is used to rebuild the data.
Note: NVMe FCMs use inline hardware compression to reduce the amount of physical
space that is required. Compression cannot be disabled. If the written data cannot be
compressed further or compressing the data causes it to grow in size, the uncompressed
data is written. In either case, because the FCM compression is done in the hardware,
there is no performance impact.
IBM Storage Virtualize FCMs are not interchangeable with the flash modules that are used in
IBM FlashSystem 900 storage enclosures because they have a different form factor and
interface.
Modules that are used in IBM FlashSystem 5100, 5200, 7200, 7300, 9100, 9200, and 9500
are a built-in 2.5-inch U2 dual-port form factor.
FCMs are available with a physical or usable capacity of 4.8, 9.6, 19.2, and 38.4 TB. The
usable capacity is a factor of how many bytes the flash chips can hold.
FCMs have a maximum effective capacity (or virtual capacity) beyond which they cannot be
filled. Effective capacity is the total amount of user data that can be stored on a module,
assuming that the compression ratio (CR) of the data is at least equal to (or higher than) the
ratio of effective capacity to usable capacity. Each FCM contains a fixed amount of space for
metadata, and the maximum effective capacity is the amount of data it takes to fill the
metadata space.
IBM Storage Virtualize 8.5 adds support for FCM3 with increased effective capacity and
compression.
194 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Notes:
FCM3 is available only on IBM FlashSystem 5200, 7300, and 9500 with IBM Storage
Virtualize 8.5.0. and higher
IBM FlashSystem 9500 with FCM3 19.2 TB and 38.4 TB drives use 7 nm (nanometer)
technology and PCI Gen 4 to increase the throughput.
With 8.5.2 the FS9500 can now support 48 FCM3 XL (38.4 TB) drives
The RAID bitmap space on the FS9500 is increased to 800MB
DRAID expansion can now be from 1 to 42 drives at a time (max was 12 previously)
IBM FlashSystem 5200, 7300, and 9500 do not support mixed FCM1, FCM2, and
FCM3 in an array.
An array with intermixed FCM1 and FCM2 drives performs like an FCM1 array.
A 4.8 TB FCM has a higher CR because it has the same amount of metadata space as the
9.6 TB.
For more information about usable and effective capacities, see 3.1.3, “Internal storage
considerations” on page 199.
IBM FlashSystem supports SCM drives that are built on two different technologies:
3D XPoint technology from Intel, which is developed by Intel and Micron (Intel Optane
drives)
zNAND technology from Samsung (Samsung zSSD)
SCM drives have their own technology type and drive class in an IBM Storage Virtualize
configuration. They cannot intermix in the same array with standard NVMe or SAS drives.
Due to their speed, SCM drives are placed in a new top tier, which is ranked higher than
existing tier0_flash that is used for NVMe NAND drives.
When using Easy Tier, think about maximizing the capacity to get the most benefits (unless
the working set is small).
SCM with Easy Tier reduces latency, and in some cases improves input/output operations per
second (IOPS). If you want the benefits of SCM across all your capacity, then Easy Tier will
continually automatically move the hottest data onto the SCM tier and leave the rest of the
data on the lower tiers. This action can benefit DRPs when the metadata is moved to the
SCM drives.
If you have a particular workload that requires the best performance and lowest latency and it
fits into the limited SCM capacity that is available, then use SCM as a separate pool and pick
which workloads use that pool.
IBM FlashSystem 5015 and 5035 control enclosures have twelve 3.5-inch large form factor
(LFF) or twenty-four 2.5-inch small form factor (SFF) SAS drive slots. They can be scaled up
by connecting SAS expansion enclosures.
A single IBM FlashSystem 5100, 5200, 7200, 9100, or 9200 control enclosure supports the
attachment of up to 20 expansion enclosures with a maximum of 760 drives (748 drives for
IBM FlashSystem 5200), including NVMe drives in the control enclosure. By clustering control
enclosures, the size of the system can be increased to a maximum of 1520 drives for
IBM FlashSystem 5100, 2992 drives for IBM FlashSystem 5200, and 3040 drives for
IBM FlashSystem 7200, 9100, and 9200.
A single IBM FlashSystem 9500 control enclosure can support up to three IBM FlashSystem
9000 SFF expansion enclosures or one IBM FlashSystem 9000 LFF HD expansion enclosure
for a combined maximum of 232 NVMe and SAS drives per system. Intermixing of expansion
enclosures in a system is supported.
196 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
A single IBM FlashSystem 7300 Model 924 control enclosure can support up to 10
IBM FlashSystem 7000 expansion enclosures with a maximum of 440 drives per system.
IBM FlashSystem 7300 systems can be clustered to help deliver greater performance,
bandwidth, and scalability. A clustered IBM FlashSystem 7300 system can contain up to four
IBM FlashSystem 7300 systems and up to 1,760 drives. Maximum drives per control
enclosure:
IBM FlashSystem 5015 control enclosure supports up to 10 expansions and 392 drives
maximum.
Expansion enclosures are dynamically added without downtime, helping to quickly and
seamlessly respond to growing capacity demands.
The following types of SAS-attached expansion enclosures are available for the
IBM FlashSystem family:
2U 19-inch rack mount SFF expansion with 24 slots for 2.5-inch drives
2U 19-inch rack mount LFF expansion with 12 slots for 3.5-inch drives (not available for
IBM FlashSystem 9x00)
5U 19-inch rack mount LFF high-density expansion enclosure with 92 slots for 3.5-inch
drives
Different expansion enclosure types can be attached to a single control enclosure and
intermixed with each other.
Note: Intermixing expansion enclosures with machine type (MT) 2077 and MT 2078 is not
allowed.
IBM FlashSystem 5035, 5045, 5100, 5200, 7200,7300, 9100, 9200, and 9500 control
enclosures have two SAS chains for attaching expansion enclosures. Keep both SAS chains
equally loaded. For example, when attaching ten 2U enclosures, connect half of them to chain
1 and the other half to chain 2.
The number of drive slots per SAS chain is limited to 368. To achieve this goal, you need four
5U high-density enclosures. Table 3-5, Table 3-6 on page 198 and Table 3-7 on page 198 list
the maximum number of drives that are allowed when different enclosures are attached and
intermixed. For example, if three 5U enclosures of an IBM FlashSystem 9200 system are
attached to a chain, you cannot connect more than two 2U enclosures to the same chain, and
you get 324 drive slots as the result.
Table 3-5 Maximum drive slots per SAS expansion chain for IBM FlashSystem1
5U 2U expansions
expansions
0 1 2 3 4 5 6 7 8 9 10
4 368 -- -- -- -- -- -- -- -- -- --
Table 3-6 Maximum drive slots per SAS expansion chain for IBM FlashSystem 9500
2U expansions
5U expansions
0 1 2 3
0 0 24 48 72
1 92 - - -
Table 3-7 Maximum drive slots per SAS expansion chain for IBM FlashSystem 7300
2U expansions
5U expansions
0 1 2 3 4 5
0 0 24 48 72 96 120
1 92 116 140 - - -
2 184 - - - - -
Table 3-8 Maximum drive slots per SAS expansion chain for IBM FlashSystem 5045
2U expansions
5U expansions
0 1 2 3 4 5 6
0 0 24 48 72 96 120 144
1 92 116 -- -- -- -- --
2 184 208 -- -- -- -- --
IBM FlashSystem 5015, 5035 and 5045 node canisters have on-board SAS ports for
expansions. IBM FlashSystem 5100, 5200, 7200, 9100, and 9200 need a 12 GB SAS
interface card to be installed in both nodes of a control enclosure to attach SAS expansions.
198 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
For best performance results, plan to operate your storage system with ~85% or less physical
capacity used. Flash drives depend on free pages being available to process new write
operations and to quickly process garbage collection.
Without some level of free space, the internal operations to maintain drive health and host
requests might over-work the drive, which causes the software to proactively fail the drive, or
a hard failure might occur in the form of the drive becoming write-protected (zero free space
left). The free space helps in rebuild scenarios where the drives have plenty of room to get
background pre-erase workloads done as data is written to the drives and general write
amplification occurs.
If you are using data reduction, then regardless of the technology that you choose, it is a best
practice to keep the system below ~85% to allow it to respond to sudden changes in the rate
of data reduction (such as host encryption being enabled). Also, as you run the system close
to full, the garbage-collection function is working harder while new writes are processed,
which might slow down the system and increase latency to the host.
Note: For more information about physical flash provisioning, see this IBM Support web
page.
Intermix rules
Drives of the same form factor and connector type can be intermixed within an expansion
enclosure.
For systems that support NVMe drives, NVMe and SAS drives can be intermixed in the same
system. However, NVMe drives can exist only in the control enclosure, and SAS drives can
exist only in SAS expansion enclosures.
Within a NVMe control enclosure, NVMe drives of different types and capacities can be
intermixed. Industry-standard NVMe drives and SCMs can be intermixed with FCMs.
For more information about rules for mixing different drives in a single DRAID array, see
“Drive intermix rules” on page 209.
Formatting
Drives and FCMs must be formatted before they can be used. The format that you use is
important because when an array is created, its members must have zero used capacity.
Drives automatically are formatted when they become a candidate.
An FCM is expected to format in under 70 seconds. Formatting an SCM drive takes much
longer than an FCM or industry-standard NVMe drive. On Intel Optane, drive formatting can
take 15 minutes.
While a drive is formatting, it appears as an offline candidate. If you attempt to create an array
before formatting is complete, the create command is delayed until all formatting is done.
After formatting is done, the command completes.
If a drive fails to format, it goes offline. If so, a manual format is required to bring it back online.
The command-line interface (CLI) scenario is shown in Example 3-1.
Securely erasing
All SCM, FCM, and industry-standard NVMe drives that are used in the system are
self-encrypting. For SAS drives, encryption is performed by an SAS chip in the control
enclosure.
For industry-standard NVMe drives, SCMs, and FCMs, formatting the drive completes a
cryptographic erase of the drive. After the erasure, the original data on that device becomes
inaccessible and cannot be reconstructed.
To securely erase SAS or NVMe drive, use the chdrive -task erase <drive_id> command.
The methods and commands that are used to securely delete data from drives enable the
system to be used in compliance with European Regulation EU2019/424.
200 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
capacity 20.0TB
...
write_endurance_used 0
write_endurance_usage_rate
replacement_date
transport_protocol nvme
compressed yes
physical_capacity 4.36TB
physical_used_capacity 138.22MB
effective_used_capacity 3.60GB
Both examples show the same 4.8 TB FCM with a maximum effective capacity of 20 TiB (or
21.99 TB).
To calculate the actual CR, divide the effective used capacity by the physical used capacity.
Here, we have 3.60/0.134 = 26.7, so written data is compressed 26.7:1 (highly compressible).
Physical used capacity is expected to be nearly the same on all modules in one array.
When FCMs are used, data CRs should be thoroughly planned and monitored.
If highly compressible data is written to an FCM, it still becomes full when it reaches the
maximum effective capacity. Any spare data space remaining is used to improve the
performance of the module and extend the wear-leveling.
Example: A total of 20 TiB of data that is compressible 10:1 is written to a 4.8 TB module.
The maximum effective capacity of the module is 21.99 TB, which equals 20 TiB.
After 20 TiB of data is written, the module is 100% full for the array because it has no free
effective (logical) capacity. At the same time, the data uses only 2 TiB of the physical
capacity. The remaining 2.36 TiB cannot be used for host writes, only for drive internal
tasks and to improve the module’s performance.
If non-compressible or low-compressible data is written, the module fills until the maximum
physical capacity is reached.
The module’s maximum effective capacity is 43.99 TB, which equals 40 TiB. The module’s
usable capacity is 19.2 TB = 17.46 TiB.
After 20 TiB is written, only 50% of effective capacity is used. With 1.2:1 compression, it
occupies 16.7 TiB of physical capacity, which makes the module physically 95% full, and
potentially affects the module’s performance.
Pool-level and array-level warnings can be set to alert and prevent compressed drive overfill.
If the drive write workload is continuously higher than the specified DWPD, the system alerts
that the drive is wearing faster than expected. Because DWPD is account for during system
sizing, this alert usually means that workload differs from what was expected on the array and
it must be revised.
DWPD numbers are important with SSD drives of smaller sizes. With drive capacities below
1 TB, it is possible to write the total capacity of a drive several times a day. When a single
SSD provides tens of terabytes, it is unlikely that you can overrun the DWPD measurement.
Example: A total of 3.84 TB read-intensive (RI) SAS drive is rated for 1 DWPD, which
means 3,840,000 MB of data can be written on it each day. Each day has 24x60x60 =
86400 seconds, so 3840000/86400 = 44.4 MBps of average daily write workload is
required to reach 1 DWPD.
Total cumulative writes over a 5-year period are 3.84 x 1 DWPD x 365 x 5 = 6.8 PB.
FCM2 /FCM3 drives are rated with two DWPD over five years, which is measured in
usable capacity. Therefore, if the data is compressible (for example, 2:1), the DWPD
doubles.
Example: A total of 19.2 TB FCM is rated for 2 DWPD. Its effective capacity is nearly
44 TB = 40 TiB, so if you use 2.3:1 compression, to reach the DWPD limit, the average
daily workload over 5 years must be around 1 GBps. Total cumulative writes over a
5-year period are more than 140 PB.
The system monitors the number of writes for each drive that supports the DWPD parameter,
and it logs a waning event if this amount is above then DWPD for the specific drive type.
It is acceptable that the write endurance usage rate has high warnings, which indicate that
the write data rate exceeds the expected threshold for the drive type during the initial phase of
system implementation or stress testing. Afterward, when system’s workload is stabilized, the
system recalculates usage rate and removes the warnings. Calculation is based on a long-run
average, so it can take up to 1 month for them to be automatically cleared.
Cumulative writes that are based on possible DWPD numbers are listed in Table 3-9. It
provides an overview of the total cumulative writes over a 5-year period with various DWPD
numbers.
202 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
To understand the value behind FCMs, consider the following comparison of three drives that
share similar-sized physical capacities:
A 3.84 TB NVMe SSD and a 4.8 TB FCM. Because of the no-penalty compression that is
built into the FCM3, it delivers up to 3.75X the cumulative capacity.
A 7.68 TB NVMe SSD and a 9.6 TB FCM. Once again, the built-in compression means
that the FCM3 delivers 3.75X the cumulative capacity.
A 15.36 TB NVMe SSD and a 19.2 TB. With FCM3, we achieve about 3.75X the
cumulative capacity of NVMe SSDs.
So, the DWPD measurement is largely irrelevant for FCMs and large SSDs.
For best practices and DRAID configuration, see 3.2.2, “Array considerations” on page 206.
Note: IBM SV2, SA2, and SV3 nodes do not support internal storage.
3.2 Arrays
To use internal IBM Storage Virtualize drives in storage pools and provision their capacity to
hosts, the drives must be joined in RAID arrays to form array-type managed disks (MDisks).
204 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Table 3-10 lists the RAID type and level support on different IBM Storage Virtualize systems.
IBM FlashSystem Family RAID RAID RAID RAID DRAID DRAID 5 DRAID 6
0 1 / 10 5 6 1 / 10
NVMe FCMs that are installed in an IBM Storage Virtualize system can be aggregated into
DRAID 6, DRAID 5, or DRAID 1. All TRAID levels are not supported on FCMs.
Some limited RAID configurations do not allow large drives. For example, DRAID 5 cannot be
created with any drive type if drives capacities are equal or above 8 TB. Creating such arrays
is blocked intentionally to prevent long rebuilding times.
As drive capacities increase, the rebuild time that is required after a drive failure increases
significantly. Together with the fact that with larger capacities the chance for a previously
unreported (and uncorrectable) medium error increases, customers that configure DRAID 5
arrays on newer platforms or products or with newer drive models are more likely to have a
second drive failure or a medium error that is found during rebuild, which would result in an
unwanted Customer Impact Event (CIE), and potentially a data loss.
Table 3-11 lists the supported drives, array types, and RAID levels.
RAID level
Consider the following points when determining which RAID level to use:
DRAID 6 is recommended for all arrays with more than six drives.
TRAID levels 5 and 6 are not supported on the current generation of IBM FlashSystem
because DRAID is superior to the TRAID levels.
For most use cases, DRAID 5 has no performance advantage compared to DRAID 6. At
the same time, DRAID 6 offers protection from the second drive failure, which is vital
because rebuild times are increasing together with the drive size. Because DRAID 6 offers
the same performance level but provides more data protection, it is the top
recommendation.
On platforms that support DRAID 1, DRAID 1 is the recommended RAID level for arrays
that consist of two or three drives.
DRAID 1 has a mirrored geometry. It consists of mirrors of two strips, which are exact
copies of each other. These mirrors are distributed across all array members.
For arrays with four or five members, it is a best practice to use DRAID 1 or DRAID 5, with
preference to DRAID 1 where it is available.
DRAID 5 provides a capacity advantage over DRAID 1 with same number of drives, at the
cost of performance. Particularly during rebuild, the performance of a DRAID 5 array is
worse than a DRAID 1 array with the same number of drives.
For arrays with six members, the choice is between DRAID 1 and DRAID 6.
On platforms that support DRAID 1, do not use TRAID 1 or RAID 10 because they do not
perform as well as the DRAID type.
On platforms that do not support DRAID 1, the recommended RAID level for NVMe SCM
drives is TRAID 10 for arrays of two drives, and DRAID 5 for arrays of four or five drives.
RAID configurations that differ from the recommendations that are listed here are not
available with the system GUI. If the wanted configuration is supported but differs from
these recommendations, arrays of required RAID levels can be created by using the
system CLI.
206 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Notes:
DRAID 1 arrays are supported only for pools with extent sizes of 1024 MiB or
greater.
IBM Storage Virtualize 8.5 does not allow more than a single DRAID array that is
made of compressing drives (for example, FCM) in the same storage pool (MDisk
group).
Starting with IBM Storage Virtualize 8.5, shared pool bitmap memory for DRAID
arrays is adjusted automatically on creation or deletion of arrays. You do not need to
use the chiogrp command before running mkdistributedarray to create a
distributed array and add it to a storage pool.
Support for 48 NVMe drives per DRAID 6 array: For IBM FlashSystem 9500, there is
now enhanced support for 48 NVMe drives in the enclosure by using DRAID 6
technology. The system must be upgraded to 8.5.2 or a later release to create
distributed RAID arrays of more than 24 x 38.4 TB FlashCore Modules (for both one
or more arrays) in the same control enclosure. The following configurations are
supported:
– DRAID 6 arrays of NVMe drives support expansion up to 48 member drives,
including up to four distributed rebuild areas.
– DRAID 6 arrays of FCM NVMe drives support expansion up to 48 member
drives, including one distributed rebuild area.
– At the time of writing, DRAID 6 arrays of extra large (38.4 TB) physical capacity
FCM NVMe drives support up to 24 member drives, including one distributed
rebuild area.
RAID geometry
Consider the following points when determining your RAID geometry:
Data, parity, and spare space must be striped across the number of devices that is
available. The higher the number of devices, the lower the percentage of overall capacity
the spare and parity devices consume, and the more bandwidth that is available during
rebuild operations.
Fewer devices are acceptable for smaller capacity systems that do not have a
high-performance requirement, but solutions with a few large drives should be avoided.
Sizing tools must be used to understand performance and capacity requirements.
DRAID code makes full use of the multi-core environment, so splitting the same number of
drives into multiple DRAID arrays does not bring performance benefits compared to a
single DRAID array with the same number of drives. Maximum system performance can
be achieved from a single DRAID array. Recommendations that were given for TRAID, for
example, to create four or eight arrays to spread load across multiple CPU threads, do not
apply to DRAID.
Consider the following guidelines to achieve the best rebuild performance in a DRAID
array:
– For FCMs and industry-standard NVMe drives, the optimal number of drives in an array
is 16 - 24. This limit ensures a balance between performance, rebuild times, and
usable capacity. An array of NVMe drives cannot have more than 24 members.2
Notes:
The optimal number of drives in an IBM FlashSystem 9500 is 16 - 48.
IBM FlashSystem 9500 machine type 4666 delivers 48 drives in a single 4U form
factor.
– For SAS HDDs, configure at least 40 drives to the array rather than create many
DRAID arrays with much fewer drives in each one to achieve the best rebuild times. A
typical best benefit is approximately 48 - 64 HDD drives in a single DRAID 6.
– For SAS SSD drives, the optimal array size is 24 - 36 drives per DRAID 6 array.
– For SCM, the maximum number of drives in an array is 12.
Distributed spare capacity, or rebuild areas, are configured with the following guidelines:
– DRAID 1 with two members: The only DRAID type that is allowed to not have spare
capacity (zero rebuild areas).
– DRAID 1 with 3 - 16 members: The array must have only one rebuild area.
– DRAID 5 or 6: The minimum recommendation is one rebuild area every 36 drives. One
rebuild area per 24 drives is optimal.
– Arrays with FCM drives cannot have more than one rebuild area per array.
The DRAID stripe width is set during array creation and indicates the width of a single unit
of redundancy within a distributed set of drives. Reducing the stripe width does not enable
the array to tolerate more failed drives. DRAID 6 does not get more redundancy than is
determined for DRAID 6 regardless of the width of a single redundancy unit.
A reduced width increases capacity overhead, but also increases rebuild speed because
there is a smaller amount of data that the RAID must read to reconstruct the missing data.
For example, a rebuild on a DRAID with a 14+P+Q geometry (width = 16) would be slower
or have a higher write penalty than a rebuild on a DRAID with the same number of drives
but with a 3+P+Q geometry (width = 5). In return, usable capacity for an array with a width
= 5 is smaller than for an array with a width = 16.
The default stripe width settings (12 for DRAID 6) provide an optimal balance between
those parameters.
The array strip size must be 256 KiB. With IBM Storage Virtualize code releases before
8.4.x, it was possible to choose 128 - 256 KiB if the DRAID member drive size was below
4 TB. From 8.4.x and later, you can create arrays with only a 256 KiB strip size.
Arrays that were created on previous code levels with a strip size 128 KiB are fully
supported.
The stripe width and strip size (both) determine the Full Stride Write (FSW) size. With
FSW, data does not need to be read in a stride, so the RAID I/O penalty is greatly
reduced.
For better performance, it is often said that you should set the host file system block size to
the same value as the FSW size or a multiple of the FSW stripe size. However, the
IBM Storage Virtualize cache is designed to perform FSW whenever possible, so no
difference is noticed in the performance of the host in most scenarios.
For fine-tuning for maximum performance, adjust the stripe width or host file system block
size to match each other. For example, for a 2 MiB host file system block size, the best
performance is achieved with an 8+P+Q DRAID6 array (eight data disks x 256 KiB stripe
size, with an array stripe width = 10).
208 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
A DRAID distinguishes non-critical and critical rebuilds. If a single drive fails in a DRAID 6
array, the array still has redundancy, and a rebuild is performed with limited throughput to
minimize the effect of the rebuild workload to an array’s performance.
If an array has no more redundancy (which resulted from a single drive failure in DRAID 5 or
double drive failure in DRAID 6), a critical rebuild is performed. The goal of a critical rebuild is
to recover redundancy as fast as possible. A critical rebuild is expected to perform nearly
twice as fast as a non-critical one.
When a failed drive that was an array member is replaced, the system includes it back to an
array. For this process, the drive must be formatted first, which might take some time for an
FCM or SCM.
If the drive was encrypted in another array, it comes up as failed because this system does
not have the required keys. The drive must be manually formatted to make it a candidate.
Note: An FCM drive that is a member of a RAID array must not be reseated unless you are
directly advised to do so by IBM Support. Reseating FCM drives that are still in use by an
array can cause unwanted consequences.
210 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
RAID expansion
Consider the following points for RAID expansion:
You can expand distributed arrays to increase the available capacity. As part of the
expansion, the system automatically migrates data for optimal performance for the new
expanded configuration. Expansion is non-disruptive and compatible with other functions,
such as IBM Easy Tier and data migrations.
New drives are integrated and data is restriped to maintain the algorithm placement of
stripes across the existing and new components. Each stripe is handled in turn, that is, the
data in the existing stripe is redistributed to ensure the DRAID protection across the new
larger set of component drives.
Only the number of member drives and rebuild areas can be increased. RAID level and
RAID stripe width stay as they were set during array creation. If you want to change the
stripe width for better capacity efficiency, you must create an array, migrate the data, and
then expand the array after deleting the original array.
The RAID-member count cannot be decreased. It is not possible to shrink an array.
DRAID 5, DRAID 6, and DRAID 1 can be expanded. TRAID arrays do not support
expansion.
Only one expansion process can run on array at a time. During a single expansion, up to
12 drives can be added.
Only one expansion per storage pool is allowed, with a maximum of four per system.
Once expansion is started, it cannot be canceled. You can only wait for it to complete or
delete an array.
As the array capacity increases, it becomes available to the pool as expansion progresses.
There is no need to wait for expansion to be 100% complete because added capacity can
be used while expansion is still in progress.
When you expand an FCM array, the physical capacity is not immediately available, and
the availability of new physical capacity does not track with logical expansion progress.
Array expansion is a process that is designed to run in the background. It can take a
significant amount of time.
Array expansion can affect host performance and latency, especially when expanding an
array of HDDs. Do not expand an array when the array has over 50% load. If you do not
reduce the host I/O load, the amount of time that is needed to complete the expansion
increases greatly.
Array expansion is not possible when an array is in write-protected mode because it is full
(out of physical capacity). Any capacity issues must be resolved first.
Creating a separate array can be an alternative for DRAID expansion.
For example, if you have a DRAID 6 array of 40 NL-SAS drives and you have 24 new
drives of the same type, the following options are available:
– Perform two DRAID expansions by adding 12 drives in one turn. With this approach,
the configuration is one array of 64 drives; however, the expansion process might take
a few weeks for large capacity drives. During that time, the host workload must be
limited, which can be unacceptable.
– Create a separate 24-drive DRAID 6 array and add it to the same pool as a 40-drive
array. The result is that you get two DRAID 6 arrays with different performance
capabilities, which is suboptimal. However, the back-end performance-aware cache
and Easy Tier balancing can compensate for this flaw.
RAID capacity
Consider the following points when determining RAID capacity:
If you are planning only your configuration, use the IBM Storage Modeler tool, which is
available for IBM Business Partners.
If your system is deployed, you can use the lspotentialarraysize CLI command to
determine the capacity of a potential array for a specified drive count, drive class, and
RAID level in the specified pool.
To get the approximate amount of available space in a DRAID 6 array, use the following
formula:
Array Capacity = D / ((W * 256) + 16) * ((N - S) * (W - 2) * 256)
D Drive capacity
N Drive count
S Rebuild areas (spare count)
W Stripe width
Example: For the capacity of a DRAID 6 array of sixteen 9.6 TB FCMs, use the
following values:
D = 9.6 TB = 8.7 TiB
N = 16
S=1
W = 12
Array capacity = 8.7 TiB / ((12*256)+16) * ((16-1) * (12-2) * 256) = 8.7 TiB / 3088 *
38400 = 108.2 TiB
To minimize the risk of an out-of-space condition, ensure that the following tasks are done:
The data CR is known and account for when planning for an array’s physical and effective
capacity.
Monitor the array’s free space and avoid filling it up with more than 85% of physical
capacity.
To monitor arrays, use IBM Spectrum Control or IBM Storage Insights with configurable
alerts. For more information, see Chapter 9, “Implementing a storage monitoring system”
on page 551.
212 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
The IBM Storage Virtualize Storage GUI and CLI displays the used and available effective
and physical capacities. For examples, see Figure 3-2 and Example 3-3.
If the used physical capacity of the array reaches 99%, IBM Storage Virtualize raises event
ID 1241: 1% physical space left for compressed array. This event is a call for
immediate action.
To prevent running out of space, one or a combination of the following corrective actions
must be taken:
– Add storage to the pool and wait while data is balanced between the arrays by Easy
Tier.
– Migrate volumes with extents on the MDisk that is running low on physical space to
another storage pool or migrate extents from the array that is running low on physical
space to other MDisks that have sufficient extents.
– Delete or migrate data from the volumes by using a host that supports UNMAP
commands. IBM Storage Virtualize Storage system issues UNMAP to the array and
space is released.
For more information about out-of-space recovery, see this IBM Support web page.
Arrays are most in danger of running out of space during a rebuild or when they are
degraded. DRAID spare capacity, which is distributed across array drives, remains free
during normal DRAID operation, thus reducing overall drive fullness. If the array capacity
is 85% full, each array FCM is used for less than that because of the spare space reserve.
When a DRAID is rebuilding, this space is used.
After the rebuild completes, the extra space is filled and the drives might be truly full,
resulting in high levels of write amplification and degraded performance. In the worst case
(for example, if the array is more than 99% full before rebuild starts), there is a chance that
the rebuild might cause a physical out-of-space condition.
This section covers aspects of planning and managing external storage that is virtualized by
IBM Storage Virtualize.
External back-end storage can be connected to IBM Storage Virtualize with FC (SCSI) or
iSCSI. NVMe-FC back-end attachment is not supported because it provides no performance
benefits for IBM Storage Virtualize. For more information, see “The NVMe protocol” on
page 190.
On IBM FlashSystem 5010 and 5030 and IBM FlashSystem 5015, 5035 and 5045,
virtualization is allowed only for data migration. Therefore, these systems can be used to
externally virtualize storage as an image mode device for the purposes of data migration, not
for long-term virtualization.
An MDisk path that is presented to the storage system for all system nodes must meet the
following criteria:
The system node is a member of a storage system.
The system node has FC or iSCSI connections to the storage system port.
The system node has successfully discovered the LU.
The port selection process has not caused the system node to exclude access to the
MDisk through the storage system port.
When the IBM Storage Virtualize node canisters select a set of ports to access the storage
system, the two types of path selection that are described in the next sections are supported
to access the MDisks. A type of path selection is determined by external system type and
cannot be changed.
214 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
To determine which algorithm is used for a specific back-end system, see IBM System
Storage Interoperation Center (SSIC), as shown in Figure 3-3.
With a round-robin compatible storage controller, there is no need to create as many volumes
as there are storage FC ports anymore. Every volume and MDisk uses all the available
IBM Storage Virtualize ports.
Additionally, the round-robin path selection improves resilience to certain storage system
failures. For example, if one of the back-end storage system FC ports has performance
problems, the I/O to MDisks is sent through other ports. Moreover, because I/Os to MDisks
are sent through all back-end storage FC ports, the port failure can be detected more quickly.
Best practice: If you have a storage system that supports the round-robin path algorithm,
you should zone as many FC ports as possible from the back-end storage controller.
IBM Storage Virtualize supports up to 16 FC ports per storage controller. For FC port
connection and zoning guidelines, see your storage system documentation.
Example 3-4 shows a storage controller that supports round-robin path selection.
With storage subsystems that use active-passive type systems, IBM Storage Virtualize
accesses an MDisk LU through one of the ports on the preferred controller. To best use the
back-end storage, make sure that the number of LUs that is created is a multiple of the
connected FC ports and aggregate all LUs to a single MDisk group.
Example 3-5 shows a storage controller that supports the MDisk group balanced path
selection.
Example 3-5 MDisk group balanced path selection (no round-robin enabled) storage controller
IBM_IBM FlashSystem:IBM FlashSystem 9100-ITSO:superuser>lsmdisk 5
id 5
name mdisk5
...
preferred_WWPN
active_WWPN 20110002AC00C202 <<< indicates MDisk group balancing
If your back-end system has homogeneous storage, create the required number of RAID
arrays (RAID 6 or RAID 10 are recommended) with an equal number of drives. The type and
geometry of an array depends on the back-end controller vendor’s recommendations. If your
back-end controller can spread the load stripe across multiple arrays in a resource pool (for
example, by striping), create a single pool and add all the arrays there.
On back-end systems with mixed drives, create a separate resource pool for each type of
drive (HDD or SSD). Keep the drive type in mind because you must assign the correct tier for
an MDisk when it is used by IBM Storage Virtualize.
Create a set of fully allocated logical volumes from the back-end system storage pool (or
pools). Each volume is detected as an MDisk on IBM Storage Virtualize. The number of
logical volumes to create depends the type of drives that are used by your back-end
controller.
For optimal performance, HDDs need 8 - 10 concurrent I/O at the device, which does not
change with drive rotation speed. Make sure that in a highly loaded system that any given
IBM Storage Virtualize MDisk can queue up approximately eight I/O per back-end system
drives.
216 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
The IBM Storage Virtualize queue depth per MDisk is approximately 60. The exact maximum
on a real system might vary depending on the circumstances. However, for this calculation, it
does not matter.
The queue depth per MDisk number leads to the HDD Rule of 8. According to this rule, to
achieve eight I/Os per drive with a queue depth 60 per MDisk from IBM Storage Virtualize, a
back-end array with 60/8 = 7.5 that is approximately equal to eight physical drives is optimal,
or one logical volume per every eight drives in an array.
Example: The back-end controller to be virtualized is IBM Storwize V5030 with 64 nearline
serial-attached SCSI (NL-SAS) 8 TB drives.
For all-flash back-end arrays, a best practice is to create at least 32 logical volumes from the
array capacity to keep the queue depths high enough and spread the work across the
virtualizer resources.
For IBM FlashSystem 9500 with the capacity and performance that it provides, you should
consider creating 64 logical volumes from the array capacity.
For smaller setups with a low number of SSDs, this number can be reduced to 16 logical
volumes (which results in 16 MDisks) or even eight volumes.
With high-end controllers, queue depth per MDisk can be calculated by using the following
formula:
Q = ((P x C) / N) / M
Q Calculated queue depth for each MDisk.
P Number of back-end controller host ports (unique worldwide port names (WWPNs))
that are zoned to IBM FlashSystem (minimum is 2 and maximum is 16).
C Maximum queue depth per WWPN, which is 1000 for controllers, such as XIV Gen3
or DS8000.
N Number of nodes in the IBM Storage Virtualize cluster (2, 4, 6, or 8).
M Number of volumes that are presented by a back-end controller and detected as
MDisks.
For a result of Q = 60, calculate the number of volumes that is needed to create as M = (P x
C) / (N x Q), which can be simplified to M = (16 x P) / N.
Example: A 4-node IBM FlashSystem 9200 is used with 12 host ports on the IBM XIV
Gen3 System.
By using the previous formula, we must create M = (16 x 12) / 4 = 48 volumes on the
IBM XIV Gen3 to obtain a balanced high-performing configuration.
The implementation steps for thin-provisioned MDisks are the same as for fully allocated
storage controllers. Extreme caution should be used when planning capacity for such
configurations.
The nominal capacity from a compression- and deduplication-enabled storage system is not
fixed; it varies based on the nature of the data. Always use a conservative data reduction ratio
for the initial configuration.
218 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Using s suitable ratio for capacity assignment can cause an out-of-space situation. If the
MDisks do not provide enough capacity, IBM Storage Virtualize disables access to all the
volumes in the storage pool.
Therefore:
Physical Capacity: 20 TB.
Calculated capacity: 20 TB x 5 = 100 TB.
The volume that is assigned from the compression- or deduplication-enabled storage
subsystem to IBM Storage Virtualize or IBM FlashSystem is 100 TB.
Real usable capacity: 20 TB x 3 = 60 TB.
If the hosts attempt to write more than 60 TB data to the storage pool, the storage
subsystem cannot provide any more capacity. Also, all volumes that are used as
IBM Storage Virtualize or FlashSystem MDisks and all related pools go offline.
A best practice is to have an emergency plan and know the steps to recover from an “Out Of
Physical Space” situation on the back-end controller. The plan must be prepared during the
initial implementation phase.
IBM FlashSystem A9000 and A9000R systems always have data reduction on, and because
of the grid architecture, they can use all the resources of the grid for the active I/Os. Data
reduction should be done at the IBM FlashSystem A9000 or A9000R system, and not at the
SVC.
If the XIV Gen3 is a Model 314, it is preferable to do the compression in the XIV system
because there are more resources in the grid that are assigned to the compression task.
However, if operational efficiency is more important, you can choose to enable compression
in the SVC.
For existing systems, you should evaluate whether you need to move to DRP to get the
benefits of deduplication or that hardware compression can meet your needs.
220 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
In this regard, the basis for virtualization begins with the physical drives of DS8000, which are
mounted in storage enclosures. Virtualization builds on the physical drives as a series of
layers:
Array sites
Arrays
Ranks
Extent pools
Logical volumes
Logical subsystems
Array sites are the building blocks that are used to define arrays, which are data storage
systems for block-based, file-based, or object based storage. Instead of storing data on a
server, storage arrays use multiple drives that are managed by a central management and
can store a huge amount of data.
In general terms, eight identical drives that have the same capacity, speed, and drive class
comprise the array site. When an array is created, the RAID level, array type, and array
configuration are defined. RAID 5, RAID 6, and RAID 10 levels are supported.
Important: Normally, RAID 6 is highly preferred and the default while using the Data
Storage Graphical Interface (DS GUI). As with large drives in particular, the RAID rebuild
times (after one drive failure) become larger. Using RAID 6 reduces the danger of data loss
due to a double-RAID failure. For more information, see this IBM Documentation web
page.
A rank, which is a logical representation for the physical array, is relevant for IBM Storage
Virtualize because of the creation of a fixed-block (FB) pool for each array that you want to
virtualize. Ranks in DS8000 are defined in a one-to-one relationship to arrays. It is for this
reason that a rank is defined as using only one array.
An extent pool or storage pool in DS8000 is a logical construct to add the extents from a set of
ranks, forming a domain for extent allocation to a logical volume.
In synthesis, a logical volume consists of a set of extents from one extent pool or storage
pool. DS8900F supports up to 65,280 logical volumes.
A logical volume that is composed of fix block extents is called logical unit number (LUN). An
FB LUN consists of one or more 1 GiB (large) extents, or one or more 16 MiB (small) extents
from one FB extent pool. A LUN is not allowed to cross extent pools. However, a LUN can
have extents from multiple ranks within the same extent pool.
Important: DS8000 Copy Services does not support FB logical volumes larger than 2 TiB.
Therefore, you cannot create a LUN that is larger than 2 TiB if you want to use Copy
Services for the LUN, unless the LUN is integrated as MDisks in an IBM FlashSystem. Use
IBM Storage Virtualize Copy Services instead. Based on the considerations, the maximum
LUN sizes to create for a DS8900F and present to IBM FlashSystem are as follows:
16 TB LUN with large extents (1 GiB)
16 TB LUN with small extents (16 MiB) for DS8880F with R8.5 or later and for DS8900F
R9.0 or later
Logical subsystems (LSSs) are another logical construct, and they are mostly used with FB
volumes. Thus, a maximum of 255 LSSs can exist on DS8900F. For more information, see
this IBM Documentation web page.
The concepts of virtualization of DS8900F for IBM Storage Virtualize are shown in Figure 3-4.
Connectivity considerations
The number of DS8000 ports that are used is at least eight. With large and
workload-intensive configurations, consider using more ports, up to 16, which is the maximum
that is supported by IBM FlashSystem.
Generally, use ports from different host adapters and, if possible, from different I/O
enclosures. This configuration is also important because during a DS8000 Licensed Internal
Code (LIC) update, a host adapter port might need to be taken offline. This configuration
allows the IBM Storage Virtualize I/O to survive a hardware failure on any component on the
storage area network (SAN) path.
For more information about SAN best practices and connectivity, see Chapter 2, “Storage
area network guidelines” on page 121.
Defining storage
To optimize DS8000 resource utilization, use the following guidelines:
Distribute capacity and workload across device adapter (DA) pairs.
Balance the ranks and extent pools between the two DS8000 internal servers to support
the corresponding workloads on them.
Spread the logical volume workload across the DS8000 internal servers by allocating the
volumes equally on rank groups 0 and 1.
Use as many disks as possible. Avoid idle disks, even if all storage capacity is not to be
used initially.
Consider using multi-rank extent pools.
222 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Stripe your logical volume across several ranks, which is the default for multi-rank extent
pools.
The DS8000 series controllers assign server (controller) affinity to ranks when they are added
to an extent pool. Ranks that belong to an even-numbered extent pool have an affinity to
Server 0, and ranks that belong to an odd-numbered extent pool have an affinity to Server 1.
Figure 3-5 shows an example of a configuration that results in a 50% reduction in available
bandwidth.
Arrays on each of the DA pairs are accessed only by one of the adapters. In this case, all
ranks on DA pair 0 are added to even-numbered extent pools, which means that they all have
an affinity to Server 0. Therefore, the adapter in Server 1 is sitting idle. Because this condition
is true for all four DA pairs, only half of the adapters are actively performing work. This
condition can also occur on a subset of the configured DA pairs.
Example 3-6 shows the invalid configuration, as depicted in the CLI output of the lsarray and
lsrank commands. The arrays that are on the same DA pair contain the same group number
(0 or 1), meaning that they have affinity to the same DS8000 series server. Here, Server 0 is
represented by Group 0, and server1 is represented by group1.
As an example of this situation, consider arrays A0 and A4, which are attached to DA pair 0.
In this example, both arrays are added to an even-numbered extent pool (P0 and P4) so that
both ranks have affinity to Server 0 (represented by Group 0), which leaves the DA in Server
1 idle.
Example 3-6 Command output for the lsarray and lsrank commands
dscli> lsarray -l
Date/Time: Oct 20, 2016 12:20:23 AM CEST IBM DSCLI Version: 7.8.1.62 DS: IBM.2107-75L2321
Array State Data RAID type arsite Rank DA Pair DDMcap(10^9B) diskclass
===================================================================================
A0 Assign Normal 5 (6+P+S) S1 R0 0 146.0 ENT
A1 Assign Normal 5 (6+P+S) S9 R1 1 146.0 ENT
A2 Assign Normal 5 (6+P+S) S17 R2 2 146.0 ENT
A3 Assign Normal 5 (6+P+S) S25 R3 3 146.0 ENT
A4 Assign Normal 5 (6+P+S) S2 R4 0 146.0 ENT
A5 Assign Normal 5 (6+P+S) S10 R5 1 146.0 ENT
A6 Assign Normal 5 (6+P+S) S18 R6 2 146.0 ENT
A7 Assign Normal 5 (6+P+S) S26 R7 3 146.0 ENT
dscli> lsrank -l
Date/Time: Oct 20, 2016 12:22:05 AM CEST IBM DSCLI Version: 7.8.1.62 DS: IBM.2107-75L2321
ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts
======================================================================================
R0 0 Normal Normal A0 5 P0 extpool0 fb 779 779
R1 1 Normal Normal A1 5 P1 extpool1 fb 779 779
R2 0 Normal Normal A2 5 P2 extpool2 fb 779 779
R3 1 Normal Normal A3 5 P3 extpool3 fb 779 779
R4 0 Normal Normal A4 5 P4 extpool4 fb 779 779
R5 1 Normal Normal A5 5 P5 extpool5 fb 779 779
R6 0 Normal Normal A6 5 P6 extpool6 fb 779 779
R7 1 Normal Normal A7 5 P7 extpool7 fb 779 779
224 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Figure 3-6 shows a configuration that balances the workload across all four DA pairs.
Figure 3-7 shows a correct configuration, as depicted in the CLI output of the lsarray and
lsrank commands. The output shows that this configuration balances the workload across all
four DA pairs with an even balance between odd and even extent pools. The arrays that are
on the same DA pair are split between groups 0 and 1.
226 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Therefore, when sizing the number of LUNs or MDisks to present to the IBM Storage
Virtualize, the suggestion is to present at least 2 - 4 volumes per path. So, using the
maximum of 16 paths, create 32, 48, or 64 DS8000 volumes. For this configuration,
IBM Storage Virtualize maintains a good queue depth.
To maintain the highest flexibility and for easier management, large DS8000 extent pools are
beneficial. However, if the DS8000 installation is dedicated to shared-nothing environments,
such as Oracle ASM, IBM Db2® warehouses, or IBM General Parallel File System (GPFS),
use the single-rank extent pools.
LUN masking
For a storage controller, all IBM Storage Virtualize nodes must detect the same set of LUs
from all target ports that are logged in. If the target ports are visible to the nodes or canisters
that do not have the same set of LUs assigned, IBM Storage Virtualize treats this situation as
an error condition and generates error code 1625.
You must validate the LUN masking from the storage controller and then confirm the correct
path count from within IBM Storage Virtualize.
The DS8000 series controllers perform LUN masking that is based on the volume group.
Example 3-7 shows the output of the showvolgrp command for volume group V0, which
contains 16 LUNs that are presented to a 2-node IBM Storage Virtualize cluster.
Example 3-8 shows output for the lshostconnect command from the DS8000 series. In this
example, four ports of the two-node cluster are assigned to the same volume group (V0), so
they are assigned to the same four LUNs.
In Example 3-8, you can see that only the IBM Storage Virtualize WWPNs are assigned
to V0.
Attention: Data corruption can occur if the same LUN is assigned to IBM Storage
Virtualize nodes and other devices, such as hosts that are attached to DS8000.
Next, you see how IBM Storage Virtualize detects these LUNs if the zoning is properly
configured. The MDisk Link Count (mdisk_link_count) represents the total number of MDisks
that are presented to the IBM Storage Virtualize cluster by that specific controller.
Example 3-9 shows the general details of the output storage controller by using the system
CLI.
An example of the preferred configuration is shown in Figure 3-8. Four storage pools or extent
pools (one even and one odd) of DS8900F are joined into one IBM Storage Virtualize storage
pool.
Figure 3-8 Four DS8900F extent pools as one IBM Storage Virtualize storage pool
To determine how many logical volumes must be created to present to IBM Storage Virtualize
as MDisks, see 3.3.2, “Guidelines for creating an optimal back-end configuration” on
page 216.
228 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
From an IBM Storage Virtualize perspective, an XIV type 281x controller can consist of more
than one WWPN. However, all are placed under one WWNN that identifies the entire XIV
system.
When implemented in this manner, statistical metrics are more effective because
performance can be collected and analyzed on the IBM Storage Virtualize node level.
A detailed procedure to create a host on XIV is available in IBM XIV Gen3 with IBM System
Storage SAN Volume Controller and Storwize V7000, REDP-5063.
Volume considerations
As modular storage, XIV storage can be available in a minimum of six modules and up to a
maximum of 15 modules in a configuration. Each additional module added to the
configuration increases the XIV capacity, CPU, memory, and connectivity.
Figure 3-9 on page 230 shows how XIV configurations vary according to the number of
modules that are on the system.
Although XIV has its own queue depth characteristics for direct host attachment, the best
practices that are described in 3.3.2, “Guidelines for creating an optimal back-end
configuration” on page 216 are preferred when you virtualize XIV with IBM Storage Virtualize.
Table 3-12 lists the suggested volume sizes and quantities for IBM Storage Virtualize on the
XIV systems with different drive capacities.
230 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Other considerations
Consider the following restrictions when using the XIV system as back-end storage for
IBM Storage Virtualize:
Volume mapping
When mapping a volume, you must use the same LUN ID to all IBM Storage Virtualize
nodes. Therefore, map the volumes to the cluster, not to individual nodes.
XIV storage pools
When creating an XIV storage pool, define the Snapshot Size as 0. Snapshot space does
not need to be reserved because it is not recommended that you use XIV snapshots on
LUNs mapped as MDisks. The snapshot functions should be used on the
IBM Storage Virtualize level.
Because all LUNs on a single XIV system share performance and capacity characteristics,
use a single IBM Storage Virtualize storage pool for a single XIV system.
Thin provisioning
XIV thin-provisioning pools are not supported by IBM Storage Virtualize. Instead, you must
use a regular pool.
Copy functions for XIV models
You cannot use advanced copy functions, such as taking a snapshot and remote
mirroring, for XIV models with disks that are managed by IBM Storage Virtualize.
For more information about the configuration of XIV behind IBM FlashSystem, see IBM XIV
Gen3 with IBM System Storage SAN Volume Controller and Storwize V7000, REDP-5063.
There are several considerations when you are attaching an IBM FlashSystem A9000 or
A9000R system as a back-end controller.
Volume considerations
IBM FlashSystem A9000 and A9000R designate resources to data reduction, and because
this designation is always on, it is advised that data reduction be done only in
IBM FlashSystem A9000 or A9000R and not in the IBM Storage Virtualize cluster. Otherwise,
when IBM FlashSystem A9000 or A9000R tries to reduce the data, unnecessary additional
latency occurs.
Estimated data reduction is important because it helps determine volume size. Always try to
use a conservative data-reduction ratio when attaching IBM FlashSystem A9000 or A9000R
because the storage pool goes offline if the back-end storage runs out of capacity.
The remaining usable capacity can be added to the storage pool after the system reaches a
stable date reduction ratio.
4 08 32 08 Controllers 1 - 4, port 1
Controllers 5 - 8, port 3
5 10 40 10 Controllers 1 - 5, port 1
Controllers 6 - 10, port 3
6 12 48 12 Controllers 1 - 6, port 1
Controllers 7 - 12, port 3
It is important not to run out of hard capacity on the back-end storage because doing so takes
the storage pool offline. It is important to closely monitor the IBM FlashSystem A9000 or
A9000R. If you start to run out of space, you can use the migration functions of IBM Storage
Virtualize to move data to another storage system.
232 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
The biggest concern about the number of volumes is to ensure that there is adequate queue
depth. Given that the maximum volume size on the IBM FlashSystem A9000 or A9000R is
1 PB and you are ensuring two volumes per path, you should be able to create a few larger
volumes and still have good queue depth and not have numerous volumes to manage.
Other considerations
IBM Storage Virtualize can detect that the IBM FlashSystem A9000 controller is using
deduplication technology and show that the Deduplication attribute of the MDisk is Active.
Deduplication status is important because it allows IBM Storage Virtualize to enforce the
following restrictions:
Storage pools with deduplicated MDisks should contain only MDisks from the same
IBM FlashSystem A9000 or IBM FlashSystem A9000R storage controller.
Deduplicated MDisks cannot be mixed in an Easy Tier enabled storage pool.
3.4.4 Considerations for IBM FlashSystem 5000, 5100, 5200, 7200, 7300, 9100,
9200, and 9500 and IBM SVC SV1, SV2, and SV3
Recommendations that are described in this section apply to a solution with the
IBM FlashSystem family or IBM Storwize family system that is virtualized by IBM Storage
Virtualize.
Connectivity considerations
It is expected that N_Port ID Virtualization (NPIV) is enabled on both systems: the system
that is virtualizing storage, and the system that works as a back-end. Zone “host” or “virtual”
WWPNs of the back-end system to physical WWPNs of the front-end or virtualizing system.
For more information about SAN and zoning best practices, see Chapter 2, “Storage area
network guidelines” on page 121.
System layers
IBM Storage Virtualize systems have a concept of system layers. There are two layers:
storage and replication. Systems that are configured into a storage layer can work as
back-end storage. Systems that are configured into replication layer can virtualize other
IBM Storage Virtualize clusters and use them as back-end controllers.
Systems that are configured with the same layer can be replication partners. Systems in the
different layers cannot.
For more information and instructions and limitations, see this IBM Documentation web page.
Four adapter 4/32 66% Recommended for high IOPS workloads and
cages systems with more than 16 drives, or remote
Two per controller copy or HyperSwap with a low host port count.
An extra memory upgrade should also be used.
Six adapter cages 6/48 100% Recommended for higher port count host
Three per attach and high-bandwidth workloads. An extra
controller memory upgrade should also be used.
234 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Four adapter 4/32 66% Recommended for high IOPS workloads and
cages systems, or remote copy or HyperSwap with a
low host port count. An extra memory upgrade
should be used.
Six adapter cages 6/48 100% Recommended for high host fan-in attach and
high-bandwidth workloads. An extra memory
upgrade should also be used.
Automatic configuration
IBM FlashSystem family systems can be automatically configured for optimal performance as
a back-end storage behind SVC.
An automatic configuration wizard must be used on a system that has no volumes, pools, and
host objects that are configured. An available wizard configures internal storage devices,
creates volumes, and maps to the host object, which represents the SVC.
Internal storage that is attached to the back-end system must be joined into RAID arrays. You
might need one or more DRAID 6 arrays, depending on the number and the type of available
drives. For RAID recommendations, see 3.2.2, “Array considerations” on page 206.
Consider creating a separate disk pool for each type (tier) of storage and use the Easy Tier
function on a front-end system. Front-end IBM FlashSystem family systems cannot monitor
the Easy Tier activity of the back-end storage.
If Easy Tier is enabled on front- and back-end systems, they independently rebalance the hot
areas according to their own heat map. This process causes a rebalance over a rebalance.
Such a situation can eliminate the performance benefits of extent reallocation. For this
reason, Easy Tier must be enabled only on one level (preferably the front end).
For more information about recommendations about Easy Tier with external storage, see 4.2,
“Knowing your data and workload” on page 250.
For most use cases, standard pools are preferred to data-reduction pools on the back-end
storage. If planned, the front end performs reduction. Data reduction on both levels is not
recommended because it adds processing overhead and does not result in capacity savings.
If Easy Tier is disabled on the back-end as advised here, the back-end IBM FlashSystem pool
extent size is not a performance concern.
Consider enabling host UNMAP support to achieve better capacity management if the system
that is going to be virtualized meets the following qualifications:
Contains FCMs.
Contains flash only (no HDDs).
Consider leaving host UNMAP disabled to protect a virtualized system from being
over-loaded if you are going to virtualize a hybrid system and the storage that will be
virtualized uses HDDs.
To turn on or ff host UNMAP support, use the chsystem CLI command. For more information,
see this IBM Documentation web page.
Volume considerations
Volumes in IBM FlashSystem can be created as striped or sequential. The general rule is to
create striped volumes. Volumes on a back-end system must be fully allocated.
For all-flash solutions, create 32 volumes from the available pool capacity, which can be
reduced to 16 or even 8 for small arrays (for example, if you have 16 or fewer flash drives in a
back-end pool). For FCM arrays, the number of volumes is also governed by load distribution.
32 volumes out of a pool with an FCM array is recommended.
When choosing volume size, consider which system (front-end or back-end) performs the
compression. If data is compressed and deduplicated on the front-end IBM Storage Virtualize
system, FCMs cannot compress it further, which results in a 1:1 CR.
Therefore, the back-end volume size is calculated from the pool physical capacity that is
divided by the number of volumes (16 or more).
Example: Assume that you have an IBM FlashSystem 9200 with twenty-four 19.2 TB
modules. This configuration provides a raw disk capacity of 460 TB, with 10+P+Q DRAID 6
and one distributed spare, and the physical array capacity is 365 TB or 332 TiB.
Because it is not recommended to provision more than 85% of physical flash, we have
282 TiB. Because we do not expect any compression on FCM (the back-end is getting data
that is compressed by upper levels), we provision storage to an upper level and assume
1:1 compression, which means we create 32 volumes of 282 TiB / 32 = 8.8 TiB each.
If IBM Storage Virtualize is not compressing data, space savings are achieved with FCM
hardware compression. Use compression-estimation tools to determine the expected CR and
use a smaller ratio for further calculations (for example, if you expect 4.5:1 compression, use
4.3:1). Determine the volume size by using the calculated effective pool capacity.
236 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Example: Assume that you have an IBM FlashSystem 7300 with twelve 9.6 TB modules.
This configuration provides raw disk capacity of 115 TB, with 9+P+Q DRAID 6 and one
distributed spare. The physical capacity is 85 TB or 78 TiB.
Because it is not recommended to provision more than 85% of a physical flash, we have
66 TiB. The Compresstimator shows that we can achieve a 3.2:1 CR; decreasing in and
assuming 3:1, we have 66 TiB x 3 = 198 TiB of effective capacity.
Create 16 volumes of 198 TiB / 16 = 12.4 TiB each. If the CR is higher than expected, we
can create and provision more volumes to the front end.
When you configure IBM FlashSystem 900 as a back-end for IBM Storage Virtualize systems,
you must remember the considerations that are described in this section.
Defining storage
IBM FlashSystem 900 supports up to 12 IBM MicroLatency modules. IBM MicroLatency
modules are installed in IBM FlashSystem 900 based on the following configuration
guidelines:
A minimum of four MicroLatency modules must be installed in the system. RAID 5 is the
only supported configuration for IBM FlashSystem 900.
The system supports configurations of 4, 6, 8, 10, and 12 MicroLatency modules in
RAID 5.
All MicroLatency modules that are installed in the enclosure must be identical in capacity
and type.
For optimal airflow and cooling, if fewer than 12 MicroLatency modules are installed in the
enclosure, populate the module bays beginning in the center of the slots and adding the
modules on either side until all 12 slots are populated.
The array configuration is performed during system setup. The system automatically creates
MDisks or arrays and defines the RAID settings based on the number of flash modules in the
system. The default supported RAID level is RAID 5.
Volume considerations
To fully use all IBM Storage Virtualize system resources, create 32 volumes (or 16 volumes if
IBM FlashSystem 900 is not fully populated). This way, all CPU cores, nodes, and FC ports of
the virtualizer are fully used.
However, one important factor must be considered when volumes are created from a pure
IBM FlashSystem 900 MDisks storage pool. IBM FlashSystem 900 can process I/Os much
faster than traditional storage. Sometimes, it is even faster than cache operations because
with cache, all I/Os to the volume must be mirrored to another node in I/O group.
This operation can take as much as 1 millisecond while I/Os that are issued directly (which
means without cache) to IBM FlashSystem 900 can take 100 - 200 microseconds. So, in
some rare use cases, it might be recommended to disable the IBM Storage Virtualize cache
to optimize for maximum IOPS.
3.4.6 Path considerations for third-party storage with EMC VMAX, EMC
PowerMAX, and Hitachi Data Systems
Many third-party storage options are available and supported. This section describes the
multipathing considerations for EMC VMAX, EMC PowerMax, and Hitachi Data Systems
(HDS).
Most storage controllers, when presented to IBM Storage Virtualize, are recognized as a
single worldwide node name (WWNN) per controller. However, for some EMC VMAX,
EMC PowerMAX, and HDS storage controller types, the system recognizes each port as a
different WWNN. For this reason, each storage port, when zoned to IBM Storage Virtualize,
appears as a different external storage controller.
Note: This section does not cover IP-attached quorum disks. For more information about
IP-attached quorum disks, see Chapter 7, “Ensuring business continuity” on page 501.
After internal drives are prepared to be added to an array or external MDisks become
managed, a small portion of the capacity is reserved for quorum data. The size is less than
0.5 GiB for a drive and not less than one pool extent for an MDisk.
Three devices from all available internal drives and managed MDisks are selected for the
quorum disk role. They store system metadata, which is used for cluster recovery after a
disaster. Despite only three devices that are designated as quorum disks, capacity for quorum
data is reserved on each of them because the designation might change (for example, if a
quorum disk has a physical failure).
238 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch3-Storage backend.fm
Only one of those disks is selected as the active quorum disk. It is used as a tie-breaker. If as
a result of a failure the cluster is split in half and both parts lose sight of each other (for
example, the inter-site link failed in a HyperSwap cluster with two I/O groups), they appeal to
the tie-breaker active quorum device. The half of the cluster nodes that can reach and reserve
the quorum disk after the split occurs lock the disk and continue to operate. The other half
stops its operation. This design prevents both sides from becoming inconsistent with each
other.
The storage device must match following criteria to be considered a quorum candidate:
The internal drive or module must follow these rules:
– It must be a member of an array or be a candidate.
– Not be in the “Unused” state.
– The MDisk must be in the “Managed” state. MDisks that are in the “Unmanaged” or
“Image” states cannot be quorum disks.
External MDisks can be provisioned over only FC.
An MDisk must be presented by a disk subsystem (LUNs) that are supported as quorum
disks.
The system uses the following rules when selecting quorum devices:
Fully connected candidates are preferred over partially connected candidates.
In a multiple enclosure environment, MDisks are preferred over drives.
Drives are preferred over MDisks.
If there is only one control enclosure and no external storage in the cluster, drives are
considered first.
Drives from a different control enclosure are preferred over a second drive from the same
enclosure.
If the IBM Storage Virtualize system contains more than one I/O group, at least one of the
candidates from each group is selected.
NVMe drives are preferred over SAS drives.
NVMe drives in a control enclosure are chosen rather than an SAS expansion drive.
To become an active quorum device (tie-breaker device), the storage must be visible to all
nodes in a cluster.
To list the IBM Storage Virtualize Storage quorum devices, run the lsquorum command, as
shown in Example 3-10.
To move the quorum assignment, use the chquorum command. The command is not
supported on NVMe drives, so you can move it only from the NVMe drive, but not to the
NVMe drive.
240 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Standard pools have been available since the initial release of IBM Storage Virtualize in 2003
and can include fully allocated or thin-provisioned volumes.
Note: With the current hardware and software generation, support for a volume-level
compression with standard pools is withdrawn. Standard pools cannot be configured to
use IBM Real-time Compression (RtC). Only IBM FlashCore Module (FCM) level
compression is available.
Data reduction pools were introduced with Storage Virtualize release 8.1.0. DRPs increase
infrastructure capacity usage by employing new efficiency functions and reducing storage
costs. The pools enable you to automatically de-allocate (unmap) and reclaim the capacity of
thin-provisioned and compressed volumes that contain deleted data. In addition, the pools
enable this reclaimed capacity to be reused by other volumes. Data reduction pools allow
volume-level compression and cross-volume deduplication.
Either pool type can be made up of different tiers. A tier defines a performance characteristic
of that subset of capacity in the pool. Every pool supports three tier types (fastest, average,
and slowest). The tiers and their usage are managed automatically by the Easy Tier function
inside the pool.
Either pool type can have child pools. With standard pools, child pools allow to dedicate
reserve) a part of pool capacity to a subset of volumes. With DRP, child pools are quotaless.
With the both pool types, child pools can have an independent from a parent pool throttle
setting (performance limit) and an independent provisioning policy.
You can create child pools within a standard pool. A pool, which contains child pools is
referred to as a parent pool. A parent pool has all the capabilities and functions of a regular
pool, but a part of its capacity is reserved for a child. A child pool is a logical subdivision of a
storage pool or MDisk group. Like a parent pool, a child pool supports volume creation and
migration.
When you create a child pool in a standard parent pool, you must specify a capacity limit for
the child pool. This limit allows for a quota of capacity to be allocated to the child pool. This
242 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
capacity is reserved for the child pool and subtracts from the available capacity in the parent
pool.
A child pool inherits its tier setting from the parent pool. Changes to a parent’s tier setting are
inherited by child pools. The child pool also inherits Easy Tier status, pool status, capacity
information, and back-end storage information. The I/O activity of a parent pool is the sum of
the I/O activity of itself and the child pools.
Parent pools
Parent pools receive their capacity from MDisks. To track the space that is available on an
MDisk, the system divides each MDisk into chunks of equal size. These chunks are called
extents and they are indexed internally. The choice of extent size affects the total amount of
storage that is managed by the system. The extent size remains constant throughout the
lifetime of the parent pool.
All MDisks in a pool are split into extents of the same size. Volumes are created from the
extents that are available in the pool. You can add MDisks to a pool at any time to increase
the number of extents that are available for new volume copies or to expand volume copies.
The system automatically balances volume extents between the MDisks to provide the best
performance to the volumes by using EasyTier function.
Choose your extent size wisely according to your future needs. A small extent size limits your
overall usable capacity, but a larger extent size can waste storage. For example, if you select
an extent size of 8 GiB but then create only a 6 GiB volume, one entire extent is allocated to
this volume (8 GiB) and 2 GiB is unused.
When you create or manage a standard pool, consider the following general guidelines:
An MDisk can be associated with only one pool.
You can add only MDisks that are in unmanaged mode to a parent pool. When MDisks are
added to a parent pool, their mode changes from unmanaged to managed.
Ensure that all MDisks that are allocated to the same tier of a parent pool have the same
redundant array of independent disks (RAID) level. This configuration ensures that the
same resiliency is maintained across that tier. Similarly, for performance reasons, do not
mix RAID types within a tier. The performance of all volumes is reduced to the lowest
achiever in the tier, and a mismatch of tier members can result in I/O convoying effects
where everything is waiting on the slowest member.
You can delete MDisks from a parent pool under the following conditions:
– The volumes are not using any of the extents that are on the MDisk.
– Enough free extents are available elsewhere in the pool to move extents that are in use
from this MDisk.
If the parent pool is deleted, you cannot recover the mapping that existed between extents
that are in the pool or the extents that the volumes use. If the parent pool includes
associated child pools, you must delete the child pools first and return its extents to the
parent pool. After the child pools are deleted, you can delete the parent pool. The MDisks
that were in the parent pool are returned to unmanaged mode and can be added to other
parent pools. Because the deletion of a parent pool can cause a loss of data, you must
force the deletion if volumes are associated with it.
If you force-delete a pool, all volumes in that pool are deleted, even if they are mapped
to a host and are still in use. Use extreme caution when force-deleting pool objects
because volume-to-extent mapping cannot be recovered after the delete is processed.
Force-deleting a storage pool is possible only with command-line interface (CLI) tools.
For more information, see the man page for the rmmdiskgrp command.
You should specify a warning capacity for a pool. A warning event is generated when the
amount of space that is used in the pool exceeds the warning capacity. The warning
threshold is especially useful with thin-provisioned volumes that are configured to
automatically use space from the pool.
Volumes are associated with just one pool, except during any migration between parent
pools. However, volume copies of the same volume can be in different pools.
Volumes that are allocated from a parent pool are by default striped across all the storage
assigned into that parent pool. Wide striping can provide performance benefits.
You cannot use the volume migration functions to migrate volumes between parent pools
that feature different extent sizes. However, you can use volume mirroring to move data to
a parent pool that has a different extent size.
When you delete a pool with mirrored volumes, consider the following points:
– If the volume is mirrored and the synchronized copies of the volume are all in the same
pool, the mirrored volume is destroyed when the storage pool is deleted.
– If the volume is mirrored and a synchronized copy exists in a different pool, the volume
copy remains after the pool is deleted.
You might not be able to delete a pool or child pool if Volume Delete Protection is enabled. In
version 8.3.1 and later, Volume Delete Protection is enabled by default. However, the
granularity of protection is improved: You can now specify Volume Delete Protection to be
enabled or disabled on a per-pool basis rather than on a system basis as was previously the
case.
Child pools
Instead of being created directly from MDisks, child pools are created from existing capacity
that is allocated to a parent pool. As with parent pools, volumes can be created that
specifically use the capacity that is allocated to the child pool. Child pools are like parent
pools with similar properties and can be used for volume copy operation.
Child pools are created with fully allocated physical capacity, that is, the physical capacity that
is applied to the child pool is reserved from the parent pool, as though you created a fully
allocated volume of the same size in the parent pool.
The allocated capacity of the child pool must be smaller than the free capacity that is available
to the parent pool. The allocated capacity of the child pool is no longer reported as the free
space of its parent pool. Instead, the parent pool reports the entire child pool as used
capacity. You must monitor the used capacity (instead of the free capacity) of the child pool
instead.
244 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
When you create or work with a child pool, consider the following information:
As with parent pools, you can specify a warning threshold that alerts you when the
capacity of the child pool is reaching its upper limit. Use this threshold to ensure that
access is not lost when the capacity of the child pool is close to its allocated capacity.
Ensure that any child pools that are associated with a parent pool have enough capacity
for the volumes that are in the child pool before removing MDisks from a parent pool. The
system automatically migrates all extents that are used by volumes to other MDisks in the
parent pool to ensure that data is not lost.
You cannot shrink the capacity of a child pool to less than its real capacity. The system
also resets the warning level when the child pool is shrunk, and issues a warning if the
level is reached when the capacity is shrunk.
On systems with encryption enabled, child pools can be created to migrate existing
volumes in a non-encrypted pool to encrypted child pools. When you create a child pool
after encryption is enabled, an encryption key is created for the child pool even when the
parent pool is not encrypted. Then, you can use volume mirroring to migrate the volumes
from the non-encrypted parent pool to the encrypted child pool.
The system supports migrating a copy of volumes between child pools within the same
parent pool or migrating a copy of a volume between a child pool and its parent pool.
Migrations between a source and target child pool with different parent pools are not
supported. However, you can migrate a copy of the volume from the source child pool to its
parent pool. Then, the volume copy can be migrated from the parent pool to the parent
pool of the target child pool. Finally, the volume copy can be migrated from the target
parent pool to the target child pool.
Migrating a volume between a parent pool and a child pool (with the same encryption key
or no encryption) results in a nocopy migration. The data does not move. Instead, the
extents are reallocated to the child or parent pool and the accounting of the used space is
corrected.
Child pools are created automatically by an IBM Spectrum Connect vSphere API for
Storage Awareness (VASA) client to implement VMware vSphere Virtual Volumes
(VVOLs).
A throttle can be assigned to a child pool to limit its IO rate or data rate. Child pool throttle
can used to restrict a set of volumes in it from using more performance resources than it is
desired.
Child pool provisioning policy can be different from a policy that is assigned to its parent.
Child pool can be assigned to an ownership group.
In standard pools, thin-provisioned volumes are created as a specific volume type, that is,
based on capacity-savings criteria. These properties are managed at the volume level. The
virtual capacity of a thin-provisioned volume is typically larger than its real capacity. Each
system uses the real capacity to store data that is written to the volume, and metadata that
describes the thin-provisioned configuration of the volume. As more information is written to
the volume, more of the real capacity is used.
The system identifies read operations to unwritten parts of the virtual capacity and returns
zeros to the server without the usage of any real capacity. For more information about storage
system, pool, and volume capacity metrics, see Chapter 9, “Implementing a storage
monitoring system” on page 551.
Thin-provisioned volumes can also help simplify server administration. Instead of assigning a
volume with some capacity to an application and increasing that capacity as the needs of the
application change, you can configure a volume with a large virtual capacity for the
application. Then, you can increase or shrink the real capacity as the application needs
change, without disrupting the application or server.
It is important to monitor physical capacity if you want to provide more space to your hosts
than is physically available in the pool and pool’s back-end storage. For more information
about monitoring the physical capacity of your storage and an explanation of the difference
between thin provisioning and over-allocation, see 9.4, “Creating alerts for IBM Storage
Control and IBM Storage Insights” on page 606.
If you use the compression functions that are provided by the FCM modules in your system as
a mechanism to add data reduction to a standard pool while maintaining the maximum
performance, take care to understand the capacity reporting, in particular if you want to thin
provision on top of the FCMs.
The FCM RAID array reports its written capacity limit, which can be as large as 4:1 to its
physical capacity. This capacity is the maximum that can be stored on the FCM array.
However, it might not reflect the compression savings that you achieve with your data.
You must first understand your expected compression ratio (CR). In an initial deployment,
allocate approximately 50% fewer savings.
For example, you have 100 TiB of physical usable capacity in an FCM RAID array before
compression. Your comprestimator results show savings of approximately 2:1, which
suggests that you can write 200 TiB of volume data to this RAID array.
Start at 150 TiB of volumes that are mapped to hosts. Monitor the real compression rates and
usage and over time add in the other 50 TiB of volume capacity. Be sure to leave spare space
for unexpected growth, and consider the guidelines that are outlined in 3.2, “Arrays” on
page 204.
If you often over-provision your hosts at much higher rates, you can use a standard pool and
create thin-provisioned volumes in that pool. However, be careful that you do not run out of
physical capacity. You now must monitor the back-end array used capacity. In essence, you
are double accounting with the thin provisioning, that is, expecting 2:1 on the FCM
compression, and then whatever level you over-provision at the volumes.
If you know that your hosts rarely grow to use the provisioned capacity, this process can be
safely done. However, the risk comes from run-away applications (writing large amounts of
capacity) or an administrator suddenly enabling application encryption and writing to fill the
entire capacity of the thin-provisioned volume.
246 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
DRPs also use all the thin-provisioning and data-efficiency features that you expect from IBM
Storage Virtualize storage to potentially reduce your capital expenditure (CapEx) and
operational expenditure (OpEx). All these benefits extend to over 500 heterogeneous storage
arrays from multiple vendors.
DRPs were designed with space reclamation being a fundamental consideration. DRPs
provide the following benefits:
Log Structured Array (LSA) allocation (redirect on all overwrites)
Garbage collection to free whole extents
Fine-grained (8 KB) chunk allocation/de-allocation within an extent
Unmap commands end-to-end support with automatic space reclamation
Support for compression
Support for deduplication
Support for traditional fully allocated volumes
Data reduction increase storage efficiency and reduce storage costs, especially for flash
storage. Data reduction reduces the amount of data that is stored on external storage
systems and internal drives by compressing and deduplicating capacity and reclaiming
capacity that is no longer in use.
Object-based access control (OBAC) or multi-tenancy can be applied to DRP child pools or
volumes because OBAC requires a child pool to function.
The internal layout of a DRP is different from a standard pool. A standard pool creates volume
objects within the pool. Some fine-grained internal metadata is stored within a
thin-provisioned or real-time-compressed volume in a standard pool. Overall, the pool
contains volume objects.
A DRP reports volumes to the user in the same way as a standard pool. All volumes in a
single DRP use the same Customer Data Volume to store their data. Therefore, deduplication
is possible across volumes in a single DRP. There is a Directory Volume for each user volume
that is created within the pool. The directory points to grains of data that is stored in the
Customer Data Volume.
Other internal volumes are created, one per DRP. There is one Journal Volume per I/O group
that can be used for recovery purposes and to replay metadata updates if needed. There is
one Reverse Lookup Volume per I/O group that is used by garbage collection.
Figure 4-1 shows the difference between DRP volumes and volumes in standard pools.
The Customer Data Volume normally uses greater than 97% of pool capacity. The I/O pattern
is a large sequential write pattern (256 KB) that is coalesced into full stride writes, and you
typically see a short, random read pattern.
Directory Volumes occupy approximately 1% of pool capacity. They typically have a short
4 KB random read/write I/O. The Journal Volume occupies approximately 1% of pool capacity,
and shows large sequential write I/O (256 KB typically).
Journal Volumes are only read for recovery scenarios (for example, T3 recovery). Reverse
Lookup Volumes are used by the garbage-collection process and occupy less than 1% of pool
capacity. Reverse Lookup Volumes have a short, semi-random read/write pattern.
The primary task of garbage collection (see Figure 4-2 on page 249) is to reclaim space, that
is, to track all the regions that were invalidated and make this capacity usable for new writes.
As a result of compression and deduplication, when you overwrite a host-write, the new data
does not always use the same amount of space that the previous data used. This issue leads
to the writes always occupying new space on back-end storage while the old data is still in its
original location.
248 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Stored data is divided into regions. As data is overwritten, a record is kept of which areas of
those regions were invalidated. Regions that have many invalidated parts are potential
candidates for garbage collection. When most of a region has invalidated data, it is
inexpensive to move the remaining data to another location, which frees the whole region.
DRPs include built-in services to enable garbage collection of unused blocks. Therefore,
many smaller unmaps end up enabling a much larger chunk (extent) to be freed back to the
pool. Trying to fill small holes is inefficient because too many I/Os are needed to keep reading
and rewriting the directory. Therefore, garbage collection waits until an extent has many small
holes and moves the remaining data into the extent, compacts the data, and rewrites the
data. When there is an empty extent, it can be freed back to the virtualization layer (and
back-end with unmap) or start writing into the extent with new data (or rewrites).
The reverse lookup metadata volume tracks the extent usage, or more importantly the holes
that are created by overwrites or unmaps. Garbage collection looks for extents with the most
unused space. After a whole extent has all valid data moved elsewhere, it can be freed back
to the set of unused extents in that pool or reused for new written data.
Garbage collection needs free regions to move data during its operation, so it is suggested
that you size pools to keep a specific amount of free capacity available. This best practice
ensures that there is some free space for garbage collection. For more information, see 4.4.6,
“Understanding capacity use in a data reduction pool” on page 274.
A fully allocated volume uses the entire capacity of the volume. When the volume is created,
the space is reserved (used) from the DRP and not available for other volumes in the DRP.
Data is not deduplicated or compressed in a fully allocated volume. Similarly, because the
volume does not use the internal fine-grained allocation functions, technically it operates in
the same way as a fully allocated volume in a standard pool.
Deduplication in DRP runs on a pool basis, so deduplication is performed across all the
volumes in a single data reduction pool. The DRP first looks for deduplication matches, and
then it compresses the data before writing to the storage.
It is not recommended to use thin-only volumes in DRP on an FCM backend. All hardware
platforms that support FCMs, have compression acceleration hardware, therefore
performance penalty for compression will be minimal. At the same time, using pool level
compression on top of FCMs simplifies capacity monitoring. So, it is not recommended to use
thin-only volumes in such configurations, as there is no practical benefit in that.
Note: When the back-end storage is thin-provisioned or data-reduced, the GUI will not
offer the option to create only thin-provisioned volumes in a DRP. GUI aims to comply with
the best practice, and thin-only volume on FCM backend can cause capacity monitoring
difficulties. You can still use CLI to create volumes with this capacity savings type if
required.
250 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
In Storage Virtualize, both FCM compression and DRP compression are inline, which means
that it happens when the data is being written, rather than an attempt to compress data as a
background task.
Data compression techniques depend on the type of data that must be compressed and on
the needed performance. Effective compression savings generally rely on the accuracy of
your planning and understanding whether the specific data is compressible or not.
The compression is lossless, that is, data is compressed without losing any of the data. The
original data can be recovered after the compress or expand cycle. Good compression
savings might be achieved in the data types such as:
Virtualized Infrastructure
Database and data warehouse
Home directory, shares, and shared project data
CAD/CAM
Oil and gas data
Log data
Software development
Text and some picture files
However, if the data is compressed in some cases, the savings are less, or even negative.
Pictures (for example, GIF, JPG, and PNG), audio (MP3 and WMA) and video or audio (AVI
and MPG), and even compressed databases data might not be good candidates for
compression.
Table 4-1 lists the compression ratio that is expected for the common data types.
Databases Up to 80%
Email Up to 80%
If the data is encrypted by host or application before it is written to the Storage Virtualize
system, it cannot be compressed. Compressing already encrypted data does not result in
much savings because it contains pseudo-random data. The compression algorithm relies on
patterns to gain efficient size reduction. Because encryption destroys such patterns, the
compression algorithm would be unable to provide much data reduction.
Note: Saving assumptions that are based on the type of data are imprecise. Therefore,
you should determine compression savings with the proper tools. See 4.2.3, “Data
reduction estimation tools” on page 252 for tools information.
Deduplication is done by using hash tables to identify previously written copies of data. If
duplicates are found, instead of writing the data to disk, the algorithm references the
previously found data.
Deduplication uses 8 KiB deduplication grains and an SHA-1 hashing algorithm.
DRPs build 256 KiB chunks of data consisting of multiple de-duplicated and compressed
8 KiB grains.
DRPs write contiguous 256 KiB chunks for efficient write streaming with the capability for
cache and RAID to operate on full stride writes.
DRPs provide deduplication and then compress capability.
The scope of deduplication is within a DRP within an I/O Group.
Some environments have data with high deduplication savings, and are therefore candidates
for deduplication.
Good deduplication savings can be achieved in several environments, such as virtual desktop
and some virtual machine (VM) environments. Therefore, these environments might be good
candidates for deduplication.
IBM provides the Data Reduction Estimator Tool (DRET) to help determine the deduplication
capacity-saving benefits.
Comprestimator
Comprestimator is available in the following ways:
As a stand-alone, host-based CLI utility. It can be used to estimate the expected
compression for block volumes where you do not have an IBM Storage Virtualize product
providing those volumes.
Integrated into IBM Storage Virtualize code.
Host-based Comprestimator
The Comprestimator is a CLI and host-based utility that can be used to estimate an expected
compression rate for block devices. The tool can be downloaded from this IBM Support web
page, also this page has detailed instructions on how to run it.
252 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Integrated Comprestimator
IBM Storage Virtualize also features an integrated Comprestimator tool that is available
through the management GUI and CLI. If you want to apply capacity efficiency features to
volumes that already exist in the system, you can use this tool to evaluate whether
compression will generate capacity savings.
To access the Comprestimator tool in the management GUI, select Volumes → Volumes.
If you want to analyze all the volumes in the system, select Actions → Capacity Savings →
Estimate Compression Savings.
If you want to evaluate only the capacity savings of selected volumes, select a list of volumes
and select Actions → Capacity Savings → Analyze, as shown in Figure 4-3.
To display the results of the capacity savings analysis, select Actions → Capacity
Savings → Download Savings Report, as shown in Figure 4-3, or enter the command
lsvdiskanalysis in the CLI, as shown in Example 4-1.
total_savings 578.04GB
total_savings_ratio 96.33
margin_of_error 4.97
You can customize the Volume view to view estimation results in a convenient way to help
make your decision, as shown in Figure 4-4.
Comprestimator is always enabled and running in background, so you can view the expected
capacity savings in the main dashboard view, pool views, and volume views. However, on
older codes it needs to be started manually by triggering the “estimate” or “analyze” tasks.
DRET uses advanced mathematical and statistical algorithms to perform an analysis with a
low memory “footprint”. The utility runs on a host that can access the devices to be analyzed.
It performs only read operations, so it has no effect on the data that is stored on the device.
Depending on the configuration of the environment, in many cases the DRET is used on more
than one host to analyze more data types.
It is important to understand block device behavior when analyzing traditional (fully allocated)
volumes. Traditional volumes that were created without initially zeroing the device might
contain traces of old data on the block device level. Such data is not accessible or viewable on
the file system level. When the DRET is used to analyze such volumes, the expected
reduction results reflect the savings rate to be achieved for all the data on the block device
level, including traces of old data.
Regardless of the block device type being scanned, it is also important to understand a few
principles of common file system space management. When files are deleted from a file
254 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
system, the space they occupied before the deletion becomes free and available to the file
system. The freeing of space occurs even though the data on disk was not removed, but the
file system index and pointers were updated to reflect this change.
When DRET is used to analyze a block device that is used by a file system, all underlying
data in the device is analyzed regardless of whether this data belongs to files that were
already deleted from the file system. For example, you can fill a 100 GB file system and use
100% of the file system, and then delete all the files in the file system, which makes it 0%
used. When scanning the block device that is used for storing the file system in this example,
DRET (or any other utility) can access the data that belongs to the files that are deleted.
To reduce the impact of the block device and file system behavior, it is recommended that you
use DRET to analyze volumes that contain as much active data as possible rather than
volumes that are mostly empty of data. This usage increases the accuracy level and reduces
the risk of analyzing old data that is deleted, but might still have traces on the device.
Note: According to the results of the DRET, use DRPs to use the available data
deduplication savings unless performance requirements exceed what DRP can deliver.
Do not enable deduplication if the data set is not expected to provide deduplication
savings.
Block size
The concept of a block size is simple and the impact on storage performance might be
distinct. Block size effects might have an impact on overall performance. Therefore,
consider that larger blocks affect performance more than smaller blocks. Understanding
and considering block sizes in the design, optimization, and operation of the storage
system-sizing leads to more predictable behavior of the entire environment.
Note: Where possible, limit the maximum transfer size that is sent to IBM FlashSystem
to no more than 256 KiB. This limitation is a best practice and not specific to only DRP.
It is important to understand the relevance of application response time rather than internal
response time with required IOPS or throughput. Typical online transaction processing
(OLTP) applications require IOPS and low latency.
Do not place capacity over performance while designing or planning a storage solution. Even
if capacity might be sufficient, the environment can suffer from low performance.
Deduplication and compression might satisfy capacity needs, but aim for performance and
robust application performance.
To size an IBM Storage Virtualize environment, your IBM account team or IBM Business
Partner must access IBM Storage Modeller (StorM). The tool can be used to determine
whether the system can provide suitable bandwidth and latency with the requested data
efficiency features enabled.
The main best practices in the storage pool planning activity are described in this section.
Most of these practices apply to both standard and DRP pools, except where otherwise
256 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
specified. For more specific best practices for DRPs, see 4.4, “Data reduction pools best
practices” on page 268.
Depending on the requirements and configuration, you can also configure multiple pools, and
pools of a different type can coexist on a single system.
Consider information provided below to select a proper pool type for your data.
Note: With the previous generation of IBM Storage Virtualize hardware, there was an RTC
compression method supported for standard pools. However, with the current hardware
generation, it is no longer available.
Capacity savings method is assigned per-volume basis in the pool. Volumes with the different
methods can be mixed in a single pool without any limitations.
If the system is equipped with FlashCore Modules, compression on FCM level is always
enabled and works independently of the pool type. This means that even with standard pools,
data will be compressed on FCMs.
Performance of Fully Allocated (FA) volumes in both pool types is the same with both pool
types. Any type of the pool can be used.
The way how thin provisioning is implemented is different: in DRP, thin provisioning is
provided by a use of DRP’s LSA data structures, and carries significant computation and I/O
overhead comparing to standard pools. Consider the following if you plan to use only thin
provisioning without pool-level compression or deduplication:
On entry-level systems that do not support compression: use thin provisioning in standard
pools.
On entry-level systems that do not have compression acceleration hardware, which is
configured with all-flash drives and where a flash-tier latency is expected: use thin
provisioning in standard pools.
To automate volume creation with the required capacity savings method, system has a
concept of provisioning policies. A provisioning policy can be assigned to a pool of any type or
its child pool, and it enforces configured with it capacity savings method on all volumes that
are newly created on the pool.
If you plan to use DRP, you need to be aware of extra limitations that they imply:
Minimum extent size is 1024 MB. For standard pools it can be smaller. (Except
Flashsystem 9500)
There is a limit for total thin-provisioned and compressed capacity for all volumes in a
single data reduction pool. Standard pools only have a limit for a single volume. For
258 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
details, refer to Configuration Limits and Restrictions web page for your hardware
platform.
When a user deletes a file on a host, operating system issues an unmap command for the
blocks that made up the file. A large amount of capacity can be freed if the user deletes (or
Vmotions) a volume on a data store on a host. This process might result in many contiguous
blocks being freed. Each of these contiguous blocks results in an unmap command that is
sent to the LUN provisioned on a storage device.
Data reduction pools support end-to-end unmap functions. Space that is deallocated by the
hosts with an unmap command results in the reduction of the used space in the volume and
the pool.
In a DRP, when a system receives an unmap command, the result is that the capacity that is
allocated within that contiguous chunk is freed. After the capacity is marked as free, it is
accounted as a pool's reclaimable capacity. Similarly, deleting a volume at the DRP level
marks all the volume´s capacity as reclaimable. Garbage collection process, which runs at the
background, works with reclaimable capacity and issues unmap commands to the system’s
backend (array or externally virtualized system) to free up unused space.
If a volume that receives unmap command belongs to a standard pool, the command is
accepted, but the thin-provisioned volume’s used capacity will not decrease, and free capacity
will not be released to a pool free space.
If back-end storage does not support unmap, commands that are sent by a host are also
accepted and processed. Internally, the system issues a write_same command of zeroes to
the corresponding back-end storage LBAs. This behavior is required per the standard
because storage system must ensure that a subsequent read to an unmapped region returns
zeros and not any other data. For particular storage devices, such as nearline serial-attached
SCSI (NL-SAS) drives, the write_same commands can create workload which is higher than
the device can sustain, which results in performance problems.
Backend unmap ensures that physical capacity on the FCM arrays is freed when a host
deletes its data. Also it helps to ensure a good MDisk performance: flash drives can reuse the
space for wear-leveling and to maintain a healthy capacity of “pre-erased” (ready to be used)
blocks.
In virtualization scenarios, for example in configurations with an IBM SVC virtualizing IBM
FlashSystem in the back-end, SVC will issue unmap or write_same commands to the
virtualized system if the system announces unmap support during SCSI discovery.
Important: A standard pool will not shrink its used space as the result of a host unmap
command processing. However, the back-end used capacity will shrink, if back-end is
space-efficient (for example, if backend is an array of FCMs).
For this reason, even with a standard pool it is still beneficial to have host unmap support
that is enabled for systems that use FCMs. If a system contains not only FCMs but a mix of
storage, thorough planning is required.
By default, host-based unmap support is disabled on all product other than the
IBM FlashSystem 9000 series, as those systems are all-flash. On systems that can be
ordered in hybrid configurations, you might need to enable host unmap support.
Back-end unmap is enabled by default on all products, and best practice is to leave it enabled.
You can check how much unmap processing is occurring on a per volume or per-pool basis
by using the performance statistics. This information can be viewed with IBM Spectrum
Control or IBM Storage Insights.
Performance monitoring helps you notice possible effects, and if the unmap workload is
affecting performance, consider taking the necessary steps and consider the data rates
that are observed. It might be expected to see GiBps of unmap if you deleted many
volumes.
You can throttle the amount of “host offload” operations (such as the SCSI UNMAP) by using
the per-node settings for offload throttle. For example:
mkthrottle -type offload -bandwidth 500
You can also stop the system from announcing unmap support to one or more host systems,
if you are not able to stop any host from issuing unmap commands by using host-side
settings. To modify unmap support for a particular host, change the host type:
chhost -type generic_no_unmap <host_id_or_name>
260 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
If you experience severe performance problems as a result of unmap operations, you can
disable unmap on the entire system or on per-host basis.
If other important factors do not lead you to choose standard pools, then DRPs are the right
choice. Using DRPs can increase storage efficiency and reduce costs because it reduces the
amount of data that is stored on hardware and reclaims previously used storage resources
that are no longer needed by host systems.
Also, DRPs provide great flexibility for future use because they add the ability of compression
and deduplication of data at the volume level in a specific pool, even if these features are
initially not used at creation time.
Note: We recommend the use of DRP pools with fully allocated volumes if the restrictions
on capacity do not affect your environment.
By design, IBM Storage Virtualize systems take the entire storage pool offline if a single
MDisk in that storage pool goes offline, which means that the storage pool’s MDisk quantity
and size define the failure domain. Reducing the hardware failure domain for back-end
storage is only part of your considerations. When you are determining the storage pool layout,
you must also consider application boundaries and dependencies to identify any availability
benefits that one configuration might have over another one.
Sometimes, reducing the hardware failure domain, such as placing the volumes of an
application into a single storage pool, is not always an advantage from the application
perspective. Alternatively, splitting the volumes of an application across multiple storage pools
increases the chances of having an application outage if one of the storage pools that is
associated with that application goes offline.
Finally, increasing the number of pools to reduce the failure domain is not always a viable
option. An example of such a situation is a system with a low number of physical drives:
creating multiple arrays to place them into different pools reduces the usable space because
of spare and protection capacity.
When virtualizing external storage, the failure domain is defined by the external storage itself
rather than by the pool definition on the front-end system. For example, if you provide 20
MDisks from external storage and all of these MDisks are using the same physical arrays, the
failure domain becomes the total capacity of these MDisks, no matter how many pools you
have distributed them across.
The following actions are the starting best practices when planning storage pools for
availability:
Create a separate pool for arrays that are created from drives connected to a single
control enclosure (IO group). A pool can join MDisks from different IO groups in
enclosure-based system, but this approach increases failure domain size and dramatically
increases inter-node traffic traversing the SAN. In SVC environment, storage is normally
connected to all IO groups, so this is not a concern.
Create a separate pool for each FCM array. Only a single FCM array is allowed in a pool.
Create separate pools for internal storage and external storage, unless you are creating a
hybrid pool that is managed by Easy Tier (see 4.3.6, “External pools” on page 267).
Create a storage pool for each external virtualized storage subsystem, unless you are
creating a hybrid pool that is managed by Easy Tier (see 4.3.6, “External pools” on
page 267).
Note: If capacity from different external storage is shared across multiple pools,
provisioning groups are created.
IBM Storage Virtualize detects that a resource (MDisk) shares its physical storage with
other MDisks and monitors provisioning group capacity. MDisks in a single provisioning
group should not be shared between storage pools because capacity consumption on
one pool can affect free capacity on other pools. System detects this condition and
shows that the pool contains shared resources.
For Easy Tier enabled storage pools, always allow free capacity for Easy Tier to deliver
better performance.
Consider implementing child pools when you must have a logical division of your volumes
for each application set. There are cases where you want to subdivide a storage pool but
maintain many MDisks in that pool. Child pools allow logical subdivision of a pool.
Throttles and provisioning policies can be set independently per child pool. In standard
pools, capacity thresholds for child pools can also be set.
When you select storage subsystems, the decision often comes down to the ability of the
storage subsystem to be more reliable and resilient, and meet application requirements.
Although IBM Storage Virtualize can provide any physical level-data redundancy for
virtualized external storages by using advanced features, with the availability characteristics
of the storage subsystem’s controllers have the most impact on the overall availability of the
data that is virtualized.
However, IBM Storage Virtualize DRAID can also utilize multiple threads. This means that full
system performance can be achieved with a single DRAID array that joins, for example, all
FCM drives in FlashSystem 9200. There is no need to create multiple DRAID arrays, until the
arrays conform to best practices for arrays on your drive type, as listed in 3.2.2, “Array
considerations” on page 206.
262 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
The following actions are the starting best practices when planning storage pools for
performance:
Create a dedicated storage pool with dedicated resources if there is a specific
performance application request.
When using external storage in an Easy Tier enabled pool, do not intermix MDisks in the
same tier with different performance characteristics.
In an IBM FlashSystem clustered environment, create storage pools with IOgrp or Control
Enclosure affinity. You use only arrays or MDisks that are supplied by the internal storage
that is directly connected to one IOgrp SAS chain only. This configuration avoids
unnecessary IOgrp-to-IOgrp communication traversing the storage area network (SAN)
and consuming Fibre Channel (FC) bandwidth.
Note: In IBM FlashSystem setup with multiple IO groups, do not mix MDisks from
different control enclosures (I/O groups) in a single pool.
For Easy Tier enabled storage pools, always allow free capacity for Easy Tier to deliver
better performance.
Consider implementing child pools when you must have a logical division of your volumes
for each application set. Cases often exist where you want to subdivide a storage pool but
maintain many f MDisks in that pool. Child pools are logically like storage pools, but allow
you to specify one or more subdivided child pools. Thresholds and throttles can be set
independently per child pool.
Cache partitioning
System’s write cache is partitioned, and each pool is assigned to a single cache partition. The
system automatically defines a logical cache partition per storage pool. Child pools do not
count toward cache partitioning.
A cache partition is a logical threshold that stops a single partition from consuming the entire
cache resource. This partition is provided as a protection mechanism and does not affect
performance in normal operations. Only when a storage pool becomes overloaded does the
partitioning activate and essentially slow down write operations in the pool to the same speed
that the back-end can handle. Overloaded means that the front-end write throughput is
greater than the back-end storage can sustain. This situation should be avoided.
Table 4-2 shows the upper limit of write cache data that any one partition, or storage pool, can
occupy.
1 100%
2 66%
3 40%
4 30%
5 or more 25%
You can think of the rule as, no single partition occupies more than its upper limit of cache
capacity with write data.
In recent versions of IBM Spectrum Control, the fullness of the cache partition is reported and
can be monitored. You should not see partitions reaching 100% full. If you do, then it suggests
the corresponding storage pool is in an overload situation, and the workload should be moved
from that pool or extra storage capability should be added to that pool.
The terminology and its reporting in the GUI changed in recent versions, and they are listed in
Table 4-3.
Physical capacity Usable capacity The amount of capacity that is available for storing data
on a system, pool, array, or MDisk after formatting and
RAID techniques are applied.
Volume capacity Provisioned The total capacity of all volumes in the system.
capacity
N/A Written capacity The total capacity that is written to the volumes in the
system, which is shown as a percentage of the
provisioned capacity and reported before any data
reduction.
The usable capacity describes the amount of capacity that can be written to on the system
and includes any back-end data reduction (that is, the “virtual” capacity is reported to the
system).
264 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Note: With DRP, capacity reporting is more complex than with standard pools. For details,
refer to 4.4, “Data reduction pools best practices” on page 268.
For FCMs, the usable capacity is the maximum capacity that can be written to the system.
However, for the smaller capacity drives (4.8 TB), the reported usable capacity is 20 TiB. The
actual usable capacity might be lower because of the actual data reduction that is achieved
from the FCM compression.
Plan to achieve the default 2:1 compression, which is approximately an average of 10 TiB of
usable space. Careful monitoring of the actual data reduction should be considered if you
plan to provision to the maximum stated usable capacity when the small capacity FCMs are
used.
The larger FCMs, 9.6 TB and above, report just over 2:1 usable capacity. Therefore, 22, 44,
and 88 for the 9.6, 19.2, and 38.4 TB modules.
The provisioned capacity shows the total provisioned capacity in terms of the volume
allocations. This capacity is the “virtual” capacity that is allocated to fully allocated and
thin-provisioned volumes. Therefore, it is in theory that the capacity can be written if all
volumes were filled 100% by the system.
The written capacity is the actual amount of data that is written into the provisioned capacity:
For fully allocated volumes, the written capacity is always 100% of the provisioned
capacity.
For thin-provisioned volumes (including data reduced volumes), the written capacity is the
actual amount of data that the host writes to the volumes.
The final set of capacity numbers relates to the data reduction, which is reported in two ways:
As the savings from DRP (compression and deduplication) that is provided at the DRP
level, as shown in Figure 4-6.
As the FCM compression.
Extent sizes can be 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, or 8192 MB. However, the
choice of sizes can be limited by a pool type and by hardware platform. For example, DRP
supports only extent sizes not smaller than 1024 MB on any platform, and FlashSystem 9500
allows only to choose between 4096 and 8192 MB. For the exact capacity limits, refer to
Configuration Limits and Restrictions web page for your hardware platform.
Extent size can’t be changed after a pool is created, it must remain constant through the
lifetime of the pool.
Limitation: Extent-based migrations from standard pools to DRPs are not supported
unless the volume is fully allocated.
266 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Availability considerations
The external storage virtualization feature provides many advantages through consolidation
of storage. You must understand the availability implications that storage component failures
can have on availability domains within a system.
IBM Storage Virtualize offers significant performance benefits through its ability to stripe
across back-end storage volumes. However, consider the effects that various configurations
have on availability:
When you select MDisks for a storage pool, performance is often the primary
consideration. However, in many cases, the availability of the configuration is traded for
little or no performance gain.
System takes the entire storage pool offline if a single MDisk in that storage pool goes
offline. Consider an example where you have 40 external arrays of 1 TB each for a total
capacity of 40 TB with all 40 arrays in the same storage pool. In this case, you place the
entire 40 TB of capacity at risk if one of the 40 arrays fails (which causes the storage pool
to go offline). If you then spread the 40 arrays out over some of the storage pools, the
effect of an array failure (an offline MDisk) affects less storage capacity, which limits the
failure domain.
To ensure optimum availability to well-designed storage pools, consider the following best
practices:
It is recommended that each storage pool contains only MDisks from a single storage
subsystem. An exception exists when you are working with Easy Tier hybrid pools. For
more information, see 4.7, “Easy Tier and tiered and balanced storage pools” on
page 294.
It is suggested that each storage pool contains only MDisks from a single storage tier
(SSD or flash, enterprise, or NL-SAS) unless you are working with Easy Tier hybrid pools.
For more information, see 4.7, “Easy Tier and tiered and balanced storage pools” on
page 294.
IBM Storage Virtualize does not provide any physical-level data redundancy for virtualized
external storage. The availability characteristics of the storage subsystems’ controllers have
the most impact on the overall availability of the data that is virtualized by
IBM Storage Virtualize.
Performance considerations
Using the following external virtualization capabilities, you can boost the performance of the
back-end storage systems:
Using wide-striping across multiple arrays
Adding more read/write cache capability
Wide-striping can add approximately 10% extra input/output processor (IOP) performance to
the back-end system by using these mechanisms.
Another factor is that virtualized-storage subsystems can be scaled up or scaled out. For
example, IBM System Storage DS8000 series is a scale-up architecture that delivers the best
performance per unit, and the IBM FlashSystem series can be scaled out with enough units to
deliver the same performance.
With a virtualized system, there is debate whether to scale out back-end systems or add them
as individual systems behind IBM Storage Virtualize. Either case is valid. However, adding
individual controllers is likely to allow IBM Storage Virtualize to generate more I/O based on
queuing and port-usage algorithms. It is recommended that you add each controller
enclosure (I/O group) of an IBM FlashSystem back-end as its own controller, that is, do not
cluster IBM FlashSystem when it acts as an external storage controller behind another
IBM Storage Virtualize product, such as SVC.
Adding each controller (I/O group) of an IBM FlashSystem back-end as its own controller
adds more management IP addresses and configurations. However, this action provides the
best scalability in terms of overall solution performance.
All storage subsystems possess an inherent failure rate. Therefore, the failure rate of a
storage pool becomes the failure rate of the storage subsystem times the number of units.
The back-end storage access is controlled through MDisks where the IBM Storage Virtualize
systems act like a host to the back-end controller. Just as you must consider volume queue
depths when accessing storage from a host, these systems must calculate queue depths to
maintain high throughput capability while ensuring the lowest possible latency.
For more information about the queue depth algorithm and the rules about how many MDisks
to present for an external pool, see “Volume considerations” on page 229, which describes
how many volumes to create on the back-end controller (that are seen as MDisks by the
virtualizing controller) based on the type and number of drives or flash modules.
268 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Important: If you plan to use DRP with deduplication and compression that is enabled with
FCM storage, assume zero extra compression from the FCMs, that is, use the reported
physical or usable capacity from the RAID array as the usable capacity in the pool and
ignore the maximum effective capacity.
The reason for assuming zero extra compression from the FCMs is because the DRP
function is sending compressed data to the FCMs, which cannot be further compressed.
Therefore, the data reduction (effective) capacity savings are reported at the front-end pool
level and the back-end pool capacity is almost 1:1 for the physical capacity.
Some small amount of other compression savings might be seen because of the
compression of the DRP metadata on the FCMs.
The main point to consider is whether the data is deduplicable. Tools are available to provide
estimation of the deduplication ratio. For more information, see “Determining whether your
data is a deduplication candidate” on page 252.
With the industry standard NVMe drives, which do not perform inline compression, similar
considerations apply:
Data is deduplicable. In this case, the recommendation is to use a compressed and
deduplicated volume type. The DRP compression technology has more than enough
compression bandwidth for these purposes, so compression should always be done.
Data is not deduplicable. In this case, the recommendation is to use only a compressed
volume type. The internal compression technology provides enough compression
bandwidth.
Note: In general, avoid creating DRP volumes that are only thin-provisioned and
deduplicated. When using DRP volumes, they should be either fully allocated, or
deduplicated and compressed.
Various configuration items affect the performance of compression on the system. To attain
high compression ratios and performance on your system, ensure that the following
guidelines are met:
For performance-optimized configurations, aim to use FCM compression as the only
capacity efficiency method.
Never use DRP on both front-end (virtualizing) and back-end (virtualized) systems
concurrently (DRP over DRP). It is recommended to use DRP at the virtualizer level rather
than the back-end storage because this approach simplifies capacity management and
reporting.
For storage behind the virtualizer, best practice is to provision fully allocated volumes. By
running in this configuration, you ensure the following items:
The virtualizer understands the real physical capacity that is available and can warn you
about and avoid out-of-space situations (where access is lost due to no space).
Capacity monitoring can be performed on the virtualizer level because it sees the true
physical and effective capacity usage.
The virtualizer performs efficient data reduction on previously unreduced data. Generally,
the virtualizer has offload hardware and more CPU resources than the back-end storage
because it does not need to deal with RAID and other such considerations.
If you cannot avoid back-end data reduction (for example, the back-end storage controller
cannot disable its data reduction features), ensure that you follow these best practices:
Do not excessively overprovision the physical capacity on the back-end:
– For example, assume that you have 100 TiB of real capacity. Start by presenting only
100 TiB of volumes to IBM Storage Virtualize. Monitor the actual data reduction on the
back-end controller. If your data is reducing well over time, increase the capacity that is
provisioned to the virtualizer.
270 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
– This approach ensures that you can monitor and validate your data reduction rates and
avoids a panic if you do not achieve the expected rates and presented too much
capacity to the system.
Ensure that capacity usage is carefully and constantly monitored on both levels. Set up
alerting and have an emergency plan for the situation when back-end device is close to
being out of physical space.
Important: Never run DRP on top of DRP. This approach is wasteful and causes
performance problems without extra capacity savings.
DRP restrictions
Consider the following important restrictions when planning for a DRP implementation:
The maximum number of supported DRPs in the system is four.
VVOLs are not currently supported in DRP.
Volume shrinking is not supported in DRP with thin or compressed volumes.
Non-Disruptive Volume Movement (NDVM) is not supported by DRP volumes.
The volume copy split of a volume mirror in a different I/O group is not supported for DRP
thin-provisioned or compressed volumes.
Image and sequential mode virtual disks (VDisks) are not supported in DRP.
Extent level migration is not allowed between DRPs (or between a DRP and a standard
pool) unless volumes are fully allocated.
Volume migration for any volume type is permitted between a quotaless child and its
parent DRP pool.
There is a limit of maximum of 128 K extents per Customer Data Volume per I/O group:
– Therefore, the pool extent size dictates the maximum physical capacity in a pool after
data reduction.
– Use a 4 GB extent size or greater.
The recommended pool size is at least 20 TB.
Use less than 1 PB per I/O group.
Your pool should be no more than 85% occupied.
At the time of writing, the maximum number of extents that are supported for a data volume is
128 K. As shown in Figure 4-1 on page 248, one data volume is available per pool.
Table 4-4 lists the maximum size per pool, by extent size and I/O group number.
Table 4-4 Pool size by extent size and I/O group number
Extent size Max size with one Max size with two Max size with Max size with
I/O group I/O groups three I/O group four I/O group
Considering that the extent size cannot be changed after the pool is created, it is
recommended that you carefully plan the extent size according to the environment capacity
requirements. For most of the configurations, an extent size of 4 GB is recommended for DRP.
Table 4-5 Minimum recommended pool size by extent size and I/O group number
Extent size Min size with one Min size with two Min size with Min size with four
I/O group I/O group three I/O group I/O group
The values that are reported in Table 4-5 on page 272 represent the minimum required
capacity for a DRP to create a single volume.
This garbage-collection process requires a certain amount of free space to work efficiently.
For this reason, it is recommended to keep at least 15% free space in a DRP pool.
272 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Host-based migration
Host-based migration uses operating system features or software tools that run on the hosts
to concurrently move data to the normal host operations. VMware vMotion and AIX Logical
Volume Mirroring (LVM) are two examples of these features. When you use this approach, a
specific amount of capacity on the target pool is required to provide the migration target
volumes.
With volume mirroring, the throughput of the migration activity can be adjusted at a volume
level by specifying the Mirror Sync Rate parameter. Therefore, if performance is affected, the
migration speed can be lowered or even suspended.
Note: Volume mirroring supports only two copies of a volume. If a configuration uses both
copies, one of the copies must be removed first before you start the migration.
The volume copy split of a volume mirror in a different I/O group is not supported for a
thin-provisioned or compressed volume in a DRP.
Data reduction pools use Log Structured Array (LSA) mechanism to store data of
thin-provisioned and compressed volumes. When host overwrites a block of the data on its
volume, LSA will not write new data to the same place as the old block. Instead, it will write
data to the new area, and mark an old block as garbage. Same process occurs when a sends
unmap (deallocate) commands after it has deleted a part of his data: those blocks are also
marked as a garbage.
When a you delete a thin-provisioned or compressed volume in DRP, its capacity does not
immediately return as free. In DRP, volume deletion is asynchronous process, which includes
the following phases:
– The grain must be inspected to determine whether this volume was the last one that
referenced this grain (deduplicated):
274 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
All the blocks marked as garbage data are accounted as a pool’s Reclaimable Capacity.
Reduced The data that is written to the DRP, in compressed and de-duplicated form.
customer data
Fully allocated The amount of capacity that is allocated to fully allocated volumes (assumed to
data be 100% written).
Free The amount of free space that is not in use by any volume.
Reclaimable data The amount of data in the pool marked as a garbage, but not yet accounted as
free. It is either old (overwritten) data, or data that is unmapped by host, or
associated with recently deleted volumes. It will become free after it is collected
by a Garbage Collector process.
After a reasonable usage period, the DRP can show up to only 15 - 20% of overall free space,
and a significant part of the capacity as reclaimable. The garbage collection algorithm must
balance the need to free space with the overhead of performing garbage collection.
Therefore, the incoming write/overwrite rates and any unmap operations dictate how much
“reclaimable space” is present at any time.
Balancing how much garbage collection is done versus how much free space is available
dictates how much reclaimable space is present at any time. The system dynamically adjusts
the target rate of garbage collection to maintain a suitable amount of free space.
Data reduction pool attempts to maintain a sensible amount of free space. If there is little free
space and you delete many volumes, the garbage-collection code might trigger a large
amount of back-end data movement and might result in performance issues.
Important: Use caution when using up all or most of the free space with fully allocated
volumes. Garbage collection requires free space to coalesce data blocks into whole
extents and free capacity. If little free space is available, the garbage collector must work
harder to free space.
It might be worth creating some “airbag” fully allocated volumes in a DRP. This type of
volume reserves some space that you can quickly return to the free space resources if you
reach a point where you are almost out of space, or when garbage collection is struggling
to free capacity efficiently.
Consider these points:
– This type of volume should not be mapped to hosts.
– This type of volume should be labeled, for example,
“RESERVED_CAPACITY_DO_NOT_USE”.
276 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Note: In DRP, volume’s used capacity and tier used capacity are not reported per volume.
These items are reported only at the parent pool level because of the complexities of
deduplication capacity reporting.
Important: When adding external MDisks, the system does not know to which tier the
MDisk belongs. Ensure that you specify or change the tier type to match the tier type of the
MDisk.
This specification is vital to ensure that Easy Tier keeps a pool as a single tier pool and
balances across all MDisks, or Easy Tier adds the MDisk to the correct tier in a multitier
pool.
Failure to set the correct tier type creates a performance problem that might be difficult to
diagnose in the future.
Adding MDisks to storage pools is a simple task, but it is suggested that you perform some
checks in advance, especially when adding external MDisks.
In IBM Storage Virtualize, there is a feature that tests an MDisk automatically for reliable
read/write access before it is added to a storage pool. Therefore, user action is not required.
The test fails under the following conditions:
One or more nodes cannot access the MDisk through the chosen controller port.
I/O to the disk does not complete within a reasonable time.
The SCSI inquiry data that is provided for the disk is incorrect or incomplete.
The IBM Storage Virtualize cluster suffers a software error during the MDisk test.
Image-mode MDisks are not tested before they are added to a storage pool because an
offline image-mode MDisk does not take the storage pool offline. Therefore, the suggestion
here is to use a dedicated storage pool for each image mode MDisk. This best practice makes
it easier to discover what the MDisk is going to be virtualized as, and reduces the chance of
human error.
Persistent reserve
A common condition where external MDisks can be configured by IBM FlashSystem and SVC
but cannot perform read/write is when a persistent reserve is left on a LUN from a previously
attached host.
In this condition, rezone the back-end storage and map it back to the host that is holding the
reserve. Alternatively, map the back-end storage to another host that can remove the reserve
by using a utility such as the Microsoft Windows SDD Persistent Reserve Tool.
When multiple tiers of storage are on the same system, you might also want to indicate the
storage tier in the name. For example, you can use R5 and R10 to differentiate RAID levels, or
you can use T1, T2, and so on, to indicate the defined tiers.
Best practice: For MDisks, use a naming convention that associates the MDisk with its
corresponding controller and array within the controller, such as DS8K_<extent pool name
or id>_<volume id>.
Note: The removal of MDisks occurs only if sufficient space is available to migrate the
volume data to other extents on other MDisks that remain in the storage pool. After you
remove the MDisk from the storage pool, it takes time to change the mode from managed to
unmanaged, depending on the size of the MDisk that you are removing.
When you remove the MDisk made of internal disk drives from the storage pool, the MDisk is
deleted. This process also deletes the array on which this MDisk was built, and converts all
drives that were included in this array to a candidate state. You can now use those disk drives
to create another array of a different size and RAID type, or you can use them as hot spares.
When external MDisk is deleted from a pool, it turns to unmanaged state. After that you can
use back-end controller management interface to remove the MDisk.
278 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
DRP restriction: The lsmdiskextent command does not provide accurate extent usage
for thin-provisioned or compressed volumes on DRPs.
Specify the -force flag on the rmmdisk command, or select the corresponding option in the
GUI. Both actions cause the system to automatically move all used extents on the MDisk to
the remaining MDisks in the storage pool. Even if you use -force flag, system will still perform
appropriate checks to make sure that you can safely remove the MDisks from a pool.
Alternatively, you might want to manually perform the extent migrations. Otherwise, the
automatic migration randomly allocates extents to MDisks (and areas of MDisks). After all the
extents are manually migrated, the MDisk removal can proceed without the -force flag.
If the MDisk was named by using the best practices, the correct LUNs are easier to identify.
However, ensure that the identification of LUNs that are being unmapped from the controller
match the associated MDisk on IBM FlashSystem and SVC by using the Controller LUN
Number field and the unique identifier (UID) field.
The UID is unique across all MDisks on all controllers. However, the controller LUN is unique
only within a specified controller and for a certain host. Therefore, when you use the controller
LUN, check that you are managing the correct storage controller and that you are looking at
the mappings for the correct IBM Storage Virtualize host object.
Tip: Renaming your back-end storage controllers also helps you with MDisk identification.
DS8000 LUN
The LUN ID uniquely identifies LUNs only within the same storage controller. If multiple
storage devices are attached to the same IBM Storage Virtualize cluster, the LUN ID must be
combined with the worldwide node name (WWNN) attribute to uniquely identify LUNs within
the IBM Storage Virtualize.
To get the WWNN of the DS8000 controller, take the first 16 digits of the MDisk UID and
change the first digit from 6 to 5, such as 6005076305ffc74c to 5005076305ffc74c. When
detected as IBM FlashSystem and SVC ctrl_LUN_#, the DS8000 LUN is decoded as
40XX40YY00000000, where XX is the logical subsystem (LSS) and YY is the LUN within the LSS.
As detected by the DS8000, the LUN ID is the four digits starting from the 29th digit, as shown
in Example 4-4.
In Example 4-4, you can identify the MDisk that is supplied by the DS8000, which is LUN ID
1007.
280 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
2. To identify your LUN, in the volumes by Hosts view, expand your IBM FlashSystem and
SVC host group and then review the LUN column.
3. The MDisk UID field consists of part of the controller WWNN from bits 2 - 13. You might
check those bits by using the lscontroller command.
4. The correlation can now be performed by taking the first 16 bits from the MDisk UID field:
– Bits 1 - 13 refer to the controller WWNN.
– Bits 14 - 16 are the XIV volume serial number (897) in hexadecimal format (resulting in
381 hex).
– The conversion is
0017380002860381000000000000000000000000000000000000000000000000.
Where:
• The controller WWNN (bits 2 - 13) is 0017380002860.
• The XIV volume serial number that is converted to hex is 381.
5. To correlate the IBM Storage Virtualize ctrl_LUN_#:
a. Convert the XIV volume number to hexadecimal format.
b. Check the last 3 bits from the IBM Storage Virtualize ctrl_LUN_#.
In this example, the number is 0000000000000002, as shown in Figure 4-8.
3. At the virtualizer, review the MDisk details and compare the MDisk UID field with the IBM
FlashSystem Volume UID, as shown in Figure 4-10. The first 32 bits should be the same.
282 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Figure 4-10 SAN Volume Controller MDisk details for IBM FlashSystem volumes
4. Double-check that the virtualizer ctrl_LUN_# is the IBM FlashSystem SCSI ID number in
hexadecimal format. In this example, the number is 0000000000000005.
If the LUN is mapped back with different attributes, IBM Storage Virtualize recognizes this
MDisk as a new MDisk. In this case, the associated storage pool does not come back online.
Consider this situation for storage controllers that support LUN selection because selecting a
different LUN ID changes the UID. If the LUN was mapped back with a different LUN ID, it
must be mapped again by using the previous LUN ID. Previous ID can be found in Storage
Virtualize configuration backup files (for information on how to create them refer to
Implementation Guide for IBM Storage FlashSystem and IBM SAN Volume Controller:
Updated for IBM Storage Virtualize Version 8.6, SG24-8542) or in the back-end system logs.
The first MDisk to allocate an extent from is chosen in a pseudo-random way rather than
always starting from the same MDisk. The pseudo-random algorithm avoids the situation
where the “striping effect” inherent in a round-robin algorithm places the first extent for many
volumes on the same MDisk.
Placing the first extent of a number of volumes on the same MDisk might lead to poor
performance for workloads that place a large I/O load on the first extent of each volume or
that create multiple sequential streams.
However, this allocation pattern is unlikely to remain for long because Easy Tier balancing
moves the extents to balance the load evenly across all MDisk in the tier. The hot and cold
extents also are moved between tiers.
In a multitier pool, the middle tier is used by default for new volume creation. If free space is
not available in the middle tier, the cold tier is used if it exists. If the cold tier does not exist, the
hot tier is used. For more information about Easy Tier, see 4.7, “Easy Tier and tiered and
balanced storage pools” on page 294.
DRP restriction: With compressed and deduplicated volumes on DRP, the extent
distribution cannot be checked across the MDisks. Initially, only a minimal number of
extents are allocated to the volume, based on the rsize parameter.
Note: Hot Spare Nodes (HSNs) also need encryption licenses if they are to be used to
replace the failed nodes that support encryption.
284 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
It is not possible to convert the existing data to an encrypted copy in-place. You can use the
volume migration function to migrate the data to an encrypted storage pool or encrypted child
pool. Alternatively, you can also use the volume mirroring function to add a copy to an
encrypted storage pool or encrypted child pool and delete the unencrypted copy after the
migration.
Before you activate and enable encryption, you must determine the method of accessing key
information during times when the system requires an encryption key to be present. The
system requires an encryption key to be present during the following operations:
System power-on
System restart
User-initiated rekey operations
System recovery
For more information about configuration details about IBM Storage Virtualize encryption, see
Implementation Guide for IBM Storage FlashSystem and IBM SAN Volume Controller:
Updated for IBM Storage Virtualize Version 8.6, SG24-8542.
Both methods protect against the potential exposure of sensitive user data that is stored on
discarded, lost, or stolen media. Both methods can facilitate the warranty return or disposal of
hardware. The method that is used for encryption is chosen automatically by the system
based on the placement of the data.
Figure 4-11 shows encryption placement in the lower layers of the IBM Storage Virtualize
software stack.
Figure 4-11 Encryption placement in lower layers of the IBM FlashSystem software stack
Hardware encryption
Hardware encryption has the following characteristics:
The algorithm is a built-in SAS chip for all SAS-attached drives, or built in to the drive itself
for NVMe-attached drives (FCM, industry-standard NVMe, and SCM).
No system overhead.
Only available to direct-attached SAS or NVMe disks.
Can be enabled only when you create internal arrays.
Child pools cannot be encrypted if the parent storage pool is not encrypted.
Child pools are automatically encrypted if the parent storage pool is encrypted, but can
have different encryption keys.
DRP child pools can use only the same encryption key as their parent.
286 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
However, if you want to create encrypted child pools from an unencrypted storage pool
containing a mix of internal arrays and external MDisks. the following restrictions apply:
The parent pool must not contain any unencrypted internal arrays.
All system nodes must have an activated encryption license.
Note: An encrypted child pool that is created from an unencrypted parent storage pool
reports as unencrypted if the parent pool contains unencrypted internal arrays. Remove
these arrays to ensure that the child pool is fully encrypted.
The general rule is to not mix different types of MDisks in a storage pool unless you intend to
use the Easy Tier tiering function. In this scenario, the internal arrays must be encrypted if
you want to create encrypted child pools from an unencrypted parent storage pool. All
methods of encryption use the same encryption algorithm, the same key management
infrastructure, and the same license.
Example 4-6 Command to declare or identify a self-encrypted MDisk from a virtualized external
storage
IBM_2145:ITSO_A:superuser>chmdisk -encrypt yes mdisk0
Two options are available for accessing key information on USB flash drives:
USB flash drives are left inserted in the system always.
If you want the system to restart automatically, a USB flash drive must be left inserted in
all the nodes on the system. When you power on, then all nodes have access to the
encryption key. This method requires that the physical environment where the system is
located is secure. If the location is secure, it prevents an unauthorized person from making
copies of the encryption keys, stealing the system, or accessing data that is stored on the
system.
USB flash drives are not left inserted into the system except as required.
For the most secure operation, do not keep the USB flash drives inserted into the nodes
on the system. However, this method requires that you manually insert the USB flash
drives that contain copies of the encryption key into the nodes during operations that the
system requires an encryption key to be present. USB flash drives that contain the keys
must be stored securely to prevent theft or loss.
288 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Key servers
Key servers have the following characteristics:
Physical access to the system is not required to process a rekeying operation.
Support for businesses that have security requirements to not use USB ports.
Strong key generation.
Key self-replication and automatic backups.
Implementations follow an open standard that aids in interoperability.
Audit details.
Ability to administer access to data separately from storage devices.
Encryption key servers create and manage encryption keys that are used by the system. In
environments with many systems, key servers distribute keys remotely without requiring
physical access to the systems. A key server is a centralized system that generates, stores,
and sends encryption keys to the system. If the key server provider supports replication of
keys among multiple key servers, you can specify up to four key servers (one master and
three clones) that connect to the system over both a public network or a separate private
network.
You can adjust the validity period to comply with specific security policies and always match
the certificate validity period on IBM FlashSystem, SVC, and IBM Security Guardian Key
Lifecycle Manager servers. A mismatch causes a certificate authorization error and leads to
unnecessary certificate exchange.
Figure 4-13 shows the default certificate type and validity period on IBM Storage Virtualize.
Figure 4-14 shows the default certificate type and validity period on an IBM Security Key
Lifecycle Manager server.
Figure 4-14 Creating a self-signed certificate: IBM Security Key Lifecycle Manager server
290 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
By default, in IBM Storage Virtualize, the IBM Spectrum_VIRT group name is predefined in the
encryption configuration wizard. IBM Spectrum_VIRT contains all the keys for the managed
IBM FlashSystem and SVC. However, it is possible to use different device groups if they are
GPFS device-based, for example, one device group for each environment (production or
disaster recovery (DR)). Each device group maintains its own key database, and this
approach allows more granular key management.
Also, the rekey process creates a configuration on the IBM Security Guardian Key Lifecycle
Manager server, and it is important not to wait for the next replication window but to manually
synchronize the configuration to the extra key servers (clones). Otherwise, an error message
is generated by the IBM Storage Virtualize system, which indicates that the key is missing on
the clones.
Figure 4-17 on page 293 shows the keys that are associated with a device group. In this
example, the SG247933_BOOK device group contains one encryption-enabled system, and it
has three associated keys. Only one of the keys is activated, and the other two were
deactivated after the rekey process.
292 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
SG247933_BOOK
uuid = KEY-8a89d57-15bf8f41-cea6-4df3-8f4e-be0c36318615
alias = mmm008a89d57000000870
key algorithm = AES
key store name = defaultKeyStore
key state = ACTIVE
creation date = 18/11/2017, 01:43:27 Greenwich Mean Time
expiration date = null
wsadmin>print AdminTask.tklmKeyList('[-uuid
KEY-8a89d57-74edaef9-b6d9-4766-9b39-7e21d9911011]')
CTGKM0001I Command succeeded.
uuid = KEY-8a89d57-74edaef9-b6d9-4766-9b39-7e21d9911011
alias = mmm008a89d5700000086e
key algorithm = AES
key store name = defaultKeyStore
key state = DEACTIVATED
creation date = 17/11/2017, 20:07:19 Greenwich Mean Time
expiration date = 17/11/2017, 23:18:37 Greenwich Mean Time
wsadmin>print AdminTask.tklmKeyList('[-uuid
KEY-8a89d57-ebe5d5a1-8987-4aff-ab58-5f808a078269]')
uuid = KEY-8a89d57-ebe5d5a1-8987-4aff-ab58-5f808a078269
alias = mmm008a89d5700000086f
key algorithm = AES
key store name = defaultKeyStore
key state = DEACTIVATED
creation date = 17/11/2017, 23:18:34 Greenwich Mean Time
expiration date = 18/11/2017, 01:43:32 Greenwich Mean Time
Note: The initial configuration, such as certificate exchange and TLS configuration, is
required only on the master IBM Security Key Lifecycle Manager server. The restore or
replication process duplicates all the required configurations to the clone servers.
If encryption was enabled on a pre-7.8.0 code level system and the system is updated to later
releases, you must run a USB rekey operation to enable key server encryption. Run the
chencryption command before you enable key server encryption. To perform a rekey
operation, run the commands that are shown in Example 4-9.
Example 4-9 Commands to enable the key server encryption option on a system upgraded from
pre-7.8.0
chencyrption -usb newkey -key prepare
chencryption -usb newkey -key commit
By implementing an evolving artificial intelligence (AI) like algorithm, Easy Tier moved the
most frequently accessed blocks of data to the lowest latency device. Therefore, it provides
an exponential improvement in performance when compared to a small investment in SSD
and flash capacity.
The industry moved on in the more than 10 years since Easy Tier was first introduced. The
current cost of SSD and flash-based technology meant that more users can deploy all-flash
environments.
HDD-based large capacity NL-SAS drives are still the most cost-effective online storage
devices. Although SSD and flash ended the 15K RPM drive market, it has yet to reach a price
point that competes with NL-SAS for lower performing workloads. The use cases for Easy
Tier changed, and most deployments now use either all-flash or “flash and trash” approaches,
with 10% or more flash capacity and the remainder using NL-SAS.
Easy Tier also provides balancing within a tier. This configuration ensures that no one single
component within a tier of the same capabilities is more heavily loaded than another one. It
does so to maintain an even latency across the tier and help to provide consistent and
predictable performance.
As the industry strives to develop technologies that can enable higher throughput and lower
latency than even flash, Easy Tier continues to provide user benefits. For example, SCM
294 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
technologies, which were introduced to IBM FlashSystem in 2020, now provide lower latency
than even flash, but as with flash when it was first introduced, at a considerably higher cost of
acquisition per gigabyte.
Choosing the correct mix of drives and data placement is critical to achieve optimal
performance at the lowest cost. Maximum value can be derived by placing “hot” data with high
I/O density and low response time requirements on the highest tier, while targeting lower tiers
for “cooler” data, which is accessed more sequentially and at lower rates.
Easy Tier dynamically automates the ongoing placement of data among different storage
tiers. It can be enabled for internal and external storage to achieve optimal performance.
The Easy Tier feature that is called storage pool balancing automatically moves extents
within the same storage tier from overloaded to less loaded MDisks. Storage pool balancing
ensures that your data is optimally placed among all disks within storage pools.
Storage pool balancing is designed to balance extents between tiers in the same pool to
improve overall system performance and avoid overloading a single MDisk in the pool.
However, Easy Tier considers only performance, and it does not consider capacity. Therefore,
if two FCM arrays are in a pool and one of them is nearly out of space and the other is empty,
Easy Tier does not attempt to move extents between the arrays.
For this reason, it is recommended that if you must increase the capacity on an MDisk,
increase the size of the array rather than add an FCM array.
Easy Tier reduces the I/O latency for hot spots, but it does not replace storage cache. Both
Easy Tier and storage cache solve a similar access latency workload problem. However,
these two methods weigh differently in the algorithmic construction that is based on locality of
reference, recency, and frequency. Easy Tier monitors I/O performance from the device end
(after cache), it can pick up the performance issues that cache cannot solve, and
complements the overall storage system performance.
The primary benefit of Easy Tier is to reduce latency for hot spots, but this feature includes an
added benefit where the remaining “medium” (that is, not cold) data has less contention for its
resources and performs better as a result (that is, lower latency).
Easy Tier can be used in a single tier pool to balance the workload across storage MDisks. It
ensures an even load on all MDisks in a tier or pool. Therefore, bottlenecks and convoying
effects are removed when striped volumes are used. In a multitier pool, each tier is balanced.
In general, the storage environment’s I/O is monitored at a volume level, and the entire
volume is always placed inside one suitable storage tier. Determining the amount of I/O,
moving part of the underlying volume to an appropriate storage tier, and reacting to workload
changes is too complex for manual operation. It is in this situation that the Easy Tier feature
can be used.
Easy Tier is a performance optimization function that automatically migrates extents that
belong to a volume between different storage tiers (see Figure 4-18) or the same storage tier.
Because this migration works at the extent level, it is often referred to as sublogical unit
number (LUN) migration. Movement of the extents is dynamic, nondisruptive, and not visible
from the host perspective. As a result of extent movement, the volume no longer has all its
data in one tier; rather, it is in two or three tiers, or balanced between MDisks in the same tier.
296 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
You can enable Easy Tier on a per volume basis, except for non-fully allocated volumes in a
DRP where Easy Tier is always enabled. It monitors the I/O activity and latency of the extents
on all Easy Tier enabled volumes.
Based on the performance characteristics, Easy Tier creates an extent migration plan and
dynamically moves (promotes) high activity or hot extents to a higher disk tier within the same
storage pool. Generally, a new migration plan is generated on a stable system once every 24
hours. Instances might occur when Easy Tier reacts within 5 minutes, for example, when
detecting an overload situation.
It also moves (demotes) extents whose activity dropped off, or cooled, from higher disk tier
MDisks back to a lower tier MDisk. When Easy Tier runs in a storage pool rebalance mode, it
moves extents from busy MDisks to less busy MDisks of the same type.
Note: Image mode and sequential volumes are not candidates for Easy Tier automatic
data placement because all extents for those types of volumes must be on one specific
MDisk, and they cannot be moved.
With these three tier classifications, an Easy Tier pool can be optimized.
Internal processing
The Easy Tier function includes the following four main processes:
I/O monitoring
This process operates continuously and monitors volumes for host I/O activity. It collects
performance statistics for each extent, and derives averages for a rolling 24-hour period of
I/O activity.
Easy Tier makes allowances for large block I/Os; therefore, it considers only I/Os of up
to 64 kilobytes (KiB) as migration candidates.
This process is efficient and consumes negligible processing resources of the system’s
nodes.
Data Placement Advisor (DPA)
The DPA uses workload statistics to make a cost-benefit decision about which extents will
be candidates for migration to a higher performance tier.
This process identifies extents that can be migrated back to a lower tier.
Note: You can increase the target migration rate to 48 GiB every 5 minutes by temporarily
enabling accelerated mode. For more information, see “Easy Tier acceleration” on
page 315.
When active, Easy Tier performs the following actions across the tiers:
Promote
Moves the hotter extents to a higher performance tier with available capacity. Promote
occurs within adjacent tiers.
Demote
Demotes colder extents from a higher tier to a lower tier. Demote occurs within adjacent
tiers.
Swap
Exchanges a cold extent in an upper tier with a hot extent in a lower tier.
Warm demote
Prevents performance overload of a tier by demoting a warm extent to a lower tier. This
process is triggered when the bandwidth or IOPS exceeds a predefined threshold. If you
see these operations, it is a trigger to suggest that you should add more capacity to the
higher tier.
Warm promote
This feature addresses the situation where a lower tier suddenly becomes active. Instead
of waiting for the next migration plan, Easy Tier can react immediately. Warm promote acts
in a similar way to warm demote. If the 5-minute average performance shows that a layer
is overloaded, Easy Tier immediately starts to promote extents until the condition is
relieved. This action is often referred to as “overload protection”.
Cold demote
Demotes inactive (or cold) extents that are on a higher performance tier to its adjacent
lower-cost tier. Therefore, Easy Tier automatically frees extents on the higher storage tier
before the extents on the lower tier become hot. Only supported between HDD tiers.
Expanded cold demote
Demotes appropriate sequential workloads to the lowest tier to better use nearline disk
bandwidth.
298 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Auto rebalance
Redistributes extents within a tier to balance usage across MDisks for maximum
performance. This process moves hot extents from high-use MDisks to low-use MDisks,
and exchanges extents between high-use MDisks and low-use MDisks.
Space reservation demote
Introduced in version 8.4.0 to prevent out-of-space conditions on thin-provisioned
(compressed) back-ends, EasyTier stops the migration of new data into a tier, and if
necessary, migrates extents to a lower tier.
Easy Tier attempts to migrate the most active volume extents up to an SSD first.
If a new migration plan is generated before the completion of the previous plan, the previous
migration plan and queued extents that are not yet relocated are abandoned. However,
migrations that are still applicable are included in the new plan.
Note: Extent migration occurs only between adjacent tiers. For example, in a three-tiered
storage pool, Easy Tier does not move extents from the flash tier directly to the nearline tier
and vice versa without moving them first to the enterprise tier.
The Easy Tier extent migration types are shown in Figure 4-19.
Easy Tier is a licensed feature on some IBM Storage Virtualize hardware platforms. If the
license is not present and Easy Tier is set to Auto or On, the system runs in Measure mode.
Options: The Easy Tier function can be turned on or off at the storage pool level and at the
volume level, except for non fully allocated volumes in a DRP where Easy Tier is always
enabled.
Measure mode
Easy Tier can be run in an evaluation or measurement-only mode, and it collects usage
statistics for each extent in a storage pool where the Easy Tier value is set to Measure.
This collection is typically done for a single-tier pool so that the benefits of adding more
performance tiers to the pool can be evaluated before any major hardware acquisition.
The heat and activity of each extent can be viewed in the GUI by selecting Monitoring →
Easy Tier Reports.
Automatic mode
In Automatic mode, the storage pool parameter -easytier auto must be set, and the volumes
in the pool must have -easytier set to on.
The behavior of Easy Tier depends on the pool configuration. Consider the following points:
If the pool contains only MDisks with a single tier type, the pool is in balancing mode.
If the pool contains MDisks with more than one tier type, the pool runs automatic data
placement and migration in addition to balancing within each tier.
Dynamic data movement is transparent to the host server and application users of the data,
other than providing improved performance. Extents are automatically migrated, as explained
in “Implementation rules” on page 310.
There might be situations where the Easy Tier setting is “auto” but the system is running in
monitoring mode only, for example, with unsupported tier types or if you have not enabled the
Easy Tier license. For more information, see Table 4-9 on page 306.
The GUI provides the same reports as available in measuring mode, and in addition provide
the data movement report that shows the breakdown of the actual migration events that are
triggered by Easy Tier. These migrations are reported in terms of the migration types, as
described in “Internal processing” on page 297.
For example, when Easy Tier detects an unsupported set of tier types in a pool, as outlined in
Table 4-9 on page 306, using the “on” mode forces Easy Tier to the active state, and it
performs to the best of its ability. The system raises an alert. There is an associated Directed
Maintenance Procedure that guides you to fix the unsupported tier types.
300 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Important: Avoid creating a pool with more than three tiers. Although the system attempts
to create generic hot, medium, and cold “buckets”, you might end up with Easy Tier
running in measure mode only.
These configurations are unsupported because they can cause performance problems in
the long term, for example, disparate performance within a single tier.
The ability to override Automatic mode is provided to enable temporary migration from an
older set of tiers to new tiers, which must be rectified as soon as possible.
Balancing is when you maintain equivalent latency across all MDisks in a tier, which can
result in different capacity usage across the MDisks. However, performance balancing is
preferred over capacity-balancing in most cases.
The process automatically balances existing data when new MDisks are added into an
existing pool, even if the pool contains only a single type of drive.
Balancing is automatically active on all storage pools no matter what the Easy Tier setting is.
For a single tier pool, the Easy Tier state reports as Balancing.
Note: Storage pool balancing can be used to balance extents when mixing different-sized
disks of the same performance tier. For example, when adding larger capacity drives to a
pool with smaller capacity drives of the same class, storage pool balancing redistributes
the extents to leverage the extra performance of the new MDisks.
On Inactiveb
On Measuredc
Storage pool Easy Number of tiers in the Volume copy Easy Volume copy Easy
Tier setting storage pool Tier settinga Tier status
On Balancedd
On Activee f
On Balancedd
On Activee
On Activef
a. If the volume copy is in image or sequential mode or being migrated, the volume copy Easy
Tier status is Measured rather than Active.
b. When the volume copy status is Inactive, no Easy Tier functions are enabled for that volume
copy.
c. When the volume copy status is Measured, the Easy Tier function collects usage statistics for
the volume, but automatic data placement is not active.
d. When the volume copy status is Balanced, the Easy Tier function enables performance-based
pool balancing for that volume copy.
e. When the volume copy status is Active, the Easy Tier function operates in automatic data
placement mode for that volume.
f. When five-tier (or some four-tier) configurations are used and Easy Tier is in the On state, Easy
Tier is forced to operate but might not behave exactly as expected. For more information, s
Table 4-9 on page 306.
Note: The default Easy Tier setting for a storage pool is Auto, and the default Easy Tier
setting for a volume copy is On. Therefore, Easy Tier functions, except for pool
performance balancing, are disabled for storage pools with a single tier. Automatic data
placement mode is enabled by default for all striped volume copies in a storage pool with
two or more tiers.
The type of disk and RAID geometry that is used by internal or external MDisks define their
expected performance characteristics. These characteristics are used to help define a tier
type for each MDisk in the system.
302 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Five tier types can be assigned. The tables in this section use the numbers from this list as a
shorthand for the tier name:
tier_scm Represents SCM MDisks.
tier0_flash Represents enterprise flash technology, including FCM.
tier1_flash Represents lower performing Tier1 flash technology (lower drive writes
per day (DWPD)).
tier_enterprise Represents enterprise HDD technology (both 10 K and 15 K RPM).
tier_nearline Represents nearline HDD technology (7.2 K RPM).
Attention: As described in 4.7.5, “Changing the tier type of an MDisk” on page 307,
system does not automatically detect the type of external MDisks. Instead, all external
MDisks are initially put into the enterprise tier by default. The administrator must then
manually change the MDisks tier and add them to storage pools.
MDisks that are used in a single-tier storage pool should have the same hardware
characteristics. These characteristics include the same RAID type, RAID array size, disk type,
disk RPM, and controller performance characteristics.
For external MDisks, attempt to create all MDisks with the same RAID geometry (number of
disks). If this approach is not possible, you can modify the Easy Tier load setting to manually
balance the workload, but you must be careful. For more information, see “MDisk Easy Tier
load” on page 315.
For internal MDisks, the system can cope with different geometries because the number of
drives is reported to Easy Tier, which then uses the Overload Protection information to
balance the workload, as described in 4.7.6, “Easy Tier overload protection” on page 307.
Figure 4-21 on page 305 shows a scenario in which a storage pool is populated with three
different MDisk types:
One belonging to an SSD array
One belonging to an SAS HDD array
One belonging to an NL-SAS HDD array)
Although Figure 4-21 on page 305 shows RAID 5 arrays other RAID types can be used.
304 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Note: If you add MDisks to a pool and they have (or you assign) more than three tier types,
Easy Tier tries to group two or more of the tier types into a single “bucket” and use them
both as either the “middle” or “cold” tier. The groupings are described in Table 4-9 on
page 306.
However, overload protection and pool balancing might result in a bias on the load being
placed on those MDisks despite them being in the same “bucket”.
Hot tier 1 2 1 1 1 or 2
2 or
3
Middle tier 2 or 3 or 3 4 2 3 4 3
3 or 4 or
4 or
Cold tier 5 5 4 or 5 5 3 or 4 or 5 4 or
4 or 5 5
5
For more information about the tier descriptions, see 4.7.4, “MDisk tier types” on page 302.
Cold tier 5
If you create a pool with all five tiers or one of the unsupported four-tier pools and Easy Tier is
set to “auto” mode, Easy Tier enters “measure” mode and measures the statistics but does
not move any extents. To return to a supported tier configuration, remove one or more
MDisks.
Important: Avoid creating a pool with more than three tiers. Although the system attempts
to create “buckets”, the result might be that Easy Tier runs in Measure mode only.
Attention: Before you force Easy Tier to run in this mode, you must have a full
understanding of the implications of doing so. Be very cautious about using this mode.
The on setting is provided to allow temporary migrations where you cannot avoid creating one
of these unsupported configurations. The implications are that long-term use in this mode can
cause performance issues due to the grouping of unlike MDisks within a single Easy Tier tier.
For these configurations, Easy Tier uses the mappings that are shown in Table 4-10.
Hot tier 1 1 1
306 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
Attention: When adding external MDisks to a pool, validate that the tier_type setting is
correct. Incorrect tier_type settings can cause performance problems, for example, if you
inadvertently create a multitier pool.
Internal MDisks are automatically created with the correct tier_type because the system is
aware of the drives that are used to create the RAID array and so can set the correct
tier_type automatically.
The tier_type can be set when adding an MDisk to a pool, or you can later change the tier of
an MDisk by using the CLI, as shown in Example 4-10.
Note: Changing the tier type is possible only for external MDisks, not for arrays.
It is also possible to change the external MDisk tier by using the GUI.
This change happens online and has no effect on hosts or the availability of the volumes.
Therefore, Easy Tier implements overload protection to ensure that it does not move too
much workload onto the hot tier. If this protection is triggered, no other extents are moved
onto that tier while the overload is detected. Extents can still be swapped, so if one extent
becomes colder and another hotter, they can be swapped.
To implement overload protection, Easy Tier must understand the capabilities of an MDisk.
For internal MDisks, this understanding is handled automatically because the system can
instruct Easy Tier about the type of drive and RAID geometry (for example, 8+P+Q);
therefore, the system can calculate the expected performance ceiling for any internal MDisk.
With external MDisks, the only measure or details Easy Tier knows are the storage controller
type. We know whether the controller is an enterprise, midrange, or entry level system and
can make some assumptions about the load that it can handle.
However, external MDisks cannot automatically have their MDisk tier type or “Easy Tier Load”
defined. Set the tier type manually and (if needed), modify the load setting. For more
information about Easy Tier loads, see “MDisk Easy Tier load” on page 315.
Overload protection is also used by the “warm promote” function. If Easy Tier detects a
sudden change on a cold tier in which a workload is causing overloading of the cold tier
MDisks, it can quickly react and recommend migration of the extents to the middle tier. This
feature is useful when provisioning new volumes that overrun the capacity of the middle tier or
when no middle tier is present, for example, with flash and nearline only configurations.
Arrays that are created from self-compressing drives have a written capacity limit (virtual
capacity before compression) that is higher than the array’s usable capacity (physical
capacity). Writing highly compressible data to the array means that the written capacity limit
can be reached without running out of usable capacity. However, if data is not compressible or
the compression ratio is low, it is possible to run out of usable capacity before reaching the
written capacity limit of the array, which means the amount of data that is written to a
self-compressing array must be controlled to prevent the array from running out of space.
Without a maximum overallocation limit, Easy Tier scales the usable capacity of the array
based on the actual compression ratio of the data that is stored on the array at a point in time.
Easy Tier migrates data to the array and might use a large percentage of the usable capacity
in doing so, but it stops migrating to the array when the array comes close to running out of
usable capacity. Then, it might start migrating data away from the array again to free space.
However, Easy Tier migrates storage only at a slow rate, which might not keep up with
changes to the compression ratio within the tier. When Easy Tier swaps extents or data is
overwritten by hosts, compressible data might be replaced with data that is less compressible,
which increases the amount of usable capacity that is consumed by extents and might result
in self-compressing arrays running out of space, which can cause a loss of access to data
until the condition is resolved.
You can specify the maximum overallocation ratio for pools that contain FCM arrays to
prevent out-of-space scenarios. The value acts as a multiplier of the physically available
space in self-compressing arrays. The allowed values are a percentage in the range of 100%
(default) to 400% or off. Setting the value to off disables this feature.
When the limit is set, Easy Tier scales the available usable capacity of self-compressing
arrays by using the specified overallocation limit and adjusts the migration plan to make sure
308 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
the fullness of these arrays stays below the maximum overallocation. Specify the maximum
overallocation limit based on the estimated lowest compression ratio of the data that is written
to the pool.
The default setting of 100% allows no Easy Tier overallocation on new pools, which is the
most conservative and safe option. By increasing the limit, you allow Easy Tier to put more
data into the FCM tier and benefit from its performance, but at the same time chances to get
into “out of physical space” (OOPS) situation if compression ratio suddenly drops is much
higher.
Easy Tier attempts to migrate the extents to another extent within the same tier. However, if
there is not enough space in the same tier, Easy Tier picks the highest-priority tier with free
capacity. Table 4-11 describes the migration target-tier priorities.
1 2 3 4 5
The tiers are chosen to optimize the typical migration cases, for example, replacing the
enterprise HDD tier with Tier 1 flash arrays or replacing nearline HDDs with Tier 1 flash
arrays.
Important: Easy Tier uses the extent migration capabilities of IBM Storage Virtualize.
These migrations require free capacity because an extent is first cloned to a new extent
before the old extent is returned to the free capacity in the relevant tier.
It is recommended that a minimum of 16 extents are needed for Easy Tier to operate.
However, if only 16 extents are available, Easy Tier can move at most 16 extents at a time.
Easy Tier and storage pool balancing do not function if you allocate 100% of the storage
pool to volumes.
Implementation rules
Remember the following implementation and operational rules when you use Easy Tier:
Easy Tier automatic data placement is not supported on image mode or sequential
volumes. I/O monitoring for such volumes is supported, but you cannot migrate extents on
these volumes unless you convert image or sequential volume copies to striped volumes.
Automatic data placement and extent I/O activity monitors are supported on each copy of
a mirrored volume. Easy Tier works with each copy independently of the other copy.
If possible, the system creates volumes or expands volumes by using extents from MDisks
from the enterprise tier. However, if necessary, it uses extents from MDisks from the flash
tier.
Do not provision 100% of Easy Tier enabled pool capacity. Reserve at least 16 extents for
each tier for the Easy Tier movement operations.
When a volume is migrated out of a storage pool that is managed with Easy Tier, Easy Tier
automatic data placement mode is no longer active on that volume. Automatic data
placement is turned off while a volume is being migrated, even when it is between pools that
both have Easy Tier automatic data placement enabled. Automatic data placement for the
volume is reenabled when the migration is complete.
Limitations
When you use Easy Tier, consider the following limitations:
Removing an MDisk by using the -force parameter.
When an MDisk is deleted from a storage pool with the -force parameter, extents in use
are migrated to MDisks in the same tier as the MDisk that is being removed, if possible. If
insufficient extents exist in that tier, extents from another tier are used.
Migrating extents.
When Easy Tier automatic data placement is enabled for a volume, you cannot use the
migrateexts CLI command on that volume.
Migrating a volume to another storage pool.
When a system migrates a volume to a new storage pool, Easy Tier automatic data
placement between the two tiers is temporarily suspended. After the volume is migrated to
its new storage pool, Easy Tier automatic data placement resumes for the moved volume, if
appropriate.
When the system migrates a volume from one storage pool to another one, Easy Tier
attempts to migrate each extent to an extent in the new storage pool from the same tier as
the original extent. In several cases, such as where a target tier is unavailable, another tier
is used based on the priority rules that are outlined in 4.7.8, “Removing an MDisk from an
Easy Tier pool” on page 309.
310 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
It is unlikely that all the data that is contained in an extent has the same I/O workload, and as
a result, the same temperature. Therefore, moving a hot extent likely also moves data that is
not hot. The overall Easy Tier efficiency to put hot data in the correct tier is then inversely
proportional to the extent size.
However, Easy Tier efficiency is not the only factor that is considered when choosing the
extent size. Manageability and capacity requirement considerations also must be accounted
for.
Generally, use the default 1 GB (standard pool) or 4 GB (DRP) extent size for Easy Tier
enabled configurations.
In general, using Easy Tier at the highest level, that is, the virtualizer, is recommended. So, it
is a best practice to disable Easy Tier on back-end systems, but to leave it enabled on the
virtualizer.
Important: Never run tiering at two levels. Doing so causes thrashing and unexpected
heat and cold jumps at both levels.
Although both of these options provide benefits in term of performance, they have different
characteristics.
Option 2, especially with DS8000 as the back-end, offers some advantages compared to
option 1. For example, when external storage is used, the virtualizer uses generic
performance profiles to evaluate the workload that can be placed on a specific MDisk, as
described in “MDisk Easy Tier load” on page 315. These profiles might not match the
back-end capabilities, which can lead to a resource usage that is not optimized.
However, this problem rarely occurs with option 2 because the performance profiles are
based on the real back-end configuration.
Easy Tier continuously moves extents across the tiers (and within the same tier) and attempts
to optimize performance. As result, the amount of data that is written to the back-end (and
therefore the compression ratio) can unpredictably fluctuate over time, even though the data
is not modified by the user.
312 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
This situation means that the optimized extent distribution on the primary system can differ
considerably from the one that is on the secondary system. The optimized extent reallocation
that is based on the workload learning on the primary system is not sent to the secondary
system now to allow the same extent optimization on both systems based on the primary
workload pattern.
In a DR situation with a failover from the primary site to a secondary site, the extent
distribution of the volumes on the secondary system is not optimized to match the primary
workload. Easy Tier relearns the production I/O profile and builds a new extent migration plan
on the secondary system to adapt to the new production workload.
The secondary site eventually achieves the same optimization and level of performance as on
the primary system. This task takes a little time, so the production workload on the secondary
system might not run at its optimum performance during that period. The Easy Tier
acceleration feature can be used to mitigate this situation. For more information, see “Easy
Tier acceleration” on page 315.
IBM Storage Virtualize remote copy configurations that use the nearline tier at the secondary
system must be carefully planned, especially when practicing DR by using FlashCopy. In
these scenarios, FlashCopy often starts just before the beginning of the DR test. It is likely
that the FlashCopy target volumes are in the nearline tier because of prolonged inactivity.
When the FlashCopy starts, an intensive workload often is added to the FlashCopy target
volumes because of both the background and foreground I/Os. This situation can easily lead
to overloading, and then possibly performance degradation of the nearline storage tier if it is
not correctly sized in terms of resources.
Even if data exists and the write is an overwrite, the new data is not written in that place.
Instead, the new write is appended at the end and the old data is marked as needing garbage
collection. This process provides the following advantages:
Writes to a DRP volume always are treated as sequential. Therefore, all the 8 KB chunks
can be built into a larger 256 KB chunk and destage the writes from cache as full stripe
writes or as large as a 256 KB sequential stream of smaller writes.
Easy Tier with DRP gives the best performance both in terms of RAID on back-end
systems and on flash, where it becomes easier for the flash device to perform its internal
garbage collection on a larger boundary.
To improve the Easy Tier efficiency with this write workload profile, you can start to record
metadata about how frequently certain areas of a volume are overwritten. The Easy Tier
algorithm was modified so that you can then bin-sort the chunks into a heat map in terms of
rewrite activity, and then group commonly rewritten data onto a single extent. This method
ensures that Easy Tier operates correctly for read/write data when data reduction is used.
Before DRP, write operations to compressed volumes had a lower value to the Easy Tier
algorithms because writes were always to a new extent, so the previous heat was lost. Now,
we can maintain the heat over time and ensure that frequently rewritten data is grouped. This
process also aids the garbage-collection process where it is likely that large contiguous areas
are garbage that is collected together.
Consider the following sample multi-tier configurations that address some of most common
requirements. The same benefits can be achieved by adding SCM to the configuration. In
these examples, the top flash tier can be replaced with an SCM tier, or SCM can be added as
the hot tier and the corresponding medium and cold tiers are shifted down to drop the coldest
tier:
5% SCM and 95% FCM
This configuration provides FCM capacity efficiency and performance, with the
performance boost because of DRP and other frequently accessed metadata on SCM tier.
20-50% flash and 80-50% nearline
This configuration provides a mix of storage for latency-sensitive and capacity-driven
workloads, it provides reduced costs and performance comparable to a single-tier flash
solution.
10 - 20% flash and 80 - 90% enterprise
This configuration provides flash-like performance with reduced costs.
5% Tier 0 flash, 15% Tier 1 flash, and 80% nearline
This configuration provides flash-like performance with reduced costs.
3 - 5% flash and 95 - 97% enterprise
This configuration provides improved performance compared to a single-tier solution. All
data is ensured to have at least enterprise performance. It also removes the requirement
for overprovisioning for high-access density environments.
3 - 5% flash, 25 - 50% enterprise, and 40 - 70% nearline
This configuration provides improved performance and density compared to a single-tier
solution. It also provides significant reduction in environmental costs.
314 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:38 pm Ch4_FS_SVC_STORAGE_POOLS.fm
This setting can be changed online without affecting host or data availability. To turn on or off
Easy Tier acceleration mode, run the following command:
chsystem -easytieracceleration <on/off>
This setting cannot be changed for internal MDisks (an array) because the system can
determine the exact load that an internal MDisk can handle based on the type of drive (HDD
or SSD), the number of drives, and type of RAID in use per MDisk.
For an external MDisk, Easy Tier uses specific performance profiles based on the
characteristics of the external controller and on the tier that is assigned to the MDisk. These
performance profiles are generic, which means that they do not account for the actual
back-end configuration. For example, the same performance profile is used for a DS8000 with
300 GB 15 K RPM and 1.8 TB 10 K RPM.
This feature is provided for advanced users to change the Easy Tier load setting to better
align it with a specific external controller configuration.
Note: The load setting is used with the MDisk tier type setting to calculate the number of
concurrent I/Os and expected latency from the MDisk. Setting this value incorrectly or by
using the wrong MDisk tier type can have a detrimental effect on overall pool performance.
The following values can be set to each MDisk for the Easy Tier load:
Default
Low
Medium
High
Very high
The system uses a default setting based on the controller performance profile and the MDisk
tier setting of the presented MDisks.
Change the default setting to any other value only when you are certain that a MDisk is
underutilized and can handle more load or that the MDisk is overutilized and the load should
be lowered. Change this setting to “very high” only for SDDs and flash MDisks.
This setting can be changed online without affecting the host or data availability.
Note: After changing the load setting, note the old and new settings and record the date
and time of the change. Use IBM Storage Insights to review the performance of the pool in
the coming days to ensure that you have not inadvertently degraded the performance of
the pool.
You can also gradually increase the load setting and validate that with each change you
are seeing an increase in throughput without a corresponding detrimental increase in
latency (and vice versa if you are decreasing the load setting).
316 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
Chapter 5. Volumes
In an IBM Storage Virtualize system, a volume is a logical disk that the system presents to
attached hosts. This chapter describes the various types of volumes and provides guidance
about managing their properties.
For best performance, spread the host workload over multiple volumes.
318 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
Volumes in standard pools of an IBM SAN Volume Controller (SVC) or IBM FlashSystem can
feature the following attributes that affect where the extents are allocated:
Striped
A volume that is striped at the extent level. The extents are allocated from each managed
disk (MDisk) that is in the storage pool. This volume type is the most frequently used
because each I/O to the volume is spread across external storage MDisks.
Sequential
A volume on which extents are allocated sequentially from one MDisk. This type of volume
is rarely used because a striped volume is better suited to most cases.
Image
Image-mode volumes are special volumes that have a direct relationship with one MDisk.
If you have an MDisk that contains data that you want to merge into the clustered system,
you can create an image-mode volume. When you create an image-mode volume, a direct
mapping is made between extents that are on the MDisk and extents that are on the
volume. The MDisk is not virtualized. The logical block address (LBA) x on the MDisk is
the same as LBA x on the volume.
When you create an image-mode volume copy, you must assign it to a storage pool. An
image-mode volume copy must be at least one extent in size. The minimum size of an
image-mode volume copy is the extent size of the storage pool to which it is assigned.
The extents are managed in the same way as other volume copies. When the extents are
created, you can move the data onto other MDisks that are in the storage pool without
losing access to the data. After you move one or more extents, the volume copy becomes
a virtualized disk, and the mode of the MDisk changes from image to managed.
When you create a volume, it takes some time to completely format it completely
(depending on the volume size). The syncrate parameter of the volume specifies the
volume copy synchronization rate, and it can be modified to accelerate the completion of
the format process.
For example, the initialization of a 1 TB volume can take more than 120 hours to complete
with the default syncrate value 50, or approximately 4 hours if you manually set the
syncrate to 100. If you increase the syncrate to accelerate the volume initialization,
remember to reduce it again to avoid issues the next time you use volume mirroring to
perform a data migration of that volume.
For more information about creating a thin-provisioned volume, see 5.3, “Thin-provisioned
volumes” on page 321.
Each volume is associated with an I/O group and has a preferred node inside that I/O
group. When creating a volume on an SVC, consider balancing volumes across the I/O
groups to balance the load across the cluster. When creating a volume on a clustered
IBM FlashSystem, ensure each MDisk group completely resides in one IBM FlashSystem.
If a host can access only one I/O group, the volume must be created in the I/O group to
which the host has access.
Also, it is possible to define a list of I/O groups in which a volume can be accessible to
hosts. It is a best practice that a volume is accessible to hosts by the caching only I/O
group. You can have more than one I/O group in the access list of a volume in some
scenarios with specific requirements, such as when a volume is migrated to another I/O
group.
Tip: Migrating volumes across I/O groups can be a disruptive action. Therefore, specify
the correct I/O group at the time the volume is created.
By default, the preferred node, which owns a volume within an I/O group, is selected in a
round-robin basis. Although it is not easy to estimate the workload when the volume is
created, distribute the workload evenly on each node within an I/O group.
Except in a few cases, the cache mode of a volume is set to read/write. For more
information, see 5.12, “Volume cache mode” on page 355.
A volume occupies an integer number of extents, but its length does not need to be an
integer multiple of the extent size. However, the length does need to be an integer multiple
of the block size. Any space that is left over between the last logical block in the volume
and the end of the last extent in the volume is unused.
320 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
The maximum number of volumes per I/O group and system is listed in the “Configurations
Limits and Restrictions” section for your system’s code level at the following IBM Support
web pages:
– IBM SAN Volume Controller
– IBM FlashSystem 9500
– IBM FlashSystem 9100 and 9200
– IBM FlashSystem 7200 and 7300
– IBM FlashSystem 50x5 and 5200
Note: We do not recommend using thin-provisioned volumes in a DRP with IBM FlashCore
Module (FCM).
In standard pools, thin-provisioned volumes are created based on capacity savings criteria.
These properties are managed at the volume level. However, in DRPs, all the benefits of
thin-provisioning are available to all the volumes that are assigned to the pool. For the
thin-provisioned volumes in DRPs, you can configure compression and data deduplication on
these volumes, which increases the capacity savings for the entire pool.
You can enhance capacity efficiency for thin-provisioned volumes by monitoring the hosts’
usage of capacity. When the host indicates that the capacity is no longer needed, the space is
released and can be reclaimed by the DRP. Standard pools do not have these functions.
Real capacity defines how much disk space from a pool is allocated to a volume. Virtual
capacity is the capacity of the volume that is reported to the hosts. A volume’s virtual capacity
is typically larger than its real capacity. However, as data continues to be written to the
volume, that difference diminishes.
Each system uses the real capacity to store data that is written to the volume and metadata
that describes the thin-provisioned configuration of the volume. As more information is written
to the volume, more of the real capacity is used. The system identifies read operations to
unwritten parts of the virtual capacity and returns zeros to the server without using any real
capacity.
If you select the autoexpand feature, the IBM Storage Virtualize system automatically adds a
fixed amount of real capacity to the thin volume as required. Therefore, the autoexpand
feature attempts to maintain a fixed amount of unused real capacity for the volume. We
recommend the usage of autoexpand by default to avoid volume-offline issues.
322 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
This amount of extra real capacity is known as the contingency capacity. The contingency
capacity is initially set to the real capacity that is assigned when the volume is created. If the
user modifies the real capacity, the contingency capacity is reset to be the difference between
the used capacity and real capacity.
A volume that is created without the autoexpand feature (and therefore has a zero
contingency capacity) goes offline when the real capacity is used. In this case, it must be
expanded.
When creating a thin-provisioned volume with compression and deduplication enabled, you
must be careful about out-of-space issues in the volume and pool where the volume is
created. Set the warning threshold notification in the pools that contain thin-provisioned
volumes, and in the volume.
Warning threshold: When you are working with thin-provisioned volumes, enable the
warning threshold (by using email or a Simple Network Management Protocol (SNMP)
trap) in the storage pool. If the autoexpand feature is not used, you also must enable the
warning threshold on the volume level. If the pool or volume runs out of space, the volume
goes offline, which results in a loss of access.
If you do not want to be concerned with monitoring volume capacity, it is highly recommended
that the autoexpand option is enabled. Also, when you create a thin-provisioned volume, you
must specify the space that is initially allocated to it (by using the -rsize option in the CLI) and
the grain size.
By default, -rsize (or real capacity) is set to 2% of the volume virtual capacity, and grain size
is 256 KiB. These default values, with the autoexpand enabled and warning disabled options,
work in most scenarios. Some instances exist in which you might consider using different
values to suit your environment.
Example 5-2 shows the command to create a volume with the suitable parameters.
324 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
IBM Storage Virtualize systems support compressed volumes in a DRP only. A DRP also
reclaims capacity that is not used by hosts if the host supports SCSI unmap commands. When
these hosts issue SCSI unmap commands, a DRP reclaims the released capacity.
Compressed volumes in DRPs do not display their individual compression ratio (CR). The
pool’s used capacity before reduction indicates the total amount of data that is written to
volume copies in the storage pool before data reduction occurs. The pool’s used capacity
after reduction is the space that is used after thin provisioning, compression, and
deduplication. This compression solution provides nondisruptive conversion between
compressed and decompressed volumes.
If you are planning to virtualize volumes that are connected to your hosts directly from any
storage subsystems and you want an estimate of the space saving that likely is to be
achieved, run the IBM Data Reduction Estimator Tool (DRET).
DRET is a CLI- and host-based utility that can be used to estimate an expected compression
rate for block devices. This tool also can evaluate capacity savings by using deduplication. For
more information, see the IBM Data Reduction Estimator Tool for SVC, Storwize and
FlashStems products web page.
IBM Storage Virtualize systems also include an integrated Comprestimator tool, which is
available through the management GUI and CLI. If you are considering applying compression
on noncompressed volumes in an IBM FlashSystem, you can use this tool to evaluate
whether compression generates enough capacity savings.
For more information, see 4.2.3, “Data reduction estimation tools” on page 252.
As shown in Figure 5-5, customize the Volume view to see the compression savings for a
compressed volume and estimated compression savings for a noncompressed volume that
you are planning to migrate.
The deduplication process identifies unique chunks of data (or byte patterns) and stores a
signature of the chunk for reference when writing new data chunks. If the new chunk’s
signature matches an existing signature, the new chunk is replaced with a small reference
that points to the stored chunk. The same byte pattern can occur many times, which result in
the amount of data that must be stored being greatly reduced.
If a volume is configured with deduplication and compression, data is deduplicated first and
then compressed. Therefore, deduplication references are created on the compressed data
that is stored on the physical domain.
The scope of deduplication is all deduplicated volumes in the same pool, regardless of the
volume’s preferred node or I/O group.
To create a thin-provisioned volume that uses deduplication and does not require a
provisioning policy, enter the command into the CLI that is shown in Example 5-3.
To create a compressed volume that uses deduplication and does not required a provisioning
policy, enter the command that is shown in Example 5-4.
326 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
To maximize the space that is available for the deduplication database, the system distributes
it between all nodes in the I/O groups that contain deduplicated volumes. Each node holds a
distinct portion of the records that are stored in the database.
Depending on the data type that is stored on the volume, the capacity savings can be
significant. Examples of use cases that typically benefit from deduplication are backup
servers and virtual environments with multiple virtual machines (VMs) running the same
operating system.
In both cases, it is expected that multiple copies of identical files exist, such as components of
the standard operating system or applications that are used in the organization. Data that is
encrypted or compressed at the file-system level does not benefit from deduplication.
Deduplication works by finding patterns, and encryption essentially works by obfuscating
whatever patterns might exist in the data.
If you want to evaluate whether savings are realized by migrating a set of volumes to
deduplicated volumes, you can use DRET. For more information about DRET, see 4.2.3,
“Data reduction estimation tools” on page 252.
Consider the following properties of thin-provisioned volumes that are useful to understand:
When the used capacity first exceeds the volume warning threshold, an event is raised,
which indicates that real capacity is required. The default warning threshold value is 80%
of the volume capacity. To disable warnings, specify 0%.
Compressed volumes include an attribute called decompressed used capacity (for
standard pools) and used capacity before reduction (for a DRP). These volumes are the
used capacities before compression or data reduction. They are used to calculate the CR.
If you run out of space on a volume or storage pool, the host that uses the affected volumes
cannot perform new write operations to these volumes. Therefore, an application or database
that is running on this host becomes unavailable.
In a storage pool with only fully allocated volumes, the storage administrator can easily
manage the used and available capacity in the storage pool as the used capacity grows when
volumes are created or expanded.
However, in a pool with thin-provisioned volumes, the used capacity increases any time that
the host writes data. For this reason, the storage administrator must consider capacity
planning carefully. It is critical to put in place volume and pool capacity monitoring.
Tools, such as IBM Spectrum Control and IBM Storage Insights, can display the capacity of a
storage pool in real time and graph how it is growing over time. These tools are important
because they are used to predict when the pool runs out of space.
IBM Storage Virtualize also alerts you by including an event in the event log when the storage
pool reaches the configured threshold, which is called the warning level. The GUI sets this
threshold to 80% of the capacity of the storage pool by default.
By using enhanced Call Home and IBM Storage Insights, IBM now can monitor and flag
systems that have low capacity. This ability can result in a support ticket being generated and
the client being contacted.
Note: This protection likely solves most immediate problems. However, after you are
informed that you ran out of space, a limited amount of time exists to react. You need a
plan in place and the next steps must be understood.
328 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
Note: The following policies use arbitrary numbers. These numbers are designed to
make the suggested policies more readable. We do not provide any recommended
numbers to insert into these policies because they are determined by business risk,
and this consideration is different for every client.
– Manage free space such that enough free capacity always is available for your
10 largest volumes to reach 100% full without running out of free space.
– Never overallocate more than 200%. For example, if you have 100 TB of capacity in the
storage pool, the sum of the volume capacities in the same pool must not exceed
200 TB.
– Always start the process of adding capacity when the storage pool reaches 70% full.
Grain size
The grain size is defined when the thin-provisioned volume is created. The grain size can be
set to 32 KB, 64 KB, 128 KB, or 256 KB (default). The grain size cannot be changed after the
thin-provisioned volume is created.
Smaller grain sizes can save more space, but they have larger directories. For example, if you
select 32 KB for the grain size, the volume size cannot exceed 260,000 GB. Therefore, if you
are not going to use the thin-provisioned volume as a FlashCopy source or target volume, use
256 KB by default to maximize performance.
Thin-provisioned volume copies in DRPs have a grain size of 8 KB. This predefined value
cannot be set or changed.
If you are planning to use thin-provisioning with FlashCopy, the grain size for FlashCopy
volumes can be only 64 KB or 256 KB. In addition, to achieve best performance, the grain size
for the thin-provisioned volume and FlashCopy mapping must be same. For this reason, it is
not recommended to use thin-provisioned volume in DRPs as a FlashCopy source or target
volume.
When a server writes to a mirrored volume, the system writes the data to both copies. When
a server reads a mirrored volume, the system picks one of the copies to read. If one of the
mirrored volume copies is temporarily unavailable (for example, because the storage system
that provides the pool is unavailable), the volume remains accessible to servers. The system
remembers which areas of the volume are written and resynchronizes these areas when both
copies are available.
330 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
You can create a volume with one or two copies, and you can convert a non-mirrored volume
into a mirrored volume by adding a copy. When a copy is added in this way, the system
synchronizes the new copy so that it is the same as the existing volume. Servers can access
the volume during this synchronization process.
You can convert a mirrored volume into a nonmirrored volume by deleting one copy or by
splitting one copy to create a non-mirrored volume.
The volume copy can be any type: image, striped, or sequential. The volume copy can use
thin-provisioning or compression to save capacity. If the copies are in DRPs, you also can use
deduplication to the volume copies to increase the capacity savings.
If you are creating a volume, the two copies can use different capacity reduction attributes.
You can add a deduplicated volume copy in a DRP to a volume in a standard pool. You can
use this method to migrate volumes to DRPs.
After a volume mirror is synchronized, a mirrored copy can become unsynchronized if it goes
offline and write I/O requests must be processed, or if a mirror fast failover occurs. The fast
failover isolates the host systems from temporarily slow-performing mirrored copies, which
affect the system with a short interruption to redundancy.
Note: In standard volumes, the primary volume formats before synchronizing to the
volume copies. The -syncrate parameter for the mkvdisk command controls the format
and synchronization speed.
You can create a mirrored volume by using the Mirrored option in the Create Volume window,
as showing in Figure 5-6. To display the Volume copy type selection ensure the Advanced
settings mode is enabled.
You can convert a non-mirrored volume into a mirrored volume by adding a copy, as shown in
Figure 5-7 on page 332.
332 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
However, in rare cases, the sequence can take more than 20 seconds to complete. When the
I/O ending sequence completes, the volume mirror configuration is updated to record that the
slow copy is now no longer synchronized. When the configuration updates finish, the write I/O
can be completed on the host system.
The volume mirror stops by using the slow copy for 4 - 6 minutes; subsequent I/O requests
are satisfied by the remaining synchronized copy. During this time, synchronization is
suspended. Also, the volume’s synchronization progress shows less than 100% and
decreases if the volume receives more host writes. After the copy suspension completes,
volume mirroring synchronization resumes and the slow copy starts synchronizing.
If another I/O request times out on the unsynchronized copy during the synchronization,
volume mirroring again stops by using that copy for 4 - 6 minutes. If a copy is always slow,
volume mirroring attempts to synchronize the copy again every 4 - 6 minutes and another I/O
timeout occurs.
The copy is not used for another 4 - 6 minutes and becomes progressively unsynchronized.
Synchronization progress gradually decreases as more regions of the volume are written.
If write fast failovers occur regularly, an underlying performance problem might exist within the
storage system that is processing I/O data for the mirrored copy that became
unsynchronized. If one copy is slow because of storage system performance, multiple copies
on different volumes are affected. The copies might be configured from the storage pool that
is associated with one or more storage systems. This situation indicates possible overloading
or other back-end performance problems.
When you run the mkvdisk command to create a volume, the mirror_write_priority
parameter is set to latency by default. Fast failover is enabled. However, fast failover can be
controlled by changing the value of the mirror_write_priority parameter on the chvdisk
command. If the mirror_write_priority is set to redundancy, fast failover is disabled.
The system applies a full SCSI initiator-layer error recovery procedure (ERP) for all mirrored
write I/O. If one copy is slow, the ERP can take up to 5 minutes. If the write operation is still
unsuccessful, the copy is taken offline. Carefully consider whether maintaining redundancy or
fast failover and host response time (at the expense of a temporary loss of redundancy) is
more important.
Note: Mirrored volumes can be taken offline if no quorum disk is available. This behavior
occurs because the synchronization status for mirrored volumes is recorded on the
quorum disk. To protect against mirrored volumes being taken offline, follow the guidelines
for setting up quorum disks.
The system submits a host read I/O request to one copy of a volume at a time. If that request
succeeds, the system returns the data. If it is not successful, the system retries the request to
the other copy volume.
With read fast failovers, when the primary-for-read copy goes slow for read I/O, the system
fails over to the other copy. Therefore, the system tries the other copy first for read I/O during
the following 4 - 6 minutes. After that attempt, the system reverts to read the original
primary-for-read copy.
During this period, if read I/O to the other copy also is slow, the system reverts immediately.
Also, if the primary-for-read copy changes, the system reverts to try the new primary-for-read
copy. This issue can occur when the system topology changes or when the primary or local
copy changes. For example, in a standard topology, the system normally tries to read the
primary copy first. If you change the volume’s primary copy during a read fast failover period,
the system reverts to read the newly set primary copy immediately.
The read fast failover function is always enabled on the system. During this process, the
system does not suspend the volumes or make the copies out of sync.
Therefore, before you perform maintenance on a storage system that might affect the data
integrity of one copy, it is important to check that both volume copies are synchronized. Then,
remove that volume copy before you begin the maintenance.
HyperSwap is a system topology that enables high availability (HA) and disaster recovery
(DR) (HADR) between I/O groups at different locations. Before you configure HyperSwap
volumes, the system topology must be configured for HyperSwap and sites must be defined.
Figure 5-8 shows an overall view of HyperSwap that is configured with two sites.
334 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
In the management GUI, HyperSwap volumes are configured by specifying volume details,
such as quantity, capacity, name, and the method for saving capacity. As with basic volumes,
you can choose compression or thin-provisioning to save capacity on volumes.
For thin-provisioning or compression, you can also select to use deduplication for the volume
that you create. For example, you can create a compressed volume that also uses
deduplication to remove duplicated data.
The method for capacity savings applies to all HyperSwap volumes and copies that are
created. The volume location displays the site where copies are located, based on the
configured sites for the HyperSwap system topology. For each site, specify a pool and I/O
group that are used by the volume copies that are created on each site. If you select to
deduplicate volume data, the volume copies must be in DRPs on both sites.
The management GUI creates an HyperSwap relationship and change volumes (CVs)
automatically. HyperSwap relationships manage the synchronous replication of data between
HyperSwap volume copies at the two sites.
If your HyperSwap system supports self-compressing FCMs and the base volume is fully
allocated in a DRP, the corresponding CV is created with compression enabled. If the base
volume is in a standard pool, the CV is created as a thin-provisioned volume.
You can specify a consistency group (CG) that contains multiple active-active relationships to
simplify management of replication and provide consistency across multiple volumes. A CG is
commonly used when an application spans multiple volumes. CVs maintain a consistent copy
of data during resynchronization. CVs allow an older copy to be used for DR if a failure
occurred on the up-to-date copy before resynchronization completes.
You can also use the mkvolume CLI to create a HyperSwap volume. The command also
defines pools and sites for HyperSwap volume copies and creates the active-active
relationship and CVs automatically.
You can see the relationship between the master and auxiliary volume in a 2-site HyperSwap
topology in Figure 5-9.
For more information about HyperSwap volumes, see 7.6, “HyperSwap internals” on
page 515.
You can assign ownership of VVOLs to IBM Spectrum Connect by creating a user with the
vSphere API for Storage Awareness (VASA) Provider security role. IBM Spectrum Connect
provides communication between the VMware vSphere infrastructure and the system.
Although you can complete specific actions on volumes and pools that are owned by the
VASA Provider security role, IBM Spectrum Connect retains management responsibility for
VVOLs.
When VVOLs are enabled on the system, a utility volume is created to store metadata for the
VMware vCenter applications. You can select a pool to provide capacity for the utility volume.
With each new volume that is created by the VASA provider, VMware vCenter defines a few
kilobytes of metadata that are stored on the utility volume.
The utility volume can be mirrored to a second storage pool to ensure that the failure of a
storage pool does not result in loss of access to the metadata. Utility volumes are exclusively
managed by the VASA provider and cannot be deleted or mapped to other host objects.
Figure 5-10 provides a high-level overview of the key components that enable the VVOL
management framework.
336 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
To start using VVOLs, complete the following steps on the IBM Storage Virtualize system
before you configure any settings within the IBM Spectrum Connect server:
1. Enable VVOLs on the system:
a. In the management GUI, select Settings → System → VMWare Virtual Volumes
(vVols) and click On.
b. Select the pool to where the utility volume is stored. If possible, store a mirrored copy of
the utility volume in a second storage pool that is in a separate failure domain. The
utility volume cannot be created in a DRP.
c. Create a user for IBM Spectrum Connect to communicate with IBM FlashSystem, as
shown in Figure 5-11.
2. Create the user account for IBM Spectrum Connect and the user group with the VMware
VASA Provider role if they were not set in the previous step:
a. Create a user group by selecting Access → Users by Group → Create User Group.
Enter the user group name, select VASA Provider for the role, and click Create.
b. Create the user account by selecting Access → Users by Group, select the user
group that was created in step a, and click Create User. Enter the name of the user
account, select the user group with VASA Provider role, enter a valid password for the
user, and click Create.
3. For each ESXi host server to use VVOLs, create a host object:
a. In the management GUI, select Hosts → Hosts → Add Host.
b. Enter the name of the ESXi host server, enter the connection information, select VVOL
for the host type, and then click Add Host.
c. If the ESXi host was previously configured, the host type can be changed by modifying
the ESXi host type.
338 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
Note: The user account with the VASA Provider role is used by only the IBM Spectrum
Connect server to access the IBM Storage Virtualize system and run the automated tasks
that are required for vVols. Users must not directly log in to the management GUI or CLI
with this type of account and complete system tasks unless they are directed to by
IBM Support.
With TCT, the system supports connections to CSPs and the creation of cloud snapshots of
any volume or volume group on the system. Cloud snapshots are point-in-time copies of
volumes that are created and transferred to cloud storage that is managed by a CSP.
A cloud account defines the connection between the system and a supported CSP. It also
must be configured before data can be transferred to or restored from the cloud storage. After
a cloud account is configured with the CSP, you determine which volumes you want to create
cloud snapshots of and enable TCT on those volumes.
A cloud account is an object on the system that represents a connection to a CSP by using a
particular set of credentials. These credentials differ depending on the type of CSP that is
being specified. Most CSPs require the hostname of the CSP and an associated password.
Some CSPs also require certificates to authenticate users of the cloud storage. Public clouds
use certificates that are signed by well-known certificate authorities.
Private CSPs can use a self-signed certificate or a certificate that is signed by a trusted
certificate authority. These credentials are defined on the CSP and passed to the system
through the administrators of the CSP.
A cloud account defines whether the system can successfully communicate and authenticate
with the CSP by using the account credentials.
If the system is authenticated, it can access cloud storage to copy data to the cloud storage or
restore data that is copied to cloud storage back to the system. The system supports one
cloud account to a single CSP. Migration between providers is not supported.
340 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
Only one operation (cloud snapshot, restore, or snapshot deletion) is allowed at a time on
a cloud volume.
Cloud volume traffic is allowed only through management interfaces (1 G or 10 G).
When the snapshot version is restored to a new volume, you can use the restored data
independently of the original volume from which the snapshot was created. If the new volume
exists on the system, the restore operation uses the unique identifier (UID) of the new volume.
If the new volume does not exist on the system, you must choose whether to use the UID
from the original volume or create a UID. If you plan to use the new volume on the same
system, use the UID that is associated with the snapshot version that is being restored.
When migrating from image to managed or vice versa, the command varies, as shown in
Table 5-1.
Migrating a volume from one storage pool to another one is nondisruptive to the host
application that uses the volume. Depending on the workload of the IBM Storage Virtualize
system, performance might be slightly affected.
The migration of a volume from one storage pool to another storage pool by using the
migratevdisk command is allowed only if both storage pools feature the same extent size.
Volume mirroring can be used if a volume must be migrated from one storage pool to another
storage pool with different extent sizes. Also, you may use the migratevdisk command to or
from DRP only if a volume is fully allocated.
Example 5-5 shows the migratevdisk command that can be used to migrate an image-type
volume to a striped-type volume. The command also can be used to migrate a striped-type
volume to a striped-type volume.
This command migrates the volume Migrate_sample to the storage pool MDG1DS4K, and uses
four threads when migrating. Instead of using the volume name, you can use its ID number.
You can monitor the migration process by using the lsmigrate command, as shown in
Example 5-6.
A volume is represented only as an image-type volume after it reaches the state where it is a
straight-through mapping. An image-type volume cannot be expanded.
Image-type disks are used to migrate data to IBM FlashSystem and migrate data out of
virtualization. In general, the reason for migrating a volume to an image-type volume is to
move the data on the disk to a non-virtualized environment.
If the migration is interrupted by a cluster recovery, the migration resumes after the recovery
completes.
342 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
The MDisk that is specified as the target must be in an unmanaged state at the time that the
command is run. Running this command results in the inclusion of the MDisk into the
user-specified storage pool.
Remember: This command cannot be used if the source volume copy is in a child pool or
if the target MDisk group that is specified is a child pool. This command does not work if
the volume is fast formatting.
The migratetoimage command fails if the target or source volume is offline. Correct the offline
condition before attempting to migrate the volume.
If the volume (or volume copy) is a target of a FlashCopy mapping with a source volume in an
active-active relationship, the new MDisk group must be in the same site as the source
volume. If the volume is in an active-active relationship, the new MDisk group must be in the
same site as the source volume. Also, the site information for the MDisk being added must be
defined and match the site information for other MDisks in the storage pool.
Note: You cannot migrate a volume or volume image between storage pools if cloud
snapshot is enabled on the volume.
An encryption key cannot be used when migrating an image-mode MDisk. To use encryption
(when the MDisk has an encryption key), the MDisk must be self-encrypting before
configuring the storage pool.
The migratetoimage command is useful when you want to use your system as a data mover.
For more information about the requirements and specifications for the migratetoimage
command, see this IBM Documentation web page.
To migrate from a thin-provisioned volume to a fully allocated volume, the process is similar:
1. Add a target fully allocated copy.
2. Wait for synchronization to complete.
3. Remove the source thin-provisioned copy.
In both cases, if you set the autodelete option to yes when creating the volume copy, the
source copy is automatically deleted, and you can skip the third step in both processes. The
best practice for this type of migration is to try not to overload the systems with a high
syncrate or with too many migrations at the same time.
The syncrate parameter specifies the copy synchronization rate. A value of zero prevents
synchronization. The default value is 50. The supported -syncrate values and their
corresponding rates are listed in Table 5-2.
01 - 10 128 KB
11 - 20 256 KB
21 - 30 512 KB
31 - 40 1 MB
41 - 50 2 MB
51 - 60 4 MB
61 - 70 8 MB
71 - 80 16 MB
81 - 90 32 MB
91 - 100 64 MB
131 - 140 1 GB
141 - 150 2 GB
We recommend modifying syncrate after monitoring overall bandwidth and latency. Then, if
the performance is not affected for migration, increase the syncrate to complete within the
allotted time.
You also can use volume mirroring when you migrate a volume from a non-virtualized storage
device to an IBM Storage Virtualize system. As you can see in Figure 5-13 on page 345, you
first must attach the storage to the IBM Storage Virtualize system (in this instance, an SVC),
which requires some downtime because the hosts need to stop I/O, rediscover the volume
through the IBM Storage Virtualize system, and then resume access.
344 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
After the storage is correctly attached to the IBM Storage Virtualize system, map the
image-type volumes to the hosts so that the host recognizes volumes as though they were
accessed through the non-virtualized storage device. Then, you can restart applications.
After that process completes, you can use volume mirroring to migrate the volumes to a
storage pool with managed MDisks, which creates striped-type copies of each volume in this
target pool. Data synchronization in the volume copies and then starts in the background.
Figure 5-14 shows two examples of how you can use volume mirroring to convert volumes to
a different type or migrate volumes to a different type of pool.
You also can move compressed or thin-provisioned volumes in standard pools to DRPs to
simplify management of reclaimed capacity. The DRP tracks the unmap operations of the
hosts and reallocates capacity automatically. The system supports volume mirroring to create
a copy of the volume in a new DRP. This method creates a copy of the volume in a new DRP
and does not disrupt host operations.
Deleting a volume copy in a DRP is a background task and can take a significant amount of
time. During the deletion process, the deleting copy is still associated with the volume and a
new volume mirror cannot be created until the deletion is complete. If you want to use volume
mirroring again on the same volume without waiting for the delete, split the copy to be deleted
to a new volume before deleting it.
You also can migrate data between node-based systems and enclosure-based systems.
Unlike replication remote copy types, nondisruptive system migration does not require a
remote mirroring license before you can configure a remote copy relationship that is used for
migration.
There are some configuration and host operating system restrictions that are documented in
the Configuration Limits and Restrictions document under Volume Mobility.
346 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
Prerequisites
The following prerequisites must be met for nondisruptive system migration:
Both systems are running version 8.4.2 or later.
An FC or IP partnership exists between the two systems that you want to migrate volumes
between. The maximum supported round-trip time (RTT) between the two systems is
3 milliseconds. Ensure that the partnership has sufficient bandwidth to support the write
throughput for all the volumes you are migrating. For more information, seethe
mkfcpartnership command for creating FC partnerships and mkippartnership command
for creating IP partnerships.
Any hosts that are mapped to volumes that you are migrating are correctly zoned to both
systems. Hosts must appear in an online state on both systems.
Note: The volumes must be the same size. If the GUI window does not show the
expected auxiliary volume, check the size by running the lsvdisk -unit b <volume
name or id> command.
Note: Data is copied to the target system at the lowest of partnership background copy
rate or relationship bandwidth. The relationship bandwidth default is 25 MBps per
relationship and can be increased by running the chsystem
-relationshipbandwidthlimit <new value in MBps> command if needed.
11.Create host mappings to the auxiliary volumes on the remote system. Ensure that all
auxiliary volumes are mapped to the same hosts that were previously mapped to the
master volumes on the older system.
12.Ensure that the Host Bus Adapters (HBAs) in all hosts that are mapped to the volume are
rescanned to ensure that all new paths are detected to the auxiliary volumes. Record the
current path states on any connected hosts. Identify the worldwide port names (WWPNs)
that are used for the active and standby (ghost) paths.
13.In the management GUI, select Copy Services → Remote Copy → Independent
Relationship. Right-click the migration relationship and select Switch Direction. This
action reverses the copy direction of the relationship and switches the active and standby
paths, which result in all host I/O being directed to the new volume.
14.Validate that all hosts use the new paths to the volume by verifying that the paths that were
reporting as standby (or ghost) are now reporting active. Verify that all previously active
paths are now reporting standby (or ghost).
Note: Do not proceed if the added standby paths are not visible on the host. Standby
paths might be listed under a different name on the host, such as “ghost” paths. Data
access can be affected if all standby paths are not visible to the hosts when the
direction is switched on the relationship.
15.Validate that all hosts use the target volume for I/O and verify that no issues exist.
16.On the original source system that was used in the migration, select Hosts → Hosts.
Right-click the hosts and select Unmap Volumes. Verify the number of volumes that are
being unmapped, and then click Unmap.
17.On the original source system, select Volumes → Volumes. Right-click the volumes and
select Delete. Verify the number of volumes that are being deleted and click Continue.
The volume migration process is complete.
348 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
Depending on the amount of data that is being migrated, the process can take some time.
Note: Data is copied to the target system at the lowest of the partnership background
copy rate or the relationship bandwidth. The relationship bandwidth default is 25 MBps
per relationship, which can be increased by running the chsystem
-relationshipbandwidthlimit <new value in MBps> command.
Attention: Do not proceed if the added standby paths are not visible on the host.
Standby paths might be listed under a different name on the host, such as “ghost”
paths. Data access can be affected if all standby paths are not visible to the hosts when
the direction is switched on the relationship.
8. Switch the direction of the relationship so that the auxiliary volume on the target system
becomes the primary source for host I/O operations by running the switchrcrelationship
-primary aux migrationrc command, where migrationrc indicates the name of the
relationship. This command reverses the copy direction of the relationship and switches
the active and standby paths, which result in all host I/O being directed to the auxiliary
volume.
9. Validate that all hosts use the new paths to the volume by verifying that the paths
previously reporting as standby (or ghost) are now reporting active.
10.Verify that all previously active paths are now reporting standby (or ghost).
11.Validate that all hosts use the target volume for I/O and verify that no issues exist.
12.On the original source system, unmap hosts from the original volumes by entering the
rmvdiskhostmap -host host1 sourcevolume command, where sourcevolume is the name
of the original volume that was migrated.
13.On the original source system, delete the original source volumes by entering the
rmvolume sourcevolume command, where sourcevolume is the name of the original
volume that was migrated.
Preferred node assignment is normally automatic. The system selects the node in the I/O
group that includes the fewest volumes. However, the preferred node can be specified or
changed, if needed.
All modern multipathing drivers support Asymmetric Logical Unit Access (ALUA). This access
allows the storage to mark certain paths as preferred (paths to the preferred node). ALUA
multipathing drivers acknowledge preferred pathing and send I/O to only the other node if the
preferred node is not accessible.
Figure 5-15 shows write operations from a host to two volumes with different preferred nodes.
Figure 5-15 Write operations from a host through different preferred nodes for each volume
When debugging performance problems, it can be useful to review the Non-Preferred Node
Usage Percentage metric in IBM Spectrum Control or IBM Storage Insights. I/O to the
non-preferred node might cause performance problems for the I/O group, which can be
identified by these tools.
For more information about this performance metric and more in IBM Spectrum Control, see
this IBM Documentation web page.
350 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
Changing the preferred node of a volume within an I/O group or to another I/O group is a
nondisruptive process.
This operation can be done by using the CLI and GUI, but if you have only one I/O group, this
operation is not possible by using the GUI. To change the preferred node within an I/O group
by using the CLI, run the following command:
movevdisk -node <node_id or node_name> <vdisk_id or vdisk_name>
Some limitations exist in moving a volume across I/O groups, which is called Non-Disruptive
Volume Movement (NDVM). These limitations are mostly in host cluster environments. You
can check their compatibility at the IBM System Storage Interoperation Center (SSIC)
website.
Note: These migration tasks can be nondisruptive if performed correctly and the hosts that
are mapped to the volume support NDVM. The cached data that is held within the system
first must be written to disk before the allocation of the volume can be changed.
Modifying the I/O group that services the volume can be done concurrently with I/O
operations if the host supports nondisruptive volume move. It also requires a rescan at the
host level to ensure that the multipathing driver is notified that the allocation of the preferred
node changed and the ports by which the volume is accessed changed. This rescan can be
done in a situation where one pair of nodes becomes over-used.
If any host mappings are available for the volume, the hosts must be members of the target
I/O group or the migration fails.
Verify that you created paths to I/O groups on the host system. After the system successfully
adds the new I/O group to the volume’s access set and you moved the selected volumes to
another I/O group, detect the new paths to the volumes on the host.
The commands and actions on the host vary depending on the type of host and the
connection method that is used. These steps must be completed on all hosts to which the
selected volumes are currently mapped.
Note: If the selected volume is performing quick initialization, this wizard is unavailable
until quick initialization completes.
For example, volumes that are used for backup or archive operations can have I/O-intensive
workloads, potentially taking bandwidth from production volumes. A volume throttle can be
used to limit I/Os for these types of volumes so that I/O operations for production volumes are
not affected.
When deciding between using IOPS or bandwidth as the I/O governing throttle, consider the
disk access pattern of the application. Database applications often issue much I/O, but they
transfer only a relatively small amount of data. In this case, setting an I/O governing throttle
that is based on MBps does not achieve the expected result. Therefore, it is better to set an
IOPS limit.
However, a streaming video application often issues a small amount of I/O but transfers much
data. In contrast to the database example, defining an I/O throttle that is based on IOPS does
not achieve a good result. For a streaming video application, it is better to set an MBps limit.
352 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
You can edit the throttling value in the menu, as shown in Figure 5-17.
Figure 5-18 shows both the bandwidth and IOPS parameters that can be set.
Note: The mkthrottle command can be used to create throttles for volumes, hosts, host
clusters, pools, or system offload commands.
When the IOPS limit is configured on a volume and it is smaller than 100 IOPS, the throttling
logic rounds it to 100 IOPS. Even if the throttle is set to a value smaller than 100 IOPS, the
throttling occurs at 100 IOPS.
After any of the commands that were described thus far are used to set volume throttling, a
throttle object is created. Then, you can list your created throttle objects by using the
lsthrottle command and change their parameters with the chthrottle command.
Example 5-7 shows some command examples.
superuser>lsthrottle throttle0
id 0
throttle_name throttle0
object_id 52
object_name Vol01
throttle_type vdisk
IOPs_limit 1000
bandwidth_limit_MB 100
For more information and the procedure to set volume throttling, see IBM Documentation.
354 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
By default, when a volume is created, the cache mode is set to readwrite. Disabling cache
can affect performance and increase read/write response time.
Figure 5-19 shows write operation behavior when a volume cache is activated (readwrite).
Figure 5-20 shows a write operation behavior when volume cache is deactivated (none).
In most cases, the volume with readwrite cache mode is recommended because disabling
cache for a volume can result in performance issues to the host. However, some specific
scenarios exist in which it is recommended to disable the readwrite cache.
You might use cache-disabled (none) volumes when you have remote copy or FlashCopy in a
back-end storage controller, and these volumes are virtualized in IBM FlashSystem devices
as image virtual disks (VDisks). Another possible usage of a cache-disabled volume is when
intellectual capital is in Copy Services automation scripts. Keep the usage of cache-disabled
volumes to a minimum for normal workloads.
You can also use cache-disabled volumes to control the allocation of cache resources. By
disabling the cache for specific volumes, more cache resources are available to cache I/Os to
other volumes in the same I/O group. An example of this usage is a non-critical application
that uses volumes in MDisks from all-flash storage.
By default, volumes are created with cache mode enabled (read/write); however, you can
specify the cache mode when the volume is created by using the -cache option.
The cache mode of a volume can be concurrently changed (with I/O) by using the chvdisk
command or using the GUI by selecting Volumes → Volumes → Actions → Cache Mode.
Figure 5-21 on page 357 shows the editing cache mode for a volume.
356 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
The CLI does not fail I/O to the user, and the command must be allowed to run on any
volume. If used correctly without the -force flag, the command does not result in a corrupted
volume. Therefore, the cache must be flushed and you must discard cache data if the user
disables cache on a volume.
Example 5-8 shows an image volume VDISK_IMAGE_1 that changed the cache parameter after
it was created.
cache none
.
lines removed for brevity
In an environment with copy services (FlashCopy, MM, GM, and volume mirroring) and typical
workloads, disabling IBM FlashSystem cache is detrimental to overall performance.
Attention: Carefully evaluate the effect to the entire system with quantitative analysis
before and after making this change.
To prevent an active volume from being deleted unintentionally, administrators must enable
volume protection. They also can specify a period that the volume must be idle before it can
be deleted. If volume protection is enabled and the period is not expired, the volume deletion
fails, even if the -force parameter is used.
When you delete a volume, the system verifies whether it is a part of a host mapping,
FlashCopy mapping, or remote copy relationship. In these cases, the system fails to delete
the volume unless the -force parameter is specified. However, if volume protection is
enabled, the -force parameter does not delete a volume if it has I/O activity in the last
minutes that are defined in the protection duration time in volume protection.
Note: The -force parameter overrides the volume dependencies, not the volume
protection setting. Volume protection must be disabled to permit a volume or host-mapping
deletion if the volume had recent I/O activity.
If you want volume protection enabled in your system but disabled in a specific storage pool,
run the following command:
chmdiskgrp -vdiskprotectionenabled no <pool_name_or_ID>
358 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
You can also manage volume protection in the GUI by selecting Settings → System →
Volume Protection, as shown in Figure 5-22.
Expanding a volume
You can expand volumes for the following reasons:
To increase the available capacity on a specific volume that is mapped to a host.
To increase the size of a volume to make it match the size of the source or master volume
so that it can be used in a FlashCopy mapping or MM relationship.
Shrinking a volume
Volumes can be reduced in size if necessary. If a volume does not contain any data, it is
unlikely that you will encounter any issues when shrinking its size. However, if a volume is in
use and contains data, do not shrink its size because IBM Storage Virtualize is unaware of
whether it is removing used or non-used capacity.
Attention: When you shrink a volume, capacity is removed from the end of the disk,
whether that capacity is in use. Even if a volume includes free capacity, do not assume that
only unused capacity is removed when you shrink a volume.
360 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch5 FS-VOLUMES.fm
Figure 5-24 shows the Modify Volume Capacity window, which you can use to shrink
volumes.
362 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
This chapter provides an overview and best practices of IBM Storage Virtualize Copy
Services capabilities, including FlashCopy, Metro Mirror (MM) and Global Mirror (GM), and
volume mirroring.
6.1.1 FlashCopy
FlashCopy is a function that you can use to create a point-in-time copy of one of your
volumes. This function is helpful when performing backups or application testing. These
copies can be cascaded on one another, read from, written to, and even reversed. These
copies can conserve storage (if needed) by being space-efficient copies that record only
items that changed from the originals instead of full copies.
Primarily, you use this function to insulate hosts from the failure of a storage pool and from the
failure of a back-end disk subsystem. During a storage pool failure, the system continues to
provide service for the volume from the other copy on the other storage pool, with no
disruption to the host.
You can also use volume mirroring to change the capacity savings of a volume and migrate
data between storage pools of different extent sizes and characteristics.
364 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
You can use FlashCopy to solve critical and challenging business needs that require
duplication of data of your source volume. Volumes can remain online and active while you
create consistent copies of the data sets. Because the copy is performed at the block level, it
operates below the host operating system and its cache. Therefore, the copy is not apparent
to the host.
Important: Because FlashCopy operates at the block level below the host operating
system and cache, those levels do need to be flushed for consistent FlashCopy copies.
While the FlashCopy operation is performed, the source volume is stopped briefly to initialize
the FlashCopy bitmap, and then input/output (I/O) can resume. Although several FlashCopy
options require the data to be copied from the source to the target in the background, which
can take time to complete, the resulting data on the target volume is presented so that the
copy appears to complete immediately.
This process is performed by using a bitmap (or bit array) that tracks changes to the data after
the FlashCopy is started, and an indirection layer that enables data to be read from the
source volume transparently.
The business applications for FlashCopy are wide-ranging. In the following sections, a short
description of the most common use cases is provided.
After the FlashCopy is performed, the resulting image of the data can be backed up to tape as
though it were the source system. After the copy to tape is complete, the image data is
redundant, and the target volumes can be discarded. For time-limited applications, such as
these examples, “no copy” or incremental FlashCopy is used most often. The usage of these
methods puts less load on your infrastructure.
When FlashCopy is used for backup purposes, the target data usually is managed as
read-only at the operating system level. This approach provides extra security by ensuring
that your target data was not modified and remains true to the source.
This approach can be used for various applications, such as recovering your production
database application after an errant batch process that caused extensive damage.
Best practices: Although restoring from FlashCopy is quicker than a traditional tape
media restore, do not use restoring from FlashCopy as a substitute for good archiving
practices. Instead, keep one to several iterations of your FlashCopy copies so that you can
near-instantly recover your data from the most recent history. Keep your long-term archive
for your business.
In addition to the restore option, which copies the original blocks from the target volume to
modified blocks on the source volume, the target can be used to perform a restore of
individual files. To do that task, you must make the target available on a host. Do not make the
target available to the source host because seeing duplicates of disks causes problems for
most host operating systems. Copy the files to the source by using the normal host data copy
methods for your environment.
Use case: FlashCopy can be used to migrate volumes from and to data reduction pools
(DRPs), which do not support extent-based migrations.
This method differs from the other migration methods, which are described later in this
chapter. Common uses for this capability are host and back-end storage hardware refreshes.
Create a FlashCopy of your source and use it for your testing. This copy is a duplicate of your
production data down to the block level so that even physical disk IDs are copied. Therefore, it
is impossible for your applications to tell the difference.
366 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Cyber resiliency
FlashCopy is the foundation of the IBM Storage Virtualize Safeguarded Copy function that can
create cyber-resilient point-in-time copies of volumes that cannot be changed or deleted
through user errors, malicious actions, or ransomware attacks. For more information, see 6.3,
“Safeguarded Copy” on page 321.
To start a FlashCopy operation, a relationship between the source and the target volume
must be defined. This relationship is called FlashCopy mapping.
FlashCopy mappings can be stand-alone or a member of a consistency group (CG). You can
perform the actions of preparing, starting, or stopping FlashCopy on either a stand-alone
mapping or a CG.
Note: Starting from IBM Storage Virtualize 8.4.2, the maximum number of FlashCopy
mappings per system is 15 864.a
a. Applies only to IBM Storwize V7000 Gen3, IBM FlashSystem 7200, 9100, and 9200, and
IBM SAN Volume Controller DH8, SV1, SA2, SV2, and SV3.
A FlashCopy mapping has a set of attributes and settings that define the characteristics and
the capabilities of the FlashCopy.
Background copy
The background copy rate is a property of a FlashCopy mapping that you use to specify
whether a background physical copy of the source volume to the corresponding target volume
occurs. A value of 0 disables the background copy. If the FlashCopy background copy is
disabled, only data that changed on the source volume is copied to the target volume. A
FlashCopy with background copy disabled is also known as no-copy FlashCopy.
The benefit of using a FlashCopy mapping with background copy enabled is that the target
volume becomes a real clone (independent from the source volume) of the FlashCopy
mapping source volume after the copy completes. When the background copy function is not
performed, the target volume remains a valid copy of the source data while the FlashCopy
mapping remains in place.
Valid values for the background copy rate are 0 - 150. The background copy rate can be
defined and changed dynamically for individual FlashCopy mappings.
Table 6-1 lists the relationship of the background copy rate value to the attempted amount of
data to be copied per second.
Table 6-1 Relationship between the rate and data rate per second
Value Data copied per second
01 - 10 0128 KB
11 - 20 0256 KB
21 - 30 0512 KB
31 - 40 0001 MB
41 - 50 0002 MB
51 - 60 0004 MB
61 - 70 0008 MB
71 - 80 0016 MB
81 - 90 0032 MB
91 - 100 0064 MB
101-110 0128 MB
111-120 0256 MB
121-130 0512 MB
131-140 1024 MB
141-150 2048 MB
Note: To ensure optimal performance of all IBM Storage Virtualize features, it is a best
practice to not exceed a copyrate value of 130.
When CGs are used, the FlashCopy commands are issued to the CGs. The groups
simultaneously perform the operation on all FlashCopy mappings that are contained within
the CGs.
368 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
FC_CG_1
PRODUCTION BACKUP
point in time
Volume S1 Volume T1
FC_Mapping 1
FC_Source_1 FC_Target_1
point in time
Volume S2 Volume T2
FC_Mapping 2
FC_Source_2 FC_Target_2
Legend
FC = FlashCopy
CG = Consistency Group
Incremental FlashCopy
By using Incremental FlashCopy, you can reduce the required time to copy. Also, because
less data must be copied, the workload that is put on the system and the back-end storage is
reduced.
Incremental FlashCopy does not require that you copy an entire disk source volume
whenever the FlashCopy mapping is started. Instead, only the changed regions on source
volumes are copied to target volumes, as shown in Figure 6-3.
If the FlashCopy mapping was stopped before the background copy completed, then when
the mapping is restarted, the data that was copied before the mapping stopped will not be
copied again. For example, if an incremental mapping reaches 10 percent progress when it is
stopped and then it is restarted, that 10 percent of data will not be recopied when the
mapping is restarted, assuming that it was not changed.
A difference value is provided in the query of a mapping, which makes it possible to know how
much data changed. This data must be copied when the Incremental FlashCopy mapping is
restarted. The difference value is the percentage (0 - 100 percent) of data that changed. This
data must be copied to the target volume to get a fully independent copy of the source
volume.
370 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Volume S1
FC_Source_1
Up to 256 different mappings are possible for each source volume. These mappings are
independently controllable from each other. Multiple target FlashCopy mappings can be
members of the same or different CGs. In cases where all the mappings are in the same CG,
the result of starting the CG will be to FlashCopy to multiple identical target volumes.
Cascaded FlashCopy
With Cascaded FlashCopy, you can have a source volume for one FlashCopy mapping and
as the target for another FlashCopy mapping, which is referred to as a Cascaded FlashCopy.
This function is illustrated in Figure 6-5.
Reverse FlashCopy
Reverse FlashCopy enables FlashCopy targets to become restore points for the source
without breaking the FlashCopy relationship, and without having to wait for the original copy
operation to complete. It can be used with the Multiple Target FlashCopy to create multiple
rollback points.
A key advantage of the Multiple Target reverse FlashCopy function is that the reverse
FlashCopy does not destroy the original target. This feature enables processes that are using
the target, such as a tape backup, to continue uninterrupted. IBM Storage Virtualize also
allows you to create an optional copy of the source volume to be made before the reverse
copy operation starts. This ability to restore back to the original source data can be useful for
diagnostic purposes.
Thin-provisioned FlashCopy
When a new volume is created, you can designate it as a thin-provisioned volume, and it has
a virtual capacity and a real capacity.
Virtual capacity is the volume storage capacity that is available to a host. Real capacity is the
storage capacity that is allocated to a volume copy from a storage pool. In a fully allocated
volume, the virtual capacity and real capacity are the same. However, in a thin-provisioned
volume, the virtual capacity can be much larger than the real capacity.
The virtual capacity of a thin-provisioned volume is typically larger than its real capacity. On
IBM Storage Virtualize systems, the real capacity is used to store data that is written to the
volume, and metadata that describes the thin-provisioned configuration of the volume. As
more information is written to the volume, more of the real capacity is used.
Thin-provisioned volumes also can help to simplify server administration. Instead of assigning
a volume with some capacity to an application and increasing that capacity following the
needs of the application if those needs change, you can configure a volume with a large
virtual capacity for the application. Then, you can increase or shrink the real capacity as the
application’s needs change, without disrupting the application or server.
When you configure a thin-provisioned volume, you can use the warning level attribute to
generate a warning event when the used real capacity exceeds a specified amount or
percentage of the total real capacity. For example, if you have a volume with 10 GB of total
capacity and you set the warning to 80 percent, an event is registered in the event log when
you use 80 percent of the total capacity. This technique is useful when you need to control
how much of the volume is used.
If a thin-provisioned volume does not have enough real capacity for a write operation, the
volume is taken offline and an error is logged (error code 1865, event ID 060001). Access to
the thin-provisioned volume is restored by either increasing the real capacity of the volume or
increasing the size of the storage pool on which it is allocated.
You can use thin volumes for Cascaded FlashCopy and Multiple Target FlashCopy. It is also
possible to mix thin-provisioned with normal volumes. Thin-provisioning can be used for
incremental FlashCopy too, but using thin-provisioned volumes for incremental FlashCopy
makes sense only if the source and target are thin-provisioned.
372 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
A FlashCopy mapping has an attribute that represents the state of the mapping. The
FlashCopy states are the following ones:
Idle_or_copied
Copying
Stopped
Stopping
Suspended
Preparing
Prepared
Idle_or_copied
Read/write caching is enabled for both the source and the target. A FlashCopy mapping
exists between the source and target, but the source and target behave as independent
volumes in this state.
Copying
The FlashCopy indirection layer (see “Indirection layer” on page 306) governs all I/O to the
source and target volumes while the background copy is running. The background copy
process is copying grains from the source to the target. Reads/writes run on the target as
though the contents of the source were instantaneously copied to the target during the
running of the startfcmaporstartfcconsistgrp command. The source and target can be
independently updated. Internally, the target depends on the source for certain tracks.
Read/write caching is enabled on the source and the target.
Stopped
The FlashCopy was stopped either by a user command or by an I/O error. When a FlashCopy
mapping is stopped, the integrity of the data on the target volume is lost. Therefore, while the
FlashCopy mapping is in this state, the target volume is in the Offline state. To regain access
to the target, the mapping must be started again (the previous point-in-time is lost) or the
FlashCopy mapping must be deleted. The source volume is accessible, and read/write
caching is enabled for the source. In the Stopped state, a mapping can either be prepared
again or deleted.
Stopping
The mapping is transferring data to a dependent mapping. The behavior of the target volume
depends on whether the background copy process completed while the mapping was in the
Copying state. If the copy process completed, the target volume remains online while the
stopping copy process completes. If the copy process did not complete, data in the cache is
discarded for the target volume. The target volume is taken offline, and the stopping copy
process runs. After the data is copied, a stop complete asynchronous event notification is
issued. The mapping moves to the Idle/Copied state if the background copy completed or to
the Stopped state if the background copy did not complete. The source volume remains
accessible for I/O.
Suspended
The FlashCopy was in the Copying or Stopping state when access to the metadata was lost.
As a result, both the source and target volumes are offline and the background copy process
halted. When the metadata becomes available again, the FlashCopy mapping returns to the
Copying or Stopping state. Access to the source and target volumes is restored, and the
background copy or stopping process resumes. Unflushed data that was written to the source
or target before the FlashCopy was suspended is pinned in cache until the FlashCopy
mapping leaves the Suspended state.
Preparing
The FlashCopy is preparing the mapping. While in this state, data from cache is destaged to
disk and a consistent copy of the source exists on disk. Now, cache is operating in
write-through mode and writes to the source volume experience more latency. The target
volume is reported as online, but it does not perform reads or writes. These reads/writes are
failed by the Small Computer System Interface (SCSI) front end.
374 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Before starting the FlashCopy mapping, it is important that any cache at the host level, for
example, buffers on the host operating system or application, are instructed to flush any
outstanding writes to the source volume. Performing the cache flush that is required as part of
the startfcmap or startfcconsistgrp command causes I/Os to be delayed while waiting for
the cache flush to complete. To overcome this problem, FlashCopy supports the
prestartfcmap or prestartfcconsistgrp commands. These commands prepare for a
FlashCopy start while still allowing I/Os to continue to the source volume.
In the Preparing state, the FlashCopy mapping is prepared by the following steps:
1. Flush any modified write data that is associated with the source volume from the cache.
Read data for the source is left in the cache.
2. Place the cache for the source volume into write-through mode so that subsequent writes
wait until data is written to disk before completing the write command that is received from
the host.
3. Discard any read or write data that is associated with the target volume from the cache.
Prepared
While in the Prepared state, the FlashCopy mapping is ready to perform a start. While the
FlashCopy mapping is in this state, the target volume is in the Offline state. In the Prepared
state, writes to the source volume experience more latency because the cache is operating in
write-through mode.
Figure 6-6 represent the FlashCopy mapping state diagram. It illustrates the states in which a
mapping can exist, and which events are responsible for a state change.
A FlashCopy bitmap takes up the bitmap space in the memory of the I/O group that must be
shared with bitmaps of other features (such as remote copy bitmaps, volume mirroring
bitmaps, and redundant array of independent disks (RAID) bitmaps).
Indirection layer
The FlashCopy indirection layer governs the I/O to the source and target volumes when a
FlashCopy mapping is started. This process is done by using a FlashCopy bitmap. The
purpose of the FlashCopy indirection layer is to enable both the source and target volumes for
read/write I/O immediately after FlashCopy starts.
The following description illustrates how the FlashCopy indirection layer works when a
FlashCopy mapping is prepared and then started.
When a FlashCopy mapping is prepared and started, the following sequence is applied:
1. Flush the write cache to the source volume or volumes that are part of a CG.
2. Put the cache into write-through mode on the source volumes.
3. Discard the cache for the target volumes.
4. Establish a sync point on all of the source volumes in the CG (creating the FlashCopy
bitmap).
5. Ensure that the indirection layer governs all the I/O to the source volumes and target.
6. Enable the cache on source volumes and target volumes.
FlashCopy provides the semantics of a point-in-time copy that uses the indirection layer,
which intercepts I/O that is directed at either the source or target volumes. The act of starting
a FlashCopy mapping causes this indirection layer to become active in the I/O path, which
occurs automatically across all FlashCopy mappings in the CG. Then, the indirection layer
determines how each I/O is routed based on the following factors:
The volume and the logical block address (LBA) to which the I/O is addressed.
Its direction (read or write).
The indirection layer allows an I/O to go through the underlying volume, which preserves the
point-in-time copy. To do that task, the IBM Storage Virtualize code uses two mechanisms:
Copy-on-write (CoW): With this mechanism, when a write operation occurs in the source
volume, a portion of data (grain) containing the data to be modified is copied to the target
volume before the operation completion.
Redirect-on-write (RoW): With this mechanism, when a write operation occurs in the
source volume, the data to be modified is written in another area, leaving the original data
unmodified to be used by the target volume.
IBM Storage Virtualize implements CoW and RoW logics transparently to the user with the
aim to optimize the performance and capacity. By using the RoW mechanism, the
performance can improve by reducing the number of physical I/Os for the write operations
while a significant capacity savings can be achieved by improving the overall deduplication
ratio.
376 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
The RoW was introduced with IBM Storage Virtualize 8.4 and is used in the following
conditions:
Source and target volumes in the same pool.
Source and target volumes in the same I/O group.
The pool that contains the source and target volumes must be a DRP.
Source and target volumes do not participate in a volume mirroring relationship.
Source and target volumes are not fully allocated.
In all the cases in which the RoW in not applicable, the CoW is used.
Source No Read from the source Copy the grain to the most
volume. recently started target for this
source, and then write to the
source.
Target No If any newer targets exist for Hold the write. Check the
this source in which this grain dependency target volumes
already was copied, read to see whether the grain was
from the oldest of these copied. If the grain is not
targets. Otherwise, read from already copied to the next
the source. oldest target for this source,
copy the grain to the next
oldest target. Then, write to
the target.
Yes Read from the target volume. Write to the target volume.
The CoW process might introduce significant latency into write operations. To isolate the
active application from this additional latency, the FlashCopy indirection layer is placed
logically between the upper and lower cache. Therefore, the additional latency that is
introduced by the CoW process is encountered only by the internal cache operations, and not
by the application.
378 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
The logical placement of the FlashCopy indirection layer is shown in Figure 6-8.
Also, preparing FlashCopy is fast because the upper cache write data does not have to go
directly to back-end storage, but to the lower cache layer only.
As a result of this sequence of events, the configuration in Figure 6-9 has the following
characteristics:
Target 1 depends on Target 2 and Target 3. It remains dependent until all of Target 1 is
copied. No target depends on Target 1, so the mapping can be stopped without needing to
copy any data to maintain the consistency in the other targets.
Target 2 depends on Target 3, and remains dependent until all of Target 2 is copied. Target
1 depends on Target 2, so if this mapping is stopped, the cleanup process is started to
copy all data that is uniquely held on this mapping (that is ty ) to Target 1.
380 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Target 3 does not depend on any target, but it has Target 1 and Target 2 depending on it,
so if this mapping is stopped, the cleanup process is started to copy all data that is
uniquely held on this mapping (that is tz ) to Target 2.
FlashCopy targets per source 256 This maximum is the maximum number of
FlashCopy mappings that can exist with the same
source volume.
FlashCopy mappings per system 15864 This maximum is the maximum number of
FlashCopy mappings per system.
FlashCopy CGs per system 500 This maximum is an arbitrary limit that is policed
by the software.
FlashCopy volume space per I/O 4096 TB This maximum is a limit on the quantity of
group FlashCopy mappings by using bitmap space from
one I/O group.
FlashCopy mappings per CG 512 This limit is due to the time that is taken to prepare
a CG with many mappings.
Configuration limits: The configuration limits always change with the introduction of new
hardware and software capabilities. For more information about the latest configuration
limits, see this IBM Support web page.
The total amount of cache memory that is reserved for the FlashCopy bitmaps limits the
amount of capacity that can be used as a FlashCopy target. Table 6-4 shows the relationship
of bitmap space to FlashCopy address space, depending on the size of the grain and the kind
of FlashCopy service being used.
Table 6-4 Relationship of bitmap space to FlashCopy address space for the I/O group
Copy service Grain size (KB) 1 MB of memory provides the following
volume capacity for the specified I/O group
Mapping consideration: For multiple FlashCopy targets, you must consider the number of
mappings. For example, for a mapping with a 256 KB grain size, 8 KB of memory allows
one mapping between a 16 GB source volume and a 16 GB target volume. Alternatively,
the same amount of memory and the same grainsize allows two mappings, between one
8 GB source volume and two 8 GB target volumes. You need to consider the total target
capacity when doing bitmap calculations, not the source.
When you create a FlashCopy mapping, if you specify an I/O group other than the I/O
group of the source volume, the memory accounting goes toward the specified I/O group,
not toward the I/O group of the source volume.
The default amount of memory for FlashCopy is 20 MB. This value can be increased or
decreased by using the chiogrp command or through the GUI. The maximum amount of
memory that can be specified for FlashCopy is 2048 MB (512 MB for 32-bit systems). The
maximum combined amount of memory across all copy services features is 2600 MB
(552 MB for 32-bit systems).
Bitmap allocation: When creating a FlashCopy mapping, you can optionally specify the
I/O group where the bitmap is allocated. If you specify an I/O group other than the
I/O group of the source volume, the memory accounting goes toward the specified I/O
group, not toward the I/O group of the source volume. This option can be useful when an
I/O group is exhausting the memory that is allocated to the FlashCopy bitmaps and no
more free memory is available in the I/O group.
382 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
The following restrictions apply when expanding or shrinking volumes that are defined in a
FlashCopy mapping:
– Target volumes cannot be shrunk.
– A source volume can be shrunk, but only to the largest starting size of a target volume
(in a multiple target or cascading mappings) when in the copying or stopping state.
– Source and target volumes must be same size when the mapping is prepared or
started.
– Source and target volumes can be expanded in any order except for incremental
FlashCopy, where the target volume must be expanded before the source volume can
be expanded.
In a cascading FlashCopy, the grain size of all the FlashCopy mappings that participate
must be the same.
In a multi-target FlashCopy, the grain size of all the FlashCopy mappings that participate
must be the same.
In a reverse FlashCopy, the grain size of all the FlashCopy mappings that participate must
be the same.
No FlashCopy mapping can be added to a CG while the FlashCopy mapping status is
Copying.
No FlashCopy mapping can be added to a CG while the CG status is Copying.
Using CGs is restricted when using cascading FlashCopy. A CG starts FlashCopy
mappings at the same point in time. Within the same CG, it is not possible to have
mappings with these conditions:
– The source volume of one mapping is the target of another mapping.
– The target volume of one mapping is the source volume for another mapping.
These combinations are not useful because within a CG, mappings cannot be established
in a certain order. This limitation renders the content of the target volume undefined. For
example, it is not possible to determine whether the first mapping was established before
the target volume of the first mapping that acts as a source volume for the second
mapping.
Even if it were possible to ensure the order in which the mappings are established within a
CG, the result is equal to multiple target FlashCopy (two volumes holding the same target
data for one source volume). In other words, a cascade is useful for copying volumes in a
certain order (and copying the changed content targets of FlashCopy copies) rather than
at the same time in an undefined order (from within one single CG).
Source and target volumes can be used as primary in a remote copy relationship. For
more information about the FlashCopy and the remote copy possible interactions, see
“Interaction between remote copy and FlashCopy” on page 362.
FlashCopy presets
The IBM Storage Virtualize GUI provides three FlashCopy presets (Snapshot, Clone, and
Backup) to simplify the more common FlashCopy operations. Figure 6-10 shows the preset
selection window in the GUI.
Although these presets meet most FlashCopy requirements, they do not support all possible
FlashCopy options. If more specialized options are required that are not supported by the
presets, the options must be performed by using command-line interface (CLI) commands.
This section describes the three preset options and their use cases.
Snapshot
This preset creates a CoW or RoW point-in-time copy. For more information about RoW
prerequisites, see “Indirection layer” on page 306.
The snapshot is not intended to be an independent copy. Instead, the copy is used to
maintain a view of the production data at the time that the snapshot is created. Therefore, the
snapshot holds only the data from regions of the production volume that have changed since
the snapshot was created. Because the snapshot preset uses thin-provisioning, only the
capacity that is required for the changes is used.
384 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
A typical use case for the snapshot is when the user wants to produce a copy of a volume
without affecting the availability of the volume. The user does not anticipate many changes to
be made to the source or target volume. A significant proportion of the volumes remains
unchanged.
By ensuring that only changes require a copy of data to be made, the total amount of disk
space that is required for the copy is reduced. Therefore, many snapshot copies can be used
in the environment.
Snapshots are useful for providing protection against corruption or similar issues with the
validity of the data. However, they do not provide protection from physical controller failures.
Snapshots can also provide a vehicle for performing repeatable testing (including “what-if”
modeling that is based on production data) without requiring a full copy of the data to be
provisioned.
Clone
The clone preset creates a replica of the volume, which can then be changed without
affecting the original volume. After the copy completes, the mapping that was created by the
preset is automatically deleted.
A typical use case for the snapshot is when users want a copy of the volume that they can
modify without affecting the original volume. After the clone is established, there is no
expectation that it is refreshed, or that there is any further need to reference the original
production data again. If the source is thin-provisioned, the target is thin-provisioned for the
auto-create target.
Backup
The backup preset creates a point-in-time replica of the production data. After the copy
completes, the backup view can be refreshed from the production data, with minimal copying
of data from the production volume to the backup volume.
The backup preset can be used when the user wants to create a copy of the volume that can
be used as a backup if the source becomes unavailable. This unavailability can happen
during loss of the underlying physical controller. The user plans to periodically update the
secondary copy, and does not want to suffer from the resource demands of creating a copy
each time.
Incremental FlashCopy times are faster than full copy, which helps to reduce the window
where the new backup is not yet fully effective. If the source is thin-provisioned, the target is
also thin-provisioned in this option for the auto-create target.
Another use case, which is not supported by the name, is to create and maintain (periodically
refresh) an independent image. This image can be subjected to intensive I/O (for example,
data mining) without affecting the source volume’s performance.
Thin-provisioning considerations
When creating FlashCopy with thin-provisioned target volumes, the no-copy option often is
used. The real size of a thin-provisioned volume is an attribute that defines how much
physical capacity is reserved for the volume. The real size can vary 0 - 100% of the virtual
capacity.
Sizing consideration
When thin-provisioned FlashCopy is used, an estimation of the physical capacity
consumption is required. Consider that while a FlashCopy is active, the thin-provisioned
target volume allocates physical capacity whenever a grain is modified for the first time on
source or target volume.
The following factors must be considered that so that an accurate sizing can be completed:
The FlashCopy duration in terms of seconds (D).
The write operation per second (W).
The grain size in terms of KB (G).
The rewrite factor. This factor represents the average chance that a write operation
reoccurs in the same grain (R) in percentage.
Although the first three factors are easy to assess, the rewrite factor can be only roughly
estimated because it depends on the workload type and the FlashCopy duration. The used
capacity (CC) of a thin-provisioned target volume of C size while the FlashCopy is active can
be estimated by using the following equation:
CC = min{(W - W x R) x G x D,C}
For example, consider a 100 GB volume that has FlashCopy active for 3 hours (10.800
seconds) with a grain size of 64 K. Consider also a write workload of 100 input/output
operations per second (IOPS) with a rewrite factor of 85% (85% of writes occur on the same
grains). In this case, the estimation of the used capacity is:
CC = (100 - 85) x 64 x 10.800 = 10.368.000 KB = 9,88 GB
386 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
In incremental FlashCopy, the modified data is identified by using the bitmaps. The amount of
data to be copied when refreshing the mapping depends on the grain size. If the grain size is
64 KB, as compared to 256 KB, there might be less data to copy to get a fully independent
copy of the source again.
The exception is where the thin-provisioned target volume is going to become a production
volume (and likely to be subjected to ongoing heavy I/O). In this case, the 256 KB
thin-provisioned grain size is preferable because it provides better long-term I/O performance
at the expense of a slower initial copy.
Cascading FlashCopy and Multiple Target FlashCopy require that all the mappings that are
participating in the FlashCopy chain feature the same grain size. For more information, see
“FlashCopy general restrictions” on page 312.
Table 6-5 shows how the back-end I/O operations are distributed across the nodes.
Node that performs the Preferred node in Preferred node in Preferred node in the Preferred node in the
back-end I/O if the grain the source the target source volume’s I/O target volume’s I/O
is copied. volume’s I/O volume’s I/O group. group.
group. group.
Node that performs the Preferred node in Preferred node in The preferred node in The preferred node in
back-end I/O if the grain the source the source the source volume’s the source volume’s
is not yet copied. volume’s I/O volume’s I/O I/O group will I/O group will read,
group. group. read/write, and the and the preferred
preferred node in node in target
target volume’s I/O volume’s I/O group will
group will write. write.
The data transfer among the source and the target volume’s preferred nodes occurs through
the node-to-node connectivity. Consider the following volume placement alternatives:
Source and target volumes use the same preferred node.
In this scenario, the node that is acting as preferred node for the source and target
volumes manages all the read/write FlashCopy operations. Only resources from this node
are used for the FlashCopy operations, and no node-to-node bandwidth is used.
Source and target volumes use the different preferred node.
In this scenario, both nodes that are acting as preferred nodes manage read/write
FlashCopy operations according to the previously described scenarios. The data that is
transferred between the two preferred nodes goes through the node-to-node network.
Both alternatives that are described have advantages and disadvantages, but in general
option 1 (source and target volumes use the same preferred node) is preferred. Consider the
following exceptions:
A clustered IBM FlashSystem system with multiple I/O groups in HyperSwap, where the
source volumes are evenly spread across all the nodes.
In this case, the preferred node placement should follow the location on site B, and then
the target volumes preferred node must be in site B. Placing the target volumes preferred
node in site A causes the redirection of the FlashCopy write operation through the
node-to-node network.
A clustered IBM FlashSystem system with multiple control enclosures, where the source
volumes are evenly spread across all the canisters.
In this case, the preferred node placement should follow the location of source and target
volumes on the internal storage. For example, if the source volume is on the internal
storage that is attached to control enclosure A and the target volume is on internal storage
that is attached to control enclosure B, then the target volumes preferred node must be in
one canister of control enclosure B. Placing the target volumes preferred node on control
enclosure A causes the redirection of the FlashCopy write operation through the
node-to-node network.
388 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
DRP-optimized snapshots: To use the RoW capability that was introduced with
IBM Storage Virtualize 8.4, check the volume placement restrictions that are described in
“Indirection layer” on page 306.
If the copy process cannot achieve these goals, it starts contending resources that go to the
foreground I/O (that is, the I/O that is coming from the hosts). As a result, both background
copy and foreground I/O tend to see an increase in latency and a reduction in throughput
compared to the situation where the bandwidth is not limited. Degradation is graceful. Both
background copy and foreground I/O continue to progress, and do not stop, hang, or cause
the node to fail.
To avoid any impact on the foreground I/O, that is, in the hosts’ response time, carefully plan
the background copy activity by accounting for the overall workload running in the systems.
The background copy basically reads/writes data to managed disks (MDisks). Usually, the
most affected component is the back-end storage. CPU and memory are not normally
significantly affected by the copy activity.
The theoretical added workload due to the background copy is easily estimable. For example,
starting 20 FlashCopy copies, each with a background copy rate of 70, adds a maximum
throughput of 160 MBps for the reads and 160 MBps for the writes.
The source and target volumes distribution on the back-end storage determines where this
workload is going to be added. The duration of the background copy depends on the amount
of data to be copied. This amount is the total size of volumes for full background copy or the
amount of data that is modified for incremental copy refresh.
Performance monitoring tools like IBM Spectrum Control can be used to evaluate the existing
workload on the back-end storage in a specific time window. By adding this workload to the
foreseen background copy workload, you can estimate the overall workload running toward
the back-end storage. Disk performance simulation tools, like Disk Magic or IBM Storage
Modeller (StorM), can be used to estimate the effect, if any, of the added back-end workload
to the host service time during the background copy window. The outcomes of this analysis
can provide useful hints for the background copy rate settings.
When performance monitoring and simulation tools are not available, use a conservative and
progressive approach. Consider that the background copy setting can be modified at any
time, even when the FlashCopy already started. The background copy process can be
stopped by setting the background copy rate to 0.
Initially set the background copy rate value to add a limited workload to the back end (for
example, less than 100 MBps). If no effects on hosts are noticed, the background copy rate
value can be increased. Do this process until you see negative effects. The background copy
rate setting follows an exponential scale, so changing, for example, from 50 to 60 doubles the
data rate goal from 2 MBps to 4 MBps.
An interaction occurs between the background copy rate and the cleaning rate settings:
Background copy = 0 and cleaning rate = 0
No background copy or cleaning take place. When the mapping is stopped, it goes into the
stopping state and a cleaning process starts with the default cleaning rate, which is 50 or
2 MBps.
Background copy > 0 and cleaning rate = 0
The background copy takes place at the background copy rate, but no cleaning process
starts. When the mapping is stopped, it goes into the stopping state, and a cleaning
process starts with the default cleaning rate (50 or 2 MBps).
Background copy = 0 and cleaning rate > 0
No background copy takes place, but the cleaning process runs at the cleaning rate. When
the mapping is stopped, the cleaning completes (if not yet completed) at the cleaning rate.
Background copy > 0 and cleaning rate > 0
The background copy takes place at the background copy rate, but no cleaning process
starts. When the mapping is stopped, it goes into the stopping state, and a cleaning
process starts with the specified cleaning rate.
Regarding the workload considerations for the cleaning process, the same guidelines as for
background copy apply.
Both of these layers have various levels and methods of caching data to provide better speed.
Because IBM Storage Virtualize and FlashCopy sit below these layers, they are unaware of
the cache at the application or operating system layers.
To ensure the integrity of the copy that is made, it is necessary to flush the host operating
system and application cache for any outstanding reads or writes before the FlashCopy
operation is performed. Failing to flush the host operating system and application cache
produces what is referred to as a crash consistent copy.
The resulting copy requires the same type of recovery procedure, such as log replay and file
system checks, that is required following a host crash. FlashCopy copies that are
crash-consistent often can be used following file system and application recovery procedures.
Note: Although a best practice to perform FlashCopy is to flush the host cache first, some
companies, such as Oracle, support using snapshots without it, as described in Very Large
Database (VLDB) Backup & Recovery Best Practices.
390 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Various operating systems and applications provide facilities to stop I/O operations and
ensure that all data is flushed from the host cache. If these facilities are available, they can be
used to prepare for a FlashCopy operation. When this type of facility is not available, the host
cache must be flushed manually by quiescing the application and unmounting the file system
or drives.
Best practice: From a practical standpoint, when you have an application that is backed
by a database and you want to make a FlashCopy of that application’s data, it is sufficient
in most cases to use the write-suspend method that is available in most modern
databases. You can use this method because the database maintains strict control
over I/O.
This method is as opposed to flushing data from both the application and the backing
database, which is always the suggested method because it is safer. However, this method
can be used when facilities do not exist or your environment includes time sensitivity.
Automation manages safeguarded backups and restores and recovers data with the
integration of IBM Copy Services Manager (IBM CSM). IBM CSM automates the creation of
Safeguarded backups according to the schedule that is defined in a Safeguarded policy.
IBM CSM supports testing, restoring, and recovering operations with Safeguarded backups.
Catastrophic: You can recover the entire environment back to the point in time of the copy
because it is the only recovery option.
Offline backup: You can perform an offline backup of data from a consistent point-in-time
copy to build a second line of defense, which provides a greater retention period and
increased isolation and security.
Immutability
Immutability is defined by how easy it is to change, corrupt, or destroy data. Protection
against all forms of corruption becomes more critical because in addition to hardware or
software failures, corruption can be caused by inadvertent user error, malicious intent, or
cyberattack.
To keep data safe, IBM Storage Virtualize end-to-end enterprise cyber resiliency features
provide many options for protection systems and data from user errors, malicious destruction,
and ransomware attacks.
Because an unrelenting tide of data breaches is driving increased interest in providing secure
authentication across multicloud environments, IBM Storage Virtualize offers the powerful
data security function of IBM Safeguarded Copy. With this new technology, businesses can
prevent data tampering or deletion for any reason by enabling the creation of immutable
point-in-time copies of data for a production volume.
Isolation
Isolation is a term that means that the protected copies of data are isolated from the active
production data so that they cannot be corrupted by a compromised host system.
Safeguarded Backups are invisible to hackers, and are hidden and protected from being
modified or deleted by user error, malicious destruction, or ransomware attacks.
The data can be used only after a Safeguarded Backup is recovered to a separate recovery
volume. Recovery volumes can be accessed by using a recovery system that you use to
restore production data. Safeguarded Backups are a trusted and secure source of data that
can be used for forensic analysis or a surgical or catastrophic recovery.
392 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Figure 6-11 IBM Safeguarded copy provides logical corruption protection to protect sensitive point in
time copies of data
In this scenario, we have a Safeguarded Copy configuration with the production volumes, the
recovery volumes, and five Safeguarded Backups (SGC1 to SGC5), with SGC5 as the most
recent one, representing five recovery points. The recovery process (to point in time SGC2)
consists of the following steps:
1. IBM Storage Virtualize establishes a FlashCopy from the production volumes to the
recovery volumes, which make the recovery volumes identical to the production volumes.
2. IBM Storage Virtualize creates a recovery bitmap that indicates all data that was changed
since SGC2 and must be referenced from the logs SGC5, SGC4, SGC3, and SGC2,
rather than from the production volumes.
3. If the recovery system reads data from recovery volume, IBM Storage Virtualize examines
the recovery bitmap and decides whether it must fetch the requested data from production
volumes or from one of the CG logs.
Note: Safeguarded Copy is not a direct replacement for FlashCopy, and both can be used
as part of a cyber resilience solution. Unlike with FlashCopy, the recovery data is not
stored in separate regular volumes, but in a storage space that is called Safeguarded
Backup Capacity.
Recovering a backup
The following considerations apply to recovering a backup:
Recover is used when you want to test or run forensics against a backup.
When you issue the Recover Backup command, IBM CSM creates a set of new R1
recovery volumes that are used to create an image of the data that is contained in the
backup.
IBM CSM allows customers to attach hosts to the R1 volumes through IBM CSM, or they
can be attached through IBM Spectrum Virtualize.
IBM CSM creates the R1 volumes with a name that is the concatenation of the source
volume name and the time of the backup. With this approach, you can do quick filtering in
the IBM Storage Virtualize GUI.
By default, the recovery volumes are created in the Source pool for the H1 volumes, but on
the Recover Options tab customers can select an alternative pool where the R1 volumes
will be created.
When creating R1 volumes, a provisioning policy can be used on the pool to define the
default characteristics of the volume. By default, the volumes are created as thin volumes.
Restoring a backup
The following considerations apply to restoring a backup:
Restore is used when you want to restore your source production volumes (H1) to the
image of the data in the backup.
Although not required, it is a best practice that you recover a backup, and test the contents
before issuing the restore backup command.
The restore backup replaces all content on the H1 source volumes with the content in the
backup.
For added protection, you can enable Dual Control in IBM CSM to ensure that two users
approve a restore before it occurs.
After a restore is issued, the session details panel indicates the time to which the H1s
were restored.
394 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Note: IBM CSM is required for automating taking, maintaining, restoring, recovering, and
deleting the Safeguarded copies. IBM CSM must be licensed.
Whether building out a new solution or adding to an existing HADR solution, consider the
following points if you must plan and prepare for the use cases that are described in 6.3.1,
“Safeguarded Copy use cases” on page 321:
Identify the potential problem or exposure that you must solve, for example, protect
against inadvertent deletion, malicious destruction, selective manipulation, or ransomware
attack.
Identify the data that you need to protect, for example, mission- or business-critical only or
the entire environment or a subset.
Specify the frequency that you need to take Safeguarded backup copies, for example, take
one every 10 or 30 minutes or every 1, 3, 5, or 24 hours.
Identify how long do you need to keep the Safeguarded backup copies, for example, keep
them for 1 day, 2 days, 5 days, 1 week, or 1month.
Determine how to perform validation and forensics for logical corruption detection.
Decide on the servers that you will use for validation and forensics, and which ones you
intend to use for recovery and run production after the data is recovered.
Plan and prepare for surgical recovery and catastrophic recovery.
Determine the frequency of the offline backups, retaining period, backups location, and
required speed of recovery requirements from offline backups.
Determine the current or planned HADR solution and how is it managed, for example, by
using IBM CSM or a script.
Decide on how should Safeguarded backup copies be implemented, for example, at the
production data center, the DR data center, both, or at another site.
Decide whether virtual or physical isolation is required.
Be aware of the IBM Storage Virtualize maximum object and FlashCopy limits
(256 FlashCopy mappings per source volume).
If a Safeguarded Copy Backup Policy creates a Safeguarded backup copy every hour and
keeps the copies safe for 12 days, you would hit the 256 maximum FlashCopy mappings
in 256 hours from the scheduled start time (1 backup every hour and 24 backups per day,
for 12 days is (24 x 12) 360 backups, which exceeds the 256 limit).
Determine the current or planned HADR solution. The type of isolation determines the
topology, for example, 2-site, 3-site, and so on.
396 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
The IBM FlashSystem Cyber Vault solution complements IBM Safeguarded Copy.
IBM FlashSystem Cyber Vault automatically scans the copies that are created regularly by
Safeguarded Copy and looks for signs of data corruption that might be introduced by malware
or ransomware. This scan serves two purposes:
It can help identify a classic ransomware attack rapidly after it starts.
It can help identify which data copies were not affected by an attack.
Armed with this information, customers are positioned to more quickly identify that an attack
is underway and rapidly identify and recover a clean copy of their data. For more information,
see IBM Spectrum Virtualize, IBM FlashSystem, and IBM SAN Volume Controller Security
Feature Checklist, REDP-5678.
MM is designed for metropolitan distances with a zero recovery point objective (RPO) to
achieve zero data loss. This objective is achieved with a synchronous copy of volumes. Writes
are not acknowledged to host until they are committed to both storage systems. By definition,
any vendors’ synchronous replication makes the host wait for write I/Os to complete at both
the local and remote storage systems. Round-trip replication network latencies are added to
source volume response time.
GM technologies are designed to minimize the effect of network latency on source volume by
replicating data asynchronously. IBM Storage Virtualize provides two types of asynchronous
mirroring technology:
Standard GM
GMCV
With GM, writes are acknowledged as soon as they can be committed to the local storage
system. At the same time, they are sequence-tagged and passed on to the replication
network. This technique allows GM to be used over longer distances. By definition, any
vendors’ asynchronous replication results in an RPO greater than zero. However, for GM, the
RPO is small, typically anywhere from several milliseconds to some number of seconds.
Although GM is asynchronous, it tries to achieve near-zero RPO. Hence, the network and the
remote storage system must be able to cope with peaks in traffic.
GMCV can replicate point-in-time copies of volumes. This option generally requires lower
bandwidth because it is the average rather than the peak throughput that must be
accommodated. The RPO for GMCV is higher than traditional GM.
Starting with IBM Storage Virtualize 8.4.2, the Nondisruptive Volume Migration capability was
introduced. This feature uses the remote copy capabilities to transparently move host
volumes between IBM Storage Virtualize based systems.
398 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
400 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Partnerships are established between two systems by issuing the mkfcpartnership (for an
FC-based partnership) or mkippartnership (for an IP-based partnership) command once
from each end of the partnership. The following parameters must be specified:
Management IPs is required. Both IPv4 and IPv6 supported.
The remote system name (or ID).
The link bandwidth (in Mbps).
The background copy rate as a percentage of the link bandwidth.
The background copy parameter that determines the maximum speed of the initial
synchronization and resynchronization of the relationships.
In addition to the background copy rate setting, the initial synchronization can be adjusted at
the relationship level with the relationship_bandwidth_limit parameter. The
relationship_bandwidth_limit command is a system-wide parameter that sets the
maximum bandwidth that can be used to initially synchronize a single relationship.
After the initial synchronization is complete, you can change the copy direction (see
Figure 6-14) by switching the roles of the primary and secondary. The ability to change roles
is used to facilitate DR.
Master Auxillary
volume volume
Copy direction
Role Role
Primary Secondary
Role Role
Secondary Primary
Copy direction
Attention: When the direction of the relationship is changed, the primary and secondary
roles of the volumes are altered, which changes the read/write properties of the volume.
The master volume takes on a secondary role and becomes read-only, and the auxiliary
volume takes on the primary role and facilitates read/write access.
Consistency groups
A CG is a collection of relationships that can be treated as one entity. This technique is used
to preserve write order consistency across a group of volumes that pertain to one application,
for example, a database volume and a database log file volume.
After a remote copy relationship is added into a CG, you cannot manage the relationship in
isolation from the CG. Issuing any command that can change the state of the relationship fails
if it is run on an individual relationship that is already part of a CG. For example, a
stoprcrelationship command on the stand-alone volume would fail because the system
knows that the relationship is part of a CG.
402 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Like remote copy relationships, remote copy CG also assigns the role of master to the source
storage system and auxiliary to the target storage system.
Consistency group consideration: A CG relationship does not have to directly match the
I/O group number at each site. A CG that is owned by I/O group 1 at the local site does not
have to be owned by I/O group 1 at the remote site. If you have more than one I/O group at
either site, you can create the relationship between any two I/O groups. This technique
spreads the workload, for example, from local I/O group 1 to remote I/O group 2.
Streams
CGs can be used as a way to spread replication workload across multiple streams within a
partnership.
Any volume that is not in a CG also goes into stream0. You might want to consider creating an
empty CG 0 so that stand-alone volumes do not share a stream with active CG volumes.
You can optimize your streams by creating more CGs. Within each stream, each batch of
writes must be processed in tag sequence order and any delays in processing any particular
write also delays the writes behind it in the stream. Having more streams (up to 16) reduces
this kind of potential congestion.
Layer concept
The layer is an attribute of IBM Storage Virtualize based systems that allow you to create
partnerships among different IBM Storage Virtualize products. The key points concerning
layers are listed here:
SVC is always in the Replication layer.
By default, IBM FlashSystem products are in the Storage layer. A user can change it to the
Replication layer.
A system can form partnerships with only systems in the same layer.
An SVC can virtualize an IBM FlashSystem system only if the IBM FlashSystem is in the
Storage layer.
An IBM FlashSystem system in the Replication layer can virtualize an IBM FlashSystem
system in the Storage layer.
Generally, changing the layer is performed only at initial setup time or as part of a major
reconfiguration. To change the layer of an IBM Storage Virtualize system, the system must
meet the following preconditions:
The IBM Storage Virtualize system must not have IBM Spectrum Virtualize, Storwize, or
IBM FlashSystem host objects that are defined, and it must not be virtualizing any other
IBM Storage Virtualize system.
The IBM Storage Virtualize system must not be visible to any other IBM Storage Virtualize
system in the SAN fabric.
The IBM Storage Virtualize system must not have any system partnerships defined. If it is
already using MM or GM, the existing partnerships and relationships must be removed
first.
Changing an IBM Storage Virtualize system from the Storage layer to the Replication layer
can be performed only by using the CLI. After you are certain that all the preconditions are
met, issue the following command to change the layer from Storage to Replication:
chsystem -layer replication
404 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Partnership topologies
IBM Storage Virtualize allows various partnership topologies, as shown in Figure 6-16. Each
box represents an IBM Storage Virtualize based system.
The set of systems that are directly or indirectly connected form the connected set. A system
can be connected to up to three remote systems. No more than four systems can be in the
same connected set.
Star topology
A star topology can be used to share a centralized DR system (three in this example) with up
to three other systems, for example, replicating 1 → 3, 2 → 3, and 4 → 3.
Ring topology
A ring topology (three or more systems) can be used to establish a one-in, one-out
implementation. For example, the implementation can be 1 → 2, 2 → 3, 3 → 1 to spread
replication loads evenly among three systems.
Linear topology
A linear topology of two or more sites is also possible. However, it is simpler to create
partnerships between system 1 and system 2, and separately between system 3 and system
4.
Mesh topology
A fully connected mesh topology is where every system has a partnership to each of the three
other systems. This topology allows flexibility in that volumes can be replicated between any
two systems.
Topology considerations:
Although systems can have up to three partnerships, any one volume can be part of
only a single relationship. You cannot establish a multi-target remote copy relationship
for a specific volume. However, three-site replication is possible with IBM Storage
Virtualize 3-site replication. For more information, see IBM Spectrum Virtualize 3-Site
Replication, SG24-8504.
Although various topologies are supported, it is advisable to keep your partnerships as
simple as possible, which in most cases mean system pairs or a star.
Considering that within a single system a remote copy does not protect data in a disaster
scenario, this capability has no practical use except for functional testing. For this reason,
intrasystem remote copy is not officially supported for production data.
If the primary volume fails completely for any reason, MM is designed to ensure that the
secondary volume holds the same data as the primary did at the time of failure.
MM provides the simplest way to maintain an identical copy on both the primary and
secondary volumes. However, as with any synchronous copy over long distance, there can be
a performance impact to host applications due to network latency.
MM supports relationships between volumes that are up to 300 kilometers (km) apart.
Latency is an important consideration for any MM network. With typical fiber optic round-trip
latencies of 1 millisecond (ms) per 100 km, you can expect a minimum of 3 ms extra latency
due to the network alone on each I/O if you are running across the 300 km separation.
406 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
For a write to be considered as committed, the data must be written in both local and remote
systems cache. De-staging to disk is a natural part of I/O management, but it is not generally
in the critical path for an MM write acknowledgment.
If the primary volume fails for any reason, GM ensures that the secondary volume holds the
same data as the primary did at a point a short time before the failure. That short period of
data loss is typically 10 ms - 10 seconds, but varies according to individual circumstances.
GM is an asynchronous remote copy technique, where foreground writes at the local system
and mirrored foreground writes at the remote system are not wholly independent of one
another. The IBM Storage Virtualize implementation of GM uses algorithms to maintain a
consistent image at the target volume always.
This consistent image is achieved by identifying sets of I/Os that are active concurrently at the
source, assigning an order to those sets, and applying these sets of I/Os in the assigned
order at the target. The multiple I/Os within a single set are applied concurrently.
The process that marshals the sequential sets of I/Os operates at the remote system, and
therefore is not subject to the latency of the long-distance link.
Figure 6-18 on page 339 shows that a write operation to the master volume is acknowledged
to the host that issues the write before the write operation is mirrored to the cache for the
auxiliary volume.
408 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
With GM, a confirmation is sent to the host server before the host receives a confirmation of
the completion at the auxiliary volume. The GM function identifies sets of write I/Os that are
active concurrently at the primary volume. Then, it assigns an order to those sets and applies
these sets of I/Os in the assigned order at the auxiliary volume.
Further writes might be received from a host when the secondary write is still active for the
same block. In this case, although the primary write might complete, the new host write on the
auxiliary volume is delayed until the previous write is completed. Finally, any delay in step 2
on page 339 is reflected in the write-delay on the primary volume.
Write ordering
Many applications that use block storage must survive failures, such as a loss of power or a
software crash. They also must not lose data that existed before the failure. Because many
applications must perform many update operations in parallel to that storage block,
maintaining write ordering is key to ensuring the correct operation of applications after a
disruption.
An application that performs a high volume of database updates is often designed with the
concept of dependent writes. Dependent writes ensure that an earlier write completes before
a later write starts. Reversing the order of dependent writes can undermine the algorithms of
the application and lead to problems, such as detected or undetected data corruption.
To handle this situation, IBM Storage Virtualize uses a write ordering algorithm while sending
data to remote site by using remote copy. Each write gets tagged in the primary storage
cache for its ordering. By using this order, data is sent to the remote site and committed on
the target or remote storage.
Colliding writes
Colliding writes are defined as new write I/Os that overlap existing active write I/Os.
The original GM algorithm required only a single write to be active on any 512-byte LBA of a
volume. If another write was received from a host while the auxiliary write was still active, the
new host write was delayed until the auxiliary write was complete (although the master write
might complete). This restriction was needed if a series of writes to the auxiliary had to be
retried (which is known as reconstruction). Conceptually, the data for reconstruction comes
from the master volume.
If multiple writes might be applied to the master for a sector, only the most recent write had
the correct data during reconstruction. If reconstruction was interrupted for any reason, the
intermediate state of the auxiliary became inconsistent.
Applications that deliver such write activity do not achieve the performance that GM is
intended to support. A volume statistic is maintained about the frequency of these collisions.
The original GM implementation was modified to allow multiple writes to a single location to
be outstanding in the GM algorithm.
A need still exists for master writes to be serialized. The intermediate states of the master
data must be kept in a nonvolatile journal while the writes are outstanding to maintain the
correct write ordering during reconstruction. Reconstruction must never overwrite data on the
auxiliary with an earlier version. The colliding writes of volume statistic monitoring are now
limited to those writes that are not affected by this change.
410 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
The following numbers correspond to the numbers that are shown in Figure 6-19:
1. A first write is performed from the host to LBA X.
2. A host receives acknowledgment that the write is complete, even though the mirrored write
to the auxiliary volume is not yet complete.
The first two actions (1 and 2) occur asynchronously with the first write.
3. A second write is performed from the host to LBA X. If this write occurs before the host
receives acknowledgment (2), the write is written to the journal file.
4. A host receives acknowledgment that the second write is complete.
MM and GM both require the bandwidth to be sized to meet the peak workload. GMCV must
be sized to meet only the average workload across a cycle period.
Figure 6-20 shows a high-level conceptual view of GMCV. GMCV uses FlashCopy to maintain
image consistency and isolate host volumes from the replication process.
GMCV sends only one copy of a changed grain that might have been rewritten many times
within the cycle period.
If the primary volume fails completely for any reason, GMCV ensures that the secondary
volume holds the same data as the primary volume did at a specific point in time. That period
of data loss is typically 5 minutes - 24 hours, but varies according to the design choices that
you make.
CVs holds point-in-time copies of 256 KB grains. If any of the disk blocks in a grain change,
that grain is copied to the CV to preserve its contents. CVs also are maintained at the
secondary site so that a consistent copy of the volume is always available even when the
secondary volume is being updated.
Figure 6-21 on page 343 shows how a CV is used to preserve a point-in-time data set, which
is then replicated to a secondary site. The data at the secondary site is in turn preserved by a
CV until the next replication cycle completes.
412 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
GMCV FlashCopy mapping note: GMCV FlashCopy mappings are not standard
FlashCopy volumes and are not accessible for general use. They are internal structures
that are dedicated to supporting GMCV.
Specifying or taking the default none means that GM acts in its traditional mode without CVs.
Specifying multi means that GM starts cycling based on the cycle period, which defaults to
300 seconds. The valid range is 60 - 24*60*60 seconds (86,400 seconds = one day).
If all the changed grains cannot be copied to the secondary site within the specified time, then
the replication takes as long as it needs, and it starts the next replication when the earlier one
completes. You can choose to implement this approach by deliberately setting the cycle
period to a short period, which is a perfectly valid approach. However, the shorter the cycle
period, the less opportunity there is for peak write I/O smoothing, and the more bandwidth
you need.
The -cyclingmode setting can be changed only when the GM relationship is in a stopped
state.
If a cycle completes within the specified cycle period, then the RPO is not more than 2x cycle
long. However, if it does not complete within the cycle period, then the RPO is not more than
the sum of the last two cycle times.
The current RPO can be determined by looking at the lsrcrelationship freeze time attribute.
The freeze time is the timestamp of the last primary CV that completed copying to the
secondary site. Note the following example:
1. The cycle period is the default of 5 minutes. The cycle is triggered at 6:00 AM and
competes at 6:03 AM. The freeze time would be 6:00 AM, and the RPO is 3 minutes.
2. The cycle starts again at 6:05 AM. The RPO now is 5 minutes. The cycle is still running at
6:12 AM, and the RPO is now up to 12 minutes because 6:00 AM is still the freeze time of
the last complete cycle.
3. At 6:13 AM, the cycle completes and the RPO now is 8 minutes because 6:05 AM is the
freeze time of the last complete cycle.
4. Because the cycle period was exceeded, the cycle immediately starts again.
The MM, GM, GMCV, and HyperSwap Copy Services functions create remote copy or remote
replication relationships between volumes or CGs. If the secondary volume in a remote copy
relationship becomes unavailable to the primary volume, the system maintains the
relationship. However, the data might become out of sync when the secondary volume
becomes available.
CVs can be used to maintain a consistent image of the secondary volume. HyperSwap
relationships and GM relationships with cycling mode set to Multiple must always be
configured with CVs. MM and GM with cycling mode set to None can optionally be configured
with CVs.
When a secondary CV is configured, the relationship between the primary and secondary
volumes does not stop if the link goes down or the secondary volume is offline. The
relationship does not go in to the Consistent stopped status. Instead, the system uses the
secondary CV to automatically copy the previous consistent state of the secondary volume.
414 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
The relationship automatically moves to the Consistent copying status as the system
resynchronizes and protects the consistency of the data. The relationship status changes to
Consistent synchronized when the resynchronization process completes. The relationship
automatically resumes replication after the temporary loss of connectivity.
Terminology
The inter-system network is specified in terms of latency and bandwidth. These parameters
define the capabilities of the link regarding the traffic that it can carry. They be must be
chosen so that they support all forms of traffic, including mirrored foreground writes,
background copy writes, and inter-system heartbeat messaging (node-to-node
communication).
Link latency is the time that is taken by data to move across a network from one location to
another one. It is measured in milliseconds. The latency measures the time that is spent to
send the data and to receive the acknowledgment back (round-trip time (RTT)).
Link bandwidth is the network capacity to move data as measured in millions of bits per
second or megabits per second (Mbps) or billions of bits per second or gigabits per second
(Gbps).
Inter-system connectivity supports mirrored foreground and background I/O. A portion of the
link is also used to carry traffic that is associated with the exchange of low-level messaging
between the nodes of the local and remote systems. A dedicated amount of the link
bandwidth is required for the exchange of heartbeat messages and the initial configuration of
inter-system partnerships.
FC connectivity is the standard connectivity that is used for the remote copy inter-system
networks. It uses the FC protocol and SAN infrastructures to interconnect the systems.
The initiator starts by sending a read command (FCP_CMND) across the network to the target.
The target is responsible for retrieving the data and responding by sending the data
(FCP_DATA_OUT) to the initiator. Finally, the target completes the operation by sending the
command completed response (FCP_RSP). FCP_DATA_OUT and FCP_RSP are sent to the initiator
in sequence. Overall, one round trip is required to complete the read, so the read takes at
least one RTT plus the time for the data out.
416 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Within the confines of a data center, where the latencies are measured in microseconds
(μsec), no issues exist. However, across a geographical network where the latencies are
measured in milliseconds (ms), the overall service time can be significantly affected.
Considering that the network delay over fiber optics per kilometer (km) is approximately
5 μsec (10 μsec RTT), the resulting minimum service time per every km of distance for a
SCSI operation is 10 μsec (reads) and 20 μsec (writes), for example, a SCSI write over 50 km
has a minimum service time of 1000 μsec (that is, 1 ms).
Figure 6-24 shows how a remote copy write operation is performed over an FC network.
When the remote copy is initialized, the target system (secondary system) sends a dummy
read command (FCP_CMND) to the initiator (primary system). This command waits on the
initiator until a write operation is requested.
When a write operation is started, the data is sent to the target as response of the dummy
read command (FCP_DATA_OUT). Finally, the target completes the operation by sending a new
dummy read command (FCP_CMND).
Overall, one round trip is required to complete the remote write by using this protocol, so
replicating a write takes at least one RTT plus the time for the data out.
FC 1 Gbps IP 10 Gbps IP
250 ms 80 ms 10 ms
More configuration requirements and guidelines apply to systems that perform remote
mirroring over extended distances, where the RTT is greater than 80 ms. If you use remote
mirroring between systems with 80 - 250 ms round-trip latency, you must meet the following
extra requirements:
The RC buffer size setting must be 512 MB on each system in the partnership. This setting
can be accomplished by running the chsystem -rcbuffersize 512 command on each
system.
Two FC ports on each node that will be used for replication must be dedicated for
replication traffic. This configuration can be achieved by using SAN zoning and port
masking. Starting with IBM Storage Virtualize 8.5, a user can configure a remote copy
portset to achieve remote copy traffic isolation.
SAN zoning should be applied to provide separate intrasystem zones for each
local-remote I/O group pair that is used for replication. For more information about zoning
guidelines, see “Remote system ports and zoning considerations” on page 355.
Two nodes 5 6 6 6
Four nodes 6 10 11 12
Six nodes 6 11 16 17
Eight nodes 6 12 17 21
418 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
These numbers represent the total traffic between the two systems when no I/O is occurring
to a mirrored volume on the remote system. Half of the data is sent by one system, and half of
the data is sent by the other system. The traffic is divided evenly over all available
connections. Therefore, if you have two redundant links, half of this traffic is sent over each
link during a fault-free operation.
If the link between the sites is configured with redundancy to tolerate single failures, size the
link so that the bandwidth and latency statements continue to be accurate even during single
failure conditions.
Consider that inter-system bandwidth should support the combined traffic of the following
items:
Mirrored foreground writes, as generated by your server applications at peak times
Background write synchronization, as defined by the GM bandwidth parameter
Inter-system communication (heartbeat messaging)
GM, which does not have write buffering resources, tends to mirror the foreground write when
it is committed in cache, so the bandwidth requirements are similar to MM.
For a proper bandwidth sizing with MM or GM, you must know your peak write workload to at
least a 5-minute interval. This information can be easily gained from tools like IBM Spectrum
Control. Finally, you must allow for background copy, intercluster communication traffic, and a
safe margin for unexpected peaks and workload growth.
This example represents a general traffic pattern that might be common in many
medium-sized sites. Furthermore, 20% of bandwidth must be left available for the background
synchronization.
This calculation provides the dedicated bandwidth that is required. The user should consider
a safety margin plus growth requirement for the environment.
(Peak Write I/O x Write I/O Size = [Total Data Rate] + 20% resync allowance + 5
Mbps heartbeat = [Total Bandwidth Required])
The GMCV network sizing is basically a tradeoff between RPO, journal capacity, and network
bandwidth. A direct relationship exists between the RPO and the physical occupancy of the
CVs. The lower the RPO, the less capacity is used by CVs. However, higher RPO requires
usually less network bandwidth.
For a proper bandwidth sizing with GMCV, you must know your average write workload during
the cycle time. This information can be obtained easily from tools like IBM Spectrum Control.
Finally, you must consider the background resync workload, intercluster communication
traffic, and a safe margin for unexpected peaks and workload growth.
This example is intended to represent a general traffic pattern that might be common in many
medium-sized sites. Furthermore, 20% of bandwidth must be left available for the background
synchronization.
420 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
The central principle of sizing is that you need to know your write workload:
For MM and GM, you need to know the peak write workload.
For GMCV, you need to know the average write workload.
GMCV bandwidth: In the above examples, the bandwidth estimation for GMCV is based
on the assumption that the write operations occur in such a way that a CV grain (that has a
size of 256 KB) is completely changed before it is transferred to the remote site. In the real
life, this situation is unlikely to occur.
Usually, only a portion of a grain is changed during a GMCV cycle, but the transfer process
always copies the whole grain to the remote site. This behavior can lead to an unforeseen
processor burden in the transfer bandwidth that, in an edge case, can be even higher than
the one that is required for standard GM.
The two GM technologies, as previously described, use the available bandwidth in different
ways:
GM uses the amount of bandwidth that is needed to sustain the write workload of the
replication set.
GMCV uses the fixed amount of bandwidth as defined in the partnership as background
copy.
For this reason, during GMCV cycle-creation, a fixed part of the bandwidth is allocated for the
background copy and only the remaining part of the bandwidth is available for GM. To avoid
bandwidth contention, which can lead to a 1920 error (see 6.5.6, “1920 error” on page 375) or
delayed GMCV cycle creation, the bandwidth must be sized to consider both requirements.
Note: GM can use bandwidth that is reserved for background copy if it is not used by a
background copy workload.
Ideally, in these cases the bandwidth should be enough to accommodate the peak write
workload for the GM replication set plus the estimated bandwidth that is needed to fulfill the
RPO of GMCV. If these requirements cannot be met due to bandwidth restrictions, the option
with the least impact is to increase the GMCV cycle period, and then reduce the background
copy rate to minimize the chance of a 1920 error.
These considerations also apply to configurations where multiple IBM Storage Virtualize
based systems are sharing bandwidth resources.
Redundancy
The inter-system network must adopt the same policy toward redundancy as for the local and
remote systems to which it is connecting. The Inter-Switch Links (ISLs) must have
redundancy, and the individual ISLs must provide the necessary bandwidth in isolation.
When an FC network becomes congested, the FC switches stop accepting more frames until
the congestion clears. They also can drop frames. Congestion can quickly move upstream in
the fabric and clog the end devices from communicating.
422 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Of these options, the optical distance extension is the preferred method. IP distance
extension introduces more complexity, is less reliable, and has performance limitations.
However, optical distance extension can be impractical in many cases because of cost or
unavailability.
For more information about supported SAN routers and FC extenders, see
this IBM Documentation web page.
Hops
The hop count is not increased by the intersite connection architecture. For example, if you
have a SAN extension that is based on DWDM, the DWDM components are not apparent to
the number of hops. The hop count limit within a fabric is set by the fabric devices (switch or
director) operating system. It is used to derive a frame hold time value for each fabric device.
This hold time value is the maximum amount of time that a frame can be held in a switch
before it is dropped or the fabric busy condition is returned. For example, a frame might be
held if its destination port is unavailable. The hold time is derived from a formula that uses the
error detect timeout value and the resource allocation timeout value. Every extra hop adds
about 1.2 microseconds of latency to the transmission.
Currently, IBM Storage Virtualize Copy Services support three hops when protocol
conversion exists. Therefore, if you have DWDM extended between primary and secondary
sites, three SAN directors or switches can exist between the primary and secondary systems.
Buffer credits
SAN device ports need memory to temporarily store frames as they arrive, assemble them in
sequence, and deliver them to the upper layer protocol. The number of frames that a port can
hold is called its buffer credit. The FC architecture is based on a flow control that ensures a
constant stream of data to fill the available pipe.
When two FC ports begin a conversation, they exchange information about their buffer
capacities. An FC port sends only the number of buffer frames for which the receiving port
gives credit. This method avoids overruns and provides a way to maintain performance over
distance by filling the pipe with in-flight frames or buffers.
FC Flow Control: Each time that a port sends a frame, it increments BB_Credit_CNT and
EE_Credit_CNT by one. When it receives R_RDY from the adjacent port, it decrements
BB_Credit_CNT by one. When it receives ACK from the destination port, it decrements
EE_Credit_CNT by one.
The previous statements are true for a Class 2 service. Class 1 is a dedicated connection.
Therefore, BB_Credit is not important, and only EE_Credit is used (EE Flow Control).
However, Class 3 is an unacknowledged service, so it uses only BB_Credit (BB Flow
Control), but the mechanism is the same in all cases.
The number of buffers is an important factor in overall performance. You need enough buffers
to ensure that the transmitting port can continue to send frames without stopping to use the
full bandwidth, which is true with distance. The total amount of buffer credit that is needed to
optimize the throughput depends on the link speed and the average frame size.
For example, consider an 8 Gbps link connecting two switches that are 100 km apart. At
8 Gbps, a full frame (2148 bytes) occupies about 0.51 km of fiber. In a 100 km link, you can
send 198 frames before the first one reaches its destination. You need an ACK to go back to
the start to fill EE_Credit again. You can send another 198 frames before you receive the first
ACK.
You need at least 396 buffers to allow for nonstop transmission at 100 km distance. The
maximum distance that can be achieved at full performance depends on the capabilities of
the FC node that is attached at either end of the link extenders, which are vendor-specific. A
match should occur between the buffer credit capability of the nodes at either end of the
extenders.
424 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
The preferred practice for IBM Storage Virtualize is to provision dedicated node ports for local
node-to-node traffic (by using port masking) and isolate GM node-to-node traffic between the
local nodes from other local SAN traffic.
Remote port masking: To isolate the node-to-node traffic from the remote copy traffic, the
local and remote port masking implementation is preferable.
For zoning, the following rules for the remote system partnership apply:
For remote copy configurations where the round-trip latency between systems is less than
80 milliseconds, zone two FC ports on each node in the local system to two FC ports on
each node in the remote system.
For remote copy configurations where the round-trip latency between systems is more
than 80 milliseconds, apply SAN zoning to provide separate intrasystem zones for each
local-remote I/O group pair that is used for replication, as shown in Figure 6-25.
N_Port ID Virtualization (NPIV): IBM Storage Virtualize systems with the NPIV feature
enabled provide virtual worldwide port names (WWPNs) for the host zoning. These
WWPNs are intended for host zoning only, and they cannot be used for the remote copy
partnership.
426 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
In the configuration that is shown in Figure 6-26 on page 357, the remote copy network is
isolated in a replication SAN that interconnects Site A and Site B through a SAN extension
infrastructure through two physical links. Assume that, for redundancy reasons, two ISLs are
used for each fabric for the replication SAN extension.
There are two possible configurations to interconnect the replication SANs. In configuration 1,
as shown in Figure 6-27, one ISL per fabric is attached to each physical link through xWDM
or FCIP routers. In this case, the physical paths Path A and Path B are used to extend both
fabrics.
428 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
In configuration 2, the physical paths are not shared between the fabrics, as shown in
Figure 6-28.
Figure 6-28 Configuration 2: Physical paths not shared among the fabrics
With configuration 1, in a failure of one of the physical paths, both fabrics are simultaneously
affected, and a fabric reconfiguration occurs because of an ISL loss. This situation might lead
to a temporary disruption of the remote copy communication, and in the worst case to
partnership loss condition. To mitigate this situation, link aggregation features like Brocade
ISL trunking can be implemented.
With configuration 2, a physical path failure leads to a fabric segmentation of one of the two
fabrics, leaving the other fabric unaffected. In this case, the remote copy communication is
ensured through the unaffected fabric.
You should fully understand the implication of a physical path or xWDM or FCIP router loss in
the SAN extension infrastructure and implement the appropriate architecture to avoid a
simultaneous impact.
Remote copy (MM and GM) relationships per system 10000 This configuration can be any mix of MM and
GM relationships.
Active-active relationships 2000 The limit for the number of HyperSwap volumes
in a system.
Remote copy relationships per CG (<= 256 GMCV None No limit is imposed beyond the remote copy
relationships are configured.) relationships per system limit. Apply to GM and
MM.
Total MM and GM volume capacity per I/O group 2048 TB The total capacity for all master and auxiliary
volumes in the I/O group.
Inter-site links per IP partnership 2 A maximum of two inter-site links can be used
between two IP partnership sites.
Ports per node 1 A maximum of one port per node can be used
for IP partnership.
Like FlashCopy, remote copy services require memory to allocate the bitmap structures that
are used to track the updates while volumes are suspended or synchronizing. The default
amount of memory for remote copy services is 20 MB. This value can be increased or
decreased by using the chiogrp command. The maximum amount of memory that can be
specified for remote copy services is 512 MB. The grain size for the remote copy services is
256 KB.
Partnerships between systems for MM or GM replication can be used with both FC and native
Ethernet connectivity. Distances greater than 300 meters are supported only when using an
FCIP link or FC between source and target.
430 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Table 6-9 shows the configuration limits for clustering and HyperSwap over FC and Ethernet.
Table 6-9 Configuration limits for clustering and HyperSwap over FC and Ethernet
Clustering Clustering HyperSwap HyperSwap Metro/Global Metro/Global
over Fibre over over Fibre over Mirror replication Mirror replication
Channel 25-gigabit Channel Ethernet over Fibre over Ethernet
Ethernet (25 Gb only) Channel (10 Gb or 25 GB)
(GbE)
Yes (up to two Yes (up to two Yes (up to 2 I/O Yes (up to 2 I/O Yes Yes
I/O groups) I/O groups) groups) groups)
You can mirror intrasystem MM or GM only between volumes in the same I/O group.
With such configurations, it is a best practice to set the cleaning rate as needed. This best
practice also applies to Consistency Protection volumes and HyperSwap configurations.
However, some storage controllers can provide specific copy services capabilities that are not
available with the current version of IBM Spectrum Virtualize. IBM Storage Virtualize
addresses these situations by using cache-disabled image mode volumes that virtualize
LUNs that participate in the native back-end controller’s copy services relationships.
Keeping the cache disabled ensures data consistency throughout the I/O stack, from the host
to the back-end controller. Otherwise, by leaving the cache enabled on a volume, the
underlying controller does not receive any write I/Os as the host writes them. IBM Storage
Virtualize caches them and processes them later. This process can have more ramifications if
a target host depends on the write I/Os from the source host as they are written.
Note: Native copy services are not supported on all storage controllers. For more
information about the known limitations, see this IBM Support web page.
As part of its copy services function, the storage controller might take a LUN offline or
suspend reads or writes. IBM Storage Virtualize does not recognize why this process
happens. Therefore, it might log errors when these events occur. For this reason, if
IBM Storage Virtualize must detect the LUN, ensure that the LUN remains in the unmanaged
state until full access is granted.
432 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Native back-end controller copy services can also be used for LUNs that are not managed by
IBM Spectrum Virtualize. Accidental incorrect configurations of the back-end controller copy
services involving IBM Storage Virtualize attached LUNs can produce unpredictable results.
For example, if you accidentally use a LUN with IBM Storage Virtualize data on it as a
point-in-time target LUN, you can corrupt that data. Moreover, if that LUN was a MDisk in a
managed-disk group with striped or sequential volumes on it, the MDisk group might be
brought offline. This situation makes all the volumes that belong to that group go offline,
leading to a widespread host access disruption.
Attention: Upgrading both systems concurrently is not monitored by the software upgrade
process.
Allow the software upgrade to complete on one system before you start it on the other
system. Upgrading both systems concurrently can lead to a loss of synchronization. In stress
situations, it can further lead to a loss of availability.
Usually, pre-existing remote copy relationships are unaffected by a software upgrade that is
performed correctly. However, always check in the target code release notes for special
considerations on the copy services.
Although it is not a best practice, a remote copy partnership can be established with some
restrictions among systems with different IBM Storage Virtualize versions. For more
information, see this IBM Support web page.
Although defined at a system level, the partnership bandwidth and the background copy rate
are evenly divided among the cluster’s I/O groups. The available bandwidth for the
background copy can be used by either nodes or shared by both nodes within the I/O group.
This bandwidth allocation is independent from the number of volumes for which a node is
responsible. Each node divides its bandwidth evenly between the (multiple) remote copy
relationships with which it associates volumes that are performing a background copy.
The node-to-node in-flight write limit is determined by the number of nodes in the remote
system. The more nodes that exist at the remote system, the lower the limit is for the in-flight
write I/Os from a local node to a remote node. Less data can be outstanding from any one
local node to any other remote node. To optimize performance, GM volumes must have their
preferred nodes distributed evenly between the nodes of the systems.
The preferred node property of a volume helps to balance the I/O load between nodes in that
I/O group. This property is also used by remote copy to route I/O between systems.
The IBM Storage Virtualize node that receives a write for a volume is normally the preferred
node of the volume. For volumes in a remote copy relationship, that node is responsible for
sending that write to the preferred node of the target volume. The primary preferred node is
responsible for sending any writes that relate to the background copy. Again, these writes are
sent to the preferred node of the target volume.
Each node of the remote system has a fixed pool of remote copy system resources for each
node of the primary system. Each remote node has a separate queue for I/O from each of the
primary nodes. This queue is a fixed size and is the same size for every node. If preferred
nodes for the volumes of the remote system are set so that every combination of primary
node and secondary node is used, remote copy performance is maximized.
Figure 6-29 shows an example of remote copy resources that are not optimized. Volumes
from the local system are replicated to the remote system. All volumes with a preferred node
of Node 1 are replicated to the remote system, where the target volumes also have a
preferred node of Node 1.
With the configuration that is shown in Figure 6-29, the resources for remote system Node 1
that are reserved for local system Node 2 are not used. Also, the resources for local system
Node 1 that are reserved for remote system Node 2 are not used.
If the configuration that is shown in Figure 6-29 changes to the configuration that is shown in
Figure 6-30 on page 365, all remote copy resources for each node are used, and remote copy
operates with better performance.
434 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Note: The pause period is short. I/O is paused only while FlashCopy mapping is being
prepared.
Peak I/O Response Time varies based on the number of relationships in a CG. Lower the
number of relationships in a CG to make Peak I/O Response Time better. It is a best practice
to have fewer volumes per CG wherever possible. Table 6-10 shows the relative Peak I/O
Response Time with the number of relationships per CG.
Table 6-10 Relative Peak I/O Response Time with number of relationships per CG
Relationships per CG Peak I/O Response Time (Approximate)
1 1.0x
25 1.2x
50 2.0x
150 3.0x
256 5.0x
Therefore, the placement on the back end is critical to provide adequate performance.
Consider using DRP for the CVs only if it is beneficial in terms of space savings.
Tip: The internal FlashCopy that is used by the GMCV uses a 256 KB grain size. However,
it is possible to force a 64 KB grain size by creating a FlashCopy with a 64 KB grain size
from the GMCV volume and a dummy target volume before assigning the CV to the
relationship. You can do this procedure for both the source and target volumes. After the
CV assignment is done, the dummy FlashCopy can be deleted.
Important: An increase in the peak foreground workload can have a detrimental effect
on foreground I/O by pushing more mirrored foreground write traffic along the
inter-system network, which might not have the bandwidth to sustain it. It can also
overload the primary storage.
To set the background copy bandwidth optimally, consider all aspects of your environments,
starting with the following biggest contributing resources:
Primary storage
Inter-system network bandwidth
Auxiliary storage
Provision the most restrictive of these three resources between the background copy
bandwidth and the peak foreground I/O workload. Perform this provisioning by calculation or
by determining experimentally how much background copy can be allowed before the
foreground I/O latency becomes unacceptable.
Then, reduce the background copy to accommodate peaks in workload. In cases where the
available network bandwidth cannot sustain an acceptable background copy rate, consider
alternatives to the initial copy, as described in “Initial synchronization options and offline
synchronization” on page 367.
Changes in the environment or increasing its workload can affect the foreground I/O.
IBM Storage Virtualize provides a means to monitor and a parameter to control how
foreground I/O is affected by running remote copy processes. IBM Storage Virtualize monitors
the delivery of the mirrored foreground writes.
436 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Finally, with GMCV, the cycling process that transfers the data from the local to the remote
system is a background copy task. For more information, see “Global Mirror and GMCV
coexistence considerations” on page 351. For this reason, the background copy rate and the
relationship_bandwidth_limit setting affect the available bandwidth during the initial
synchronization and the normal cycling process.
Consider, for example, a 4-I/O group cluster that has a partnership bandwidth of
4,000 Mbps and a background copy percentage of 50%. The expected maximum
background copy rate for this partnership is 250 MBps.
Because the available bandwidth is evenly divided among the I/O groups, every I/O group
in this cluster can theoretically synchronize data at a maximum rate of approximately
62 MBps (50% of 1,000 Mbps). Now, in an edge case where only volumes from one I/O
group are replicated, the partnership bandwidth should be adjusted to 16,000 Mbps to
reach the full background copy rate (250 MBps).
Attention: If you do not perform these steps correctly, the remote copy reports the
relationship as being consistent when it is not. This setting is likely to cause auxiliary
volumes to be useless.
By understanding the methods to start an MM and GM relationship, you can use one of them
as a means to implement the remote copy relationship saving bandwidth.
Consider a situation where you have a large source volume (or many source volumes)
containing already active data that you want to replicate to a remote site. Your planning shows
that the mirror initial-sync time takes too long (or is too costly if you pay for the traffic that you
use). In this case, you can set up the sync by using another medium that is less expensive.
This synchronization method is called offline synchronization.
This example uses tape media as the source for the initial sync for the MM relationship or the
GM relationship target before it uses remote copy services to maintain the MM or GM. This
example does not require downtime for the hosts that use the source volumes.
Before you set up GM relationships and save bandwidth, complete the following steps:
1. Ensure that the hosts are running and using their volumes normally. The MM relationship
or GM relationship is not yet defined.
Identify all volumes that become the source volumes in an MM relationship or in a GM
relationship.
2. Establish the remote copy partnership with the target IBM Storage Virtualize system.
Attention: If you do not use the -sync option, all these steps are redundant because
the IBM Storage Virtualize system performs a full initial synchronization.
2. Stop each mirror relationship by using the -access option, which enables write access to
the target volumes. You need write access later.
3. Copy the source volume to the alternative media by using the dd command to copy the
contents of the volume to tape. Another option is to use your backup tool (for example,
IBM Spectrum Protect) to make an image backup of the volume.
Change tracking: Although the source is modified while you copy the image, the
IBM Storage Virtualize software is tracking those changes. The image that you create
might have some of the changes and is likely to miss some of the changes.
When the relationship is restarted, IBM Storage Virtualize applies all the changes that
occurred since the relationship stopped in step 2. After all the changes are applied, you
have a consistent target image.
4. Ship your media to the remote site and apply the contents to the targets of the MM or GM
relationship. You can mount the MM and GM target volumes to a UNIX server and use the
dd command to copy the contents of the tape to the target volume.
If you used your backup tool to make an image of the volume, follow the instructions for
your tool to restore the image to the target volume. Remember to remove the mount if the
host is temporary.
438 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Tip: It does not matter how long it takes to get your media to the remote site to perform
this step. However, the faster that you can get the media to the remote site and load it,
the quicker that the IBM Storage Virtualize system starts running and maintaining the
MM and GM.
5. Unmount the target volumes from your host. When you start the MM and GM relationships
later, IBM Storage Virtualize stops write-access to the volume while the mirror relationship
is running.
6. Start your MM and GM relationships. The relationships must be started with the -clean
parameter. This way, changes that are made on the secondary volume are ignored. Only
changes that are made on the clean primary volume are considered when synchronizing
the primary and secondary volumes.
7. While the mirror relationship catches up, the target volume is not usable at all. When it
reaches the ConsistentSynchnonized status, your remote volume is ready for use in a
disaster.
The best practice is to perform an accurate back-end resource sizing for the remote system to
fulfill the following capabilities:
The peak application workload to the GM or MM volumes
The defined level of background copy
Any other I/O that is performed at the remote site
relationshipbandwidthlimit
The relationshipbandwidthlimit parameter is an optional parameter that specifies the new
background copy bandwidth of 1 - 1000 MBps. The default is 25 MBps. This parameter
operates system-wide, and defines the maximum background copy bandwidth that any
relationship can adopt. The existing background copy bandwidth settings that are defined on
a partnership continue to operate with the lower of the partnership and volume rates
attempted.
Important: Do not set this value higher than the default without establishing that the higher
bandwidth can be sustained.
The gmlinktolerance parameter can be thought of as how long you allow the host delay to go
on being significant before you decide to terminate a GM volume relationship. This parameter
accepts values of 20 - 86,400 seconds in increments of 10 seconds. The default is
300 seconds. You can disable the link tolerance by entering a value of zero for this parameter.
The gmmaxhostdelay parameter can be thought of as the maximum host I/O impact that is due
to GM, that is, how long that local I/O would take with GM turned off, and how long does it
take with GM turned on. The difference is the host delay due to the GM tag and forward
processing.
Although the default settings are adequate for most situations, increasing one parameter
while reducing another one might deliver a tuned performance environment for a particular
circumstance.
Example 6-1 shows how to change the gmlinktolerance and gmmaxhostdelay parameters by
using the chsystem command.
Test and monitor: To reiterate, thoroughly test and carefully monitor the host impact of
any changes before putting them into a live production environment.
For more information about settings considerations for the gmlinktolerance and
gmmaxhostdelay parameters, see 6.5.6, “1920 error” on page 375.
rcbuffersize
The rcbuffersize parameter was introduced to manage workloads with intense and bursty
write I/O that do not fill the internal buffer while GM writes are undergoing sequence tagging.
Important: Do not change the rcbuffersize parameter except under the direction of
IBM Support.
Example 6-2 shows how to change rcbuffersize to 64 MB by using the chsystem command.
The default value for rcbuffersize is 48 MB, and the maximum value is 512 MB.
Any extra buffers that you allocate are taken away from the general cache.
440 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
The partnershipexclusionthreshold parameter was introduced so that users can set the
timeout for an I/O that triggers a temporarily dropping of the link to the remote cluster. The
value must be 30 - 315.
For more information about settings considerations for the maxreplicationdelay parameter,
see 6.5.6, “1920 error” on page 375.
Another typical remote copy use case is data movement among distant locations as required,
for example, for data center relocation and consolidation projects. In these scenarios, the IBM
Storage Virtualize remote copy technology is particularly effective when combined with the
image copy feature that allows data movement among storage systems of different
technologies or vendors.
Mirroring scenarios that involve multiple sites can be implemented by using a combination of
IBM Storage Virtualize capabilities.
DRP limitation: Currently, the image mode VDisk is not supported by DRP.
442 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
In Figure 6-32, the primary site uses IBM Storage Virtualize remote copy functions (GM or
MM) at the secondary site. Therefore, if a disaster occurs at the primary site, the storage
administrator enables access to the target volume (from the secondary site) and the business
application continues processing.
While the business continues processing at the secondary site, the storage controller copy
services replicate to the third site. This configuration is allowed under the following conditions:
The back-end controller native copy services must be supported by IBM Spectrum
Virtualize. For more information, see “Native back-end controller copy functions
considerations” on page 362.
The source and target volumes that are used by the back-end controller native copy
services must be imported to the IBM Storage Virtualize system as image-mode volumes
with the cache disabled.
In the configuration that is described in Figure 6-33, a GM (MM also can be used) solution is
implemented between the Local System at Site A, which is the production site, and the
Remote System 1 at Site B, which is the primary DR site. A third system, Remote System 2,
is at Site C, which is the secondary DR site. Connectivity is provided between Site A and Site
B, between Site B and Site C, and optionally between Site A and Site C.
444 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
The first time that these operations are performed, a full copy between Remote System 1
and Remote System 2 occurs. Later runs of these operations perform incremental
resynchronization instead. After the GM between Remote System 1 and Remote System
2 is in the Consistent Synchronized state, the consistency point in Site C is created. The
GM between Remote System 1 and Remote System 2 can now be stopped to be ready for
the next consistency point creation.
A 1920 error can occur for many reasons. The condition might be the result of a temporary
interruption, such as maintenance on the inter-system connectivity, an unexpectedly higher
foreground host I/O workload, or a permanent error because of a hardware failure. It is also
possible that not all relationships are affected and that multiple 1920 errors can be posted.
To mitigate the effects of the GM to the foreground I/Os, the IBM Storage Virtualize code
implements different control mechanisms for Slow I/O and Hung I/O conditions. The Slow I/O
condition is a persistent performance degradation on write operations that are introduced by
the remote copy logic. The Hung I/O condition is a long delay (seconds) on write operations.
In terms of nodes and back-end characteristics, the system configuration must be provisioned
so that when combined, they can support the maximum throughput that is delivered by the
applications at the primary that uses GM.
If the capabilities of the system configuration are exceeded, the system becomes backlogged,
and the hosts receive higher latencies on their write I/O. Remote copy in GM implements a
protection mechanism to detect this condition and halts mirrored foreground write and
background copy I/O. Suspension of this type of I/O traffic ensures that misconfiguration or
hardware problems (or both) do not affect host application availability.
GM attempts to detect and differentiate between backlogs that occur because of the
operation of the GM protocol. It does not examine the general delays in the system when it is
heavily loaded, where a host might see high latency even if GM were disabled.
In this case, the 1920 error is identified with the specific event ID 985003 that is associated to
the GM relationship, which in the last 10-second period had the greatest accumulated time
spent on delays. This event ID is generated with the text Remote Copy retry timeout.
A higher gmlinktolerance value, gmmaxhostdelay setting, or I/O load, might reduce the risk of
encountering this edge case.
This parameter is mainly intended to protect from secondary system issues. It does not help
with ongoing performance issues, but can be used to limit the exposure of hosts to long write
response times that can cause application errors. For example, setting maxreplicationdelay
to 30 means that if a write operation for a volume in a remote copy relationship does not
complete within 30 seconds, the relationship is stopped, which triggers a 1920 error. This
error happens even if the cause of the write delay is not related to the remote copy. For this
reason, the maxreplicationdelay settings can lead to false positive1920 error triggering.
In addition to the 1920 error, the specific event ID 985004 is generated with the text Maximum
replication delay exceeded.
446 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
One of these actions is temporarily dropping (for 15 minutes) the link between systems if any
I/O takes longer than 5 minutes and 15 seconds (315 seconds). This action often removes
hang conditions that are caused by replication problems. The
partnershipexclusionthreshold parameter introduced the ability to set this value to a time
lower than 315 seconds to respond to hung I/O more swiftly. The
partnershipexclusionthreshold value must be a number 30 - 315.
The diagnosis of a 1920 error is assisted by SAN performance statistics. To gather this
information, you can use IBM Spectrum Control with a statistics monitoring interval of 1 or
5 minutes. Also, turn on the internal statistics gathering function IOstats in IBM Spectrum
Virtualize. Although not as powerful as IBM Spectrum Control, IOstats can provide valuable
debug information if the snap command gathers system configuration data close to the time of
failure.
The following main performance statistics must be investigated for the 1920 error:
Write I/O Rate and Write Data Rate
For volumes that are primary volumes in relationships, these statistics are the total amount
of write operations that are submitted per second by hosts on average over the sample
period, and the bandwidth of those writes. For secondary volumes in relationships, these
statistics are the average number of replicated writes that are received per second, and
the bandwidth that these writes consume. Summing the rate over the volumes that you
intend to replicate gives a coarse estimate of the replication link bandwidth that is
required.
Write Response Time and Peak Write Response Time
On primary volumes, these items are the average time (in milliseconds) and the peak time
between a write request being received from a host and the completion message being
returned. The Write Response Time is the best way to show what kind of write
performance that the host is seeing.
If a user complains that an application is slow, and the stats show that the Write Response
Time leaps from 1 ms to 20 ms, the two are most likely linked. However, some applications
with high queue depths and low to moderate workloads are not affected by increased
response times. This high queue depth is an effect of some other problem. The Peak Write
Response Time is less useful because it is sensitive to individual glitches in performance,
but it can show more details about the distribution of write response times.
On secondary volumes, these statistics describe the time for the write to be submitted
from the replication feature into the system cache, and should normally be of a similar
magnitude to the ones on the primary volume. Generally, the Write Response Time should
be below 1 ms for a fast-performing system.
Global Mirror Write I/O Rate
This statistic shows the number of writes per second that the (regular) replication feature
is processing for this volume. It applies to both types of GM and to MM, but only for the
secondary volume. Because writes are always separated into 32 KB or smaller tracks
before replication, this setting might be different from the Write I/O Rate on the primary
volume (magnified further because the samples on the two systems will not be aligned, so
they capture a different set of writes).
Global Mirror Overlapping Write I/O Rate
This statistic monitors the amount of overlapping I/O that the GM feature is handling for
regular GM relationships, which is where an LBA is written again after the primary volume
is updated, but before the secondary volume is updated for an earlier write to that LBA. To
mitigate the effects of the overlapping I/Os, a journaling feature was implemented, as
described in “Colliding writes” on page 340.
Global Mirror secondary write lag
This statistic is valid for regular GM primary and secondary volumes. For primary volumes,
it tracks the length of time in milliseconds that replication writes are outstanding from the
primary system. This amount includes the time to send the data to the remote system,
consistently apply it to the secondary nonvolatile cache, and send an acknowledgment
back to the primary system.
For secondary volumes, this statistic records only the time that is taken to consistently
apply it to the system cache, which is normally up to 20 ms. Most of that time is spent
coordinating consistency across many nodes and volumes. Primary and secondary
volumes for a relationship tend to record times that differ by the RTT between systems. If
this statistic is high on the secondary system, look for congestion on the secondary
system’s fabrics, saturated auxiliary storage, or high CPU utilization on the secondary
system.
448 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
450 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Inter-system network
For diagnostic purposes, ask the following questions about the inter-system network:
Was network maintenance being performed?
Consider the hardware or software maintenance that is associated with the inter-system
network, such as updating firmware or adding more capacity.
Is the inter-system network overloaded?
You can find indications of this situation by using statistical analysis with the help of I/O
stats, IBM Spectrum Control, or both. Examine the internode communications, storage
controller performance, or both. By using IBM Spectrum Control, you can check that the
storage metrics for the GM relationships were stopped, which can be tens of minutes
depending on the gmlinktolerance and maxreplicationdelay parameters.
Diagnose the overloaded link by using the following methods:
– Look at the statistics that are generated by the routers or switches near your most
bandwidth-constrained link between the systems.
Exactly what is provided and how to analyze it varies depending on the equipment that
is used.
– Look at the port statistics for high response time in internode communication.
An overloaded long-distance link causes high response times in the internode
messages (the Port to Remote Node Send Response Time statistic) that are sent by
IBM Spectrum Virtualize. If delays persist, the messaging protocols exhaust their
tolerance elasticity, and the GM protocol is forced to delay handling new foreground
writes while waiting for resources to free up.
– Look at the port statistics for buffer credit starvation.
The Zero Buffer Credit Percentage and Port Send Delay I/O Percentage statistic can
be useful here because you normally have a high value here as the link saturates. Only
look at ports that are replicating to the remote system.
– Look at the volume statistics (before the 1920 error is posted):
• Target volume write throughput approaches the link bandwidth.
If the write throughput on the target volume is equal to your link bandwidth, your link
is likely overloaded. Check what is driving this situation. For example, does the
peak foreground write activity exceed the bandwidth, or does a combination of this
peak I/O and the background copy exceed the link capacity?
• Source volume write throughput approaches the link bandwidth.
This write throughput represents only the I/O that is performed by the application
hosts. If this number approaches the link bandwidth, you might need to upgrade the
link’s bandwidth. Alternatively, reduce the foreground write I/O that the application is
attempting to perform, or reduce the number of remote copy relationships.
• Target volume write throughput is greater than the source volume write throughput.
If this condition exists, the situation suggests a high level of background copy and
mirrored foreground write I/O. In these circumstances, decrease the background
copy rate parameter of the GM partnership to bring back the combined mirrored
foreground I/O and background copy I/O rates within the remote links bandwidth.
Storage controllers
Investigate the primary and remote storage controllers, starting at the remote site. If the
back-end storage at the secondary system is overloaded, or another problem is affecting the
cache there, the GM protocol fails to keep up. Similarly, the problem exhausts the
(gmlinktolerance) elasticity and has a similar effect at the primary system.
452 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Node
For IBM Storage Virtualize node hardware, the possible cause of the 1920 errors might be
from a heavily loaded secondary or primary system. If this condition persists, a 1920 error
might be posted.
GM must synchronize its I/O processing across all nodes in the system to ensure data
consistency. If any node is running out of memory, it can affect all relationships. So, check the
CPU cores usage statistic. If CPU usage looks higher when there is a performance problem,
then running out of CPU bandwidth might be causing the problem. Of course, CPU usage
goes up when the IOPS going through a node goes up, so if the workload increases, you
would expect to see CPU usage increase.
If there is an increase in CPU usage on the secondary system but no increase in IOPS and
volume write latency increases too, it is likely that the increase in CPU usage caused the
increased volume write latency. In that case, try to work out what might have caused the
increase in CPU usage (for example, starting many FlashCopy mappings). Consider moving
that activity to a time with less workload. If there is an increase in both CPU usage and IOPS,
and the CPU usage is close to 100%, then that node might be overloaded. A Port-to-local
node send queue time value higher than 0.2 ms often denotes overloaded CPU cores.
In a primary system, if it is sufficiently busy, the write ordering detection in GM can delay
writes enough to reach a latency of gmmaxhostdelay and cause a 1920 error. Stopping
replication potentially lowers CPU usage, and also lowers the opportunities for each I/O to be
delayed by slow scheduling on a busy system.
If you checked CPU core utilization on all the nodes and it has not gotten near 100%, a high
Port to local node send response time means that there is fabric congestion or a slow-draining
FC device.
A good indicator of SAN congestion is the Zero Buffer Credit Percentage and Port Send Delay
I/O Percentage on the port statistics. For more information about buffer credit, see “Buffer
credits” on page 354.
If a port has more than 10% zero buffer credits, that situation definitely causes a problem for
all I/O, not just GM writes. Values 1 - 10% are moderately high and might contribute to
performance issues.
For both primary and secondary systems, congestion on the fabric from other slow-draining
devices becomes much less of an issue when only dedicated ports are used for node-to-node
traffic within the system. However, these ports become an option only on systems with more
than four ports per node. Use port masking to segment your ports.
FlashCopy considerations
Check that FlashCopy mappings are in the prepared state. Check whether the GM target
volumes are the sources of a FlashCopy mapping and whether that mapping was in the
prepared state for an extended time.
FlashCopy can add significant workload to the back-end storage, especially when the
background copy is active (see “Background copy considerations” on page 319). In cases
where the remote system is used to create golden or practice copies for DR testing, the
workload that is added by the FlashCopy background processes can overload the system.
This overload can lead to poor remote copy performances and then to a 1920 error, even
though with IBM FlashSystem this issue is not much of one because of a high-performing
flash back end.
Careful planning of the back-end resources is important with these kinds of scenarios.
Reducing the FlashCopy background copy rate can also help to mitigate this situation.
Furthermore, the FlashCopy CoW process adds some latency by delaying the write
operations on the primary volumes until the data is written to the FlashCopy target.
This process does not directly affect the remote copy operations because it is logically placed
below the remote copy processing in the I/O stack, as shown in Figure 6-7 on page 308.
Nevertheless, in some circumstances, especially with write-intensive environments, the CoW
process tends to stress some of the internal resources of the system, such as CPU and
memory. This condition also can affect the remote copy that competes for the same
resources, eventually leading to 1920 errors.
FCIP considerations
When you get a 1920 error, always check the latency first. The FCIP routing layer can
introduce latency if it is not properly configured. If your network provider reports a much lower
latency, you might have a problem at your FCIP routing layer. Most FCIP routing devices have
built-in tools to enable you to check the RTT. When you are checking latency, remember that
TCP/IP routing devices (including FCIP routers) report RTT by using standard 64-byte ping
packets.
In Figure 6-34 on page 385, you can see why the effective transit time must be measured only
by using packets that are large enough to hold an FC frame, or 2148 bytes (2112 bytes of
payload and 36 bytes of header). Set estimated resource requirements to be a safe amount
because various switch vendors have optional features that might increase this size. After you
verify your latency by using the proper packet size, proceed with normal hardware
troubleshooting.
Look at the second largest component of your RTT, which is serialization delay. Serialization
delay is the amount of time that is required to move a packet of data of a specific size across
a network link of a certain bandwidth. The required time to move a specific amount of data
decreases as the data transmission rate increases.
Figure 6-34 on page 385 shows the orders of magnitude of difference between the link
bandwidths. It is easy to see how 1920 errors can arise when your bandwidth is insufficient.
Never use a TCP/IP ping to measure RTT for FCIP traffic.
454 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Figure 6-34 Effect of packet size (in bytes) versus the link size
In Figure 6-34, the amount of time in microseconds that is required to transmit a packet
across network links of varying bandwidth capacity is compared. The following packet sizes
are used:
64 bytes: The size of the common ping packet
1500 bytes: The size of the standard TCP/IP packet
2148 bytes: The size of an FC frame
Finally, your path maximum transmission unit (MTU) affects the delay that is incurred to get a
packet from one location to another location. An MTU might cause fragmentation or be too
large and cause too many retransmits when a packet is lost.
Hung I/O
A hung I/O condition is reached when a write operation is delayed in the IBM Storage
Virtualize stack for a significant time (typically seconds). This condition is monitored by
IBM Spectrum Virtualize, which eventually leads to a 1920 error if the delay is higher than
maxreplicationdelay settings.
Hung I/Os can be caused by many factors, such as back-end performance, cache fullness,
internal resource starvation, and remote copy issues. When the maxreplicationdelay setting
triggers a 1920 error, the following areas must be investigated:
Inter-site network disconnections: This kind of event generates partnership instability,
which leads to delayed mirrored write operations until the condition is resolved.
Secondary system poor performance: In the case of bad performance, the secondary
system can become unresponsive, which delays the replica of the write operations.
Primary or secondary system node warmstarts: During a node warmstart, the system
freezes all the I/Os for few seconds to get a consistent state of the cluster resources.
These events often are not directly related to the remote copy operations.
Note: The maxreplicationdelay trigger can occur even if the cause of the write delay is
not related to the remote copy. In this case, the replication suspension does not resolve the
hung I/O condition.
To exclude the remote copy as the cause of the hung I/O, the duration of the delay (peak
write response time) can be checked by using tools, such as IBM Spectrum Control. If the
measured delay is greater than the maxreplicationdelay settings, it is unlikely that the
remote copy is responsible.
When the relationship is restarted, you must resynchronize it. During this period, the data on
the MM or GM auxiliary volumes on the secondary system is inconsistent, and your
applications cannot use the volumes as backup disks. To address this data consistency
exposure on the secondary system, a FlashCopy of the auxiliary volumes can be created to
maintain a consistent image until the GM (or the MM) relationships are synchronized again
and back in a consistent state.
IBM Storage Virtualize provides the Remote Copy Consistency Protection feature that
automates this process. When Consistency Protection is configured, the relationship between
the primary and secondary volumes does not go in to the Inconsistent copying status after
it is restarted. Instead, the system uses a secondary CV to automatically copy the previous
consistent state of the secondary volume.
The relationship automatically moves to the Consistent copying status as the system
resynchronizes and protects the consistency of the data. The relationship status changes to
Consistent synchronized when the resynchronization process completes.
For more information about the Consistency Protection feature, see Implementation Guide for
IBM Spectrum Virtualize Version 8.5, SG24-8520.
To ensure that the system can handle the background copy load, delay restarting the MM or
GM relationship until a quiet period occurs. If the required link capacity is unavailable, you
might experience another 1920 error, and the MM or GM relationship might stop in an
inconsistent state.
Copy services tools, like IBM CSM, or manual scripts can be used to automatize the
relationships to restart after a 1920 error. CSM implements a logic to avoid recurring restart
operations in the case of a persistent problem. CSM attempts an automatic restart for every
occurrence of a 1720 or 1920 error of a certain number of times (determined by the
gmlinktolerance value) within a 30-minute period.
If the number of allowable automatic restarts is exceeded within the period, CSM does not
automatically restart GM on the next 1720 or 1920 error. Furthermore, with CSM it is possible
to specify the amount of time, in seconds, in which the tool waits after a 1720 or 1920 error
before automatically restarting the GM. For more information, see this I BM Documentation
web page.
Tip: When implementing automatic restart functions, it is a best practice to preserve data
consistency on GM target volumes during the resynchronization by using features such as
FlashCopy or Consistency Protection.
456 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
For example, GM can look at average delays. However, some hosts, such as VMware ESX,
might not tolerate a single I/O getting old, for example, 45 seconds, before it decides to
restart. Because it is better to terminate a GM relationship than it is to restart a host, you
might want to set gmlinktolerance to something like 30 seconds and then compensate so
that you do not get too many relationship terminations by setting gmmaxhostdelay to
something larger, such as 100 ms.
If you compare the two approaches, the default (gmlinktolerance 300, gmmaxhostdelay 5) is
a rule that means “If more than one third of the I/Os are slow and that happens repeatedly for
5 minutes, then terminate the busiest relationship in that stream.” In contrast, the example of
gmlinktolerance 30, gmmaxhostdelay 100 is a rule that means “If more than one third of the
I/Os are slow and that happens repeatedly for 30 seconds, then terminate the busiest
relationship in the stream.”
So the first approach picks up general slowness, and the other approach picks up shorter
bursts of extreme slowness that might disrupt your server environment. The general
recommendation is to change the gmlinktolerance and gmmaxhostdelay values progressively
and evaluate the overall impact to find an acceptable compromise between performances and
GM stability.
You can even disable the gmlinktolerance feature by setting the gmlinktolerance value to 0.
However, the gmlinktolerance parameter cannot protect applications from extended
response times if it is disabled. You might consider disabling the gmlinktolerance feature in
the following circumstances:
During SAN maintenance windows, where degraded performance is expected from SAN
components and application hosts can withstand extended response times from GM
volumes.
During periods when application hosts can tolerate extended response times and it is
expected that the gmlinktolerance feature might stop the GM relationships. For example,
you are testing the usage of an I/O generator that is configured to stress the back-end
storage. Then, the gmlinktolerance feature might detect high latency and stop the GM
relationships. Disabling the gmlinktolerance parameter stops the GM relationships at the
risk of exposing the test host to extended response times.
Another tunable parameter that interacts with the GM is the maxreplicationdelay. The
maxreplicationdelay settings do not mitigate the 1920 error occurrence because the
parameter adds a trigger to the 1920 error. However, maxreplicationdelay provides users
with a fine granularity mechanism to manage the hung I/Os condition and it can be used in
combination with gmlinktolerance and gmmaxhostdelay settings to better address particular
environment conditions.
Native IP replication uses SANslide technology, which was developed by Bridgeworks Limited
of Christchurch, UK. They specialize in products that can bridge storage protocols and
accelerate data transfer over long distances. Adding this technology at each end of a wide
area network (WAN) TCP/IP link improves the utilization of the link.
This technology improves the link utilization by applying patented artificial intelligence (AI) to
hide latency that is normally associated with WANs. Doing so can greatly improve the
performance of mirroring services, in particular GMCV over long distances.
Bridgeworks SANSlide technology, which is integrated into IBM Spectrum Virtualize, uses AI
to help optimize network bandwidth usage and adapt to changing workload and network
conditions. This technology can improve remote mirroring network bandwidth usage up to
three times. It can enable clients to deploy a less costly network infrastructure, or speed up
remote replication cycles to enhance DR effectiveness.
With an Ethernet network data flow, the data transfer can slow down over time. This condition
occurs because of the latency that is caused by waiting for the acknowledgment of each set of
packets that are sent. The next packet set cannot be sent until the previous packet is
acknowledged, as shown in Figure 6-35.
458 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
However, by using the embedded IP replication, this behavior can be eliminated with the
enhanced parallelism of the data flow. This parallelism uses multiple virtual connections
(VCs) that share IP links and addresses.
The AI engine can dynamically adjust the number of VCs, receive window size, and packet
size to maintain optimum performance. While the engine is waiting for one VC’s ACK, it sends
more packets across other VCs. If packets are lost from any VC, data is automatically
retransmitted, as shown in Figure 6-36.
Figure 6-36 Optimized network data flow by using Bridgeworks SANSlide technology
For more information about this technology, see IBM SAN Volume Controller and Storwize
Family Native IP Replication, REDP-5103.
Note: With code versions earlier than 8.4.2, only a single partnership over IP is
supported.
A system can have simultaneous partnerships over FC and IP, but with separate systems.
The FC zones between two systems must be removed before an IP partnership is
configured.
The use of WAN-optimization devices, such as Riverbed, is not supported in IP
partnership configurations containing SVC.
IP partnerships are supported by 25-, 10-, and 1-Gbps links. However, the intermix on a
single link is not supported.
The maximum supported RTT is 80 ms for 1-Gbps links.
The maximum supported RTT is 10 ms for 25- and 10-Gbps links.
The minimum supported link bandwidth is 10 Mbps.
The inter-cluster heartbeat traffic uses 1 Mbps per link.
Migrations of remote copy relationships directly from FC-based partnerships to
IP partnerships are not supported.
IP partnerships between the two systems can be over either IPv4 or IPv6, but not both.
Virtual local area network (VLAN) tagging of the IP addresses that are configured for
remote copy is supported.
Management IP addresses and internet Small Computer Systems Interface (ISCSI) IP
addresses on the same port can be in a different network.
An added layer of security is provided by using Challenge Handshake Authentication
Protocol (CHAP) authentication.
Direct-attached systems configurations are supported by the following restrictions:
– Only two direct-attach links are allowed.
– The direct-attach links must be on the same I/O group.
– Use two portsets, where a portset contains only the two ports that are directly linked.
TCP ports 3260 and 3265 are used for IP partnership communications. Therefore, these
ports must be open in firewalls between the systems.
Network address translation (NAT) between systems that are being configured in an IP
partnership group is not supported.
Only one remote copy data session per portset can be established. It is intended that only
one connection (for sending or receiving remote copy data) is made for each independent
physical link between the systems.
Note: A physical link is the physical IP link between the two sites: A (local) and
B (remote). Multiple IP addresses on local system A can be connected (by Ethernet
switches) to this physical link. Similarly, multiple IP addresses on remote system B can
be connected (by Ethernet switches) to the same physical link. At any point, only a
single IP address on cluster A can form a remote copy data session with an IP address
on cluster B.
The maximum throughput is restricted based on the usage of 1-Gbps or 10-Gbps Ethernet
ports. The output varies based on distance (for example, round-trip latency) and quality of
the communication link (for example, packet loss). The following maximum throughputs
are achievable:
– One 1-Gbps port can transfer up to 120 MB.
– One 10-Gbps port can transfer up to 600 MB.
460 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Inter-site links per IP Two links All models A maximum of two inter-site links
partnership can be used between two IP
partnership sites.
Ports per node One port All models A maximum of one port per node
can be used for an IP partnership.
IP partnership software 140 MBps All models Not the total limit of IP replication,
compression limit but only of compression.
VLAN tagging creates two separate connections on the same IP network for different types of
traffic. The system supports VLAN configuration on both IPv4 and IPv6 connections.
When the VLAN ID is configured for the IP addresses that are used for iSCSI host attachment
or IP replication, the suitable VLAN settings on the Ethernet network and servers must be
configured correctly to avoid connectivity issues. After the VLANs are configured, changes to
the VLAN settings disrupt iSCSI and IP replication traffic to and from the partnerships.
During the VLAN configuration for each IP address, the VLAN settings for the local and
failover ports on two nodes of an I/O group can differ. To avoid any service disruption,
switches must be configured so that the failover VLANs are configured on the local switch
ports, and the failover of IP addresses from a failing node to a surviving node succeeds.
If failover VLANs are not configured on the local switch ports, no paths are available to
IBM Storage Virtualize during a node failure, and the replication fails.
Consider the following requirements and procedures when implementing VLAN tagging:
VLAN tagging is supported for IP partnership traffic between two systems.
VLAN provides network traffic separation at layer 2 for Ethernet transport.
VLAN tagging by default is disabled for any IP address of a node port. You can use the CLI
or GUI to set the VLAN ID for port IP addresses on both systems in the IP partnership.
When a VLAN ID is configured for the port IP addresses that are used in remote copy port
groups, the VLAN settings on the Ethernet network must also be properly configured to
prevent connectivity issues.
Setting VLAN tags for a port is disruptive. Therefore, VLAN tagging requires that you stop the
partnership first before you configure VLAN tags. Then, restart when the configuration is
complete.
Using the IBM Storage Virtualize management GUI, you can use DNS of the partner system
by using the following procedure:
1. Select Copy Services → Partnerships and select Create Partnership.
2. Select 2-Site Partnership and click Continue.
3. On the Create Partnership page, select IP.
4. To configure the partnership, enter either the partner system IP address or domain name,
and then select the IP address or domain name of the partner system.
6.6.5 IP compression
IBM Storage Virtualize can use the IP compression capability to speed up replication cycles
or to reduce bandwidth utilization.
This feature reduces the volume of data that must be transmitted during remote copy
operations by using compression capabilities like the ones with existing IBM Real-time
Compression (RtC) implementations.
No license: The IP compression feature does not require a RtC software license.
The data compression is made within the IP replication component of the IBM Storage
Virtualize code. This feature can be used with MM, GM, and GMCV. The IP compression
feature provides two kinds of compression mechanisms: hardware compression and software
compression.
To evaluate the benefits of the IP compression, use the Comprestimator tool to estimate the
compression ratio (CR) of the data to be replicated. The IP compression can be enabled and
disabled without stopping the remote copy relationship by using the mkippartnership and
chpartnership commands with the -compress parameter. Furthermore, in systems with
replication that is enabled in both directions, the IP compression can be enabled in only one
direction. IP compression is supported for IPv4 and IPv6 partnerships.
462 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Portsets are groupings of logical addresses that are associated with the specific traffic types.
IBM Storage Virtualize supports IP (iSCSI or iSCSI Extensions for Remote Direct Memory
Access (RDMA) (iSER)) or FC portsets for host attachment, IP portsets for back-end storage
connectivity (iSCSI only), and IP replication traffic. Each physical Ethernet port can have a
maximum of 64 IP addresses with each IP address on a unique portset.
A portset object is a system-wide object that might contain IP addresses from every I/O
group. Figure 6-37 shows a sample of a portsets definition across the canister ports in a 2-I/O
group IBM Storage Virtualize Storage system cluster.
Multiple IBM Storage Virtualize canisters or nodes can be connected to the same physical
long-distance link by setting IP addresses in the same portset. Samples of supported
configurations are described in 6.6.7, “Supported configurations examples” on page 395.
In scenarios with two physical links between the local and remote clusters, two separate
replication portsets must be used to designate which IP addresses are connected to which
physical link. The relationship between the physical links and the replication portsets is not
monitored by the IBM Storage Virtualize code. Therefore, two different replication portsets
can be used with a single physical link and vice versa.
All IP addresses in a replication portset must be IPv4 or IPv6 addresses (IP types cannot be
mixed). IP addresses can be shared among replication and host type portsets, although it is
not recommended.
Note: The concept of a portset was introduced in IBM Storage Virtualize 8.4.2 and the IP
Multi-tenancy feature. Versions before 8.4.2 use the remote copy port groups concept to
tag the IP addresses to associate with an IP partnership. For more information about the
remote copy port group configuration, see this IBM Documentation web page.
When upgrading to version 8.4.2, an automatic process occurs to convert the remote copy
port groups configuration to an equivalent replication portset configuration.
If the IP partnership cannot continue over an IP address, the system fails over to another IP
address within that portset. Some reasons this issue might occur include the switch to which
it is connected fails, the node goes offline, or the cable that is connected to the port is
unplugged.
For the IP partnership to continue during a failover, multiple ports must be configured within
the portset. If only one link is configured between the two systems, configure at least two IP
addresses (one per node) within the portset. You can configure these two IP addresses on
two nodes within the same I/O group or within separate I/O groups.
While failover is in progress, no connections in that portset exist between the two systems in
the IP partnership for a short time. Typically, failover completes within 30 seconds to 1 minute.
If the systems are configured with two portsets, the failover process within each portset
continues independently of each other.
The disadvantage of configuring only one link between two systems is that during a failover, a
discovery is initiated. When the discovery succeeds, the IP partnership is reestablished. As a
result, the relationships might stop, in which case a manual restart is required. To configure
two inter-system links, you must configure two replication type portsets.
When a node fails in this scenario, the IP partnership can continue over the other link until the
node failure is rectified. Then, failback occurs when both links are again active and available
to the IP partnership. The discovery is triggered so that the active IP partnership data path is
made available from the new IP address.
In a two-node system or when more than one I/O group exists and the node in the other I/O
group has IP addresses within the replication portset, the discovery is triggered. The
discovery makes the active IP partnership data path available from the new IP address.
464 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Figure 6-38 Only one link on each system and canister with failover ports configured
Figure 6-38 shows two systems: System A and System B. A single portset is used with IP
addressees on two Ethernet ports, one each on Canister A1 and Canister A2 on System A.
Similarly, a single portset is configured on two Ethernet ports on Canister B1 and Canister B2
on System B.
Although two ports on each system are configured in the portset, only one Ethernet port in
each system actively participates in the IP partnership process. This selection is determined
by a path configuration algorithm that is designed to choose data paths between the two
systems to optimize performance.
The other port on the partner canister or node in the control enclosure behaves as a standby
port that is used during a canister or node failure. If Canister or Node A1 fails in System A, IP
partnership continues servicing replication I/O from Ethernet Port 2 because a failover port is
configured on Canister or Node A2 on Ethernet Port 2.
However, it might take some time for discovery and path configuration logic to reestablish
paths post-failover. This delay can cause partnerships to change to Not_Present for that time.
The details of the particular IP port that is actively participating in IP partnership is provided in
the lspartnership output (reported as link1_ip_id and link2_ip_id).
466 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Figure 6-39 Clustered or multinode systems with a single inter-site link with only one link
Figure 6-39 shows a 4-control enclosure system or an 8-node system (System A in Site A)
and a 2-control enclosure system or a 4-node system (System B in Site B). A single
replication portset is used on canisters or nodes A1, A2, A5, and A6 on System A at Site A.
Similarly, a single portset is used on canisters or nodes B1, B2, B3, and B4 on System B.
Although four control enclosures or four I/O groups (eight nodes) are in System A, only two
control enclosures or I/O groups are configured for IP partnerships. Port selection is
determined by a path configuration algorithm. The other ports play the role of standby ports.
If Canister or Node A1 fails in System A, IP partnership continues to use one of the ports that
is configured in the portset from any of the canisters or nodes from either of the two control
enclosures in System A.
However, it might take some time for discovery and path configuration logic to reestablish
paths post-failover. This delay might cause partnerships to change to the Not_Present state.
This process can lead to remote copy relationships stopping. The administrator must
manually start them if the relationships do not auto-recover.
468 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Figure 6-40 Dual links with two replication portsets on each system configured
As shown in Figure 6-40, two replication portsets are configured on System A and System B
because two inter-site links are available. In this configuration, the failover ports are not
configured on partner canisters or nodes in the control enclosure or I/O group. Rather, the
ports are maintained in different portsets on both of the canisters or nodes. They can remain
active and participate in an IP partnership by using both of the links. Failover ports cannot be
used with this configuration because only one active path per canister per partnership is
allowed.
However, if either of the canisters or nodes in the control enclosure or I/O group fail (that is, if
Canister or Node A1 on System A fails), the IP partnership continues from only the available
IP that is configured in portset that is associated to link 2. Therefore, the effective bandwidth
of the two links is reduced to 50% because only the bandwidth of a single link is available until
the failure is resolved.
During a canister or node failure or link failure, the IP partnership traffic continues from the
other available link. Therefore, if two links of 10 Mbps each are available and you have
20 Mbps of effective link bandwidth, bandwidth is reduced to 10 Mbps only during a failure.
After the canister or node failure or link failure is resolved and failback happens, the entire
bandwidth of both of the links is available as before.
Figure 6-41 Clustered/multinode systems with dual inter-site links between the two systems
470 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Figure 6-41 on page 400 shows a 4-control enclosure or an 8-node System A in Site A and a
2-control enclosure or a 4-node System B in Site B. Canisters or nodes from only two control
enclosures or two I/O groups are configured with replication portsets in System A.
In this configuration, two links and two control enclosures or two I/O groups are configured
with replication portsets. However, path selection logic is managed by an internal algorithm.
Therefore, this configuration depends on the pathing algorithm to decide which of the
canisters or nodes actively participate in IP partnership. Even if Canister or Node A5 and
Canister or Node A6 have IP addresses that are configured within replication portsets
properly, active IP partnership traffic on both of the links can be driven from Canister or Node
A1 and Canister or Node A2 only.
Figure 6-42 Multiple IP partnerships with two links and only one I/O group
472 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
In this configuration, two links and only one control enclosure or one I/O group are configured
with replication portsets in System A. Both replication portsets use the same Ethernet ports in
Canister or Node A1 and A2. System B uses a replication portset that is associated to link 1,
and System C uses a replication portset that is associated to link 2. System B and System C
have configured portsets across both control enclosures.
However, it might take some time for discovery and path configuration logic to reestablish
paths post-failover. This delay can cause partnerships to change to Not_Present for that time,
which can lead to a replication stopping. The details of the specific IP port that is actively
participating in then IP partnership is provided in the lspartnership output (reported as
link1_ip_id and link2_ip_id).
474 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
In this configuration, two links and two control enclosures or two I/O groups are configured
with replication portsets in System A. System A control enclosure 0 or I/O Group 0 (Canister
or Node A1 and Canister or Node A2) use IP addresses on the replication portset that is
associated to link 1, and control enclosure 1 or I/O Group 1 (Canister or Node A3 and
Canister or Node A4) use IP addresses on the replication portset that is associated to link 2.
System B uses a replication portset that is associated to link 1, and System C uses a
replication portset that is associated to link 2. System B and System C have configured
portsets across both control enclosures or I/O groups.
However, it might take some time for the discovery and path configuration logic to reestablish
paths post-failover. This delay can cause partnerships to change to Not_Present for that time,
which can lead to a replication stopping. The partnership for System A to System C remains
unaffected. The details of the specific IP port that is actively participating in the IP partnership
are provided in the lspartnership output (reported as link1_ip_id and link2_ip_id).
Replication portsets: Configuring two replication portsets provides more bandwidth and
resilient configurations in a link failure. Two replication portsets also can be configured with
a single physical link. This configuration makes sense only if the total link bandwidth
exceeds the aggregate bandwidth of two replication portsets together. The usage of two
portsets when the link bandwidth does not provide the aggregate throughput can lead to
network resources contention and bad link performance.
Nevertheless, with poor quality networks that have significant packet loss and high latency,
the actual usable bandwidth might decrease considerably.
Figure 6-44 shows the throughput trend for a 1-Gbps port regarding the packet loss ratio and
the latency.
Figure 6-44 shows how the combined effect of packet loss and latency can lead to a
throughput reduction of more than 85%. For these reasons, the IP replication option should
be considered only for replication configurations that are not affected by poor quality and poor
performing networks. Due to its characteristic of low-bandwidth requirement, GMCV is the
preferred solution with IP replication.
To improve performance when using compression and an IP partnership in the same system,
use a different port for iSCSI host I/O and IP partnership traffic. Also, use a different VLAN ID
for iSCSI host I/O and IP partnership traffic.
The first storage pool contains the original (primary volume copy). If one storage controller or
storage pool fails, a volume copy is not affected if it was placed on a different storage
controller or in a different storage pool.
If a volume is created with two copies, both copies use the same virtualization policy.
However, you can have two copies of a volume with different virtualization policies. In
combination with thin-provisioning, each mirror of a volume can be thin-provisioned,
compressed or fully allocated, and in striped, sequential, or image mode.
476 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
A mirrored (secondary) volume has all the capabilities of the primary volume copy. It also has
the same restrictions (for example, a mirrored volume is owned by an I/O group, as with any
other volume). This feature also provides a point-in-time copy function that is achieved by
“splitting” a copy from the volume. However, the mirrored volume does not address other
forms of mirroring that are based on remote copy (GM or MM functions), which mirrors
volumes across I/O groups or clustered systems.
One copy is the primary copy, and the other copy is the secondary copy. Initially, the first
volume copy is the primary copy. You can change the primary copy to the secondary copy
if required.
When both copies are synchronized, the write operations are again directed to both copies.
The read operations usually are directed to the primary copy unless the system is configured
in an ESC topology, which applies to an SVC system only. With this system topology and the
enablement of site awareness capability, the concept of primary copy still exists, but is not
more relevant. The read operation follows the site affinity.
For example, consider an ESC configuration with mirrored volumes with one copy in Site A
and the other in Site B. If a host I/O read is attempted to a mirrored disk through an
IBM Storage Virtualize node in Site A, then the I/O read is directed to the copy in Site A, if it is
available. Similarly, a host I/O read that is attempted through a node in Site B goes to the Site
B copy.
Important: With an SVC ESC, keep consistency between the hosts, nodes, and storage
controller site affinity as long as possible to ensure the best performance.
Access: Servers can access the volume during the synchronization processes that are
described.
You can use mirrored volumes to provide extra protection for your environment or perform a
migration. This solution offers several options:
Stretched cluster configurations (only applicable to SVC)
Standard and ESC SVC configuration uses the volume mirroring feature to implement the
data availability across the sites.
Export to Image mode
With this option, you can move storage from managed mode to image mode. This option is
useful if you are using IBM Storage Virtualize as a migration device. For example, suppose
vendor A’s product cannot communicate with vendor B’s product, but you need to migrate
existing data from vendor A to vendor B.
By using Export to image mode, you can migrate data by using copy services functions
and then return control to the native array while maintaining access to the hosts.
Import to Image mode
With this option, you can import an existing storage MDisk or LUN with its existing data
from an external storage system without putting metadata on it. The existing data remains
intact. After you import it, you can use the volume mirroring function to migrate the storage
to the other locations while the data remains accessible to your hosts.
Volume cloning by using volume mirroring and then by using the Split into New Volume
option
With this option, any volume can be cloned without any interruption to host access. You
must create two mirrored copies of the data and then break the mirroring with the split
option to make two independent copies of data. This option does not apply to already
mirrored volumes.
478 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Use case: Volume mirroring can be used to migrate volumes from and to DRPs, which
do not support extent-based migrations. For more information, see 4.3.6, “Data
migration with data reduction pools” on page 200.
When you use volume mirroring, consider how quorum candidate disks are allocated. Volume
mirroring maintains some state data on the quorum disks. If a quorum disk is not accessible
and volume mirroring cannot update the state information, a mirrored volume might need to
be taken offline to maintain data integrity. To ensure the HA of the system, ensure that
multiple quorum candidate disks, which are allocated on different storage systems, are
configured.
Quorum disk consideration: Mirrored volumes can be taken offline if there is no quorum
disk that is available. This behavior occurs because the synchronization status for mirrored
volumes is recorded on the quorum disk. To protect against mirrored volumes being taken
offline, follow the guidelines that are described in the previous paragraph.
Split a volume copy from a mirrored volume and create a volume with the split copy:
– This function is allowed only when the volume copies are synchronized. Otherwise, use
the -force command.
– It is not possible to recombine the two volumes after they are split.
– Adding and splitting in one workflow enables migrations that are not currently allowed.
– The split volume copy can be used as a means for creating a point-in-time copy
(clone).
Repair or validate volume copies by comparing them and performing the following three
functions:
– Report the first difference found. The function can iterate by starting at a specific LBA
by using the -startlba parameter.
– Create virtual medium errors where there are differences. This function is useful if
there is back-end data corruption.
– Correct the differences that are found (reads from primary copy and writes to
secondary copy).
View volumes that are affected by a back-end disk subsystem being offline:
– Assume that a standard usage is for a mirror between disk subsystems.
– Verify that mirrored volumes remain accessible if a disk system is being shut down.
– Report an error in case a quorum disk is on the back-end disk subsystem.
Expand or shrink a volume:
– This function works on both of the volume copies at once.
– All volume copies always have the same size.
– All copies must be synchronized before expanding or shrinking them.
Delete a volume. When a volume is deleted, all copies are deleted for that volume.
Migration commands apply to a specific volume copy.
Out-of-sync bitmaps share the bitmap space with FlashCopy and MM and GM. Creating,
expanding, and changing I/O groups might fail if there is insufficient memory.
GUI views contain volume copy IDs.
A mirrored volume looks the same to upper-layer clients as a nonmirrored volume. Upper
layers within the cluster software, such as FlashCopy and MM and GM, and storage clients,
do not know whether a volume is mirrored. They all continue to handle the volume as they did
before without being aware of whether the volume is mirrored.
480 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Attributes of a Volume:
- Volume name
- Volume size
Volume ABC - Number of copies of the Volume
DS8700 - Volume synchonization rate
Volume is accessed by functions:
-FlashCopy
- Metro Mirror / Global Mirror
In Figure 6-46, IBM XIV and IBM DS8700 show that a mirrored volume can use different
storage devices.
This process runs at the default synchronization rate of 50 (as shown in Table 6-12), or at the
defined rate while creating or modifying the volume. For more information about the effect of
the copy rate setting, see 6.7.5, “Volume mirroring performance considerations” on page 412.
When the synchronization process completes, the volume mirroring copies are in the in-sync
state.
Table 6-12 Relationship between the rate value and the data copied per second
User-specified rate attribute value per volume Data copied per second
0 Synchronization is disabled.
1 - 10 128 KB
11 - 20 256 KB
21 - 30 512 KB
31 - 40 1 MB
51 - 60 4 MB
61 - 70 8 MB
User-specified rate attribute value per volume Data copied per second
71 - 80 6 MB
81 - 90 32 MB
91 - 100 64 MB
By default, when a mirrored volume is created, a format process also is initiated. This process
ensures that the volume data is zeroed to prevent access to data that is still present on the
reused extents.
This format process runs in the background at the defined synchronization rate, as shown in
Table 6-12 on page 411. Before IBM Storage Virtualize 8.4, the format processing overwrites,
with zeros, only Copy 0, and then synchronizes Copy 1. With version 8.4 or later, the format
process is initiated concurrently to both volume mirroring copies, which eliminates the second
synchronization step.
You can specify that a volume is synchronized (the -createsync parameter), even if it is not.
Using this parameter can cause data corruption if the primary copy fails and leaves an
unsynchronized secondary copy to provide data. Using this parameter can cause loss of read
stability in unwritten areas if the primary copy fails, data is read from the primary copy, and
then different data is read from the secondary copy. To avoid data loss or read stability loss,
use this parameter only for a primary copy that was formatted and not written to. When using
the -createsync setting, the initial formatting is skipped.
Another example use case for -createsync is for a newly created mirrored volume where both
copies are thin-provisioned or compressed because no data is written to disk and unwritten
areas return zeros (0). If the synchronization between the volume copies was lost, the
resynchronization process is incremental, which means that only grains that were written to
must be copied, which then receive synchronized volume copies again.
The progress of volume mirror synchronization can be obtained from the GUI or by using the
lsvdisksyncprogress command.
482 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Tip: Place volume copies of one volume on storage pools with the same or similar
characteristics. Usually, if only good read performance is required, you can place the
primary copy of a volume in a storage pool with better performance. Because the data is
read always and only from one volume copy, reads are not faster than without volume
mirroring.
However, this situation is only true when both copies are synchronized. If the primary is out
of sync, then reads are submitted to the other copy.
Synchronization between volume copies has a similar impact on the cluster and the back-end
disk subsystems as FlashCopy or data migration. The synchronization rate is a property of a
volume that is expressed as a value of 0 - 150. A value of 0 disables synchronization.
Table 6-12 on page 411 shows the relationship between the rate value and the data that is
copied per second.
Rate attribute value: The rate attribute is configured on each volume that you want to
mirror. The default value of a new volume mirror is 50%.
In large IBM Storage Virtualize configurations, the settings of the copy rate can considerably
affect the performance in scenarios where a back-end storage failure occurs. For example,
consider a scenario where a failure of a back-end storage controller is affecting one copy of
300 mirrored volumes. The host continues the operations by using the remaining copy.
When the failed controller comes back online, the resynchronization process for all 300
mirrored volumes starts concurrently. With a copy rate of 100 for each volume, this process
can add a theoretical workload of 18.75 GBps, which overloads the system.
Then, the general suggestion for the copy rate settings is to evaluate the impact of massive
resynchronization and set the parameters. Consider setting the copy rate to high values for
initial synchronization only, and with a few volumes at a time. Alternatively, consider defining a
volume provisioning process that allows the safe creation of already synchronized mirrored
volumes, as described in 6.7.4, “Volume mirroring synchronization options” on page 411.
Synchronized mirrored volume copy is taken offline and goes out of sync if the following
conditions occur. The volume remains online and continues to service I/O requests from the
remaining copy.
If a write I/O to a copy failed or a long timeout expired.
The system completed all available controller level error recovery procedures (ERPs).
The fast failover feature isolates hosts from temporarily and poorly performing back-end
storage of one copy at the expense of a short interruption to redundancy. The fast failover
feature behavior is that during normal processing of host write I/O, the system submits writes
to both copies with a timeout of 10 seconds (20 seconds for stretched volumes). If one write
succeeds and the other write takes longer than 5 seconds, then the slow write is stopped. The
FC abort sequence can take around 25 seconds.
When the stop completes, one copy is marked as out of sync, and the host write I/O
completed. The overall fast failover ERP aims to complete the host I/O in approximately
30 seconds (or 40 seconds for stretched volumes).
The fast failover can be set for each mirrored volume by using the chvdisk command and the
mirror_write_priority attribute settings:
Latency (default value): A short timeout prioritizing low host latency. This option enables
the fast failover feature.
Redundancy: A long timeout prioritizing redundancy. This option indicates that a copy that
is slow to respond to a write I/O can use the full ERP time. The response to the I/O is
delayed until it completes to keep the copy in sync if possible. This option disables the fast
failover feature.
Volume mirroring ceases to use slow copy for 4 - 6 minutes, and subsequent I/O data is not
affected by a slow copy. Synchronization is suspended during this period. After the copy
suspension completes, volume mirroring resumes, which allows I/O data and synchronization
operations to the slow copy, which often quickly completes the synchronization.
If another I/O times out during the synchronization, then the system stops using that copy
again for 4 - 6 minutes. If one copy is always slow, then the system tries it every 4 - 6 minutes
and the copy gets progressively more out of sync as more grains are written. If fast failovers
are occurring regularly, there is probably an underlying performance problem with the copy’s
back-end storage.
Shared bitmap space: This bitmap space on one I/O group is shared between MM, GM,
FlashCopy, and volume mirroring.
The command to create mirrored volumes can fail if there is not enough space to allocate
bitmaps in the target I/O Group. To verify and change the space that is allocated and available
on each I/O group by using the CLI, see Example 6-4.
484 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
2 io_grp2 0 0 0
3 io_grp3 0 0 0
4 recovery_io_grp 0 0 0
IBM_IBM FlashSystem:ITSO:superuser>lsiogrp io_grp0|grep _memory
flash_copy_total_memory 20.0MB
flash_copy_free_memory 20.0MB
remote_copy_total_memory 20.0MB
remote_copy_free_memory 20.0MB
mirroring_total_memory 20.0MB
mirroring_free_memory 20.0MB
raid_total_memory 40.0MB
raid_free_memory 40.0MB
flash_copy_maximum_memory 2048.0MB
compression_total_memory 0.0MB
.
IBM_IBM FlashSystem:ITSO:superuser>chiogrp -feature mirror -size 64 io_grp0
IBM_IBM FlashSystem:ITSO:superuser>lsiogrp io_grp0|grep _memory
flash_copy_total_memory 20.0MB
flash_copy_free_memory 20.0MB
remote_copy_total_memory 20.0MB
remote_copy_free_memory 20.0MB
mirroring_total_memory 64.0MB
mirroring_free_memory 64.0MB
raid_total_memory 40.0MB
raid_free_memory 40.0MB
flash_copy_maximum_memory 2048.0MB
compression_total_memory 0.0MB
To verify and change the space that is allocated and available on each I/O group by using the
GUI, see Figure 6-47.
Note: The new method uses volume groups, which are similar to CGs. However, in
addition to ensuring that a group of volumes is preserved at the same point in time, the
volume group also enables the simplification of restoration or recovery to that point in time.
It achieves this goal through the association of a group of volumes with a snapshot policy
that determines frequency and retention duration.
Note: Adding a volume to a group automatically configures replication for it. It also permits
configuration changes to be performed while the partnership is disconnected. The system
automatically reconfigures the DR system once the partnership is reconnected. The same
volume group might be used for multiple features (for example, with or without replication,
Flash Copy snapshots and Safeguarded Copy), which allows for a single definition of the
volumes that are required by an application
A new requirement for partnerships that are used for policy-based replication is the
installation of Transport Layer Security (TLS) certificates between partnered systems. The
certificates are required to provide secure communication between systems for managing
replication between the systems. Management traffic is authenticated by a certificate and is
encrypted using SSL. However, system takes care of exchanging the certificates and no
486 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
It is recommended that you create partnerships by using the GUI as this automates the
certificate exchange and setup. Both self-signed and signed certificates are supported.During
this process, both system exchange certificates to give each system configuration access
using REST API.
The partnership bandwidth defines the amount of bandwidth that is dedicated for replication
between each of the I/O groups in megabits per second.
The background copy rate defines the amount of the bandwidth, expressed as a percentage
that can be used for synchronization. For example, a link bandwidth of 1000 Mbps and a
background copy rate of 50 allows for 500 Mbps of synchronization traffic (62.5 MBps). This
control is a separate parameter because when Remote Copy is used, this control must be
managed by the user to balance foreground host writes and background synchronization
activity. Policy-based replication does not distinguish between the two cases and
automatically manages the bandwidth for each type.
Best practices:
If you are using only policy-based replication, then the background copy rate should be
set to 100% to allow all available bandwidth to be used.
If you are using both Remote Copy and Policy-based replication on the same
partnership, then the background copy rate can be used to balance the bandwidth
between policy-based replication and Remote Copy.
All traffic for policy-based replication is treated as background copy, so the rate can be
managed to guarantee bandwidth to Remote Copy, which is more sensitive to changes in the
available bandwidth.
Remote Copy has additional system settings for the following items:
To control the synchronization rate of an individual relationship.
This setting is not used for policy-based replication; the system automatically manages the
synchronization bandwidth.
To control the amount of memory used for replication.
License requirement: The feature uses the existing remote license and the existing rules
apply. PBR does not require or consume a FlashCopy license.
the volume placement. Volumes which are part of the replication can reside in either parent
pool or in the child pool.
A set of default provisioning policies are created when the first parent pool is created. These
are
capacity_optimized (thin-provisioned, compressed)
performance_optimized (creates fully-allocated volumes)
These policies can be renamed, deleted or user can create more similar custom policies. If
IBM FlashCore modules are used in the storage pool, then fully-allocated volumes still benefit
from compression performed by the FlashCore module.
A provisioning policy can be associated with multiple pools. A pool can be associated with, at
most, one provisioning policy. Provisioning policies are supported on parent pools and child
pools. A provisioning policy exists within a single system (unlike replication policies, which are
replicated between systems). Intentionally, few configurable options exist in the provisioning
policy. The defaults for the other attributes are chosen to match the most common usage and
best practices.
The association of the provisioning policy to a pool automatically causes it to be used for new
volumes that are created in the pool. This means that the provisioning policy is used simply
by specifying a pool that has a policy that is associated when creating a volume by using the
GUI or the mkvolume command. Provisioning policies can also be used for
non-replicatedvolumes to simplify volume creation.
Different pools can use different policies and they do not need to be symmetric.
Note: Provisioning policies are not used when change volumes are created for
policy-based replication as these are always created by the system by using best
practices.
Pool link
Storage pool linking is a mechanism which defines the target pool on the recovery system for
the replication. Unlike partnership that defines the remote system to be used during disaster
recovery, storage pool linking selects a specific pool on the recovery system for the volume
copies, also referred as secondary volumes. A pool can be linked to only one pool per
partnership. A pool link is required to be configured for a partnership to create replicated
volumes in that pool. The pools can be parent pools or child pools.
488 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
For a standard topology system, if a volume is created locally as a mirrored volume between
different pools, the system uses the pool link of the primary volume copy to identify the
remote pool to use.
If the system topology is stretched, then the pool in site 1 is used to identify the linked pool;
pools in site 2 are not required to be linked to replicate stretched volumes. The system does
not provide a way to automatically create mirrored volumes on a remote system, but mirror
copies can be added by the user.
The system does not provide a way to automatically create mirrored volumes on a remote
system, but mirror copies can be added by the user.
Figure 6-48 shows the various options for pool linking for replication.
Best practices: However, not all combinations will make sense for the pool linking. The
best practice is to maintain the symmetry in assigning the policies across both locations
while linking the pools.
Scenario-1
Client needs to create the pool configuration and want all volumes to be thick, no capacity
savings.
We aim to create the simplest configuration with all volumes set to thick provisioning, without
any capacity savings.
Scenario-2
We want all the volumes to be created with a particular capacity saving, e.g.
thin/compressed/deduplicated.
Create and assign a provisioning policy on the local and remote pools. The policy on each
system would specify the same capacity savings so that the production and recovery volumes
are created the uniformly.
Scenario-3
We need some volumes to be created thick and others to be created thin (or some other
capacity saving), such as thin/compressed/deduplicated.
To achieve this, we will use child pools with different provisioning policies for each pool. Using
a data reduction pool as the parent pool will allow the child pools to be quotaless, simplifying
capacity management and ensuring efficient storage utilization.
Scenario-4
Client wants to allocate a fixed amount of storage for different groups of users/applications by
utilizing child pools.
By setting a standard pool as the parent pool, we can enforce quotas on the child pools,
ensuring that each group gets the allocated storage. Optionally, we can assign provisioning
policies to the volumes within the child pools, based on your preferences, if capacity savings
are desired. This approach allows for efficient storage allocation and management across
various user/application groups.
Replication policies
Replication policies are a key concept for policy-based replication as they define how
replication should be configured between partnered systems. A replication policy specifies
how replication should be applied to a volume group and therefore to all volumes within that
group. A replication policy can be associated with any number of volume groups. A volume
group can have at most one replication policy associated with it.
Replication policy cannot be renamed and modified once they are in use and are associated
with the replication. This is important as it guarantees that both systems always have the
same definition of how replication should be configured. If changes are required to a
replication policy, a new policy can be created with the desired changes, and the associated
volume groups can be reassociated with the new policy.
490 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Note: Each system supports up to 32 replication policies to be created, which allows for
various locations, topologies, and RPOs to be defined.
The creation of a replication policy results in the system creating the policy on all systems that
are defined in the replication policy. A replication policy can only be created when the systems
are connected. A replication policy can be deleted only if there are no volume groups that are
associated with the policy and can be deleted while the systems are disconnected.
Replication policies do not define the direction in which the replication is performed. Instead,
the direction is determined when a replication policy is associated with a volume group, or a
volume group is created by specifying a replication policy. The system where this action was
performed is configured as the production copy of the volume group.
– If the volume name is not available on the recovery system, it will be appended
with_<number> until a unique value is found.
Volume related considerations:
– Recovery copies of a volume are never created as mirrored volumes.
– A replicated volume group cannot use Transparent Cloud Tiering.
– Remote Copy and policy-based replication can be configured on the same volume only
if the volume is operating as the production volume for both types of replication, for the
purpose of migrating from Remote Copy to policy-based replication.
– Restore-in-place using FlashCopy or using replicated volumes as the target of a legacy
FlashCopy map is not supported.
– Replicated volumes cannot be expanded or shrunk.
• Workaround is to remove from the volume group, expand the volume, add to volume
group.
– Replicated volumes must have cache enabled.
– No support for VVol replication and TCT (Transparent Cloud Tiering) Volumes.
– Image-mode volumes are not supported.
General:
– 3-site capabilities are not supported by PBR.
– Ownership groups are not supported by PBR.
492 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
To change the direction of replication, the volume group must first be made independent.
After the volume group is independent, the direction can be selected by choosing which
location should be the production copy of the volume group and then restarting replication
from that system. If you wish to perform a planned failover, it is important to suspend
application I/O and ensure that the recovery point is more recent than when the application
stopped performing I/O. Failure to do this can result in data missing after the change of
direction.
Note: Creating volumes in a replicated volume groups skips the need for a full
synchronisation.
An important concept is that the configuration and the data on the volumes are coupled and
the recovery point is formed from both the configuration and volume data. Adding, creating,
removing, or deleting volumes from a volume group occurs on the recovery system at the
equivalent point-in-time as they did on the production system. This means that if independent
access is enabled on a volume group during a disaster, then the data and volumes that are
presented are as they were from a previous point in time on the production system. This
might result in partially synchronized volumes being removed from the recovery system when
independent access is enabled, if those volumes are not in the recovery point.
When a volume is deleted from the production volume group, the copy of the volume from the
recovery volume group will not be immediately deleted. This volume is deleted when the
recovery volume group has a recovery point that does not include that volume.
This coupling requires that all configuration changes to the volume group are made from the
system that has the production copy of the volume group and the changes are reflected to the
recovery system. The exception to this requirement is the action of enabling independent
access on a recovery copy of a volume group as this action can be performed only on the
system that includes the recovery copy of the volume group.
When a replication policy is removed from a volume group, the recovery volume group and its
volumes are deleted. If you need to keep the recovery copy, independent access should be
enabled on the recovery copy first. Then, the replication policy can be removed from the
volume group.
Note: The replication policy that is associated with a volume group might be changed to a
compatible policy without requiring a full resynchronization. A compatible policy is defined
as having the same locations and allows for a replication policy to be replaced by another
policy with a different recovery-point objective without interrupting replication. If you
attempt to change the policy to an incompatible policy, it is rejected by the system. The
replication policy needs to be removed and the new policy must be associated with the
volume group.
Change recording mode is the default mode and is selected whenever the volume group
cannot replicate; this could be because of offline volumes, errors, pending configuration
changes, or if the partnership is disconnected. In this mode, the system records which
regions of the volume have been written to and need to be synchronized when replication
resumes.
When replication becomes possible, the production copy automatically selects either
journaling or cycling mode to perform replication based on which is best for the current
conditions. This auto-tuning allows for the system to dynamically switch between the two
modes, attempting to achieve the lowest possible recovery point while also avoiding latency
problems for the production volumes.
494 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
updates to the bitmap are mirrored between the nodes. Every volume is split into regions that
are called grains; each grain is a contiguous 128 KiB range and each bit in the bitmap
represents one grain.
Figure 6-49 on page 425 depicts the sample of bitmap and region mapping.
If one or more writes to the same region of the volume are active at the same time, it is known
as an overlapping write. It might sometimes be referred to as a colliding write. The storage
system needs to guarantee that the writes are applied in the same order on both systems to
maintain data consistency. In theory there should never be two active writes to the same
logical block; this behavior is described as invalid by storage specifications, but in practice it
can happen and must be handled by the storage array. To guarantee the ordering of
overlapping writes, the system selects one of the writes to process first. This write must be
completed before the processing of the second write is started. For best possible
performance, application workloads should be aligned to 8 KiB boundary or multiples thereof
(16 KiB, 32 KiB, and so on).
Sequencing
In journaling mode, every write is tagged with a sequence number that defines the order in
which the writes must be replayed at the recovery system to maintain consistency.
Sequencing is across all volumes in a volume group to maintain mutual consistency. To
achieve this, one node in the production I/O group is selected to generate sequence numbers
for the volume group. As a write is written into the write cache, a parallel process requests a
sequence number for the write. Once the sequence number is obtained and the local write is
complete, the host write is eligible for replication.
For performance reasons, sequencing is performed only within an I/O group and all volumes
in a volume group that uses policy-based replication must be in the same I/O group. A
common cause of performance problems with Remote Copy Global Mirror was the requesting
sequence numbers as the sequence number generator could be any node in the system. In
this new model, it is either the node that received the write or its I/O group partner node that
generates the sequence numbers. The system has significantly better performance
characteristics when it sends messages to its I/O group partner node than any other node in
the system, which results in better performance and a more consistent solution. Each volume
group operates independently and there is no coordination of sequence numbers between
volume groups.
Synchronization operations are also tagged with a sequence number, so they are correctly
interleaved into the stream of writes.
Journal
Each node maintains a large, volatile memory buffer for journaling host writes and
synchronization reads. The exact amount of journal capacity varies between models but (at
the time of writing) it can be between 1 GiB and 32 GiB per node. The journal size is related
to the number of CPU cores and memory in the node. The purpose of the journal is two fold.
Providing temporary in-memory storage for host writes or synchronization reads until they
can be replicated to the recovery copy and written to the local cache.
Buffering host writes in journaling mode so that replication and the local host write are
decoupled, which means replication problems cause the recovery point to extend instead
of delaying I/O for the production application.
Writes are replicated from the journal to the recovery system in sequence number order. On a
per-recovery system I/O group basis, writes are replicated in a first-in-first-out basis to ensure
that the recovery system can always make progress. Writes are replicated to the preferred
node of the volume in the remote system. If a direct connection is not present between the
two nodes, it is routed through local or remote nodes. If the preferred node is not available in
the remote system, the write is replicated to the online node in the caching I/O group for the
recovery volume.
To ensure that replication does not cause application performance problems, the journal
usage is constantly monitored by the system so it can proactively ensure that there is free
capacity in the journal for new host writes. The journal is divided by CPU core and maintains
lists of writes per volume group. However, resources are not divided between volume groups
as this could lead to unfairly penalizing volume groups with higher write workloads.
If the throughput of replication is less than the throughput of the local writes, then the journal
starts to fill up. The journal size is determined so that it can accommodate temporary bursts in
write throughput. However, if it is a sustained increase in the write throughput versus what
can be replicated, a proactive journal purge is triggered to prevent exhausting the resources.
All volume groups that are replicating in the I/O group have their journals evaluated, within the
context of the RPO defined on the replication policy, for peak and average usage and
evaluated. Volume groups that are associated with higher RPO replication policies and those
consuming the most journal resources are contenders to be purged first.
Purging a journal involves discarding the journal resources recording host writes that have
completed locally, but not yet replicated for the volume group. The volume group then uses
the bitmap to synchronize any grains that were affected by writes that were in the journal.
Volume groups have their journals purged if the volume group encounters an error that
prevents replication, such as offline volumes or changes in connectivity between the two
systems.
496 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
As writes are received from the production location, the recovery system orders them based
on their sequence number. The writes are then mirrored between the two nodes in the
recovery system I/O group and written to the volume once all earlier writes are mirrored and
processed. Non-overlapping writes can be submitted in parallel to optimize performance but
overlapping writes are replayed in sequence number order to ensure consistency. Only one
node submits a write, usually the preferred node, but both nodes track its progress through
the process.
Writes are recorded in non-volatile memory on both nodes in the I/O group so that they can
be recovered if the nodes restart. After the write is stored in non-volatile memory on both
nodes, it is marked as committed. Committing a write guarantees that it will be written in the
future to the volume and that it forms part of the recovery point.
This is a simplified overview of the process that provides a high throughput method of
replaying the writes, which is always able to establish a mutually consistent recovery point for
all volumes in a volume group.
Note: Volumes in the recovery system are always provisioned by the system; existing
volumes will not be used.
Change volumes
Every replicated volume has a change volume associated; these are managed automatically
by the system and are hidden from the CLI and GUI views as there are limited
user-interactions permitted to them.
Two FlashCopy maps are created between the volume and the change volume so that these
maps can be used to either take or restore a snapshot of the volume. Change volumes do not
require or contribute towards the FlashCopy license.
It is crucial to consider the capacity of change volumes when provisioning a system that uses
replication. The used capacity of change volumes changes significantly depending on the
amount of data that is required to preserve the snapshot. In most cases the data consumes a
small percentage of the virtual capacity of the volume. However, in extreme cases it is
possible for the data to consume an equal capacity to the volume. A general rule is to size the
system with an additional 10-20% capacity of the replicated volumes for change volumes.
However, this can vary by system. If there is low bandwidth between sites, but a high
write-throughput at the production location, then anticipate the need for more capacity for
change volumes. Similarly, if you are planning for the systems to be disconnected for an
extended time, the synchronization when they reconnect might need to copy a significant
amount of data. The change-volume capacity grows according to the amount of
synchronization that is needed.
Note: For example, if a system has 25 volumes that are each 40 GiB, the virtual capacity of
that system is 1000 GiB. The amount of usable capacity that is used depends on the data
that is written and the data reduction features that are used. Twenty-five change volumes
are created in each system that, by default, consume negligible usable capacity. If data
reduction features are not used, it would be a best practice to have between 1.1 and 1.2
TiB of usable capacity in this example.
The FlashCopy maps that are used by the change volumes are automatically started and
stopped, as required, to achieve the desired behavior for replication.
In general, the change volume is automatically maintained by the system in line with the
volume it protects. However, special cases exist for mirroring and migration when you use the
CLI.
If a volume is migrated to a different pool, the change volume must also be migrated.
When you use the GUI, this migration happens automatically.
However, a CLI user is required to migrate the change volume in the same way as the user
volume. The change-volume vdisk ID can be seen in the lsvdisk view of the user volume in
the changevolume1_id field.
Note: A CLI user can also specify the -showhidden parameter on the CLI views to display
the hidden change volume vdisks and FlashCopy maps.
For mirroring operations that add or remove a volume copy, the GUI automatically applies
the same operation to the change volume, system, which keeps it aligned with the user
volume.
A CLI user must use the addvdiskcopy and rmvdiskcopy commands to manage mirrored
copies of the change volume when the user volume is modified. When the addvdiskcopy
command is used on a change volume, the only optional parameters that can be specified are
those relating to the mirroring process itself (for example, autodelete and
mirrorwritepriority). This ensures that the change volume is created according to best
practice. All other parameters for the new copy of the change volume are defined by the
system.
6.8.8 Synchronization
The system automatically performs a synchronization whenever the volume group transitions
from change recording mode to either journaling or cycling mode. Before any synchronization
starts, the FlashCopy maps for the change volumes at the recovery system are started to
preserve the consistency of the volume group. The recovery point is frozen at the time of the
last write before the synchronization during the synchronization. This is required because
synchronization does not replay writes in order. Therefore, it does not maintain consistency
while it is in progress.
Synchronization uses the bitmap to identify the grains that need to be read from the
production volumes and written to the recovery volumes. Once the change volumes are
maintaining a snapshot of the recovery volumes, the synchronization process starts with the
recovery copy controlling the requests for grains to read from the production copy.
498 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch6-Copy Services.fm
Synchronization reads are interleaved with host writes (if in journaling mode) and added to
the journal. The production system automatically manages the amount of synchronization
based on the usage of resources available in the journal. This ensures that synchronization
activity does not consume too many journal resources, thus avoiding journal purges that are
caused by synchronization.
If synchronization encounters a read error, such as a medium error, the volume group stops
replicating and an error is logged against the volume group. After the problem that caused the
read error is resolved, replication restarts when the error is marked as fixed. If
synchronization is interrupted, such as by the partnership disconnecting, it restarts
automatically when it is able and the change volume continues to protect the original
snapshot. The snapshot for the change volume is discarded only after all volumes in the
volume group complete synchronization or the recovery copy of the volume group is deleted.
If the recovery copy of a volume group is made independent during a synchronization, the
change volume is used to restore the snapshot onto the host-accessible volumes. Once the
FlashCopy maps are reversed, the volumes are made accessible to the host. This triggers a
background process to undo any writes to the volumes that were done as part of the partial
synchronization.
Journaling mode
Journaling mode provides a high-throughput, low recovery point asynchronous replication
mode. Typically, the recovery point is expected to be below one second, but it can vary based
on the RTT (round-trip time), the available bandwidth between systems, and the performance
of the recovery system.
In this mode, every host write is tagged with a sequence number and added to the journal for
replicating. This is similar in characteristics to Remote Copy Global Mirror, but journaling
mode has a significantly greater throughput and the ability to extend the RPO to avoid
performance problems.
Cycling mode
Cycling mode uses synchronization and change volumes to periodically replicate to the
recovery system in a way that requires less bandwidth and consumes fewer journal
resources. Periodically, a snapshot is captured (using FlashCopy) of all volumes in the
volume group. The snapshot is stored on the change volume at the production system and is
maintained by using either Copy-on-Write or Redirect-on-Write, depending on the
storage-pool type and volume-capacity savings. The background synchronization process
copies only the changes to the remote system. After the synchronization is complete, the
snapshot at the production system is discarded and the process might repeat.
Cycling mode results in a higher recovery point than journaling mode, but has the distinct
advantage that it can coalesce writes to the same region of the volume, thus reducing the
bandwidth required between systems.
Mode switching
Asynchronous replication automatically adapts to the conditions by switching volume groups
between journaling and cycling mode. If a volume group experiences too many
journal-purges in a short period of time, it starts replicating by using cycling mode. This
usually happens if the replication throughput is not high enough to sustain the write
throughput for this volume group. Unlike Remote Copy Global Mirror with Change Volumes, a
defined cycle period does not exist. Rather, replication will cycle frequently enough to ensure
that the RPO that is defined by the replication policy is achieved (in the absence of errors).
The system aims to ensure that all volume groups meet their defined RPO.
The transitions between the modes are transparent and are managed by the system. The
mode that a volume group is currently using cannot be easily identified; the important aspect
is the current recovery point, which is visible against the volume group. It is this metric that
should be monitored. If the volume group exceeds the RPO that is defined on the replication
policy, an alert is raised.
The recovery point can be tracked by using the statistics that are produced by the production
system. It is available on the GUI, CLI, or REST API on both systems. This value is updated
periodically and is rounded up to the nearest second. More granular reporting is only
available within the XML statistics that can be retrieved by external monitoring tools.
Note: A running recovery point of zero does not guarantee that the copies are identical as
the value is updated periodically.
500 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch7 FS-BUSINESS CONTINUITY.fm
Information technology solutions now can manage planned and unplanned outages, and
provide the flexibility and cost efficiencies that are available from cloud-computing models.
This chapter briefly describes the Stretched Cluster, Enhanced Stretched Cluster (ESC), and
HyperSwap solutions for IBM Storage Virtualize systems. Technical details or implementation
guidelines are not presented in this chapter because they are described in separate
publications.
Both HA and DR can be used simultaneously. For example, solution can implement
HyperSwap as HA with a replica on a third site as DR.
Note: This book does not cover 3-site replication solutions. For more information,
see IBM Spectrum Virtualize 3-Site Replication, SG24-8504.
IBM Storage Virtualize systems support three different cluster topologies: standard, Stretched
cluster, and HyperSwap. Standard topology is supposed to provide redundancy with 99,9999
for a single Flash System but does not protect from a complete site failure. The other two
topologies are designed to have two production sites, and provide HA by keeping data access
in a failure at one of the sites. Stretched cluster and HyperSwap are described in this chapter.
502 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch7 FS-BUSINESS CONTINUITY.fm
cluster uses the volume mirroring feature to maintain synchronized independent copies of
user data on each site.
When implemented, you can use this configuration to maintain access to data on the system,
even if failures occur at different levels, such as the storage area network (SAN), back-end
storage, IBM Storage Virtualize node, or data center power provider.
Stretched cluster is considered a HA solution because both sites work as instances of the
production environment (no standby location is used). Combined with application and
infrastructure layers of redundancy, Stretched cluster s can provide enough protection for
data that requires availability and resiliency.
When IBM Storage Virtualize was first introduced, the maximum supported distance between
nodes within an I/O group was 100 meters (328 feet). With the evolution of code and the
introduction of new features, Stretched cluster configurations were enhanced to support
distances up to 300 km (186.4 miles). These geographically dispersed solutions use specific
configurations that use Fibre Channel (FC) or Fibre Channel over IP (FC/IP) switches, or
Multiprotocol Router (MPR) Inter-Switch Links (ISLs) between different locations. While as it
was discussed above the clustered configurations were enhanced to support bigger
distances, it is very important to take into account that each 100km of distance introduces
1ms delay, thus at maximum distance introduced communication delay might raise three
times.
A Stretched cluster solution still can be configured and used in small setups where both
production sites are in the same data center and it is recommended to take into account the
account different fire protection zones. However, for most use cases, ESC is a preferred
option.
Also it is important to note that in this case Stretched cluster is organized by the way the SAN
fabric is configured.
With IBM Storage Virtualize V7.5, site awareness was extended to hosts. This extension
enables more efficient distribution of host I/O traffic through the SAN, and easier host path
management.
Stretched cluster and ESC solutions can be combined with DR features, such as Metro Mirror
(MM) or Global Mirror (GM), which make it possible to keep three independent data copies,
and enable you to effectively manage rolling disaster scenarios.
In an ESC configuration, each site is defined as an independent failure domain. If one site
experiences a failure, the other site can continue to operate without disruption. Sites can be in
the same room, across rooms in the data center, in different buildings at the same campus, or
in different cities. Different types of sites protect against different types of failures.
In addition to two sites storing copies of data, in SC and ESC environments you must
configure a third site, which must be independent of data sites. The third site hosts a quorum
device that provides an automatic tie-breaker in case of communication failure between the
two main sites (for more information, see 7.4, “Comparing business continuity solutions” on
page 510) or any failure that prevents nodes on both sites from communication with each
other (for example one site is totally down). Communication loss condition, while nodes on
both sites are still active but inter-site communication is not possible, is also known as
split-brain scenario. Therefore, it is necessary to decide which site should continue to service
IO and which site should go to standby in order to keep data consistent. This is one of the
roles of the quorum site to help with this decision.
If configured correctly, the system continues to operate after the loss of one site. The key
prerequisite is that each site contains only one node from each I/O group. However, placing
one node from each I/O group in different sites for a stretched system configuration does not
provide HA. You must also configure the suitable mirroring technology and ensure that all
configuration requirements for those technologies are correctly configured.
Best practices:
As a best practice, configure an ESC system to include at least two I/O groups (four
nodes).
Communication between nodes, so called inter-site communication between nodes
should be separated from other IO, therefore dedicated private redundant fabric should
be configured.
The inter-site communication fabrics between sites have to be totally independent from
each other and should not have any common elements that can influence the fabrics
communication capabilities. It does not matter how many physical links there are
between sites (ISLs) as inter-site communication between nodes depends on the fabric
end-to-end communication. And if one fabric is compromised even having single
physical link down (and for example frames were lost), cluster considers fabric non
reliable and would switch to another fabric. If both fabrics are compromised, that can
happen if for example one has two ISL providers and one provider has problems and
providers are shared between fabrics and not single provider is dedicated to specific
fabric, cluster considers communication loss between two sites (as frames are lost on
both fabrics, even for short period) and one of the sites is set to standby.
504 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch7 FS-BUSINESS CONTINUITY.fm
It provides HA volumes that are accessible through two sites that are up to 300 kilometers
(km) apart. A fully independent copy of the data is maintained at each site. When data is
written by hosts at either site, both copies are synchronously updated before the write
operation completes. HyperSwap automatically optimizes itself to minimize data that is
transmitted between sites, and to minimize host read/write latency. For more information
about the optimization algorithm, see 7.6, “HyperSwap internals” on page 515.
Note: For more technical information about HyperSwap, see IBM HyperSwap: An
automated disaster recovery solution.
While SC or ESC actually stretched the IO groups between two sites, and it was possible
because SVC has standalone nodes that can be organized into IO groups and physically
moved around, IBM FlashSystem storages contains of enclosures with storage capabilities
and each enclosure already has two nodes on board that can not be splitted physical and that
are already organized into the IO group. So they can not be physically moved and the IO
groups can not be stretched. To resolve this condition and provide HA solution the IBM
Storage Virtualize HyperSwap configuration was introduced and this solution requires that at
least one control enclosure is implemented in each location.
In addition to the active-active MM feature that is used to mirror (replicate) data between sites,
the HyperSwap feature also introduced the site awareness concept for node canisters,
internal and external storage, and hosts.
Figure 7-1 Typical concept scheme HyperSwap configuration with IBM Storage Virtualize
With a copy of the data that is stored at each location, HyperSwap configurations can handle
different failure scenarios.
The Small Computer System Interface (SCSI) protocol allows storage devices to indicate the
preferred ports for hosts to use when they submit I/O requests. By using the Asymmetric
Logical Unit Access (ALUA) state for a volume, a storage controller can inform the host about
what paths are active and which ones are preferred. In a HyperSwap system topology, the
system advertises the host paths to “local” nodes (nodes on the same site as the host) as
Active Optimized. The path to remote nodes (nodes on a different site) is advertised as Active
Unoptimized. If after a failure there are no Optimized paths for a host, it starts by using
Unoptimized.
506 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch7 FS-BUSINESS CONTINUITY.fm
Figure 7-2 shows how HyperSwap operates in an I/O group failure at one of the sites. The
host on Site 1 detects that there are no paths to local nodes, and starts using paths to LUN
through Site 2.
Even if the access to an entire location (site) is lost, as shown in Figure 7-3 on page 508,
access to the disks remains available at the alternative location. This behavior requires
clustering software at the application and server layer to fail over to a server at the alternative
location and resume access to the disks.
The active-active synchronous mirroring feature, keeps both copies of the storage in
synchronization. Therefore, the loss of one location causes no disruption to the alternative
location.
The HyperSwap system depends on a quality and stability of an inter-site link. A best practice
requires isolation of the traffic between IBM Storage Virtualize cluster nodes on a dedicated
private SAN from other types of traffic traversing through the inter-site link, such as host and
back-end controller traffic. This task can be performed by using dedicated hardware, with a
Cisco virtual storage area network (VSAN), or Brocade Virtual Fabric when using the same
SAN switches.
Note: SAN24B-6 can now also be used with virtual fabric using F.O.S. 9.1.1 or higher.
The isolation is necessary because all host writes to a HyperSwap volume are mirrored to a
remote site by using an internode link. If these writes are delayed, then host performance
deteriorates because its writes are acknowledged only after they complete at both sites.
Note: For more information about traffic isolation and a recommended fabric configuration,
see IBM Spectrum Virtualize HyperSwap SAN Implementation and Design Best Practices,
REDP-5597.
508 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch7 FS-BUSINESS CONTINUITY.fm
c. After the initialization wizard completes and a cluster on a first control enclosure is set
up, initialize the remaining control enclosures by selecting As an additional node in
an existing system.
d. Complete the HyperSwap configuration by following the system setup wizard.
e. Change the system topology to HyperSwap by using the GUI topology wizard.
2. If control enclosures already are configured and running (for example, you want to join two
separate IBM FlashSystem clusters into one HyperSwap system):
a. Select one of the control enclosures that will retain its configuration and keep the data.
b. Migrate all the data away from the other control enclosures (systems) that need to
become a part of HyperSwap configuration.
c. After data is migrated, delete the system configuration and the data from those
systems to reset them to the “factory” state. For more information, see Removing a
control enclosure from a system.
d. Adjust the zoning to the HyperSwap configuration that you want so that all control
enclosures can see each other over the FC SAN.
e. Use the GUI of the first control enclosure (the one that was identified in step a on
page 509) to add control enclosures to the cluster.
f. Use the GUI topology wizard to change the topology to HyperSwap.
Note: For more information about system initialization, adding control enclosures, and
changing the topology, see Implementation Guide for IBM Storage FlashSystem and IBM
SAN Volume Controller: Updated for IBM Storage Virtualize Version 8.6, SG24-8542.
The function is available on SVC only. SVC only. All IBM Storage Virtualize
these products. based products that support
two or more I/O groups.
Complexity of the Command-line interface CLI or GUI on a single CLI or GUI on a single
configuration. (CLI) or GUI on a single system, and simple object system, and simple object
system, and simple object creation. creation.
creation.
Distance between sites. Up to 300 km (186.4 miles). Up to 300 km (186.4 miles). Up to 300 km (186.4 miles).
Technology for host to Standard host multipathing Standard host multipathing Standard host multipathing
access multiple copies and driver. driver. driver.
automatically fail over.
Cache is retained if only Yes, if a spare node is used. Yes, if a spare node is used. Yes.
one site is online? Otherwise, no. Otherwise, no.
Host-to-storage-system Manual configuration of the Automatic configuration that Automatic configuration that
path optimization. preferred node. is based on the host site is based on the host site
settings. Uses Asymmetric settings. Uses ALUA or
Logical Unit Access (ALUA) ANA (NVME) and TPGS.
or ANA (NVME) and Target
Port Group Support
(TPGS).
510 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch7 FS-BUSINESS CONTINUITY.fm
Scope of failure and Single volume. Single volume. One or more volumes. The
resynchronization. scope is user-configurable.
Ability to use FlashCopy Yes (there is no awareness Yes (there is no awareness Limited. You can use
with an HA solution. of the site locality of the of the site locality of the FlashCopy maps with a
data). data). HyperSwap volume as a
source to avoid sending
data across the link
between sites.
Ability to use MM, GM, or One remote copy. You can One remote copy. You can Support for 3-site solutions
Global Mirror with Change maintain current copies on maintain current copies on is available with
Volumes (GMCV) with an up to four sites. up to four sites. IBM Storage Virtualize 8.4
HA solution. or later.
512 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch7 FS-BUSINESS CONTINUITY.fm
Initially, IP quorum was used only as a tie-breaker solution. However, with the release 8.2.1, it
was expanded to store cluster configuration metadata, fully serving as an alternative for
quorum disk devices.
Note: IP quorum with metadata demands higher link bandwidth between system’s nodes
and the IP quorum host than IP quorum without metadata support. Also, enabling
metadata storage requires lower link latency.
Consider deploying IP quorum with metadata storage only if there are no other ways to
store metadata (for example, all the system’s back-end storage is internet Small Computer
Systems Interface (ISCSI)), or if you have an ensured high-quality network link between
nodes and the IP quorum host.
FC connectivity is not needed, to use an IP quorum application as the quorum device for the
third site (up to 5 IP quorums can be deployed, but only one quorum can be set to be active).
An IP quorum application can be run on any host at the third site, as shown in Figure 7-5.
However, the following strict requirements must be met on the IP network when an IP quorum
application is used:
Connectivity from the servers that are running an IP quorum application to the service IP
addresses of all nodes or node canisters. The network also must handle the possible
security implications of exposing the service IP addresses because this connectivity also
can be used to access the service assistant interface if the IP network security is
configured incorrectly.
On each server that runs an IP quorum application, ensure that only authorized users can
access the directory that contains the IP quorum application. Metadata is stored in the
directory in a readable format, so ensure access to the IP quorum application and the
metadata is restricted to only authorized users.
The gateway should not be susceptible to failure if one site goes down.
Port 1260 is used by the IP quorum application to communicate from the hosts to all nodes
or enclosures.
The maximum round-trip delay must not exceed 80 milliseconds (ms), which means 40 ms
each direction.
If you are configuring the IP quorum application without a quorum disk for metadata, a
minimum bandwidth of 2 megabytes per second (MBps) is ensured for traffic between the
system and the quorum application. If your system is using an IP quorum application with
quorum disk for metadata, a minimum bandwidth of 64 MBps is ensured for traffic between
the system and the quorum application.
Ensure that the directory that stores an IP quorum application with metadata contains at
least 250 MB of available capacity.
Quorum devices are also required at Site 1 and Site 2, and can be either disk-based quorum
devices or IP quorum applications. A maximum number of five IP quorum applications can be
deployed.
Important: Do not host the quorum disk devices or IP quorum applications on storage that
is provided by the system it is protecting, because during a tie-break situation, this storage
pauses I/O.
For more information about IP quorum requirements and installation, including supported
operating systems and Java runtime environments (JREs), see this IBM Documentation web
page.
For more information about quorum disk devices, see 3.5, “Quorum disks” on page 238.
Note: The IP quorum configuration process is integrated into the IBM Storage Virtualize
GUI and can be found by selecting Settings → Systems → IP Quorum.
With this configuration, you can specify which site will resume I/O after a disruption based on
the applications that run on each site or other factors. For example, you can specify whether a
selected site is the preferred for resuming I/O, or if the site automatically “wins” in tie-breaker
scenarios.
514 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch7 FS-BUSINESS CONTINUITY.fm
Preferred mode
If only one site runs critical applications, you can configure this site as preferred. During a
split-brain situation, the system delays processing tie-breaker operations on other sites that
are not specified as “preferred”. The designated preferred site has a timed advantage when a
split-brain situation is detected, and starts racing for the quorum device a few seconds before
the nonpreferred sites.
Therefore, the likelihood of reaching the quorum device first is higher. If the preferred site is
damaged or cannot reach the quorum device, the other sites have the chance to win the
tie-breaker and continue I/O.
Winner mode
This configuration is recommended for use when a third site is not available for a quorum
device to be installed. In this case, when a split-brain situation is detected, the site that is
configured as the winner always is the one to continue processing I/O regardless of the failure
and its condition. The nodes at the nonwinner site always lose the tie-breaker and stop
processing I/O requests until the fault is fixed.
The relationship uses the CVs as journaling volumes during any resynchronization process.
The master CV must be in the same I/O group as the master volume. The same applies to the
auxiliary CV and the auxiliary volume, note that in case if Volume Groups are used space is
needed in it.
The HyperSwap volume always uses the unique identifier (UID) of the master volume. The
HyperSwap volume is assigned to the host by mapping only the master volume, even though
access to the auxiliary volume is ensured by the HyperSwap function.
Figure 7-6 shows how a HyperSwap volume handles the UID relationship. Note that both
volumes in this active-active relationship is presented under one UID on the system.
In HyperSwap, host write operations can be submitted by hosts at both sites, but they are
always routed to the volume, which is the primary copy of the HyperSwap relationship. Then,
data is mirrored to the secondary copy over the inter-site link.
HyperSwap can automatically switch replication direction between sites. If a sustained write
workload (that is, more than 75% of write I/O operations for at least 20 minutes) is submitted
to a site with the secondary volume, the HyperSwap function switches the direction of the
active-active relationships, swapping the secondary volume to primary, and vice versa.
Replication direction can be switched on any HyperSwap volume independently, unless the
volumes are added to a single consistency group (CG). You can have the primary on Site 1
for one HyperSwap volume, and the primary on Site 2 for another HyperSwap volume at the
same time.
Host read operations that are submitted on any site always were directed to a primary copy
before IBM Storage Virtualize 8.3.1. Starting with version 8.3.1, reads always are a processed
local copy of the volume.
516 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch7 FS-BUSINESS CONTINUITY.fm
The HyperSwap and Stretched cluster or ESC features require implementing the storage
network to ensure that the inter-node communication on the FC ports on the control
enclosures (nodes) between the sites is on dedicated fabrics. No other traffic (hosts or
back-end controllers) or traffic that is unrelated to the distributed cluster can be allowed on
this fabric. Two fabrics are used: one private for the inter-node communication, and one public
for all other data.
A few SAN designs are available that can achieve this separation and some incorrect SAN
designs can result in some potential problems that can occur with incorrect SAN design and
implementation.
One other important consideration is to review the site attribute of all the components to make
sure that they are accurate. With the site awareness algorithm that is present in the
IBM Storage Virtualize code, optimizations are done to reduce the cross-site workload. If this
attribute is missing or not accurate, there might be unnecessary increased cross-site traffic,
which might lead to higher response time to the applications.
For more information about design options and some common problems, see the following
resources:
SAN and Fabric Resiliency Best Practices for IBM b-type Products, REDP-4722
Designing a Resilient SAN for IBM HyperSwap SVC and IBM Storage Virtualize
518 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
Chapter 8. Hosts
This chapter provides general guidelines and best practices for configuring host systems for
IBM Storage Virtualize based storage systems.
Before attaching a new host, confirm that the host is supported by IBM Storage Virtualize. For
more information about a detailed compatibility matrix, see IBM System Storage
Interoperation Center (SSIC).
The host configuration guidelines apply equally to all IBM Storage Virtualize systems.
Therefore, the product name often is referred to as an IBM Storage FlashSystem / SAN
Volume Controller.
For more information about host attachment, see the Host Attachment chapter in IBM Docs.
For more information about hosts that are connected by using Fibre Channel (FC), see
Chapter 2, “Storage area network guidelines” on page 121. Host connectivity is a key
consideration in overall storage area network (SAN) design.
520 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
A volume in IBM Storage Virtualize can only be accessed using a single protocol, either SCSI
or NVMe. This is because the storage controller in the system can only present a volume to a
host in one way.
However, a volume can be accessed using different transport protocols (iSCSI, FC, and so
on) from different hosts. For example, a volume could be accessed using iSCSI from one host
and FC from another host. This is because the transport protocol is the way that the host
communicates with the storage system, and the storage system can present a volume to
different hosts using different transport protocols.
IBM Storage Virtualize does not support the same host accessing a volume using different
transport protocols via the same host bus adapter. This is because the host bus adapter can
only communicate with the storage system using one transport protocol at a time.
All hosts and cluster connections need to use the same protocol type for a volume. Different
hosts can have different protocols working with the same storage port. So iSCSI, iWarp,
iSER, NVMe/RDMA and NVMe/TCP can coexist on the same port with different host mapping
types. This includes the ports that are used for replication.
Note: At the time of writing this book, systems with ROCE adapters cannot have MTU
greater than 1500 on 8.6.0.0 or later. The workaround is to reduce MTU to 1500.
Best practice: Keep FC tape (including virtual tape libraries) and FC disks on separate
HBAs. These devices have two different data patterns when operating in their optimum
mode. Switching between them can cause unwanted processor usage and performance
slowdown for the applications.
For more information about the IBM Storage Virtualize 8.6 NPIV configuration and details,
see 7.5, “N_Port ID Virtualization support”, in Implementation Guide for IBM Storage
FlashSystem and IBM SAN Volume Controller: Updated for IBM Storage Virtualize Version
8.6, SG24-8542.
For more information about configuring NPIV, see Chapter 2, “Storage area network
guidelines” on page 121.
Each LUN has their own IO queues, so especially in SCSI environments where you have only
one queue per LUN, you can get more throughput with LUNs working in parallel.
Best practice is minimum number on LUNs on the IO group should be equal or higher than
the number of cores in the IO group.
If you use a high number of volumes and build volume groups or consistency groups for
Snapshots or FlashCopy, it makes sense to have not too many volumes in there, definitely
less than 128. All volumes in a volume groups or consistency groups will be stopped at the
same time.
522 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
The user can control this effect by using fewer larger LUNs rather than many small LUNs.
However, you might need to tune queue depths and I/O buffers to support controlling the
memory and processing time efficiently.
For more information about queue depth, see, 8.10.1, “Queue depths” on page 534.
Note: Larger volume sizes also can help to reduce the number of remote copy
relationships in an RC consistency group (CG), which can lead to a performance benefit for
Global Mirror with Change Volumes (GMCV) in large environments. For more information
about the remote copy configuration limits, see Table 6-8 on page 430.
You can allocate the operating system volume of the SAN boot as the lowest SCSI ID (zero
for most hosts), and then allocate the various data disks. If you share a volume among
multiple hosts, consider controlling the SCSI ID so that the IDs are identical across the
hosts. This consistency ensures ease of management at the host level and prevents potential
issues during IBM Storage Virtualize updates and even node restarts, mostly for VMware
ESX operating systems.
If you are using image mode to migrate a host to IBM Storage Virtualize, allocate the volumes
in the same order that they were originally assigned on the host from the back-end storage.
The lshostvdiskmap command displays a list of virtual disks (VDisks) (volumes) that are
mapped to a host. These volumes are recognized by the specified host.
Example 8-1 shows the syntax of the lshostvdiskmap command that is used to determine the
SCSI ID and the UID of volumes.
3:HG-ESX6:7:50:vol_HG-ESX6_1:60050768108104A2F0000000000000A5:0:io_grp0:private:::scsi
3:HG-ESX6:8:51:vol_HG-ESX6_10:60050768108104A2F0000000000000A8:0:io_grp0:private:::scsi
Note: Example 8-3 shows the same volume that is mapped to five different hosts, but host
110 features a different SCSI ID than the other four hosts. This example is a not a
recommended practice that can lead to loss of access in some situations because of SCSI
ID mismatch.
Previously, a host was marked as degraded if one of the host ports logged off the fabric.
However, examples exist in which this marking might be normal and may cause confusion.
At the host level, a new status_policy setting is available that includes the following settings:
The complete setting uses the original host status definitions.
By using the redundant setting, a host is not reported as degraded unless not enough
ports are available for redundancy.
Check the supported server adapter and operating systems at IBM System Storage
Interoperation Center (SSIC).
524 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
Internet Small Computer Systems Interface (iSCSI) is a common protocol usable on any
Ethernet network. With IBM Storage Virtualize 8.6 the performance was improved, especially
on the midrange and high end systems. IBM Storage Virtualize 8.6 now supports up 1024
iSCSI Hosts per IO Group, depending on System.
Priority Flow Control for iSCSI / iSER is supported on Emulex and Chelsio adapters (SVC
supported) with all DCBX enabled switches.
Maximum 4 ports in a port group per host. All ports need to be on the same speed.
The primary signature of this issue: read performance is significantly lower than write
performance. Transmission Control Protocol (TCP) delayed acknowledgment is a technique
that is used by some implementations of the TCP to improve network performance. However,
in this scenario where the number of outstanding I/O is 1, the technique can significantly
reduce I/O performance.
In essence, several ACK responses can be combined into a single response, reducing
protocol overhead. As described in RFC 1122, a host can delay sending an ACK response by
up to 500 ms. Additionally, with a stream of full-sized incoming segments, ACK responses
must be sent for every second segment.
Note: At the moment systems with RoCE Adapters cannot have MTU greater than 1500 on
8.6.0.0 or later. The workaround is to reduce MTU to 1500.
The network must support jumbo frames end-to-end to be effective. To verify, send a ping
packet to be delivered without fragmentation to verify that the network supports jumbo
frames. For example:
Windows:
ping -t <iscsi target ip> -S <iscsi initiator ip> -f -l <new mtu size - packet
overhead (usually 36, might differ)>
Verify the switch's port statistic where initiator/target ports are connected to make sure that
packet drops are not high. Review network architecture to avoid any bottlenecks and
oversubscription. The network needs to be balanced to avoid any packet drop; packet drop
significantly reduces storage performance. Involve networking support to fix any such issues.
Node 2:
Port 1: 192.168.1.12
Port 2: 192.168.2.22
Port 3: 192.168.3.33
Avoid situations where 50 hosts are logged in to port 1 and only five hosts are logged in to
port 2.
Use proper subnetting to achieve a balance between the number of sessions and
redundancy.
526 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
IBM FlashSystem systemwide CHAP is a two-way authentication method that uses CHAP to
validate both the initiator and target. Changing or removing this systemwide CHAP can be
disruptive, as it requires changes to the host configuration. If CHAP is changed without
updating the host configuration, it can result in volume or LUN outages.
For any changes to systemwide CHAP, the host iscsi configuration requires an update to
ensure it should only use a valid target (FlashSystem) CHAP, which is being updated.
It is possible to have unique target authentication CHAP for individual host initiators. Using
the chhost command allows individual host objects to offer unique username and
chapsecret. However, the target system(FlashSystem) will have a common chapsecret and
username (using CLI ls or chsystem can help managing chapsecret, while username will be
name of the cluster) to offer for initiator authentication.
When using the SendTarget discovery method with FlashSystem, the IP addresses that are
part of the same node and belong to the same portsets are returned in response. This means
that if you have 8 nodes, you will need to perform iSCSI SendTarget discovery 8 times to get
all of the IP addresses for the target.
For more information, see iSCSI performance analyses and tuning chapter in IBM Docs.
Note: Although iSER is still supported on the IBM Storage FlashSystem 5200, you should
consider migrating to RoCE.
Priority Flow Control for iSCSI / iSER is supported on Emulex & Chelsio adapters (SVC
supported) with all DCBX enabled switches.
You need to create two host definitions for the host, one for SCSI and one for NVMe, if you
want to use both type of volumes from one host, for example for migration.
Host1scsi-LUN1-FC(SCSI)
Host1nvme-LUN2-FC(NVMe)
Asymmetric Namespace Access was added to the FC-NVMe protocol standard, which gives
it functions that are similar to Asymmetric Logical Unit Access (ALUA). As a result, FC-NVMe
can now be used in stretched clusters. Consider NVMe over Fibre Channel target limits when
you plan and configure the hosts.
An NVMe host can connect to four NVMe controllers on each system node. The maximum
per node is four with an extra four in failover.
1. Zone up to four ports in a single host to detect up to four ports on a node. To allow failover
and avoid outages, zone the same or additional host ports to detect an extra four ports on
the second node in the I/O group.
2. A single I/O group can contain up to 256 FC-NVMe I/O controllers. The maximum number
of I/O controllers per node is 128 plus an extra 128 in failover. Following the
recommendation in step 1, zone a total maximum of 16 hosts to detect a single I/O group.
Also, consider that a single system target port allows up to 16 NVMe I/O controllers.
IBM Storage Virtualize 8.6 allows a maximum of 32/64 NVMe hosts per system and 16 hosts
per I/O group, if no other types of hosts are attached. IBM Storage Virtualize code does not
monitor or enforce these limits.
For more information about using NVMe hosts with IBM FlashSystem, see NVMe over Fibre
Channel Host Properties.
Note: Do not map the same volumes to SCSI and NVMe hosts concurrently, even one
after the other. Also, take care not to add NVMe hosts and SCSI hosts to the same host
cluster.
Asymmetric Namespace Access was added to the NVMe over RDMA and NVMe over TCP
protocol standard, which gives it functions that are similar to Asymmetric Logical Unit Access
(ALUA). As a result, FC-NVMe can now be used in stretched clusters. iSCSI, iWarp, iSER,
NVMe/RDMA and NVMe/TCP can coexist on the same port with different host mapping
528 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
types. This includes the ports used for replication. A single host working with multiple
protocols is not supported. Different hosts can have different protocols working with the same
storage port.
Note: Do not map the same volumes to SCSI and NVMe hosts concurrently, even one
after the other. Also, take care not to add NVMe hosts and SCSI hosts to the same host
cluster.
Tip: IBM Storage Virtualize uses discovery port 4420 for all RDMA protocol instead of
8009.
RDMA can reduce the CPU load of the server, compared with other Ethernet protocols. One
limitation of existing NVMe/RoCE is that it requires special Ethernet infrastructure (lossless
Ethernet / DCB). How to configure it depends on the switches you use, but you will need
network skills. RoCE capable host adapters are also needed. Check SSIC to understand
what is supported. RoCE v2 is routable, but prefer layer 2 networks.
Note: At the moment systems with ROCE Adapters cannot have MTU greater than 1500
on 8.6.0.0 or later. The workaround is to reduce MTU to 1500.
Check the release notes for future releases to see if the restriction is lifted.
NVMe/TCP needs more CPU resources than protocols using RDMA. NVMe/TCP is a
ubiquitous transport allowing NVMe performance without any constraint to the data center
infrastructure. Each NVMe/TCP port on FlashSystem supports multiple IPs and multiple
VLANs. NVMe/TCP will be supported on FlashSystem platforms that are installed with
Mellanox CX-4 or CX-6 adapters.
NVMe-TCP is generally switch agnostic and routable. For Operating system support and
Multipathing check at SSIC.
For more information about using NVMe hosts with IBM FlashSystem, see
https://www.ibm.com/support/pages/node/6966914.
Note: At the moment systems with RoCE Adapters cannot have MTU greater than 1500 on
8.6.0.0 or later. The workaround is to reduce MTU to 1500.
8.8 Portsets
Portsets are groupings of logical addresses that are associated with specific traffic types. IBM
Storage Virtualize 8.6.0 systems support both IP and FC portsets for host attachment,
back-end storage connectivity, and replication traffic.
A system can have maximum of 72 portsets, which is a collective maximum limit for FC and
Ethernet portsets. A portset can be of the host attach, remote copy, or storage type. The
default portset is the host attach type. A portset of a specific type can be used only for that
function, for example, a host attach type portset cannot be used for a remote copy
partnership.
svcinfo lsportset
id name type port_count host_count lossless owner_id owner_name port_type is_default
0 portset0 host 0 0 ethernet yes
1 portset1 replication 0 0 ethernet no
2 portset2 replication 0 0 ethernet no
3 portset3 storage 0 0 ethernet no
4 PortSet16 host 1 1 yes 0 Bank_Gr_Owngrp fc no
5 portset32 host 1 1 yes 1 Health_Gr_Owngrp fc no
64 portset64 host 6 4 yes fc yes
8.8.1 IP multitenancy
IP support for all IBM Storage Virtualize products previously allowed only a single IPv4 and
IPv6 address per port for use with Ethernet connectivity protocols (internet Small Computer
Systems Interface (ISCSI) and iSER).
As of version 8.4.2, IBM Storage Virtualize removed that limitation and supports an increased
per port limit to 64 IP addresses (IPv4, IPv6, or both). The scaling of the IP definition also
scaled the virtual local area network (VLAN) limitation, which can be done per IP address or
as needed.
The object-based access control (OBAC) model (that is, OBAC-based per tenant
administration and partitioned for multitenant cloud environments) also was added to the
Ethernet configuration management.
The IBM Storage Virtualize new IP object model introduced a new feature that is named the
portset. The portset object is a group of logical addresses that represents a typical IP function
and traffic type. Portsets can be used for multiple traffic types, such as host attachment,
back-end storage connectivity (iSCSI only), or IP replication.
530 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
A host can access storage through the IP addresses that are included in the portset that is
mapped to the host. The process to bind a host to a portset includes the following steps:
1. Create the portset.
2. Configure the IP addresses with the portset.
3. Create a host object.
4. Bind the host to the portset.
5. Discover and log in from the host.
IP portsets can be added by using the management GUI or the command-line interface (CLI).
You can configure portsets by using the GUI and selecting Settings → Network → Portsets.
After the portsets are created, IP addresses can be assigned by using the management GUI
or the CLI. You can configure portsets by using the GUI and selecting Settings →
Network → Ethernet Ports.
Example 8-5 shows the results of the usage of the lsip command.
While using the FC portset feature, each FC I/O port can be added to multiple FC portsets;
however, a host can be added to only one FC portset. Every portset can support up to four FC
I/O ports.
Each portset is identified by a unique name. Portset 0 is an Ethernet default portset, and
portset 64 is a default FC portset that is configured when the system is created or updated.
Portsets 1 and 2 are replication portsets. Portset 3 is a storage portset.
To see the host login for each FC port, run the lstargetportfc command.
The output shows details about each FC port, associated portset count, host count, and
active login counts. If some FC ports have a higher number of active logins, they can cause
an unbalanced performance.
Note: Event ID 088007 “Fibre Channel I/O port has more than recommended active login”
is logged when there are more than 256 active logins on all or any of the worldwide port
names (WWPNs) on the FC I/O port on any node. This event informs the customer that an
FC I/O port is serving login more than the recommended limit, and the system is being
under-utilized because the load is not distributed uniformly across the FC I/O ports and
nodes. The event is cleared when an administrator fixes the zoning such that the total
active login count of the FC I/O port becomes less than or equal to 256.
Generally for FC, the host to storage connection is controlled by SAN zoning. A portset helps
to set a rule on storage layer to avoid many FC logins to same port while other ports remain
idle. Misconfiguring portsets and hosts or wrong ports that are used in zones result in the
Event ID 064002 being logged in the IBM Storage Virtualize 8.6 event log.
Note: Event ID 064002 “Host and Fibre Channel port must be in same portset.” is logged
when the host tries to log in to a port that is associated with a different portset. A login is
detected from the host to a storage port that is not part of the same portset of which the
host is a part. For an FC host, check “Fibre Channel Connectivity” for the host with a state
of Blocked, and for a NVMe host, check “NVMe Connectivity” for a host with a state of
Invalid. Add storage port WWPNs that are assigned to the correct portset on the host to
the storage zone and remove the wrong ones. This action automatically clears the event.
532 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
For multi-tenant configuration portsets, separate the FC ports to the tenants. Example 8-4 on
page 530 on page 372 shows two portsets that are assigned to different ownership groups.
When a mapping is created, multiple paths normally exist across the SAN fabric from the
hosts to the IBM Storage Virtualize system. Most operating systems present each path as a
separate storage device. Therefore, multipathing software is required on the host. The
multipathing software manages the paths that are available to the volume, presents a single
storage device to the operating system, and provides failover if a path is lost.
If your IBM Storage Virtualize system uses NPIV, path failures that occur because of an offline
node are masked from host multipathing.
Note: When using NPIV and an SVC cluster with site awareness, the remaining node in
the same I/O group take over the addresses. Be aware of the potential impact on inter-data
center traffic here. This does not apply if a Hot Spare nodes are used per site. We
recommend to zone the host to both sides.
When a volume is created, an I/O group and preferred node are defined, and optionally can
be set by the administrator. The owner node for a volume is the preferred node when both
nodes are available.
With HyperSwap, configuration nodes in multiple I/O groups can potentially service I/O for the
same volume, and site IDs are used to optimize the I/O routing.
IBM Storage Virtualize uses Asymmetric Logical Unit Access (ALUA), as do most
multipathing drivers. Therefore, the multipathing driver gives preference to paths to the
preferred node. Most modern storage systems use ALUA.
Note: Some competitors claim that ALUA means that IBM Storage Virtualize is effectively
an active-passive cluster. This claim is not true. Both nodes in IBM Storage Virtualize can
and do service I/O concurrently.
In the small chance that an I/O goes to the non-preferred node, that node services the I/O
without issue.
The I/O queue can be controlled by using one of the following unique methods:
Host adapter-based
Memory and thread resources-based
Based on the number of commands that are outstanding for a device
For example, each IBM Storage Virtualize node has a queue depth of 10,000. A typical disk
drive operates efficiently at a queue depth of 8. Most host volume queue depth defaults are
approximately around 32.
534 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
Guidance for limiting queue depths in large SANs that was described in previous
documentation was replaced with calculations for overall I/O group-based queue depth
considerations.
No set rule is available for setting a queue-depth value per host HBA or per volume. The
requirements for your environment are driven by the intensity of each workload.
Ensure that one application or host cannot use the entire controller queue. However, if you
have a specific host application that requires the lowest latency and highest throughput,
consider giving it a proportionally larger share than others.
The total workload capability can be calculated by multiplying the number of volumes by their
respective queue depths and summing. With low latency storage, a workload of over 1 million
input/output operations per second (IOPS) can be achieved with concurrency on a single I/O
group of 1000.
For more information about queue depths, see the following IBM Documentation web pages:
FC hosts
iSCSI hosts
iSER hosts
Volumes that are mapped to that host cluster are assigned to all members of the host cluster
with the same SCSI ID. A typical use case is to define a host cluster that contains all the
WWPNs that belong to the hosts that are participating in a host operating system-based
cluster, such as IBM PowerHA®, Microsoft Cluster Server (MSCS), or VMware ESXi clusters.
rmhostcluster
rmvolumehostclustermap
Host clusters can be added by using the GUI. By using the GUI, the system assigns the SCSI
IDs for the volumes (you also can manually assign them). For ease of management purposes,
use separate ranges of SCSI IDs for hosts and host clusters.
For example, you can use SCSI IDs 0 - 99 for non-cluster host volumes, and greater than 100
for the cluster host volumes. When you choose the System Assign option, the system
automatically assigns the SCSI IDs starting from the first available in the sequence.
If you choose Self Assign, the system enables you to select the SCSI IDs manually for each
volume. On the right side of the window, the SCSI IDs that are used by the selected host or
host cluster are shown (see Figure 8-1 on page 536).
Note: Although extra care is always recommended when dealing with hosts,
IBM Storage Virtualize does not allow you to join a host into a host cluster if it includes a
volume mapping with a SCSI ID that also exists in the host cluster:
IBM_2145:ITSO-SVCLab:superuser>addhostclustermember -host ITSO_HOST3
ITSO_CLUSTER1
CMMVC9068E Hosts in the host cluster have conflicting SCSI IDs for their
private mappings.
IBM_2145:ITSO-SVCLab:superuser>
536 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
The host software uses several methods to implement host clusters. These methods require
sharing the volumes on IBM Storage Virtualize between hosts. To share storage between
hosts, the cluster must maintain control over accessing the volumes. Some clustering
software uses software locking methods.
You can choose other methods of control by directing the clustering software or the device
drivers to use the SCSI architecture reserve or release mechanisms. The multipathing
software can change the type of reserve that is used from an earlier reserve to persistent
reserve, or remove the reserve.
Persistent reserve refers to a set of SCSI-3 standard commands and command options that
provide SCSI initiators with the ability to establish, preempt, query, and reset a reservation
policy with a specified target device. The functions that are provided by the persistent reserve
commands are a superset of the original reserve or release commands.
The persistent reserve commands are incompatible with the earlier reserve or release
mechanism. Also, target devices can support only reservations from the earlier mechanism or
the new mechanism. Attempting to mix persistent reserve commands with earlier reserve or
release commands results in the target device returning a reservation conflict error.
Earlier reserve and release mechanisms (SCSI-2) reserved the entire LUN (volume) for
exclusive use down a single path. This approach prevents access from any other host or even
access from the same host that uses a different host adapter. The persistent reserve design
establishes a method and interface through a reserve policy attribute for SCSI disks. This
design specifies the type of reservation (if any) that the operating system device driver
establishes before it accesses data on the disk.
The following possible values are supported for the reserve policy:
No_reserve: No reservations are used on the disk.
Single_path: Earlier reserve or release commands are used on the disk.
PR_exclusive: Persistent reservation is used to establish exclusive host access to the disk.
PR_shared: Persistent reservation is used to establish shared host access to the disk.
When a device is opened (for example, when the AIX varyonvg command opens the
underlying hdisks), the device driver checks the object data manager (ODM) for a
reserve_policy and a PR_key_value. Then, the driver opens the device. For persistent
reserve, each host that is attached to the shared disk must use a unique registration key
value.
Instances exist in which a host image mode migration appears to succeed; however,
problems occur when the volume is opened for read/write I/O. The problems can result from
not removing the reserve on the MDisk before image mode migration is used in
IBM Storage Virtualize.
You cannot clear a leftover reserve on an IBM Storage Virtualize MDisk from IBM Storage
Virtualize. You must clear the reserve by mapping the MDisk back to the owning host and
clearing it through host commands, or through back-end storage commands as advised by
IBM technical support.
For more information about configuring AIX hosts, see IBM Power Systems AIX hosts on IBM
Docs.
The default reserve policy is single_path (SCSI-2 reserve). Unless a specific need exists for
reservations, use no_reserve.
algorithm=shortest_queue
If coming from SDD PCM, AIX defaults to fail_over. You cannot set the algorithm to
shortest_queue unless the reservation policy is no_reserve:
queue_depth=32
The default for SDD PCM is 60. For AIX PCM, the default is 30. IBM recommends 30.
For more information about configuration best practices, see AIX Multi Path Best Practices.
For more information about configuring VIOS hosts, see IBM Power Systems with Virtual I/O
Server on IBM Docs.
For more information, see The Recommended Multi-path Driver to use on IBM AIX and VIOS
When Attached to SVC and Storwize storage. Where VIOS SAN Boot or dual VIOS
configurations are required, see SSIC.
For more information about VIOS, see this IBM Virtual I/O Server overview.
538 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
The default for SDD PCM is 60. For AIX PCM, the default is 30. IBM recommends 30.
Physical storage with attached disks (in this case, volumes on IBM Storage Virtualize) on the
VIOS partition can be shared by one or more client logical partitions (LPARs). These client
LPARs contain a VSCSI client adapter (SCSI initiator) that detects these virtual devices
(VSCSI targets) as standard SCSI-compliant devices and LUNs.
PV VSCSI hdisks are entire LUNs from the VIOS perspective. If you are concerned about the
failure of a VIOS and configured redundant VIOSs for that reason, you must use PV VSCSI
hdisks. An LV VSCSI hdisk cannot be served up from multiple VIOSs.
LV VSCSI hdisks are in Logical Volume Mirroring (LVM) volume groups on the VIOS and must
not span PVs in that volume group or be striped LVs. Because of these restrictions, use PV
VSCSI hdisks.
Each of these methods can result in different data formats on the disk. The preferred disk
identification method for volumes is to use UDIDs. For more information about how to
determine your disks IDs, see Identifying exportable disks in IBM Docs.
For more information about configuring Windows hosts, see Hosts that run the Microsoft
Windows Server operating system in IBM Docs.
Regarding disk timeout for Windows servers, change the disk I/O timeout value to 60 in the
Windows registry.
For more information about configuring Linux hosts, see Hosts that run the Linux operating
system in IBM Docs.
540 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
Best practice: The scsi_mod.inq_timeout should be set to 70. If this timeout is set
incorrectly, it can cause paths to not be rediscovered after a node is restarted.
For more information about this setting and other attachment requirements, see
Attachment requirements for hosts that are running the Linux operating system in IBM
Docs.
For more information about configuring Solaris hosts, see Oracle hosts in IBM Docs.
Note: The NMP does not support the Solaris operating system in a clustered-system
environment. For more information about your supported configuration, see SSIC.
In this command, -type specifies the type of host. Valid entries are hpux, tpgs, generic,
openvms, adminlun, and hide_secondary. The tpgs host type enables extra target port unit
attentions that are required by Solaris hosts.
542 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
If a suitable ASL is not installed, the volume manager does not claim the LUNs. Using ASL is
required to enable the special failover or failback multipathing that IBM Storage Virtualize
requires for error recovery.
To determine the basic configuration of a Symantec Veritas server, run the commands that
are shown in Example 8-7.
The commands that are shown in Example 8-8 and Example 8-9 determine whether
IBM Storage Virtualize is correctly connected. They also show which ASL is used: native
Dynamic Multi-Pathing (DMP), ASL, or SDD ASL.
Example 8-8 shows what you see when Symantec Volume Manager correctly accesses IBM
Storage Virtualize by using the SDD pass-through mode ASL.
Example 8-8 Symantec Volume Manager that uses SDD pass-through mode ASL
# vxdmpadm list enclosure all
ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS
============================================================
OTHER_DISKS OTHER_DISKS OTHER_DISKS CONNECTED
VPATH_SANVC0 VPATH_SANVC 0200628002faXX00 CONNECTED
Example 8-9 shows what you see when IBM Storage Virtualize is configured by using native
DMP ASL.
Example 8-9 IBM Storage Virtualize that is configured by using native ASL
# vxdmpadm list enclosure all
ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS
============================================================
OTHER_DISKS OTHER_DSKSI OTHER_DISKS CONNECTED
SAN_VC0 SAN_VC 0200628002faXX00 CONNECTED
For more information about the latest ASL levels to use native DMP, see the array-specific
module table that is available at this Veritas web page.
To check the installed Symantec Veritas version, run the following command:
showrev -p |grep vxvm
To check which IBM ASLs are configured in the volume manager, run the following command:
vxddladm listsupport |grep -i ibm
After you install a new ASL by using the pkgadd command, restart your system or run the
vxdctl enable command. To list the ASLs that are active, run the following command:
vxddladm listsupport
For more information about configuring Linux hosts, see HP 9000 and HP Integrity.
544 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
To use PVLinks while NMP is installed, ensure that NMP did not configure a vpath for the
specified volume.
For more information about a list of configuration maximums, see Multipathing configuration
maximums for HP 9000 and HP Integrity servers in IBM Docs.
For more information about configuring VMware hosts, see Hosts that run the VMware ESXi
operating system in IBM Docs.
To determine the various VMware ESXi levels that are supported, see the SSIC.
We recommend using VMware Tools on the virtual machines. Using VMware-Plugin the new
native or via IBM Spectrum Connect can make the management easier, like the same volume
name on VMware as on IBM FlashSystem.
Information about limits and a complete list of maximums, see VMware Configuration
Maximums.
For more information about active optimized and active non-optimized paths, see Active -
active capability in IBM Docs.
Note: You check whether your volumes are seen as flash on the ESXi server. In some
cases, VMware marks IBM FlashSystem volumes as hard disk drives (HDDs). As a best
practice, mark volumes as Flash before creating a datastore on them.
546 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
To use the IBM Block storage CSI driver, complete the following steps:
1. Create an array secret.
2. Create a storage class.
3. Create a Persistent Volume Claim (PVC) that is 1 Gb.
4. Display the PVC and the created PV.
5. Create a StatefulSet.
For more information about installing, configuring, and using CSI Block Driver, see IBM block
storage CSI driver in IBM Docs.
With Storage Virtualize 8.6 the Ethernet restrictions have been removed on the FS9500 and
IBM SAN Volume Controller SV3.
All cages can be fully populated with two port 25 Gb/100 Gb Ethernet cards
For 100Gb adapters, fully populating a cage means the bandwidth will be oversubscribed.
Each adapter will get 128 Gb of its theoretical 200 Gb capability (as in the FS7300).
Note: In some situations you need maximum bandwidth, other situations need maximum
ports. By easing the restrictions on the Ethernet adapters we are allowing the customer to
make that choice. If maximum ports are selected, the bandwidth will be oversubscribed
Clustering / HyperSwap
Replication connectivity
Table 8-2 Maximum 100-GbE adapters per node and PCIe slot placement
System Maximum Dual Port 100 GbE Adapter slot placement per
adapter count node for maximum port
performance
Note: The 100 Gbps Ethernet adapter is limited to Peripheral Component Interconnect
Express (PCIe) adapters on this hardware.
As a best practice, attempt to balance I/O over all the ports as evenly as possible,
especially for NVMe over RDMA host attachment on IBM FlashSystem because the PCIe
slots are oversubscribed. Performance should be calculated for use on a primary or failover
model to avoid PCIe slot oversubscription.
Figure 8-2 on page 548 depicts the slots on an IBM Storage FlashSystem 7300 that can
contain Dual Port 100 GbE adapters.
Figure 8-2 Dual Port 100 GbE adapter placement on IBM FlashSystem Storage 7300
Note: On IBM FlashSystem 7300 when both ports of a adapter are used for NVMe over
RDMA, or NVMe over TCP, or both, I/O operation types are limited to 100 Gbps per
adapter (not per port).
Figure 8-3 depicts the slots on an IBM Storage FlashSystem 9500 that can contain Dual Port
100 GbE adapters.
548 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch8 FS-HOSTS.fm
Figure 8-3 Dual Port 100 GbE adapter placement on IBM Storage FlashSystem 9500
Figure 8-4 shows the slots on an SVC SV3 node that can contain Dual Port 100 GbE
adapters.
Figure 8-4 Dual Port 100 GbE adapter placement on SAN Volume Controller node SV3
Note: When one or more Dual Port 100 GbE adapters are installed in IBM Storage
FlashSystem 9500 or SVC SV3 nodes, they should always occupy the lower numbered slot
in the adapter cage, and the other slot must not contain an adapter. For adapter cage 2,
slots 3 and 4 are for internal use only.
550 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
With a robust and reliable storage monitoring system, you can realize significant financial
savings and minimize pain in your operation by monitoring and predicting usage bottlenecks
in your virtualized storage environment.
It is also possible to use the data that is collected from monitoring to create strategies and
apply configurations to improve performance, tuning connections, and tools usability.
This chapter provides suggestions and the basic concepts about how to implement a storage
monitoring system for IBM Storage Virtualize by using their specific functions or external IBM
tools.
Use the views that are available in the management GUI to verify the status of the system, the
hardware devices, the physical storage, and the available volumes. Selecting Monitoring →
Events provides access to all problems that exist on the system. Select the Recommended
Actions filter to display the most important events that must be resolved.
If a service error code exists for the alert, you can run a fix procedure that helps you resolve
the problem. These fix procedures analyze the system and provide more information about
the problem. These actions also ensure that the required changes do not cause volumes to
be inaccessible to the hosts and automatically perform configuration changes that are
required to return the system to its optimum state.
If any interaction is required, fix procedures suggest actions to take and guide you through
those actions that automatically manage the system where necessary. If the problem is fixed,
the alert is excluded.
The system either sends notifications through an SMTP email server to IBM Support or
through a RESTful application programming interface (API). If Call Home with cloud services
is enabled, as depicted in Figure 9-2 on page 553. Multiple email recipients can be added to
receive notifications from the storage system. You also can customize the type of information
that is sent to each recipient, as shown in Figure 9-1.
Note: With version 8.6 a cloud based E-Mail SMTP server are supported. For security
reasons an app specific password may need to be set on provider site for authorisation.
552 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Call Home with cloud services is the industry standards for transmitting data through web
services. With the introduction of this cloud based delivery mechanism, IBM has been
implemented a method to messages to the IBM call home servers, which is not affected by
spam filters or other technologies preventing IBM from receiving the call home messages.
Cloud Call Home is a key building block which IBM will continue to enhance by providing more
predictive support. These features will not be available to clients using email call home. Call
Home with cloud services is the preferred call home method for the IBM Storage Virtualize
products, to ensure the most optimal end-to-end reliable delivery mechanism. Call Home with
an active instance of E-Mail and Cloud services are shown in Figure 9-2 on page 553.
Note: Web proxy servers can be either configured in the GUI or using the CLI.
More details about Call Home can be found on the IBM Storage Virtualize Products Call
Home and Remote Support Overview website.
Starting with V8.6, the IBM Storage Virtualize Call Home functionality offers the possibility to
add a cloud based e-mail provider.
The management information base (MIB) file describes the format of the SNMP messages
that are sent by the system. Use this MIB file to configure a network management program to
receive SNMP event notifications that are sent from an IBM Storage Virtualize system. This
MIB file is suitable for use with SNMP messages from all versions of IBM Storage Virtualize.
For more information about the MIB file for the IBM Storage Virtualize system, see
Management Information Base file for SNMP.
Note: Since version 8.6 the MIB file can be downloaded from the GUI as well.
With a valid configuration, relevant SNMP traps are sent to the SNMP management server.
Example 9-1 shows the log output for a Linux based SNMP management tool that is called
snmptrapd.
554 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Note: If the -cadf on option is used, only create, update, removal, and cloud backup
activity syslog notifications from the Cloud account that are sent to the server are formatted
to the CADF standard. Formatting of other types of notifications does not change.
Figure 9-5 shows the new tabular layout from the IBM Storage Virtualize GUI. It is possible to
configure multiple syslog servers and display the configuration between IBM Storage
Virtualize and the syslog server from the syslog window.
Note: Starting with version 8.4, FQDN can be used for services such as Syslog,
Lightweight Directory Access Protocol (LDAP), and Network Time Protocol (NTP).
Example 9-2 shows that sample output of concise syslog events that are logged remotely to
rsyslogd on a Linux host.
Example 9-2 rsyslogd concise output showing audit, login, and authentication events
Apr 12 09:19:46 ITSO-cluster IBM4662[12643]: # timestamp = Tue Apr 12 09:19:46
2022 # cluster_user = superuser # source_panel = # target_panel = #
ssh_ip_address = a.b.c.d # result = success # res_obj_id = 0 # command = svctask #
action = mksyslogserver # action_cmd = mksyslogserver -audit on -error on
-facility 0 -gui -info on -ip 1.2.3.4 -login on -port 514 -protocol udp -warning
on
Apr 12 09:19:57 ITSO-cluster sshd[5106]: pam_ec_auth(sshd:auth): Username
superuser has linux user group: 1002
Apr 12 09:20:01 ITSO-cluster sshd[5106]: pam_ec_auth(sshd:auth): Accepted password
for superuser from a.b.c.d service sshd
Apr 12 09:20:01 ITSO-cluster sshd[5098]: Accepted keyboard-interactive/pam for
superuser from a.b.c.d port 39842 ssh2
Apr 12 09:20:02 ITSO-cluster sshd[5098]: pam_unix(sshd:session): session opened
for user superuser by (uid=0)
Storage pool
On a storage pool level, an integer defines a threshold at which a warning is generated. The
warning is generated the first time that the threshold is exceeded by the used-disk capacity in
the storage pool. The threshold can be specified with a percentage (see Figure 9-6) or size
(see Example 9-3) value.
VDisk
At the VDisk level, a warning is generated when the used disk capacity on the
thin-provisioned or compressed copy first exceeds the specified threshold. The threshold can
be specified with a percentage (see Figure 9-7) or size (see Example 9-4 on page 556) value.
Note: You can specify a disk_size integer, which defaults to megabytes (MB) unless the
-unit parameter is specified, or you can specify disk_size%, which is a percentage of the
volume size. If both copies are thin-provisioned and the -copy parameter is not specified,
the specified -warning parameter is set on both copies. To disable warnings, specify 0 or
0%. The default value is 0. This option is not valid for thin or compressed volumes in a data
reduction pool (DRP).
556 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
The next sections describe the performance analysis tools that are integrated with IBM
Storage Virtualize systems. Also described are the IBM external tools that are available to
collect performance statistics to allow historical retention.
Performance statistics are useful to debug or prevent some potential bottlenecks, and to
make capacity planning for future growth easier.
You can use system statistics to monitor the aggregate workload of all the volumes,
interfaces, and managed disks (MDisks) that are used on your system. The workload can be
displayed in megabytes per second (MBps) or input/output operations per second (IOPS).
Additionally, read/write latency metrics can be displayed for volumes and MDisks.
You can also monitor the overall CPU usage for the system. These statistics also summarize
the recent performance health of the system in almost real time.
You can monitor changes to stable values or differences between related statistics, such as
the latency between volumes and MDisks. Then, these differences can be further evaluated
by performance diagnostic tools.
With system-level statistics, you also can view quickly the aggregate bandwidth of volumes,
interfaces, and MDisks. Each of these graphs displays the current bandwidth in megabytes
per second and a view of bandwidth over time.
Each data point can be accessed to determine its individual bandwidth usage and evaluate
whether a specific data point might represent performance impacts. For example, you can
monitor the interfaces such as for Fibre Channel (FC), internet Small Computer System
Interface (iSCSI), Serial Attached SCSI (SAS) or IP Replication to determine whether the
interface rate is different from the expected rate.
You can also select node-level statistics, which can help you determine the performance
impact of a specific node. As with system statistics, node statistics help you to evaluate
whether the node is operating within a normal range of performance metrics.
The CPU utilization graph shows the current percentage of CPU usage and specific data
points on the graph that show peaks in utilization. If compression is being used, you can
monitor the amount of CPU resources that are being used for compression and the amount
that is available to the rest of the system. The Compression CPU utilization chart is not
relevant for DRP compression.
The Volumes and MDisks graphs on the Performance window show four metrics: Read,
Write, Read latency, and Write latency. You can use these metrics to help determine the
overall performance health of the volumes and MDisks on your system. Consistent
unexpected results can indicate errors in configuration, system faults, connectivity issues, or
workload specific behavior.
Each graph represents 5 minutes of collected statistics, which are updated every 5 seconds.
They also provide a means of assessing the overall performance of your system, as shown in
Figure 9-8.
Note: Starting with code level 8.5, the latency metrics in the Monitoring view and the
Dashboard view switch dynamically between milliseconds (ms) and microseconds (µs) and
the graph scales as needed. This function aids in monitoring submillisecond response
times of highly performant systems.
You can select the workload metric that you want to be displayed, as shown in
Figure 9-9.
558 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
You can also obtain a quick overview of the performance in the GUI dashboard: System →
Dashboard, as shown in Figure 9-10.
Command-line interface
The lsnodestats, lsnodecanisterstats, and lssystemstats commands continue to report
latency in milliseconds only. The latency was reported as an integer value before code level
8.5, but now it is reported with three decimal places of granularity. This format makes it
possible to monitor variations in response time that are less than 1 ms, as shown in
Example 9-5.
Example 9-5 Latency reported in milliseconds (ms) with microsecond (µs) granularity
IBM_FlashSystem:FS9xxx:superuser>lssystemstats -history drive_w_ms|grep -E
2306220704
230622070402 drive_w_ms 0.000
230622070407 drive_w_ms 0.000
230622070412 drive_w_ms 0.000
230622070417 drive_w_ms 0.000
230622070422 drive_w_ms 0.066
230622070427 drive_w_ms 0.000
230622070432 drive_w_ms 0.000
230622070437 drive_w_ms 0.000
230622070442 drive_w_ms 0.000
230622070447 drive_w_ms 0.000
230622070452 drive_w_ms 0.000
230622070457 drive_w_ms 0.000
REST API client is used to perform a query of the REST API. The results would be returned
to the console environment of the REST API client, as shown in Example 9-6.
An OpenApi explorer guide is embedded within the IBM Storage Virtualize product and can
reached by using the following address: https://<system-ip>:7443/rest/explorer/.
The REST API explorer can retrieve the X-Auth-Token for the provided credentials (see
Figure 9-11) and run a REST API query and display the results in a browser (see Figure 9-12
on page 561).
Figure 9-11 Authentication in REST API Explorer: Token displayed in the Response body
560 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Figure 9-12 The lsnodestats command for node ID 1 (fc_mb) with JSON results in response body
Unlike previous versions, where you were required to download the necessary log files from
the system and upload them to the STAT tool, from version 8.3.1 onwards, the system
continually reports the Easy Tier information, so the GUI always displays the most up-to-date
information.
Note: If the system or Easy Tier was running for less than 24 hours, no data might be
available to display.
The Reports window features the following views that can be accessed by using the tabs at
the top of the window, which are described next:
Data Movement
Tier Composition
Workload Skew Comparison
562 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
The report breaks down the type of movement, which is described in terms of the internal
Easy Tier extent movement types (see 4.7.2, “Easy Tier definitions” on page 297).
To aid your understanding and remind you of the definitions, click Movement Description to
view the information window (see Figure 9-14 on page 563).
Important: If you are regularly seeing “warm demote” in the movement data, consider
increasing the amount of hot tier that is available. A warm demote suggests that an extent
is hot, but not enough capacity or Overload Protection was triggered in the hot tier.
The Tier Composition window (see Figure 9-15) shows how much data in each tier is active
versus inactive. In an ideal case, most of your active data is in the hot tier alone. In most
cases, the active data set cannot fit in only the hot tier; therefore, expect to also see active
data in the middle tier. Here we can see that most of the data in the middle tier is inactive or
the workload does meet the criteria for Easy Tier optimization.
If all active data can fit in the hot tier, you see the best possible performance from the system.
Active large is data that is active but is being accessed at block sizes larger than the 64 KiB
for which Easy Tier is optimized. This data is still monitored and can contribute to “expanded
cold demote” operations.
The presence of any active data in the cold tier (regularly) suggests that you must increase
the capacity or performance in the hot or middle tiers.
In the same way as with the Data Movement window, you can click Composition
Description to view the information for each composition type (see Figure 9-16).
564 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Tip: The skew can be viewed when the system is in measuring mode with a single tier pool
to help guide the recommended capacity to purchase that can be added to the pool in a hot
tier.
A highly skewed workload (the line on the graph rises sharply within the first percentage of
capacity) means that a smaller proportional capacity of hot tier is required. A low skewed
workload (the line on the graph rises slowly and covers a large percentage of the capacity)
requires more hot tier capacity, which you should consider as a good performing middle tier
when you cannot configure enough hot tier capacity (see Figure 9-17).
In the first example that is shown in Figure 9-17, you can clearly see that this workload is
highly skewed. This single-tier pool uses less than 5% of the capacity, but is performing 99%
of the workload in terms of IOPS and MBps.
This result is a prime example for adding a small amount of faster storage to create a “hot” tier
and improve overall pool performance (see Figure 9-18).
In this second example that is shown in Figure 9-18, the system is configured as a multitier
pool, and Easy Tier optimized the data placement for some time. This workload is less
skewed than in the first example, with almost 20% of the capacity performing up to 99% of the
workload.
Here again, it might be worth considering increasing the amount of capacity in the top tier
because approximately 10% of the IOPS workload is coming from the middle tier and can be
optimized to reduce latency.
The graph that is shown in Figure 9-18 also shows the split between IOPS and MBps.
Although the middle tier is not handling much of the IOPS workload, it is providing a
reasonably large proportion of the MBps workload.
In these cases, ensure that the middle tier can manage good large block throughput. A case
might be made for further improving performance by adding some higher throughput devices
as a new middle tier, and demoting the current middle tier to the cold tier; however, this
change depends on the types of storage that is used to provide the existing tiers.
Any new configuration with three tiers must comply with the configuration rules regarding the
different types of storage that is supported in three-tier configurations (see “Easy Tier mapping
to MDisk tier types” on page 305).
If you implemented a new system and see that most of the workload is coming from a middle
or cold tier, it might take only a day or two for Easy Tier to complete the migrations after it
initially analyzes the system.
If after a few days a distinct bias still exists to the lower tiers, you might want to consider
enabling “Accelerated Mode” for a week or so; however, disable this mode after the system
reaches a steady state. For more information, see “Easy Tier acceleration” on page 315.
566 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
IBM Spectrum Control also provides more granular collection of performance data with
1-minute intervals rather than the 5-minute intervals in IBM Storage Insights or IBM Storage
Insights Pro. For more information about IBM Storage Insights, see 9.2.3, “Performance
monitoring with IBM Storage Insights” on page 571.
Because IBM Spectrum Control is an on-premises tool, it does not send the metadata about
monitored devices off-site, which is ideal for dark shops and sites that do not want to open
ports to the cloud.
For more information about the capabilities of IBM Spectrum Control, see this Product
overview.
For more information about pricing and other purchasing information, see IBM Spectrum
Control.
Note: If you use IBM Spectrum Control or manage IBM block storage systems, you can
access the no-charge version of IBM Storage Insights. For more information, see Getting
Started with IBM Storage Insights.
IBM Spectrum Control offers several reports that you can use to monitor IBM Storage
Virtualize 8.6 systems to identify performance problems. IBM Spectrum Control provides
improvements to the web-based user interface that is designed to offer easy access to your
storage environment.
IBM Spectrum Control provides a large amount of detailed information about IBM Storage
Virtualize 8.6 systems. This sections provide some basic suggestions about what metrics
must be monitored and analyzed to debug potential bottleneck problems. It also covers
alerting profiles and their thresholds that are considered important for detecting and resolving
performance issues.
Note IBM Spectrum Control 5.3.x has reached end of support. Version 5.4.0 or higher is
recommended for monitoring IBM Storage Virtualize systems.
The key performance indicators GUI of IBM Spectrum Control (see Figure 9-20 on page 569)
displays by default the last 24 hours from the active viewing time and date. Selecting an
individual element from the chart overlays the corresponding 24 hours for the previous day
and seven days before. This display allows for an immediate historical comparison of the
respective metric. The day of reference also can be changed to allow historical comparison of
previous days.
568 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
The yellow lines that are shown in Figure 9-20 represent guidelines that were established at
the levels that allow for a diverse set of workload characteristics while maintaining a stable
performance profile. The other lines on each chart represent the measured values for the
metric for the resources on your storage system: I/O groups, ports, or nodes.
You can use the lines to compare how close your resources are to potentially becoming
overloaded. If your storage system is responding poorly and the charts indicate overloaded
resources, you might need to better balance the workload. You can balance the workload
between the hardware of the cluster by adding hardware to the cluster or moving some
workload to other storage systems.
The charts that are shown in Figure 9-20 show the hourly performance data that is measured
for each resource on the selected day. Use the following charts to compare the workloads on
your storage system with the following key performance indicators:
Node Utilization Percentage by Node
The average of the bandwidth percentages of those ports in the node that are actively
used for host and MDisk send and receive operations. The average is weighted by port
speed and adjusted according to the technology limitations of the node hardware. This
chart is empty for clusters without FC ports (or when no host I/O is going on). Compare
the guideline value for this metric, for example, 60% utilization, with the measured value
from your system.
Overall Port Bandwidth Percentage by Port
The percentage of the port bandwidth that is used for receive and send operations. This
value is an indicator of port bandwidth usage that is based on the speed of the port. The
guideline value is 50%. Compare the guideline value for this metric with the values that are
measured for the switch ports. A cluster can have many ports. The chart shows only the
eight ports with the highest average bandwidth.
Port-to-Local Node Send Response Time by Node
The average number of milliseconds to complete a send operation to another node that is
in the local cluster. This value represents the external response time of the transfers.
Compare the guideline value for this metric, for example, 0.6 ms/op, with the measured
value from your system.
Note: System CPU Utilization by Node was removed from this view and replaced by Max
Cache Fullness by Pool. Additionally, either Zero Buffer Credit Percentage by Node or Port
Send Delay Time (not both) are shown depending on the model of the system.
Figure 9-21 on page 570 shows an example of the Write Response Time by I/O Group, which
exceeded the best practice limit (yellow line). The drop-down menu provides further options.
570 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Note: The guidelines are not strict thresholds. They are derived from real field experience
for many configurations and workloads. When appropriate, these guidelines can be
adopted as alert thresholds within an alert policy.
By using the IBM Cloud infrastructure, IBM Support can monitor your storage environment to
help minimize the time to resolution of problems and collect diagnostic packages without
requiring you to manually upload them. This support experience, from environment to
instance, is unique to IBM Storage Insights and transforms how and when you get help.
IBM Storage Insights is a software as a service (SaaS) offering with its core running over
IBM Cloud. IBM Storage Insights provides an unparalleled level of visibility across your
storage environment to help you manage complex storage infrastructures and make
cost-saving decisions. IBM Storage Insights combines proven IBM data management
leadership with IBM analytics leadership from IBM Research and a rich history of storage
management expertise with a cloud delivery model, enabling you to take control of your
storage environment.
As a cloud-based service, IBM Storage Insights enables you to deploy quickly and save
storage administration time while optimizing your storage. IBM Storage Insights also helps
automate aspects of the support process to enable faster resolution of issues. IBM Storage
Insights optimizes storage infrastructure by using cloud-based storage management and a
support platform with predictive analytics.
With IBM Storage Insights, you can optimize performance and tier your data and storage
systems for the right combination of speed, capacity, and economy. IBM Storage Insights
provides comprehensive storage management and helps to keep costs low, and might
prevent downtime and loss of data or revenue.
Note: As a best practice, use IBM Storage Insights or IBM Spectrum Control for a better
user experience.
The capacity-based, subscription version is called IBM Storage Insights Pro and includes all
the features of IBM Storage Insights plus a more comprehensive view of the performance,
capacity, and health of storage resources. This includes key health, performance, and
diagnostic information for switches and fabrics. It also helps you reduce storage costs and
optimize your data center by providing features like intelligent capacity planning, storage
reclamation, storage tiering, and advanced performance metrics. The storage systems that
you can monitor are expanded to include IBM file, object, software-defined storage (SDS)
systems, and non-IBM block and file storage systems, such as Dell EMC storage systems.
In both versions, when problems occur on your storage, you can get help to identify and
resolve those problems and minimize potential downtime, where and when you need it.
A table about the difference of all features is documented in IBM Documentation - Storage
Insights vs Storage Insights Pro.
As an on-premises application, IBM Spectrum Control does not send the metadata about
monitored devices off-site, which is ideal for dark shops and sites that do not want to open
ports to the cloud. However, if your organization allows for communication between its
network and the cloud, you can use IBM Storage Insights for IBM Spectrum Control to
transform your support experience for IBM block storage.
IBM Storage Insights for IBM Spectrum Control and IBM Spectrum Control work together to
monitor your storage environment. Here is how IBM Storage Insights for IBM Spectrum
Control can transform your monitoring and support experience:
Open, update, and track IBM Support tickets easily for your IBM block storage devices.
Get hassle-free log collection by allowing IBM Support to collect diagnostic packages for
devices so that you do not have to.
Use Call Home to monitor devices, get best practice recommendations, and filter events to
quickly isolate trouble spots.
Use IBM Support to view the current and historical performance of your storage systems
and help reduce the time-to-resolution of problems.
You can use IBM Storage Insights for IBM Spectrum Control if you have an active license with
a current subscription and a support agreement for an IBM Spectrum Control license. If your
subscription and support lapses, you are no longer eligible for IBM Storage Insights for
IBM Spectrum Control. To continue using IBM Storage Insights for IBM Spectrum Control,
renew your IBM Spectrum Control license. You also can choose to subscribe to IBM Storage
Insights Pro.
572 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
To compare the features of IBM Spectrum Control and IBM Storage Insights for IBM
Spectrum Control, check out the Feature comparison.
You can upgrade IBM Storage Insights to IBM Storage Insights for IBM Spectrum Control if
you have an active license for IBM Spectrum Control. For more information, see IBM Storage
Insights Registration, choose the option for IBM Spectrum Control, and follow the prompts.
IBM Storage Insights for IBM Spectrum Control does not include the service-level agreement
(SLA) for IBM Storage Insights Pro. Terms and conditions for IBM Storage Insights for
IBM Spectrum Control are available at Cloud Services Terms.
IBM Storage Insights, IBM Storage Insights Pro, and IBM Storage Insights for IBM Spectrum
Control show some similarities, but the following differences exist:
IBM Storage Insights is an off-premises IBM Cloud service that is available at no extra
charge if you own IBM block storage systems. It provides a unified dashboard for IBM
block storage systems and switches and fabrics with a diagnostic events feed, a
streamlined support experience, and key capacity and performance information.
IBM Storage Insights Pro is an off-premises IBM Cloud service that is available on
subscription and expands the capabilities of IBM Storage Insights. You can monitor IBM
file, object, and SDS systems, and non-IBM block and file storage systems, such as Dell or
EMC storage systems and IBM and non-IBM switches or fabrics.
IBM Storage Insights Pro also includes configurable alerts and predictive analytics that
help you to reduce costs, plan capacity, and detect and investigate performance issues.
You get recommendations for reclaiming unused storage, recommendations for optimizing
the placement of tiered data, capacity planning analytics, and performance
troubleshooting tools.
IBM Storage Insights for IBM Spectrum Control is similar to IBM Storage Insights Pro in
capability, and it is available for no additional cost if you have an active license with a
current subscription and support agreement for IBM Virtual Storage Center, IBM Spectrum
Storage Suite, or any edition of IBM Spectrum Control.
IBM Spectrum Scale: Advanced storage management of unstructured data for cloud, big
data, analytics, objects, and more.
IBM Cloud Object Storage: Flexible, scalable, and simple object storage with
geo-dispersed enterprise availability and security for hybrid cloud workloads.
IBM Spectrum Discover: Modern artificial intelligence (AI) workflow and metadata
management software for exabyte-scale file and object storage with hybrid multicloud
support.
Because IBM Spectrum Storage Suite contains IBM Spectrum Control, you can deploy
IBM Storage Insights for IBM Spectrum Control.
Note: Alerts are a good way to be notified about conditions and potential problems that are
detected in your storage. If you use IBM Spectrum Control and IBM Storage Insights for
IBM Spectrum Control together to enhance your monitoring capabilities. As a best practice,
define alerts in one of the offerings and not both.
By defining all your alerts in one offering, you can avoid receiving duplicate or conflicting
notifications when alert conditions are detected.
Sign-up process
Consider the following points about the sign-up process:
For the sign-up process, you need an IBMid. If you do not have an IBMid, create your IBM
account and complete the short form.
When you register, specify an owner for IBM Storage Insights. The owner manages
access for other users and acts as the main contact.
You receive a Welcome email when IBM Storage Insights is ready. The email contains a
direct link to your dashboard.
574 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Figure 9-23 IBM Storage Insights or IBM Storage Insights for IBM Spectrum Control registration
options
2. Figure 9-24 shows the Log-in window in the registration process. If you have your
credentials, enter your IBMid and proceed to the next window by clicking Continue. If you
do not have an IBMid, click Create an IBMid.
3. If you want to create an IBMid, see Figure 9-25 for reference. Provide the following
information and click Next:
– Email
– First name
– Last name
– Country or region
– Password
Enter the one-time code that was sent to your email address.
Select by email checkbox if you want to receive information from IBM to keep you
informed about products, services, and offerings. You can withdraw your marketing
consent at any time by sending an email to netsupp@us.ibm.com. Also, you can
unsubscribe from receiving marketing emails by clicking the unsubscribe link in any email.
For more information about our processing, see the IBM Privacy Statement.
Click Create account.
4. In the next window, sign in with your IBM Account and password and review the summary
information about your IBMid account privacy, as shown in Figure 9-26 on page 577.
576 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
5. Complete the following information in the IBM Storage Insights registration form (see
Figure 9-27). The following items are mandatory:
– IBM Storage Insights service name (must be unique)
– IBMid
– First name and last name
The Welcome window guides you on the next steps, as shown in Figure 9-28.
Note: The IBM Storage Insights URL is contained in the welcome email. If you have not
received the welcome email, log in to IBM Support and open a ticket.
The Deployment Planning window provides guidance about the list of supported operating
systems for the data collectors. Data collectors are lightweight applications that are deployed
on servers or virtual machines in your data centers. Data collectors collect capacity,
configuration, and performance metadata about your monitored devices and send the
metadata for analysis over HTTPS connections to your IBM Storage Insights service.The
network security requirements, and requirements for proxy configuration. Figure 9-29 on
page 578 shows the Deployment Planning window.
578 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Example 9-7 shows the command to add the IBM Storage Insights service to the Cloud Call
Home function.
After the connection of the embedded data collector was established successfully, an
informational event will pop up in IBM Storage Insights, as shown in Figure 9-30.
Figure 9-30 IBM Storage Insights info event after the si_tenant_id was added to Cloud Call Home
To complete to process the cloud service device, which triggered the request to send data to
the IBM Storage Insights instance needs to be approved, as shown in Figure 9-31.
Figure 9-31 Storage Insights - Add Call Home with cloud service device
Example 9-8 shows the error message, which will be logged in case the IBM Storage Insights
instance id will be added to the Cloud Call Home function on a 64GB memory system.
Restriction: You cannot monitor third-party, IBM FlashSystem A9000, IBM XIV, and
IBM Spectrum Accelerate devices. To install and run the data collector, log in to the server
as root.
580 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
collectors are tested again, and the data collectors with the best response times take over
the collection of metadata.
Monitoring storage devices in multiple data centers
To avoid high network latency and avoid interruptions in the collection of metadata when
you monitor storage devices in data centers in different locations, install two or more data
collectors on separate servers in each data center.
Example: You install data collectors in your Washington and Chicago data centers and
both data centers are connected over the network. If the data collectors in your
Washington data center go offline, then the data collectors in your Chicago data center
take over the collection of your metadata for both data centers.
Note: Make your metadata collection more robust by installing more data collectors.
IBM Storage Insights automatically can switch the collection of metadata to other data
collectors if the metadata collection is interrupted.
IBM Support monitors issues with the collection of metadata to make you aware of issues
with collecting metadata. Recurring issues with collecting metadata might indicate that you
must deploy more data collectors to share the collection of metadata collection workload.
In large environments, ensure that the servers that are used to install and run the data
collectors have 4 GB of available RAM and 4 GB of available drive capacity.
Proxy servers
When you install the data collector, you can connect the data collector through a proxy server.
To connect to the proxy server, the hostname and port number is needed. The connection
through a secure proxy server, username and password credentials are needed as well.
2. Accept the license agreement for the data collector, as shown in Figure 9-33.
3. Follow the guidance in the Deploy Data Collectors window to download and deploy the
data collector, as shown in Figure 9-34.
Figure 9-34 Downloading the data collector in preparation for its installation
582 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
4. Install the data collector according to the provided instructions. Figure 9-35 on page 583
shows the installation of the data collector on a Linux host.
5. After the data collector is installed and communication is established, the IBM Storage
Insights Dashboard view starts with an Add Storage System prompt, as shown in
Figure 9-36.
.
Note: If you have multiple geographically dispersed data centers and plan to install data
collectors in each data center, the association between storage systems and data collector
can be modified after multiple data collectors are available.
Operations dashboard
You can use the Operations dashboard to identify which block storage systems or fabrics in
your inventory need attention, such as the ones with error or warning conditions. You can
manage your storage and fabrics in the Operations dashboard by using key insights and
analysis about health, capacity, and performance.
To view the Operations dashboard, select Dashboards → Operations. With IBM Storage
Insights, you get the information that you need to monitor the health of your block storage
environment and fabrics on the Operations dashboard.
You can click a storage system in the list to get an overview of the health of the storage
system components or resources, key capacity metrics, including compression savings, and
key performance metrics. You can open the GUI for the storage system from the Component
Health overview.
You can view more details about the storage system and components from the overview
(IBM Storage Insights Pro only):
Notifications details and actions that you can take to manage events.
Tickets details and actions that you can take to manage tickets.
Properties details, including editable name, location, and custom tag fields, and support
information.
Inventory of nodes and enclosures for an SVC storage system, including support
information. (IBM Storage Insights Pro only).
Data collection details, such as the status of the data collection, when the most recent
data collection occurred, and a list of the available data collectors.
584 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
To view the NOC dashboard, select Dashboards → NOC. You can display it on a dedicated
monitor in your network operations center so that you can monitor storage system changes at
a glance.
The block storage systems that are being monitored are displayed in tiles or rows on the
dashboard. Call Home must be enabled on the storage systems that are monitored.
Use the Tile view to quickly access essential information about your storage systems,
including the overall condition. The overall condition is determined by the most critical status
that was detected for the storage system's internal resources. Storage systems with error
conditions are displayed at the top of the dashboard, followed by storage systems with
warning conditions.
On each tile, a snapshot of performance and capacity is displayed. Click the tile to view the
following information:
Overview of the health of the storage system components or resources, key capacity
metrics including compression savings, and key performance metrics. You can open the
GUI for the storage system from the Component Health overview.
(IBM Storage Insights Pro only) You can view more details about the storage system and
components from the overview.
Notifications details and actions that you can take to manage events.
Tickets details and actions that you can take to manage tickets.
Properties details, including editable name, location, and custom tag fields, and support
information.
(IBM Storage Insights Pro only) Inventory of nodes and enclosures for IBM Storage
Virtualize systems, including support information, if available.
This view is useful because it can be sorted and filtered based on the selected column. For
example, Figure 9-39 shows Block Storage Systems that are sorted in ascending order by
IBM Storage Virtualize code level.
Stay informed so that you can act quickly to resolve incidents before they affect critical
storage operations.
586 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
An extra benefit is that the Call Home data is screened against a set of rules to identify
misconfiguration, deviations from best practices, and IBM Support Flashes that are
applicable. The results are displayed in IBM Storage Insights.
To see these best practices, select Insights → Advisor, as shown in In Figure 9-40.
More benefits of the Advisor can be found in IBM Documentation: Monitoring recommended
actions in the Advisor.
This section is divided into three sections to describe capacity monitoring by using any of the
following interfaces:
The management GUI
IBM Spectrum Control
IBM Storage Insights
This section describes the key capacity metrics of the IBM Storage Virtualize management
GUI, IBM Spectrum Control (based on version 5.4.6), and IBM Storage Insights.
Figure 9-41 shows how to interpret the capacity and savings in a storage environment.
588 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Dashboard
The Capacity section on the Dashboard provides an overall view of system capacity. This
section displays usable capacity, used/stored capacity, and capacity savings.
Usable capacity (see Figure 9-43) visualized the amount of physical capacity that is
available for storing data on a system, pool, array, or MDisk after formatting and RAID
techniques are applied. The pie graph and divided into three categories: Used/Stored
Capacity, Available Capacity, and Total Capacity.
Note: If data reduction is used at two layers (Data reduction pool with compressed
volumes and FlashCore Modules), it is recommended to allocate the physical storage 1:1
(physical capacity instead of the effective capacity), to prevent an out of space scenario.
Used/Stored Capacity is the amount of usable capacity that is taken up by data or overhead
capacity in a system, pool, array, or MDisk after data reduction techniques have been applied.
The Available Capacity is the amount of usable capacity that is not yet used in a system,
pool, array, or MDisk.
Total capacity is the amount of total physical capacity of all standard-provisioned and
thin-provisioned storage that is managed by the storage system. The value is rounded to two
decimal places.
To determine the usable capacity on your system through the command-line interface, several
parameter values are used from the lssystem command to calculate Used/Stored Capacity,
Available Capacity, and Total Capacity.
To get the Used/Stored Capacity value through the command-line interface, the
physical_capacity have to be subtracted by physical_free_capacity and
total_reclaimable_capacity. Example 9-9 shows how to get these value filtered from lssystem.
To get the Available Capacity value through the command-line interface, the
physical_free_capacity have to be subtracted by physical_capacity. Example 9-10 shows how
to get these value filtered from lssystem.
To get the Total Capacity through the command-line interface, the physical_capacity value
needs to be filtered from lssystem, shown in Example 9-11.
Capacity Savings (Figure 9-44 on page 591) view shows the amount of capacity that is
saved on the system by using compression, deduplication, and thin-provisioning.
Total Provisioned shows the total capacity of all volumes provisioned in the system.
Data Reduction shows the ratio of written capacity to stored capacity, where the written data
has been compressed and/or depduplicated.
590 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Total Savings shows the ratio of provisioned capacity to stored capacity, which includes the
capacity savings achieved through compression, deduplication and thin-provisioning.
Deduplication indicates the total capacity savings that the system is saved from all
deduplicated volumes. Thin-Provisioning displays the total capacity savings for all
thin-provisioned volumes on the system. You can view all the volumes that use each of these
technologies. Different system models can have more requirements to use compression or
deduplication. Verify that all system requirements before these functions are used.
Example 9-12 shows deduplication and compression savings and used capacity before and
after reduction on CLI.
Similar to the system view, this view shows the relevant capacity metrics for a specific pool.
For example, this pool is using multiple data reduction features and shows the capacity
savings for each of them.
Figure 9-45 Sidebar > Pools > Properties > Properties for Pool
For example, Figure 9-46 shows that with a usable capacity of 679.10 TiB, this compressed
array could store up to 1.99 PiB of addressable logical data (Written Capacity Limit).
592 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Figure 9-46 Sidebar > Pools > MDisks by Pools > Properties > More details
The default value is 100%, which helps to reduce Easy Tier swapping of less compressible
extents from non-compressing arrays with highly compressible extents on compressing
arrays. This approach reduces the likelihood of Easy Tier causing the compressing array to
run out of physical space.
Note: As of code level 8.5.0, a user is prevented from creating two compressing arrays in
the same pool. lsarrayrecommendation does not recommend a second compressing array
in a pool that already contains one. Supported systems with two or more compressing
arrays in the same pool that were created on a pre-8.5.0 code level are allowed to upgrade
to code level 8.5.0.
594 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
The Capacity chart (see Figure 9-48) of IBM Spectrum Control at the top of the Overview
page (select IBM Spectrum Control GUI → Storage → Block Storage Systems, and then
double-click the device) shows how much capacity is used and how much capacity is
available for storing data.
The Provisioned Capacity chart shows the written capacity values in relation to the total
provisioned capacity values before data reduction techniques are applied. The following
values are shown:
The capacity of the data that is written to the volumes as a percentage of the total
provisioned capacity of the volumes.
The amount of capacity that is still available for writing data to the thin-provisioned
volumes in relation to the total provisioned capacity of the volumes. Available capacity is
the difference between the provisioned capacity and the written capacity, which is the
thin-provisioning savings.
A breakdown of the total capacity savings that are achieved when the written capacity is
stored on the thin-provisioned volumes is also provided.
In the Capacity Overview chart, a horizontal bar is shown when a capacity limit is set for the
storage system. Hover your cursor over the chart to see what the capacity limit is and how
much capacity is left before the capacity limit is reached.
For a breakdown of the capacity usage by pool or volume, click the links (see Figure 9-48 on
page 595 and Figure 9-49 on page 595).
596 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
2. You can also click View Capacity on the Actions menu (see Figure 9-51) of each device.
3. To open the Capacity view in IBM Storage Insights, use the Resources menu. The rest of
the sequence is similar to IBM Spectrum Control with some minor cosmetic differences
(see Figure 9-50 on page 596 and Figure 9-51).
The following formula is used to calculate Used Capacity (%), as shown in Figure 9-52:
[(Used Capacity ÷ Capacity)*100]
Used Capacity (GiB) shows the amount of space that is used by the standard and
thin-provisioned volumes in the pools. If the pool is a parent pool, the amount of space that is
used by the volumes in the child pools also is calculated.
The capacity that is used by for thin-provisioned volumes is less than their provisioned
capacity, which is shown in the Provisioned Capacity (GiB) column. If a pool does not have
thin-provisioned volumes, the value for used capacity is the same as the value for provisioned
capacity.
Adjusted Used Capacity (%) shows the amount of capacity that can be used without
exceeding the capacity limit.
For example, if the capacity is 100 GiB, the used capacity is 40 GiB, and the capacity limit is
80% or 80 GiB, the value for Adjusted Used Capacity (%) is (40 GiB/80 GiB)* 100 or 50%.
Therefore, in this example, you can use 30% or 40 GiB of the usable capacity of the resource
before you reach the capacity limit (see Figure 9-53).
If the used capacity exceeds the capacity limit, the value for Adjusted Used Capacity (%) is
over 100%.
To add the Adjusted Used Capacity (%) column, right-click any column heading on the Block
Storage Systems window.
Available Capacity (GiB) shows the total amount of the space in the pools that is not used by
the volumes in the pools. To calculate available capacity, the following formula is used:
[pool capacity - used capacity]
Available Volume Capacity (GiB) shows the total amount of remaining space that can be
used by the volumes in the pools. The following formula is used to calculate this value:
[Provisioned Capacity ? Used Capacity]
The capacity that is used by thin-provisioned volumes is typically less than their provisioned
capacity. Therefore, the available capacity represents the difference between the provisioned
598 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
capacity and the used capacity for all the volumes in the pools. For Hitachi VSP
non-thin-provisioned pool capacity, the available capacity is always zero.
Note: Available Volume Capacity (GiB) was previously known as Effective Unallocated
Volume Space.
Capacity (GiB) shows the total amount of storage space in the pools. For XIV systems and
IBM Spectrum Accelerate, capacity represents the physical (“hard”) capacity of the pool, not
the provisioned (“soft”) capacity. Pools that are allocated from other pools are not included in
the total pool space.
Capacity Limit (%) and Capacity Limit (GiB) can be set for the capacity that is used by your
storage systems. For example, the policy of your company is to keep 20% of the usable
capacity of your storage systems in reserve. Therefore, you log in to the GUI as Administrator
and set the capacity limit to 80% (see Figure 9-54).
Capacity-to-Limit (GiB) shows the amount of capacity that is available before the capacity
limit is reached.
For example, if the capacity limit is 80% or 80 GiB and the used capacity is 40 GiB, the value
for Capacity-to-Limit (GiB) is (80 GiB - 40 GiB or 80% - 50%), which is 30% or 40 GiB (see
Figure 9-55).
Compression Savings (%) are the estimated amount and percentage of capacity that is saved
by using data compression across all pools on the storage system. The percentage is
calculated across all compressed volumes in the pools and does not include the capacity of
non-compressed volumes.
For storage systems with drives that use inline data compression technology, the
Compression Savings does not include the capacity savings that are achieved at the drive
level.
The following formula is used to calculate the amount of storage space that is saved:
[written capacity - compressed size]
The following formula is used to calculate the percentage of capacity that is saved:
((written capacity ? compressed size) ÷ written capacity) × 100
For example, the written capacity, which is the amount of data that is written to the volumes
before compression, is 40 GiB. The compressed size, which reflects the size of compressed
data that is written to disk, is 10 GiB. Therefore, the compression savings percentage across
all compressed volumes is 75%.
Note: The Compression Savings (%) metric is available for resources that run IBM Storage
Virtualize.
Exception: For compressed volumes that are also deduplicated, this column is blank on
storage systems that run IBM Storage Virtualize.
Deduplication Savings (%) shows the estimated amount and percentage of capacity that is
saved by using data deduplication across all DRPs on the storage system. The percentage is
calculated across all deduplicated volumes in the pools, and it does not include the capacity
of volumes that are not deduplicated.
The following formula is used to calculate the amount of storage space that is saved:
written capacity ? deduplicated size
The following formula is used to calculate the percentage of capacity that is saved:
((written capacity ? deduplicated size) ÷ written capacity) × 100
For example, the written capacity, which is the amount of data that is written to the volumes
before deduplication, is 40 GiB. The deduplicated size, which reflects the size of deduplicated
data that is written to disk, is just 10 GB. Therefore, data deduplication reduced the size of the
data that is written by 75%.
Note: The Deduplication Savings (%) metric is available for IBM FlashSystem A9000,
IBM FlashSystem A9000R, and resources that run IBM Storage Virtualize 8.1.3 or later.
Drive Compression Savings (%) shows the amount and percentage of capacity that is saved
with drives that use inline data compression technology. The percentage is calculated across
all compressed drives in the pools.
The amount of storage space that is saved is the sum of drive compression savings.
The following formula is used to calculate the percentage of capacity that is saved:
((used written capacity ? compressed size) ÷ used written capacity) × 100
Note: The Drive Compression Savings (%) metric is available for storage systems that
contain FCMs with hardware compression.
Mapped Capacity (GiB) shows the total volume space in the storage system that is mapped
or assigned to host systems, including child pool capacity.
Note: Mapped Capacity (GiB) was previously known as Assigned Volume Space.
600 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Overprovisioned Capacity (GiB) shows the capacity that cannot be used by volumes
because the physical capacity of the pools cannot meet the demands for provisioned
capacity. The following formula is used to calculate this value:
[Provisioned Capacity ? Capacity]
Shortfall (%) shows the difference between the remaining unused volume capacity and the
available capacity of the associated pool, which is expressed as a percentage of the
remaining unused volume capacity. The shortfall represents the relative risk of running out of
space for overallocated thin-provisioned volumes. If the pool has sufficient available capacity
to satisfy the remaining unused volume capacity, no shortfall exists. As the remaining unused
volume capacity grows or as the available pool capacity decreases, the shortfall increases,
and the risk of running out of space becomes higher. If the available capacity of the pool is
exhausted, the shortfall is 100%, and any volumes that are not yet fully allocated have run out
of space.
If the pool is not thin-provisioned, the shortfall percentage equals zero. If the shortfall
percentage is not calculated for the storage system, the field is left blank.
You can use this percentage to determine when the amount of over-committed space in a
pool is at a critically high level. Specifically, if the physical space in a pool is less than the
committed provisioned capacity, then the pool does not have enough space to fulfill the
commitment to provisioned capacity. This value represents the percentage of the committed
provisioned capacity that is not available in a pool. As more space is used over time by
volumes while the pool capacity remains the same, this percentage increases.
For example, the remaining physical capacity of a pool is 70 GiB, but 150 GiB of provisioned
capacity was committed to thin-provisioned volumes. If the volumes are using 50 GiB, then
100 GiB is still committed to the volumes (150 GiB - 50 GiB) with a shortfall of 30 GiB (70 GiB
remaining pool space - 100 GiB remaining commitment of volume space to the volumes).
Because the volumes are overcommitted by 30 GiB based on the available capacity in the
pool, the shortfall is 30% when the following calculation is used:
[(100 GiB unused volume capacity - 70 GiB remaining pool capacity) ÷ 100 GiB
unused volume capacity] × 100
Note: Shortfall (%) is available for DS8000, Hitachi Virtual Storage Platform, and storage
systems that run IBM Storage Virtualize.
For IBM FlashSystem A9000 and IBM FlashSystem A9000R, this value is not available.
Provisioned Capacity (%) shows the percentage of the physical capacity that is committed to
the provisioned capacity of the volumes in the pools. If the value exceeds 100%, the physical
capacity does not meet the demands for provisioned capacity. To calculate the provisioned
capacity percentage, the following formula is used:
[(provisioned capacity ÷ pool capacity) × 100]
For example, if the provisioned capacity percentage is 200% for a storage pool with a physical
capacity of 15 GiB, then the provisioned capacity that is committed to the volumes in the
pools is 30 GiB. Twice as much space is committed to the pools than is physically available to
the pools. If the provisioned capacity percentage is 100% and the physical capacity is 15 GiB,
then the provisioned capacity that is committed to the pools is 15 GiB. The total physical
capacity that is available to the pools is used by the volumes in the pools.
Provisioned Capacity (GiB) shows the total amount of provisioned capacity of volumes within
the pool. If the pool is a parent pool, it also includes the storage space that can be made
available to the volumes in the child pools.
Note: Provisioned Capacity (GiB) was previously known as Total Volume Capacity.
Safeguarded Capacity (GiB) shows the total amount of capacity that is used to store volume
backups that are created by the Safeguarded Copy feature in DS8000.
Total Capacity Savings (%) shows the estimated amount and percentage of capacity that is
saved by using data deduplication, pool compression, thin provisioning, and drive
compression, across all volumes in the pool.
The following formula is used to calculate the amount of storage space that is saved:
Provisioned Capacity ? Used Capacity
The following formula is used to calculate the percentage of capacity that is saved:
((Provisioned Capacity ? Used Capacity) ÷ Provisioned Capacity) × 100
Note: Total Capacity Savings (%) was previously known as Total Data Reduction Savings,
and it is available for IBM FlashSystem A9000 and IBM FlashSystem A9000R,
IBM Spectrum Accelerate, XIV storage systems with firmware version 11.6 or later, and
resources that run IBM Storage Virtualize.
Unmapped Capacity (GiB) shows the total amount of space in the volumes that are not
assigned to hosts.
Note: Unmapped Capacity (GiB) was previously known as Unassigned Volume Space.
In the Zero Capacity column (see Figure 9-56) on the Pools page, you can see the date,
based on the storage usage trends for the pool, when the pool will run out of available
capacity.
602 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Zero Capacity: The capacity information that is collected over 180 days is analyzed to
determine, based on historical storage consumption, when the pools will run out of
capacity. The pools that ran out of capacity are marked as depleted. For the other pools, a
date is provided so that you know when the pools are projected to run out of capacity.
If sufficient information is not collected to analyze the storage usage of the pool, None is
shown as the value for zero capacity. If a capacity limit is set for the pool, the date that is
shown in the Zero Capacity column is the date when the available capacity based on the
capacity limit will be depleted.
For example, if the capacity limit for a 100 GiB pool is 80%, it is the date when the available
capacity of the pool is less than 20 GiB. Depleted is shown in the column when the
capacity limit is reached.
The following metrics can be added to capacity charts for storage systems within capacity
planning. Use the charts to detect capacity shortages and space usage trends.
Available Repository Capacity (GiB) shows the available, unallocated storage space in
the repository for Track Space-Efficient (TSE) thin-provisioning.
Soft Capacity (GiB) shows the amount of virtual storage space that is configured for the
pool.
Note: Soft Capacity (GiB) is available for XIV systems and IBM Spectrum Accelerate
storage systems.
Available Soft Capacity (GiB) shows the amount of virtual storage space that is available
to allocate to volumes in a storage pool.
Note: Available for XIV systems and IBM Spectrum Accelerate storage systems.
Written Capacity (GiB) shows the amount of data that is written from the assigned hosts
to the volume before compression or data deduplication are used to reduce the size of the
data. For example, the written capacity for a volume is 40 GiB. After compression, the
volume used space, which reflects the size of compressed data that is written to disk, is
only 10 GiB.
Available Written Capacity (GiB) shows the amount of capacity that can be written to the
pools before inline compression is applied. If the pools are not compressed, this value is
the same as Available Capacity.
Note: Available Written Capacity (GiB) was previously known as Effective Used
Capacity.
Because data compression is efficient, a pool can run out of Available Written Capacity
while physical capacity is still available. To stay aware of your capacity needs, monitor
this value and Available Capacity.
Enterprise hard disk drive (HDD) Available Capacity (GiB) shows the amount of storage
space that is available on the Enterprise HDDs that can be used by Easy Tier for retiering
the volume extents in the pool.
Note: Enterprise HDD Available Capacity (GiB) is available for DS8000 and storage
systems that run IBM Storage Virtualize.
Enterprise HDD Capacity (GiB) shows the total amount of storage space on the
Enterprise HDDs that can be used by Easy Tier for retiering the volume extents in the
pool.
Note: Enterprise HDD Capacity (GiB) is available for DS8000 and storage systems that
run IBM Storage Virtualize.
Nearline HDD Available Capacity (GiB) shows the amount of storage space that is
available on the Nearline HDDs that can be used by Easy Tier for retiering the volume
extents in the pool.
Note: Nearline HDD Available Capacity (GiB) is available for DS8000 and storage
systems that run IBM Storage Virtualize.
Nearline HDD Capacity (GiB) shows the total amount of storage space on the Nearline
HDDs that can be used by Easy Tier for retiering the volume extents in the pool.
604 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Note: Nearline HDD Available Capacity (GiB) is available for DS8000 and storage
systems that run IBM Storage Virtualize.
Repository Capacity (GiB) shows the total storage capacity of the repository for Track
Space-Efficient (TSE) thin-provisioning.
Reserved Volume Capacity shows the amount of pool capacity that is reserved but is not
yet used to store data on the thin-provisioned volume.
Note: Reserved Volume Capacity was known as Unused Space, and it is available for
resources that run IBM Storage Virtualize.
SCM Available Capacity (GiB) shows the available capacity on storage-class memory
(SCM) drives in the pool. Easy Tier can use these drives to retier the volume extents in the
pool.
Note: SCM Available Capacity (GiB) is available for IBM Storage Virtualize systems,
such as IBM FlashSystem 9100, IBM FlashSystem 7200, and IBM Storwize family
storage systems that are configured with block storage.
SCM Capacity (GiB) shows the total capacity on SCM drives in the pool. Easy Tier can
use these drives to retier the volume extents in the pool.
Note: SCM Capacity (GiB) is available for IBM Storage Virtualize systems, such as IBM
FlashSystem 9100, IBM FlashSystem 7200, and IBM Storwize family storage systems
that are configured with block storage.
Tier 0 Flash Available Capacity (GiB) shows the amount of storage space that is available
on the Tier 0 flash solid-state drives (SSDs) that can be used by Easy Tier for retiering the
volume extents in the pool.
Note: Tier 0 Flash Available Capacity (GiB) is available for DS8000 and storage
systems that run IBM Storage Virtualize.
Tier 0 Flash Capacity (GiB) shows the total amount of storage space on the Tier 0 flash
SSDs that can be used by Easy Tier for retiering the volume extents in the pool.
Note: Tier 0 Flash Capacity (GiB) is available for DS8000 and storage systems that run
IBM Storage Virtualize.
Tier 1 Flash Available Capacity (GiB) shows the amount of storage space that is available
on the Tier 1 flash, which is read-intensive (RI) SSDs that can be used by Easy Tier for
retiering the volume extents in the pool.
Note: Tier 1 Flash Available Capacity (GiB) is available for DS8000 and storage
systems that run IBM Storage Virtualize.
Tier 1 Flash Capacity (GiB) shows the total amount of storage space on the Tier 1 flash,
which is RI SSDs that can be used by Easy Tier for retiering the volume extents in the
pool.
Note: Tier 1 Flash Capacity (GiB) is available for DS8000 and storage systems that run
IBM Storage Virtualize.
Tier 2 Flash Available Capacity (GiB) shows the available capacity on Tier 2 flash, which
are high-capacity drives in the pool. Easy Tier can use these drives to retier the volume
extents in the pool.
Note: Tier 2 Flash Available Capacity (GiB) is available for DS8000 storage systems.
Tier 2 Flash Capacity (GiB) shows the total capacity on Tier 2 flash, which is
high-capacity drives in the pool. Easy Tier can use these drives to retier the volume
extents in the pool.
Note: Tier 2 Flash Capacity (GiB) is available for DS8000 storage systems.
9.4 Creating alerts for IBM Storage Control and IBM Storage
Insights
In this section, we provide information about alerts with IBM Spectrum Control and
IBM Storage Insights. The no-charge version of IBM Storage Insights does not support alerts.
New data reduction technologies add intelligence and capacity savings to your environment.
If you use data reduction on different layers, such as hardware compression in the
IBM FlashSystem 9500 FCMs (if a IBM FlashSystem 9500 is virtualized by the SVC) and in
the DRPs, ensure that you do not have insufficient space in the back-end storage device.
Over-provisioning means that in total that more space is being assigned and promised to the
hosts. They can possibly try to store more data on the storage subsystem as physical
capacity is available, which results in an out-of-space condition.
Keep at least 15% free space for garbage collection in the background. For more
information, see 4.1.2, “Data reduction pools” on page 246.
606 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Data reduction technologies conserve some physical space. If the space that is used for the
data can be reduced, the conserved space can be used for other data. Depending on the type
of data, deletion might not free up much space.
Imagine that you have three identical or almost identical files on a file system that were
deduplicated. This issue resulted in getting a good compression ratio (CR) (three files, but
stored only once). If you now delete one file, you do not gain more space because the
deduplicated data must stay on the storage (because two other versions refer to the data).
Similar results can be seen when several FlashCopies of one source are used.
Other alerts are possible as well, but generally percentage alerts are best suited because the
alert definition applies to all pools in a storage system.
Assign a severity to an alert. Assigning a severity can help you more quickly identify and
address the critical conditions that are detected on resources. The severity that you assign
depends on the guidelines and procedures within your organization. Default assignments are
provided for each alert.
Critical The alert is critical and must be resolved. For example, alerts that notify you
when the amount of available space on a file system falls below a specified
threshold.
Warning Alerts that are not critical, but represent potential problems. For example, alerts
that notify you when the status of a data collection job is not normal.
Informational Alerts that might not require any action to resolve and are primarily for
informational purposes. For example, alerts that are generated when a new pool
is added to a storage system.
Adjust the percentage levels to the required levels as needed. The process to extend storage
might take some time (ordering, installation, provisioning, and so on).
The advantage of this way of setting up an Alert Policy is that you can add various
IBM Storage Virtualize systems to this customized alert.
Figure 9-57 shows how to start creating an Alert Policy in IBM Spectrum Control.
For IBM Storage Insights, Figure 9-58 shows how to start creating an Alert Policy.
The following example shows how to create an Alert Policy by copying the existing policy. You
also might need to change an existing Alert Policy (in our example, the Default Policy).
Consider that a storage subsystem can be active in only one Alert Policy.
Figure 9-59 shows the Default IBM FlashSystem Family policy in IBM Spectrum Control 5.4.6.
608 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Note: Unless otherwise noted, IBM Storage Insights and IBM Spectrum Control do not
differ for the steps that are described next.
Figure 9-60 describes how to copy a policy to create one. Hover your cursor over the policy
that you want to copy, click the left mouse button, and select Copy Policy.
Figure 9-61 shows how to rename the previously copied policy. The new policy is stored as
another policy. One IBM Storage Virtualize system can be added to a single policy only. You
can add the system later if you are unsure now (optionally, select Resource, and then select
the option).
Figure 9-62 on page 610 shows the newly created Alert Policy Default IBM FlashSystem
Family policy - ITSO with all alerts that were inherited from the default policy.
610 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Figure 9-63 shows how to choose the required alert definitions by selecting Pool → Capacity.
Figure 9-64 denotes the tasks for setting up the Critical definition by monitoring the Used
Capacity (%) and releasing Policy Notifications at 95%.
These methods must be defined before you can choose them if your environment does not
include pre-defined methods.
Figure 9-64 shows how to set the operator, value, and severity for the alert. It also shows how
to modify the notification frequency and select notification methods.
Figure 9-65 shows how to set up the Warning level at 90% for Used Capacity (%). To
proceed, choose the plus sign at the previously defined Definition (Critical) and complete the
information, as shown in Figure 9-65 (Operator: “>=”, Value: “90%”, and Severity “Warning”).
Figure 9-67 shows how to open the Notification Settings in IBM Spectrum Control.
612 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Note: With IBM Storage Insights, you can send notifications with the email method only.
Call Home information is also leveraged to provide more accurate predictions of when a
DRP or a compressing array will run out of space. These notifications are pushed to the
Insights → Advisor view in IBM Storage Insights, as described in Figure 9-40 on
page 587.
System health
By using the management GUI dashboard, you can detect errors in the System Health page.
For more information, see System Health Tiles for SAN Volume Controller and System Health
Tiles for FlashSystem 9x00.
There are tiles for each subset of component within each category that shows the health state
of the category.
Tiles with errors and warnings are displayed first so that components that require attention
have higher visibility. Healthy pages are sorted in order of importance in day-to-day use.
The System Health page in Figure 9-68 shows the three categories of system components.
By expanding the Hardware Components page, you can see the type of hardware
components and the respective health states, as shown in Figure 9-69.
Figure 9-69 Expanded Hardware Components view for a SAN Volume Controller Cluster
Note: The More Details view shows a tabular view of individual components with more
detail. The number of tiles also might vary between systems. For example, Figure 9-70
shows that an enclosure-based system typically has more types of hardware components
compared to SVC systems.
Figure 9-70 Expanded Hardware Components view for IBM FlashSystem 9100
The pages that have errors or warnings sort the tiles in an order that draws the most attention
to the tiles that are not optimal. For example, in Figure 9-71, Call Home and Support
Assistance are in the error status and appear at the left.
Event log
A user might become aware of a problem on a system through active monitoring of the
System Health dashboard or by receiving an alert through one of the configured notification
methods.
The dashboard is intended to get the user’s attention, and it is an entry point that directs the
user to the relevant event in Monitoring → Events and the associated fix procedure.
For example, Figure 9-71 on page 614 displays a status of Call Home - Unable to send data.
Clicking More Details leads the customer to the specific event log entry, as shown in
Figure 9-72. The Run Fix Procedure option provides instructions that the user can follow to
resolve the issue.
614 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Figure 9-72 Dashboard entry point drills down to the event log
The Events by Priority icon in the upper right of the GUI navigation area also provides a
similar entry into events that need attention, as shown in Figure 9-73 on page 615.
Port monitoring
With introducing IBM Storage Virtualize version 8.4.0, a new command was added to the CLI,
to monitor the node port error counters.
Use the lsportstats command to view the port transfer and failure counts and Small
Form-factor Pluggable (SFP) diagnostics data that is recorded in the statistics file for a node.
Example 9-13 shows the TX and RX values, as well as the zero buffer-to-buffer credits and
the amount of CRC errors for port 3 of node 1.
Example 9-13 CLI output example for lsportstats command to show the TX & RX power
IBM_FlashSystem:FS9xx0:superuser>lsportstats -node node1 3|grep
fc_wwpn&&lsportstats -node node1 3|grep power&&lsportstats -node node
1 3|grep buffer-buffer&&lsportstats -node node1 3|grep CRC
fc_wwpn ="0x500507681013xxxx"
TX power (uW) ="588"
TX power low alarm threshold ="126"
RX power (uW) ="544"
RX power low alarm threshold ="31"
zero buffer-buffer credit timer (uS) ="0"
invalid CRC error count ="0"
Clicking Block Storage Systems takes you to the list of monitored systems to identify the
specific system that is in the Error state. For example, in Figure 9-75, you can see that the
first system in the list is in the Error state.
616 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
For more information about the error condition, right-click the system and select View
Details, which displays the list of internal components and their status. Figure 9-76 shows
that the error status is due to a problem with one or more volumes.
Clicking Volumes reveals the offending volumes, as shown in Figure 9-77 on page 618.
Note: Use the column sorting function or filter to display only the objects of interest.
In this specific case, the offline state is expected because the volumes are auxiliary volumes
of inconsistent stopped Global Mirror (GM) relationships. Therefore, the status can be marked
as acknowledged, as shown in Figure 9-78.
The system and volume status no longer reports the Error state, as shown in Figure 9-79.
Note: If IBM Spectrum Control and IBM Storage Insights are monitoring the environment,
the acknowledgment must be set in both instances.
618 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Other use cases might exist in which you must replace hardware after you open a ticket in
your internal system with the vendor. In these instances, you still acknowledge the status so
that any other errors change the storage system from green to red again and you see that a
second event occurred.
One significant difference is that IBM Storage Insights has an Operations dashboard, a NOC
dashboard, and custom dashboards that can show the health status in a grid or list of tiles.
When the system of interest is selected, the detailed view shows a Component Health page,
which is similar in appearance to the corresponding view in the IBM Storage Virtualize
management GUI.
For example, Figure 9-80 shows that the system is in the Error state because of a volume
error.
The Volumes tile in the Error state allows two possible actions, as shown in Figure 9-81:
View List: Similar to IBM Spectrum Control, as shown in Figure 9-77 on page 618.
Launch Storage System GUI: Launches the system’s native GUI to identify the source of
the error, as shown in 9.5.1, “Health monitoring in the IBM Storage Virtualize GUI” on
page 613.
Table 9-4 lists some useful metrics for assessing performance problems at the volume level. A
few typical use cases are as follows:
Diagnose the elevated volume write response time that is caused by GM.
Diagnose the elevated volume write response time that is caused by slowness in
IBM Real-time Compression (RtC).
Read I/O rate The average number of read operations per IOPS
second. This value includes both sequential and
nonsequential read operations.
Write I/O rate The average number of write operations per IOPS
second. This value includes both sequential and
nonsequential write operations. Also includes write
I/Os on remote copy secondary volumes (except
HyperSwap).
Total I/O rate Read I/O rate, write I/O rate, and unmap I/O rate IOPS
combined.
Unmap I/O rate The average number of unmap operations per IOPS
second. This metric corresponds to the collected
uo statistic.
Read data rate The average number of MiBs per second that are MBps
transferred for read operations. Does not include
data that is moved by using XCOPY.
Write data rate The average number of MiBs per second that are MBps
transferred for write operations. Also includes write
I/Os on remote copy secondary volumes (except
HyperSwap). Does not include data that is written
by using XCOPY.
Total data rate Read data rate, write data rate, and unmap data MBps
rate that is combined.
Unmap data rate The average number of MiBs per second that were MBps
unmapped. This metric corresponds to the
collected ub statistic.
Read transfer size The average number of KiB that are transferred per KB
read operation.
Write transfer size The average number of KiB that are transferred per KB
write operation.
Overall transfer size The average number of KiB that are transferred per KB
I/O operation. This value includes both read/write
operations, but excludes unmap operations.
Unmap transfer size Average size of unmap requests that are received. KB
620 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Read cache hits The percentage of all read operations that find data %
in the cache. This value includes both sequential
and random read operations, and read operations
in UCA and LCA where applicable.
Write cache hits The percentage of all write operations that are %
handled in the cache. This value includes both
sequential and random write operations, and writes
operations in UCA and LCA where applicable.
Upper cache (UCA) The average number of milliseconds that it took to µs
destage latency complete each destage operation in the volume
averageb cache, that is, the time that it took to do write
operations from the volume cache to the disk.
Lower cache (LCA) The average number of milliseconds that it took to µs
destage latency complete each destage operation in the volume
averagec copy cache, tat is, the time that it took to do write
operations from the volume copy cache to the disk.
Volume GM metrics
622 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Table 9-5 lists some useful metrics for assessing performance problems at the MDisk and
drive level.
This stage of performance analysis is appropriate when a preliminary analysis at the volume
level showed that delays were below lower cache (LCA). A few typical use cases are as
follows:
Diagnose a back-end overload that is causing elevated volume read response times.
Diagnose potential SAN communication problems for external MDisks.
Back-end read I/O The number of read I/O commands that are IOPS
rate submitted per second to back-end storage.
Back-end write I/O The number of write I/O commands that are IOPS
rate submitted per second to back-end storage.
Overall back-end I/O Back-end read I/O rate and back-end write I/O rate IOPS
rate that is combined.
Back-end read data The amount of data that is read per second from MBps
rate back-end storage.
Back-end write data The amount of data that is written per second to MBps
rate back-end storage.
Overall back-end Back-end read data rate and back-end write data MBps
data rate rate combined.
Back-end read The average I/O size of all back-end reads that are KB
transfer size submitted within a stats interval.
Back-end write The average I/O size of all back-end writes that are KB
transfer size submitted within a stats interval.
Overall back-end Average of back-end read transfer size and KB
transfer size back-end read transfer size.
Back-end read I/O The number of read I/O commands that are IOPS
rate submitted per second per drive.
Back-end write I/O The number of write I/O commands that are IOPS
rate submitted per second per drive.
Overall back-end I/O Back-end read I/O rate and back-end write I/O rate IOPS
rate combined.
Back-end read data The amount of data that is read per second per MBps
rate drive.
Back-end write data The amount of data that is written per second per MBps
rate drive.
Overall back-end Back-end read data rate and back-end write data MBps
data rate rate combined.
Back-end read The average I/O size per drive of reads that are KB
transfer size submitted within a stats interval.
Back-end write The average I/O size per drive of writes that are KB
transfer size submitted within a stats interval.
Overall back-end Average of back-end read transfer size and KB
transfer size back-end read transfer size.
Drive read response The average number of milliseconds for the drive ms
time resources to respond to a read operation.
Drive write response The average number of milliseconds for the drive ms
time resources to respond to a write operation.
Overall drive Average of drive read response time and drive ms
response time write response time.
Drive read queue The average number of milliseconds that a read ms
time operation spends in the queue before the operation
is sent to the drive.b
Drive write queue The average number of milliseconds that a write ms
time operation spends in the queue before the operation
is sent to the back-end storage resources.b
Overall drive queue Average of drive read queue time and drive write ms
time queue time.b
Peak drive read The response time of the slowest read per drive in ms
response time a specific interval.c
Peak drive write The response time of the slowest write per drive in ms
response time a specific interval.c
a. Includes the latency in a redundant array of independent disks (RAID) for array MDisks.
b. High values here are indicative of an overloaded drive.
c. Peak response times are calculated as an average of the peaks in IBM Storage Insights when
the system is also monitored by IBM Spectrum Control with a lower interval, for example,
1 minute.
Note: The concept of abstraction in the IBM Storage Virtualize I/O stack requires careful
consideration when evaluating performance problems. For example, the back-end could be
overloaded even though the host workload is moderate. Other components within the I/O
stack could be generating back-end workload, for example, FlashCopy background copy,
Easy Tier extent migration, or DRP garbage collection. It might be necessary to review
other metrics that record these workloads at their respective points in the I/O stack. For
example, in IBM Storage Insights, Fast-Write Writes Data Rate (Vc) records the workload
entering upper cache, Fast-Write Writes Data Rate (Vcc) records the write workload
entering lower cache, and Data Movement Rate records the read/write workload of
garbage collection. By evaluating the workload at various points, you can determine the
cause of back-end overloading.
624 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Table 9-6 lists some useful metrics for assessing performance problems at a node level. A few
typical use cases are as follows:
Diagnose local internode delays that are causing elevated volume write response time.
Diagnose remote internode delays that are causing GM interruptions.
Diagnose high CPU core usage that is affecting multiple components in the I/O stack
adversely.
Port to Local Node The actual amount of data tht is sent from the node MBps
Send Data Rate to the other nodes in the local cluster.
Port to Local Node The actual amount of data that is received by the MBps
Receive Data Rate node from the other nodes in the local cluster.
Total Port to Local Port to node send data rate and port to node MBps
Node Data Rate receive data rate combined.
Port to Remote The actual amount of data that is sent from the MBps
Node Send Data node to the other nodes in the partner cluster.
Rate
Port to Remote The actual amount of data received by the node MBps
Node Receive Data from the other nodes in the partner cluster.
Rate
Total Port to Remote Port to node send data rate and port to node MBps
Node Data Rate receive data rate combined.
Table 9-7 lists some useful metrics for assessing performance problems at an FC port level. A
few typical use cases are as follows:
Diagnose ports at their practical bandwidth limit, which causes delayed transfers.
Diagnose ports being excluded because of nonzero cyclic redundancy check (CRC) rates.
Diagnose MDisk path exclusions due to nonzero CRC rates on ports that are used for
back-end connectivity.
Diagnose an impending small form-factor pluggable (SFP) failure due to excessive
temperature.
Diagnose low SFP Rx power, which indicates a potentially defective FC cable.
Receive data rate The average rate at which data is transferred to the MBps
port (ingress).
Send data rate The average rate at which data is transferred MBps
through from the port (egress).
Total data rate The sum of receive data rate and send data rate.a MBps
626 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Zero buffer credit The amount of time, as a percentage, that the port %
percentage. was not able to send frames between ports
because of insufficient buffer-to-buffer credit. In FC
technology, buffer-to-buffer credit is used to control
the flow of frames between ports.
Port send delay time The average number of milliseconds of delay that ms
occur on the port for each send operation. The
reason for these delays might be a lack of buffer
Port send delay I/O credits.
percentage The percentage of send operations where a delay %
occurred, relative to the total number of send
operations that were measured for the port. Use
this metric with the Port Send Delay Time metric to
distinguish a few long delays from many short
CRC error rate delays.
The average number of frames per second that are rate
received in which a cyclic redundancy check
(CRC) error is detected. A CRC error is detected
when the CRC in the transmitted frame does not
Invalid transmission match the CRC computed by the receiver.
word rate The average number of bit errors per second that rate
are detected.
a. 8 Gbps and higher fiber FC adapters are full-duplex, so they can send and receive
simultaneously at their practical limit. It is more appropriate to evaluate the send or receive
metrics separately.
b. Based on the assumption of full duplex behavior, this metric is approximated by using the
maximum of the send and receive bandwidth percentages.
Table 9-8 lists some useful miscellaneous metrics. A few typical use cases are as follows:
Diagnose an elevated volume write response time due to a full cache partition.
Diagnose elevated CPU core utilization due to aggressive garbage collection.
Max read cache The maximum amount of fullness for the amount of %
fullnessa node memory that is designated for the read
cache.
Max write cache The maximum amount of fullness for the amount of %
fullnessa node memory that is designated for the write
cache.
Max write cache The maximum amount of the lower cache that the write %
fullnessb cache partitions on the nodes that manage the pool are
using for write operations. If the value is 100%, one or
more cache partitions on one or more pools is full. The
operations that pass through the pools with full cache
partitions are queued, and I/O response times increase
for the volumes in the affected pools.
Garbage-collection metrics
Data movement rate The capacity, in MiBs per second, of the valid data MBps
in a reclaimed volume extent that garbage
collection moved to a new extent in the DRP on the
node. The valid data must be moved so that the
whole extent can be freed up or reused to write
new data. This metric corresponds to the collected
mm statistic.a
Recovered capacity The capacity in number of MiBs per second that MBps
rate was recovered by garbage collection for reuse in
the DRPs on the node. This metric corresponds to
the collected rm statistic.c
a. Measured at a node level.
b. Measured at a pool or partition level.
c. Measures the rate at which reclaimable capacity is recovered.
The complete list of raw metrics that are collected by IBM Storage Virtualize systems can be
found at Starting statistics collection.
The complete list of metrics that are derived by IBM Spectrum Control and IBM Storage
Insights can be found at IBM Spectrum Control Statistics and IBM Storage Insights Statistics.
For more information about which type of support package to collect, see What Data Should
You Collect for a Problem on IBM Storage Virtualize Systems?
A maximum of 16 files are stored on each node at any one time for each statistics file type.
The total statistics coverage depends on the statistics interval. For example, the default
setting of 15 minutes has a coverage of 4 hours; however, a 15-minute sample time is too
coarse to perform detailed performance analysis. If the system is not monitored by
IBM Spectrum Control or IBM Storage Insights, then setting the statistics interval to 5 minutes
strikes a good balance between statistics coverage and statistics granularity.
Use the startstats command to modify the interval at which statistics are collected.
628 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
Note: If the system is monitored by IBM Spectrum Control and you change the statistics
interval on the IBM Storage Virtualize system, IBM Spectrum Control reverts the change
automatically.
The performance data files might be large, especially if the data is for storage systems that
include many volumes, or the performance monitors are running with a 1-minute sampling
frequency. If the time range for the data is greater than 12 hours, volume data and 1-minute
sample data are automatically excluded from the performance data, even if it is available.
To include volume data and 1-minute sample data, select the Advanced export option (see
Figure 9-83 on page 630) when you export performance data.
When you export performance data, you can specify a time range. The time range cannot
exceed the history retention limit for sample performance data. By default, this history
retention limit is two weeks.
To export hourly or daily performance data, use the exportPerformanceData script. However,
the time range still cannot exceed the history retention limits for the type of performance data.
3. Select the time range of the performance data that you want to export. You can use the
quick select options for the previous 4, 8, or 12 hours, or specify a custom time range by
clicking the time and date. Click Create (see Figure 9-83).
Note: To include volume data if the time range that you selected is greater than 12
hours, click Advanced export.
Figure 9-83 IBM Spectrum Control: Export Performance Data - Advanced Export
630 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
After the package is created, the compressed file can be downloaded by using the browser.
The package includes different reports in csv format, as shown in Figure 9-84.
For more information about how to create a performance support package, see Exporting
performance data for storage systems and fabrics.
For more information about older versions of IBM Spectrum Control, see Performance data
collection with TPC, IBM VSC and IBM Spectrum Control.
2. Right-click the storage system and select Export Performance Data (see Figure 9-86 on
page 632).
3. Select the time range of the performance data that you want to export.
You can select a time range of the previous 4, 8, or 12 hours through the quick select
options, or specify a custom time range by clicking the time and date. Click Create. A task
is started and shown in the running tasks icon in the menu bar.
Note: To include volume data if the time range that you selected is greater than 12
hours, click Advanced export.
4. When the task is complete, click the Download icon in the running tasks list in the task to
save the file locally.
For more information about how to create a performance support package, see Exporting
performance data for storage systems.
Note: This customer option is available only in IBM Storage Insights Pro and IBM Storage
Insights for IBM Spectrum Control. IBM Support can also perform this task for systems that
are registered for the no-charge edition of IBM Storage Insights.
632 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
You can use IBM CSM to complete the following data replication tasks and help reduce the
downtime of critical applications:
Plan for replication when you are provisioning storage.
Keep data on multiple related volumes consistent across storage systems if there is a
planned or unplanned outage.
Monitor and track replication operations.
Automate the mapping of source volumes to target volumes.
Figure 9-87 is an example of the initial sync progress after CSM MM and GM sessions are
created and started.
Figure 9-88 shows the CSM sessions after they complete their initial sync.
Figure 9-88 CSM sessions that are prepared and 100% synced
Note: Recoverable is now Yes, which indicates that there is a consistent recovery point.
One of the most important events that must be monitored when IBM Storage Virtualize
systems are implemented in a disaster recovery (DR) solution with GM functions is checking
whether GM was suspended because of a 1920 or 1720 error.
IBM Storage Virtualize can suspend the GM relationship to protect the performance on the
primary site when GM starts to affect write response time. That suspension can be caused by
several factors.
IBM Storage Virtualize systems do not restart the GM automatically. They must be restarted
manually.
IBM Storage Virtualize systems alert monitoring is explained in 9.1.1, “Monitoring by using
the management GUI” on page 552. When MM or GM is managed by CSM and a 1920 error
occurs, CSM can automatically restart GM sessions. The delay time on the automatic restart
option is configurable. This delay allows some time for the underlying cause to dissipate.
Automatic restart is disabled in CSM by default.
Figure 9-89 shows the path to enable automatic restart of GM sessions. You select
Sessions → Select Session → Session Actions → View/Modify → Properties → H1-H2
options.
If you have several sessions, you can stagger the delay time so that they do not all restart at
the same time, which can affect system performance. Choose the set delay time feature to
define a time in seconds for the delay between when IBM CSM processes the 1720/1920
event and when the automatic restart is issued.
An automatic restart is attempted for every suspend with reason code 1720 or 1920 up to a
predefined number of times within a 30-minute period.
The number of times that a restart is attempted is determined by the storage system’s
gmlinktolerance value. If the number of allowable automatic restarts is exceeded within the
period, the session does not restart automatically on the next unexpected suspend. Issue a
Start command to restart the session, clear the automatic restart counters, and enable
automatic restarts.
Warning: When you enable this option, the session is automatically restarted by the CSM
server. When this situation occurs, the secondary site is not consistent until the
relationships are fully resynched.
634 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch9 SVC-MONITORING.fm
You can specify the amount of time (in seconds) that the CSM server waits after an
unexpected suspend before automatically restarting the session. The range of possible
values is 0 - 43,200. The default is 0, which specifies that the session is restarted immediately
after an unexpected suspend.
Figure 9-90 displays the secondary consistency warning when automatic GM restart is
enabled.
An example of script usage is one to check at a specific interval time whether MM or GM are
still active, if any 1920 errors occurred, or to react to an SNMP or email alert that is received.
Then, the script can start some specific recovery action based on your recovery plan and
environment.
Customers who do not use IBM Copy Service Manager often create their own scripts. These
scripts are sometimes supported by IBM as part of ITS professional services or IBM System
Lab Services. Tell your IBM representative what kind of monitoring that you want to
implement with scripts, and together you can try to find whether a solution is in the
IBM Intellectual Capital Management repository that can be reused.
An example of such a script can be found at Example 2: Restarting any stopped Remote
Copy relationships every 10 minutes.
write_endurance_used Indicates the drive writes per day (DWPD). This value is blank for
drives that are not SSD drives. The value must be 0 - 255.
This value indicates the percentage of life that is used by the drive.
The value 0 indicates that full life remains, and 100 indicates that
the drive is at or past its end of life.
The drive must be replaced when the value exceeds 100.
This value is blank for drives that are either one of the following:
1. Not SSDs.
2. SSDs that predate support of the endurance indicator.
This value also applies to drives that are yet to be polled, which can
take up to 24 hours.
write_endurance_usage_rate Indicates the DWPD usage rate. The values are the following ones:
Measuring: No rate information available.
High: The drives will not last as expected (~4.5 years).
Marginal: The drives will last as expected (~4.5 - 5.5 years).
Low: The drives will last as expected (~5.5 years or more).
This value is blank for non-SSD drives.
This field displays a value only when the write_endurance_used
value changes.
replacement_date Indicates the date of a potential drive failure. The format must be
YYMMDD. This value is blank for non-SSD drives.
When write_endurance_usage_rate is high, an event is reported with error code 2560 and
event code 010126. The description of this event is as follows:
The usage rate for a flash drive is high, which can affect the expected lifespan
of the drive.
When write_endurance_used is greater than 95%, an event is reported with error code 2560
and event ID 010125. The description of this event is as follows:
A flash drive is expected to fail due to the write endurance value exceeding 95.
636 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
10
This chapter provides guidance about the maintenance activities of IBM Storage Virtualize
software. This guidance can help you maintain an infrastructure with the levels of availability,
reliability, and resiliency that are required by complex applications, and to keep up with
environmental growth needs.
You can also find tips and guidance to simplify the storage area network (SAN) administration
tasks that are used daily, such as adding users, storage allocation and removal, adding or
removing a host from the SAN, and create procedures to manage your environment.
The discussion in this chapter focuses on the IBM FlashSystem 9500, and uses screen
captures and command outputs from this model. The recommendations and practices that
are described in this chapter are applicable to the following device models:
IBM FlashSystem 5015
IBM FlashSystem 5035
IBM FlashSystem 5045
IBM FlashSystem 5100
IBM FlashSystem 5200
IBM FlashSystem 7200
IBM FlashSystem 7300
IBM FlashSystem 9100
IBM FlashSystem 9200
IBM FlashSystem 9500
IBM SAN Volume Controller (SVC) SV2 - SV3 - SA2
Note: The practices that are described in this chapter were effective in many deployments
of different models of the IBM Storage Virtualize family. These deployments were
performed in various business sectors for various international organizations. They all had
one common need: to manage their storage environment easily, effectively, and reliably.
A best practice is to use the interface most appropriate to the task that you are attempting to
complete. For example, a manual software update is best performed by using the service
assistant GUI rather the CLI. Running fix procedures to resolve problems or configuring
expansion enclosures can be performed only by using the management GUI. Creating many
volumes with customized names is best performed by using a script on the CLI. To ensure
efficient storage administration, it is a best practice to become familiar with all available user
interfaces.
Note: Taking advantage of your system GUI, you can easily verify performance information
such as input/output operations per second (IOPS), latency, port utilization, host status,
and other several sensitive information from your system. Graphics also are used to
compare past statuses of your system.
638 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
For more information about the task menus and functions of the management GUI, see
Chapter 4, “IBM Spectrum Virtualize GUI”, of Implementation Guide for IBM Storage
FlashSystem and IBM SAN Volume Controller: Updated for IBM Storage Virtualize Version
8.6, SG24-8542.
Important: If used incorrectly, the service actions that are available through the service
assistant can cause loss of access to data or even data loss.
You can connect to the service assistant on one node canister by entering the service IP
address. If there is a working communications path between the node canisters, you can view
status information and perform service tasks on the other node canister by making the other
node canister the current node. You do not have to reconnect to the other node. On the
system itself, you also can access the service assistant interface by using the technician port.
The service assistant provides facilities to help you service only control enclosures. Always
service the expansion enclosures by using the management GUI.
You can also complete the following actions by using the service assistant:
Collect logs to create and download a package of files to send to support personnel.
Provide detailed status and error summaries.
Remove the data for the system from a node.
Recover a system if it fails.
Install a code package from the support site or rescue the code from another node.
Update code on node canisters manually.
Configure a control enclosure chassis after replacement.
Change the service IP address that is assigned to Ethernet port 1 for the current node
canister.
Install a temporary Secure Shell (SSH) key if a key is not installed and CLI access is
required.
Restart the services that are used by the system.
Halt the system for maintenance (parts replacement).
To access the Service Assistant Tool GUI, start a supported web browser and go to
https://<system_ip_address>/service, where <system_ip_address> is the service IP
address for the node canister or the management IP address for the system on which you
want work.
Nearly all the functions that are offered by the CLI also are available through the management
GUI. However, the CLI does not provide the fix procedures or the performance graphics that
are available in the management GUI. Alternatively, use the CLI when you require a
configuration setting that is unavailable in the management GUI.
Running help in a CLI displays a list of all available commands. You have access to a few
other UNIX commands in the restricted shell, such as grep and more, which are useful in
formatting output from the CLI commands. Reverse-i-search (Ctrl+R) is also available.
tr Translates characters.
For more information about command reference and syntax, see the following resources:
Command-line interface IBM FlashSystem 9500, 9200 and 9100
IBM Storage Virtualize for SAN VolumeController and FlashSystem Family -
Command-Line Interface User’s Guide
For more information about the use of the service CLI, see Service command-line interface.
640 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
When you do not know, or cannot use, the service IP address for the node canister in the
control enclosure and must set the address.
When you have forgotten the superuser password and must reset the password.
For more information about the usage of the USB port, see this Procedure: Getting node
canister and system information by using a USB flash drive.
Technician port
The technician port is an Ethernet port on the back window of the controller canister. You can
use it to configure the node. The technician port can be used to do most of the system
configuration operations, which include the following tasks:
Defining a management IP address
Initializing a new system
Servicing the system
For more information about the usage of the technician port, see The technician port.
SAN storage equipment management consoles often do not provide direct access to stored
data, but you can easily shut down (accidentally or deliberately) a shared storage controller
and any number of critical applications along with it. Moreover, having individual user IDs set
for your storage administrators allows much better auditing of changes if you must analyze
your logs.
The IBM Storage Virtualize 8.6 family supports the following authentication methods:
Local authentication by using a password
Local authentication by using SSH keys
Remote authentication by using Lightweight Directory Access Protocol (LDAP) (Microsoft
Active Directory or IBM Security Directory Server)
Multifactor authentication support
Single sign-on (SSO) support
to assign users who administer the system and perform tasks such as provisioning
storage.
Copy Operator: Users with this role have monitor role privileges and can create, change,
and manage all Copy Services functions but cannot create consistency groups or modify
host mappings.
Service: Users can delete dump files, add and delete nodes, apply service, and shut down
the system. Users can also perform the same tasks as users in the monitor role.
Monitor: Users with this role can view objects, but cannot manage the system or its
resources. Support personnel can be assigned this role to monitor the system and to
determine the cause of problems. This role is suitable for use by automation tools such as
the IBM Storage Insights data collector for collecting status about the system. For more
information about IBM Storage Insights, see Chapter 9, “Implementing a storage
monitoring system” on page 551.
Restricted Administrator: Users with this role can perform the same tasks as the Security
Administrator role, but are restricted from deleting certain objects. Support personnel can
be assigned this role to solve problems
3-Site Administrator: Users with this role can configure, manage, and monitor 3-site
replication configurations through certain command operations only available on the 3-Site
Orchestrator. This is the only role intended to be used with the 3-site Orchestrator.
VASA Provider: Users with this role can manage virtual volumes or vVols that are used by
VMware vSphere and managed through a VASA Provider.
FlashCopy Administrator: Users can create, change, and delete all the existing FlashCopy
mappings and consistency groups as well as create and delete host mappings. For more
information, see FlashCopy commands.
In addition to standard groups, you also can configure ownership groups to manage access to
resources on the system. An ownership group defines a subset of users and objects within
the system. You can create ownership groups to further restrict access to specific resources
that are defined in the ownership group. For more details, see Ownership groups.
Users within an ownership group can view or change only resources within the ownership
group in which they belong. For example, you can create an ownership group for database
administrators to provide monitor-role access to a single pool that is used by their databases.
Their views and privileges in the management GUI are automatically restricted, as shown in
Figure 10-1 only the child pool with the associated volume is listed.
Regardless of the authentication method that you choose, complete the following tasks:
Create individual user IDs for your Storage Administration staff. Choose user IDs that
easily identify the user and meet your organization’s security standards.
Include each individual user ID into the UserGroup with only enough privileges to perform
the required tasks. For example, your first-level support staff probably requires only
642 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Monitor group access to perform their daily tasks, but second-level support might require
Restricted Administrator access. Consider using Ownership groups to further restrict
privileges.
If required, create generic user IDs for your batch tasks, such as Copy Services or
Monitoring. Include them in a Copy Operator or Monitor UserGroup. Never use generic
user IDs with the SecurityAdmin privilege in batch tasks.
Create unique SSH public and private keys for each administrator requiring local access.
Store your superuser password in a safe location in accordance to your organization’s
security guidelines and use it only in emergencies.
For users with local authentication, it is a best practice to enable a password policy
(length/expiry) that respects security standards.
Enable multifactor authentication (MFA).
Use single sign-on (SSO) access if it is supported by your organization.
10.3 Volumes
A volume is a logical disk that is presented to a host by an I/O group (pair of nodes), and
within that group a preferred node serves I/O requests to the volume.
When you allocate and deallocate volumes to hosts, consider the following guidelines:
Before you allocate new volumes to a server with redundant disk paths, verify that these
paths are working well, and that the multipath software is free of errors. Fix disk path
errors that you find in your server before you proceed.
When you plan for future growth of space-efficient volumes, determine whether your
server’s operating system supports the particular volume to be extended online. AIX 6.1
TL2 and earlier, for example, do not support online expansion of rootvg logical unit
numbers (LUNs). Test the procedure in a non-production server first.
Always cross-check the host LUN ID information with the vdisk_UID of IBM Storage
Virtualize. Do not assume that the operating system recognizes, creates, and numbers the
disk devices in the same sequence or with the same numbers as you created them in
IBM Storage Virtualize.
Ensure that you delete any volume or LUN definition in the server before you unmap it in
IBM Storage Virtualize. For example, in AIX, remove the hdisk from the volume group
(reducevg) and delete the associated hdisk device (rmdev).
Consider keeping volume protection enabled. If this option is not enabled on your system,
use the command chsystem vdiskprotectionenabled yes -vdiskprotectiontime
<value_in_minutes>. Volume protection ensures that some CLI actions (most the ones
that either explicitly or implicitly remove host-volume mappings or delete volumes) are
policed to prevent the removal of mappings to volumes or deletion of volumes that are
considered active, that is, the system detected I/O activity to the volume from any host
within a specified period (15 - 1440 minutes).
Note: Volume protection cannot be overridden by using the -force flag in the affected
CLI commands. Volume protection must be disabled to perform an activity that is
blocked.
Ensure that you explicitly remove a volume from any volume-to-host mappings and any
copy services relationship to which it belongs before you delete it.
If you issue the svctask rmvdisk command and IBM Storage Virtualize still has pending
mappings, the system prompts you to confirm the action, which is a hint that you might
have done something incorrectly.
When you are deallocating volumes, plan for an interval between unmapping them to
hosts (rmvdiskhostmap) and deleting them (rmvdisk). One conservative recommendation
is a minimum of a 48-hour period, and having at least one business day interval between
unmapping and deleting so that you can perform a quick backout if you later realize you
still need some data on that volume.
For more information about volumes, see Chapter 5, “Volumes” on page 317.
10.4 Hosts
A host is a physical or virtual computer that is mapped inside your system that is directly
attached or added by using your SAN (switch) through Fibre Channel (FC), internet Small
Computer Systems Interface (ISCSI), and other protocols.
When you add and remove hosts in IBM Storage Virtualize, consider the following guidelines:
Before you map new servers to IBM Storage Virtualize, verify that they are all error-free.
Fix errors that you find in your server and IBM Storage Virtualize before you proceed. In
IBM Storage Virtualize, pay special attention to anything inactive in the lsfabric
command.
Plan for an interval between updating the zoning in each of your redundant SAN fabrics,
such as at least 30 minutes. This interval allows for failover to occur and stabilize, and for
you to be notified if unexpected errors occur.
After you perform the SAN zoning from one server’s Host Bus Adapter (HBA) to
IBM Storage Virtualize, you should list the host’s worldwide port name (WWPN) by using
the lshbaportcandidate command. Use the lsfabric command to certify that it was
detected by the IBM Storage Virtualize nodes and ports that you expected. When you
create the host definition in the IBM Storage Virtualize (mkhost), try to avoid the -force
parameter. If you do not see the host’s WWPNs, it might be necessary to scan the fabric
from the host. For example, use the cfgmgr command in AIX.
For more information about hosts, see Chapter 8, “Hosts” on page 519.
Most of the following sections explain how to prepare for the software update. These sections
also present version-independent guidelines about how to update the IBM Storage Virtualize
family systems and flash drives.
644 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Before you update the system, ensure that the following requirements are met:
Download the latest system update package and update test utility. The latest package
can be obtained directly using the new download function at the IBM Storage Virtualize
system or by using the FixCentral download option. For more details, see Obtaining the
software packages.
All node canisters are online.
All errors in the system event log are addressed and marked as fixed.
There are no volumes, managed disks (MDisks), or storage systems with Degraded or
Offline status.
The service assistant IP address is configured on every node in the system.
The system superuser password is known.
The system configuration is backed up and saved (preferably off-site), as shown in
Example 10-17 on page 694.
You can physically access the hardware.
The following actions are not required, but are recommended to reduce unnecessary load on
the system during the update:
Stop all Metro Mirror (MM), Global Mirror (GM), or HyperSwap operations.
Avoid running FlashCopy operations.
Avoid migrating or formatting volumes.
Stop collecting IBM Spectrum Control performance data for the system.
Stop automated jobs that access the system.
Ensure that no other processes are running on the system.
If you want to update without host I/O, then shut down all hosts.
For additional information see: IBM Storage Virtualize Family of Products Upgrade Planning.
Note: For customers who purchased the IBM Storage Virtualize they may select from IBM
Storage Expert Care Basic, IBM Storage Expert Care Advanced or IBM Storage Expert
Care Premium. Storage Expert Care is designed to simplify and standardize the support
approach on the IBM Storage FlashSystem portfolio. Customers can select their preferred
level of service and support at the time of the system purchase. For more details see: and
IBM Storage Expert Care.
If a customer has purchased the IBM Storage Expert Care Premium, two code upgrades
per year, which are performed by IBM are included. These upgrades are done by the IBM
dedicated Remote Code Load (RCL) team or, where remote support is not allowed or
enabled, by an onsite IBM Systems Service Representative (IBM SSR).
For more information about Remote Code Load, see Chapter 10.8, “Remote Code Load”
on page 663
Using the example of an IBM Storage Virtualize 9500, log in to the web-based GUI and find
the current version. You can find the current version by doing either of the following actions:
At the upper right, click the question mark symbol (?) and select About IBM Storage
Virtualize 9500 to display the current version.
Select Settings → System → Update System to display both the current and target
versions.
Figure 10-2 shows the Update System output window and displays the current and latest
code levels. In this example, the code level is 8.5.0.7.
Figure 10-3 shows the Update System output where code level is up to date. In this example,
the code level is 8.6.0.0.
Alternatively, if you use the CLI, run the lssystem command. Example 10-1 on page 646
shows the output of the lssystem CLI command and where the code level output can be
found.
IBM Storage Virtualize software levels are specified by 4 digits in the following format (in our
example, V.R.M.F = 8.5.0.7):
V is the major version number.
R is the release level.
M is the modification level.
F is the fix level.
To update your system by using the most suitable code version, check the following examples
and define which version to use:
The specific version of an application or other component of your SAN Storage
environment has a known problem or limitation.
The latest IBM Storage Virtualize software release is not yet cross-certified as compatible
with another key component of your SAN storage environment.
Your organization has mitigating internal policies, such as the usage of the “latest release
minus 1” or requiring “seasoning” in the field before implementation in a production
environment.
646 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
For more information, see IBM Storage Virtualize Family of Products Upgrade Planning.
Note: The option to upload packages manually only applies to IBM Storage Virtualize
Software 8.5.4 and above. On IBM Storage Virtualize Systems running below 8.5.4 the
step “Provide the package manually” can be skipped.
On the Upload page, drag and drop the files or click Select files. Select the test utility,
update package, or patch files that you want to update.
Click Upload.
648 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
After uploading a valid upgrade package, test utility or patch, the management interface
will unhide the corresponding Test only, Test & Upgrade or Install patches buttons as
shown in Figure 10-6.
Note: The option to upload packages directly only applies to IBM Storage Virtualize
Software 8.5.4 and above. For IBM Storage Virtualize Systems running below 8.5.4 the
option “Obtain the package directly” is not available.
If part or all your current hardware is not supported at the target code level that you want to
update to, replace the unsupported hardware with newer models before you update to the
target code level.
Conversely, if you plan to add or replace hardware with new models to an existing cluster, you
might have to update your IBM Storage Virtualize code level first.
For more information about Code and Hardware Interoperability for IBM Storage Virtualize
see: Concurrent Compatibility and Code Cross Reference for IBM Storage Virtualize.
650 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Applications often certify only the operating system that they run under and leave the task of
certifying its compatibility with attached components (such as SAN storage) to the operating
system provider. However, various applications might use special hardware features or raw
devices and certify the attached SAN storage. If you have this situation, consult the
compatibility matrix for your application to certify that your IBM Storage Virtualize target code
level is compatible.
By cross-checking that the version of IBM Storage Virtualize is compatible with the versions of
your SAN environment components, you can determine which one to update first. By
checking a component’s update path, you can determine whether that component requires a
multistep update.
For IBM Storage Virtualize systems, if you are not making major version or multi-step updates
in any components, the following update order is recommended to avoid problems:
1. Back-end storage controllers (if present)
2. Host HBA microcode, driver, and multipath software
3. IBM Storage Virtualize system
4. IBM Storage Virtualize internal SAS and NVMe drives (if present)
Attention: Do not update two components of your IBM Storage Virtualize system
simultaneously, such as an IBM SAN Volume Controller model SV3 and one backend
storage controller. This caution is true even if you intend to perform this update with your
system offline. An update of this type can lead to unpredictable results, and an unexpected
problem is much more difficult to debug.
Because you are updating IBM Storage Virtualize, also update your SAN switches code to the
latest supported level. Start with your principal core switch or director, continue by updating
the other core switches, and update the edge switches last. Update one entire fabric (all
switches) before you move to the next one so that a problem you might encounter affects only
the first fabric. Begin your other fabric update only after you verify that the first fabric update
has no problems.
If you are not running symmetrical, redundant, or independent SAN fabrics, you should be
because a lack of them represents a single point of failure.
However, to ensure that this feature works, the failover capability of your multipath software
must be working correctly. This capability can be mitigated by enabling N_Port ID
Virtualization (NPIV) if your current code level supports this function. For more information
about NPIV, see Chapter 2, “Storage area network guidelines” on page 121.
Before you start the IBM Storage Virtualize update preparation, check the following items for
every host that is attached to IBM Storage Virtualize that you update:
The operating system type, version, and maintenance or fix level
The make, model, and microcode level of the HBAs
The multipath software type, version, and error log
Fix every problem or “suspect” that you find with the disk path failover capability. Because a
typical IBM Storage Virtualize environment can have hundreds of hosts that are attached to it,
a spreadsheet might help you with the Attached Hosts Preparation tracking process. If you
have some host virtualization, such as VMware ESX, AIX logical partitions (LPARs),
IBM Virtual I/O Server (VIOS), or Solaris containers in your environment, verify the
redundancy and failover capability in these virtualization layers.
You must successfully finish the update in one cluster before you start the next one. Try to
update the next cluster as soon as possible to the same code level as the first one. Avoid
running them with different code levels for extended periods. The recommendation is to start
with the AUX cluster.
You can use the management GUI or the CLI to install and run the Upgrade Test Utility.
652 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Note: In case the IBM Storage Virtualize Systems is running 8.5.4 or above, you have to
upload the package first to unhide the Test only button. For details see, Chapter ,
“Provide the package manually” on page 648.
3. Select the test utility that you downloaded from the Fix Central support site. Upload the
Test utility file and enter the code level to which you are planning to update. Figure 10-9
shows the IBM Storage Virtualize management GUI window that is used to install and run
the Upgrade Test Utility.
Figure 10-9 IBM Storage Virtualize Upgrade Test Utility by using the GUI
4. Click Test. The test utility verifies that the system is ready to be updated. After the Update
Test Utility completes, you are presented with the results. The results state that no
warnings or problems were found, or directs you to more information about known issues
that were discovered on the system.
Figure 10-10 on page 654 shows a successful completion of the Upgrade Test Utility.
Figure 10-10 Example result of the IBM Storage Virtualize Upgrade Test Utility
Example 10-2 Copying the upgrade test utility to IBM Storage Virtualize
C:\Program Files\PuTTY>pscp -unsafe
C:\Users\120688724\Downloads\IBM_INSTALL_FROM_8.5_AND_LATER_upgradetest_40.0
superuser@9.42.162.153:/home/admin/upgrade
Keyboard-interactive authentication prompts from server:
| Password:
End of keyboard-interactive prompts from server
IBM_INSTALL_FROM_8.5_AND_ | 411 kB | 411.7 kB/s | ETA: 00:00:00 | 100%
654 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
2. Ensure that the file was successfully copied, by checking the output of the lsdumps
-prefix /home/admin/upgrade command at the Storage Virtualize CLI. An example is
shown in Example 10-3
3. Install and run Upgrade Test Utility in the CLI, as shown in Example 10-4. In this case, the
Upgrade Test Utility found no errors but one warning and completed successfully.
IBM_FlashSystem:FS9500:superuser>svcupgradetest -v 8.6.0.0
svcupgradetest version 40.0
This version of svcupgradetest believes that this system is already running the
latest available level of code.
No upgrade is required at this time.
If you believe this is not correct, please check the support
website to see if a newer version of this tool is available.
If you are installing an ifix, and no further issues are reported below,
please continue with the upgrade.
The upgrade utility has detected that the current security settings on the
system are below the recommended levels.
More information on the recommended SSL protocol and SSH protocol levels can be
found by searching
"Changing security protocols" in the IBM Documentation.
Review the output to check whether there were any problems that were found by the utility.
The output from the command either states that no problems were found, or it directs you to
details about known issues that were discovered on the system.
Note: Always use the latest available Upgrade Test Utility version. This tool runs tests to
certify that your hardware can receive the code level that you are planning to install.
Whichever method (GUI, CLI, or manual) that you choose to perform the update, make sure
that you adhere to the following guidelines for your IBM Storage Virtualize software update:
Schedule the IBM Storage Virtualize software update for a low I/O activity time. The
update process puts one node at a time offline. It also disables the write cache in the I/O
group that node belongs to until both nodes are updated. Therefore, with lower I/O, you
are less likely to notice performance degradation during the update.
Never power off, restart, or reset an IBM Storage Virtualize node during a software update
unless you are instructed to do so by IBM Support. Typically, if the update process
encounters a problem and fails, it backs out. The update process can take 1 hour per node
with a further, optional, 10 minute mid-point delay.
If you are planning for a major IBM Storage Virtualize version update, see the Code Cross
Reference.
Check whether you are running a web browser type and version that is supported by the
IBM Storage Virtualize target software level on every computer that you intend to use to
manage your IBM Storage Virtualize.
This section describes the steps that are required to update the software.
656 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
5. Select whether you want to create intermittent pauses in the update to verify the process.
Select one of the following options.
– Full automatic update without pauses (recommended).
– Pausing the update after half of the nodes are updated.
– Pausing the update before each node updates.
6. Click Finish. As the canisters on the system are updated, the management GUI displays
the progress for each canister.
7. Monitor the update information in the management GUI to determine when the process is
complete.
– Because the update process takes some time, the installation command completes
when the software level is verified by the system. To determine when the update is
completed, you must either display the software level in the system VPD or look for the
Software update complete event in the error/event log. If any node fails to restart with
the new software level or fails at any other time during the process, the software level is
backed out.
– During an update, the version number of each node is updated when the software is
installed and the node is restarted. The system software version number is updated
when the new software level is committed.
– When the update starts, an entry is made in the error or event log and another entry is
made when the update completes or fails.
4. Issue the following CLI command to start the update process:
applysoftware -file <software_update_file>
where <software_update_file> is the file name of the software update file. If the system
identifies any volumes that would go offline as a result of restarting the nodes as part of
the system update, the software update does not start. An optional -force parameter can
be used to indicate that the update continues regardless of the problem identified. If you
use the -force parameter, you are prompted to confirm that you want to continue.
5. Issue the following CLI command to check the status of the update process:
lsupdate
This command displays success when the update is complete.
6. To verify the node software version issue the lsnodecanistervpd on each node.
To confirm the cluster version issue the lssystem command. See Example 10-5.
For more details about software upgrade see: Updating the system software
Initially, patching will only be used to updated software outside of the I/O path and the
installation of a patch will have no impact to I/O.
658 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
A Patch will modify/create/delete one or more files on the boot disk. Several Patches can be
installed, but each Patch is on top of the already existing patch and can only be reverted in
order.
To check and verify if and what patch is installed see Chapter 10.6.2, “Verifying a software
patch” on page 659.
Note: patches (that are not obsolete) are retained across code upgrades.
Issue the CLI command satask removepatch -patch <patch_file> <panel_name> to remove
patches on one or more nodes. lsservicenodes will list error 843 (patch mismatch) in case
patches are not consistent within the cluster. See Example 10-8 on page 661 for more details.
660 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
IBM_FlashSystem:FS9500:superuser>sainfo lsservicenodes
panel_name cluster_id cluster_name node_id node_name relation node_status
error_data
01-2 0000020421600428 FS9110 2 node2 local Active
843 1 IBM_PATCH_SVT00001_1.0
01-1 0000020421600428 FS9110 1 node1 partner Active
To remove all patches from a node in the system, enter the following command as shown in
Example 10-9:
When used on an array member drive, the update checks for volumes that depend on the
drive and refuses to run if any are found. Drive-dependent volumes are usually caused by
non-redundant or degraded redundant array of independent disks (RAID) arrays. Where
possible, you should restore redundancy to the system by replacing any failed drives before
upgrading the drive firmware. When this task is not possible, you can either add redundancy
to the volume by adding a second copy in another pool, or use the -force parameter to
bypass the dependent volume check. Use -force only if you are willing to accept the risk of
data loss on dependent volumes (if the drive fails during the firmware update).
Note: Due to some system constraints, it is not possible to produce a single NVMe
firmware package that works on all NVMe drives on all IBM Storage Virtualize code levels.
Therefore, you find three different NVMe firmware files that are available for download
depending on the size of the drives that you installed.
3. Click Next to start the upgrade test utility. An example of the result is shown in
Figure 10-14.
4. Click Continue Upgrade to start the upgrade process of the selected drives.
5. To monitor the progress of the upgrade, select Monitoring → Background Tasks.
662 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Note: The maximum number of drive IDs that can be specified on a CLI by using the
-drive option is 128. If you have more than 128 drives, use the -all option or run
multiple invocations of applydrivesoftware to complete the update.
3. Issue the following CLI command to check the status of the update process:
lsdriveupgradeprogress
This command displays success when the update is complete.
4. To verify that the update successfully completed, issue the lsdrive command for each
drive in the system. The firmware_level field displays the new code level for each drive.
Example 10-10 demonstrates how to list the firmware level for four specific drives.
IBM Assist on-Site (AOS) or remote support center, or Secure Remote Access (SRA),
including Call Home enablement, are required to enable RCL. With AOS enabled, a member
of the IBM Support team can view your desktop and share control of your mouse and
keyboard to get you on your way to a solution. The tool also can speed up problem
determination, collection of data, and your problem solution.
For more information about configuring support assistance, see IBM Documentation -
Remote Code Load.
To request RCL for your system, go to IBM Support - Remote Code Load and select your
product type. Then, complete the following steps:
1. At the IBM Remote Code Load web page, select Product type → Book Now - IBM
FlashSystem or SVC Remote Code Load.
2. Click Schedule Service to start scheduling the service, as shown in Figure 10-15.
3. Select the Product type for RCL. Choose your device Model and Type, and click Select.
The example in Figure 10-16 is an SVC - 2147.
664 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
4. In the RCL timeframe option, select the date (Figure 10-17) and timeframe (Figure 10-18).
5. Enter your booking details into the RCL booking information form (Figure 10-19).
6. RCL will confirm booking via email and then nearer the date they will contact the client to
outline the RCL process and what to expect.
If you have the IBM Expert Care coverage feature for your IBM FlashSystem, make sure that
your Technical Account Manager (TAM) is aware of the procedure and engaged with the
service team to proceed with the replacement.
For more information, see IBM FlashSystem documentation - Removing and Replacing a
Drive.
666 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Note: Reseating an FCM can reformat the module in specific instances. All FCM drive
failure alerts must be addressed before any re-seat or replacement procedure is done. On
receiving any error message for the FCM drives, escalate the problem to IBM Support.
Alternatively, if the HBA of a server is removed and installed in a second server and the SAN
zones for the first server and the IBM Storage Virtualize host definitions are not updated, the
second server can access volumes that it probably should not access.
If you are using server virtualization, verify the WWPNs in the server that is attached to the
SAN, such as AIX Virtual Input/Output (VIO) or VMware ESX. Cross-reference with the
output of the IBM Storage Virtualize lshost <hostname> command, as shown in
Example 10-12.
For storage allocation requests that are submitted by the server support team or application
support team to the storage administration team, always include the server’s HBA WWPNs to
which the new LUNs or volumes are supposed to be mapped. For example, a server might
use separate HBAs for disk and tape access or distribute its mapped LUNs across different
HBAs for performance. You cannot assume that any new volume is supposed to be mapped
to every WWPN that the server logged in the SAN.
668 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
If your organization uses a change management tracking tool, perform all your SAN storage
allocations under approved change requests with the servers’ WWPNs that are listed in the
Description and Implementation sections.
If your organization uses a change management tracking tool, include the vdisk_UID and
LUN ID information in every change request that performs SAN storage allocation or
reclamation.
Note: Because a host can have many volumes with the same scsi_id, always
cross-reference the IBM Storage Virtualize volume unique identifier (UID) with the host
volume UID and record the scsi_id and LUN ID of that volume.
To replace a failed HBA and retain the working HBA, complete the following steps:
1. In your server, identify the failed HBA and record its WWPNs. (For more information, see
10.10.1, “Cross-referencing WWPNs” on page 667.) Then, place this HBA and its
associated paths offline (gracefully if possible). This approach is important so that the
multipath software stops attempting to recover the HBA. Your server might even show a
degraded performance while you perform this task.
2. Some HBAs have an external label that shows the WWPNs. If you have this type of label,
record the WWPNs before you install the new HBA in the server.
3. If your server does not support HBA hot-swap, power off your system, replace the HBA,
connect the used FC cable to the new HBA, and power on the system.
If your server does support hot-swap, follow the appropriate procedures to perform a “hot”
replace of the HBA. Do not disable or disrupt the working HBA in the process.
4. Verify that the new HBA successfully logged in to the SAN switch. If it logged in
successfully, you can see its WWPNs logged in to the SAN switch port. Otherwise, fix this
issue before you continue to the next step.
Cross-check the WWPNs that you see in the SAN switch with the one that you noted in
step 1, and make sure that you did not record the wrong worldwide node name (WWNN).
5. In your SAN zoning configuration tool, replace the old HBA WWPNs for the new ones in
every alias and zone to which they belong. Do not touch the other SAN fabric (the one with
the working HBA) while you perform this task.
Only one alias should use each WWPN, and zones must reference this alias.
If you are using SAN port zoning (though you should not be) and you did not move the new
HBA FC cable to another SAN switch port, you do not need to reconfigure zoning.
6. Verify that the new HBA’s WWPNs appear in IBM Storage Virtualize by using the
lsfcportcandidate command.
If the WWPNs of the new HBA do not appear, troubleshoot your SAN connections and
zoning.
7. Add the WWPNs of this new HBA in the IBM Storage Virtualize host definition by using the
addhostport command. Do not remove the old one yet. Run the lshost <servername>
command. Then, verify that the working HBA shows as active, and that the failed HBA
shows as inactive or offline.
8. Use software to recognize the new HBA and its associated SAN disk paths. Certify that all
SAN LUNs have redundant disk paths through the working HBA and the new HBA.
9. Return to IBM Storage Virtualize and verify again (by using the lshost <servername>
command) that both the working and the new HBA’s WWPNs are active. In this case, you
can remove the old HBA WWPNs from the host definition by using the rmhostport
command.
10.Do not remove any HBA WWPNs from the host definition until you ensure that you have at
least two active ones that are working correctly.
By following these steps, you avoid removing your only working HBA by mistake.
670 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
To install these control enclosures, determine whether you need to upgrade your
IBM Storage Virtualize first.
Note: If exactly two control enclosures are in a system, you must set up a quorum disk or
application outside of the system. If the two control enclosures lose communication with
each other, the quorum disk prevents both I/O groups from going offline.
After you install the new nodes, you might need to redistribute your servers across the I/O
groups. Consider the following points:
Moving a server’s volume to different I/O groups can be done online because of a feature
called Non-Disruptive Volume Movement (NDVM). Although this process can be done
without stopping the host, careful planning and preparation are advised.
Note: You cannot move a volume that is in a type of remote copy relationship.
If each of your servers is zoned to only one I/O group, modify your SAN zoning
configuration as you move its volumes to another I/O group. As best you can, balance the
distribution of your servers across I/O groups according to I/O workload.
Use the -iogrp parameter with the mkhost command to define which I/O groups of
IBM Storage Virtualize that the new servers use. Otherwise, IBM Storage Virtualize maps
by default the host to all I/O groups, even if they do not exist and regardless of your zoning
configuration.
Example 10-15 shows this scenario and how to resolve it by using the rmhostiogrp and
addhostiogrp commands.
IBM_FlashSystem:FS9500:superuser>lshost Win2012srv1
id 0
name Win2012srv1
port_count 2
type generic
mask 1111111111111111111111111111111111111111111111111111111111111111
iogrp_count 4
status online
site_id
site_name
host_cluster_id
host_cluster_name
protocol scsi
WWPN 10000090FAB386A3
node_logged_in_count 2
state inactive
WWPN 10000090FAB386A2
node_logged_in_count 2
state inactive
IBM_FlashSystem:FS9500:superuser>lsiogrp
id name node_count vdisk_count host_count site_id site_name
0 io_grp0 2 11 2
1 io_grp1 0 0 2
2 io_grp2 0 0 2
3 io_grp3 0 0 2
4 recovery_io_grp 0 0 0
IBM_FlashSystem:FS9500:superuser>lshostiogrp Win2012srv1
id name
0 io_grp0
1 io_grp1
2 io_grp2
3 io_grp3
672 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
IBM_FlashSystem:FS9500:superuser>lshostiogrp Win2012srv1
id name
0 io_grp0
1 io_grp1
2 io_grp2
IBM_FlashSystem:FS9500:superuser>lshostiogrp Win2012srv1
id name
0 io_grp0
1 io_grp1
2 io_grp2
3 io_grp3
IBM_FlashSystem:FS9500:superuser>lsiogrp
id name node_count vdisk_count host_count site_id site_name
0 io_grp0 2 11 2
1 io_grp1 0 0 2
2 io_grp2 0 0 2
3 io_grp3 0 0 2
4 recovery_io_grp 0 0 0
If possible, avoid setting a server to use volumes from different I/O groups that have
different node types for extended periods. Otherwise, as this server’s storage capacity
grows, you might experience a performance difference between volumes from different I/O
groups. This mismatch makes it difficult to identify and resolve eventual performance
problems.
Alternatively, you can add nodes to expand your system. If your SVC cluster is below the
maximum I/O groups limit for your specific product and you intend to upgrade it, you can
install another I/O group.
Note: If I/O groups are present in a system, you must set up a quorum disk or application
outside of the system. If the I/O groups lose communication with each other, the quorum
disk prevents split brain scenarios.
For more information about adding a node to an SVC cluster, see Chapter 3, “Initial
configuration”, of Implementation Guide for IBM Storage FlashSystem and IBM SAN Volume
Controller: Updated for IBM Storage Virtualize Version 8.6, SG24-8542.
Note: Use a consistent method (only the management GUI or only the CLI) when you add,
remove, and re-add nodes. If a node is added by using the CLI and later re-added by using
the GUI, it might get a different node name than it originally had.
After you install the newer nodes, you might need to redistribute your servers across the I/O
groups. Consider the following points:
Moving a server’s volume to different I/O groups can be done online because of a feature
called Non-Disruptive Volume Movement (NDVM). Although this process can be done
without stopping the host, careful planning and preparation are advised.
Note: You cannot move a volume that is in any type of remote copy relationship.
If each of your servers is zoned to only one I/O group, modify your SAN zoning
configuration as you move its volumes to another I/O group. As best you can, balance the
distribution of your servers across I/O groups according to I/O workload.
Use the -iogrp parameter with the mkhost command to define which I/O groups of the
SVC that the new servers will use. Otherwise, SVC by default maps the host to all I/O
groups, even if they do not exist and regardless of your zoning configuration.
Example 10-15 on page 672 shows this scenario and how to resolve it by using the
rmhostiogrp and addhostiogrp commands.
If possible, avoid setting a server to use volumes from different I/O groups that have
different node types for extended periods. Otherwise, as this server’s storage capacity
grows, you might experience a performance difference between volumes from different I/O
groups. This mismatch makes it difficult to identify and resolve eventual performance
problems.
The hot-spare node uses the same NPIV WWPNs for its FC ports as the failed node, so host
operations are not disrupted. After the failed node returns to the system, the hot-spare node
returns to the Spare state, which indicates it can be automatically swapped for other failed
nodes on the system.
The following restrictions apply to the usage of hot-spare nodes on the system:
Hot-spare nodes can be used with FC-attached external storage only.
Hot-spare nodes cannot be used in the following situations:
– In systems that use Remote Direct Memory Access (RDMA)-capable Ethernet ports for
node-to-node communications.
– On enclosure-based FlashSystem storage systems.
– With SAS-attached storage.
– With iSCSI-attached storage.
– With storage that is directly attached to the system.
A maximum of four hot-spare nodes can be added to the system.
674 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
If your system uses stretched or HyperSwap system topology, hot-spare nodes must be
designated per site.
Where <panel_name> is the name of the node that is displayed in the Service Assistant or in
the output of the lsnodecanididate command.
For more information, see IBM Spectrum Virtualize: Hot-Spare Node and NPIV Target Ports,
REDP-5477.
Important: Do not power on a node that is shown as offline in the management GUI if you
powered off the node to add memory to increase total memory. Before you increase mem-
ory, you must remove the node from the system so that it is not showing in the manage-
ment GUI or in the output from the svcinfo lsnode command.
Do not power on a node that is still in the system and showing as offline with more memory
than the node had when it powered off. Such a node can cause an immediate outage or an
outage when you update the system software.
Important: The memory in both node canisters must be configured identically to create the
total enclosure memory size.
Table 10-2 shows the available memory configuration for each IBM FlashSystem and SVC
control enclosure. Each column gives the valid configuration for each total enclosure memory
size. DIMM slots are listed in the same order that they appear in the node canister.
676 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
To ensure proper cooling and a steady flow of air from the fan modules in each node canister,
blank DIMMs must be inserted in any slot that does not contain a memory module.
Table 10-2 Available memory configurations for one node in a control enclosure
DIMM slot Total enclosure Total enclosure Total enclosure
memory 1024 GB memory 2048 GB memory 3072 GB
1 (CPU1) Blank 64 GB 64 GB
2 (CPU1) 64 GB 64 GB 64 GB
4 (CPU1) Blank 64 GB 64 GB
5 (CPU1) 64 GB 64 GB 64 GB
8 (CPU0) 64 GB 64 GB 64 GB
10 (CPU0) Blank 64 GB 64 GB
11 (CPU0) 64 GB 64 GB 64 GB
13 (CPU0) Blank 64 GB 64 GB
14 (CPU0) 64 GB 64 GB 64 GB
16 (CPU0) Blank 64 GB 64 GB
17 (CPU1) 64 GB 64 GB 64 GB
20 (CPU1) 64 GB 64 GB 64 GB
21 (CPU1) Blank 64 GB 64 GB
23 (CPU1) 64 GB 64 GB 64 GB
24 (CPU1) Blank 64 GB 64 GB
The control enclosure can be configured with three I/O adapter features to provide up to
forty-eight 32-Gb FC ports or up to ten 25-GbE (iSCSI or iSCSI Extensions for Remote Direct
Memory Access (RDMA) (iSER) capable) ports. The control enclosure also includes eight
10-GbE ports as standard for iSCSI connectivity and two 1-GbE ports for system
management. A feature code also is available to include the SAS Expansion card if the user
wants to use optional expansion enclosures (only valid for FS9500). The options for the
features that are available are shown in Table 10-3.
1 12 6 10
2 24 12 20
For more information about the feature codes, memory options, and functions of each
adapter, see IBM FlashSystem 9500 Product Guide, REDP-5669.
The control enclosure can be configured with three I/O adapter features to provide up to
twenty-four 32-Gb FC ports or up to twelve 25-GbE (iSCSI or iSER capable) ports. The
control enclosure also includes eight 10-GbE ports as standard for iSCSI connectivity and two
1-GbE ports for system management. A feature code also is available to include the SAS
Expansion card if the user wants to use optional expansion enclosures. The options for the
features available are shown in Table 10-4.
1 12 6 6
2 24 12 12
3 36 18 18
4 48 24 24
678 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
For more information about the feature codes, memory options, and functions of each
adapter, see IBM FlashSystem 7300 Product Guide, REDP-5668.
iSCSI One 1-GbE tech port + One 1-GbE dedicated One 1-GbE dedicated tech port
iSCSI tech port
One 1-GbE iSCSI only
iSCSI N/A Two 10-GbE (iSCSI only) Four 10-GbE (iSCSI only)
Table 10-6 lists the possible adapter installation for IBM FlashSystem 5015, 5035, 5045 and
5200. Only one interface card can be installed per canister, and the interface card must be the
same in both canisters.
iSCSI 4-port 10-GbE iSCSI or 4-port 10-GbE iSCSI or 2-port 25-GbE ROCE
ISER or iSCSI
iSCSI 2-port 25-GbE iSCSI or 2-port 25-GbE iSCSI or 2-port 25-GbE internet
Wide-area RDMA
Protocol (iWARP) ISER
or iSCSI
SAS 4-port 12-Gb SAS host 4-port 12-Gb SAS host 2-port 12-Gb SAS to
attach attach allow SAS expansions
IBM FlashSystem 5015 and 5035/5045 control enclosures include 1-GbE or 10-GbE ports as
standard for iSCSI connectivity. The standard connectivity can be extended by using more
ports or enhanced with more connectivity through an optional I/O adapter feature. For more
information, see Family 2072+06 IBM FlashSystem 5015 and 5035.
680 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
IBM FlashSystem products support SCM drives over NVMe to improve overall storage
performance, or offer a higher performance storage pool. SCM drives can be used for small
workloads that need exceptional levels of performance at the lowest latencies, or they can be
combined with other NVMe drives by using Easy Tier to accelerate much larger workloads.
Like FCM, SCM drives are also available as upgrades for the previous generation of all flash
arrays.
IBM Storage Virtualize 8.6 supports up to 12 SCM drives in a control enclosure for the
IBM FlashSystem 9000, 7000, and 5200 families.
For more information about SCM, see Chapter 3, “Storage back-end” on page 189.
By splitting the clustered system, you no longer have one IBM Storage Virtualize cluster that
handles all I/O operations, hosts, and subsystem storage attachments. The goal is to create a
second IBM Storage Virtualize cluster so that you can equally distribute the workload over the
two systems.
After safely removing enclosures from the existing cluster and creating a second
IBM Storage Virtualize cluster, choose from the following approaches to balance the two
systems:
Attach new storage subsystems and hosts to the new system and start adding only new
workloads on the new system.
Migrate the workload onto the new system by using the approach that is described in
Chapter 3, “Storage back-end” on page 189.
For more information about the IBM FlashWatch offering, see IBM Flashwatch.
IBM Storage Virtualize 8.3 and later brings the possibility for you to set the throttling at
volume, host, host cluster, or storage pool levels, and offload throttling by using the GUI. This
section describes some details of I/O throttling and shows how to configure the feature in your
system.
682 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Offload commands, such as UNMAP and XCOPY, free hosts and speed the copy process by
offloading the operations of certain types of hosts to a storage system. These commands
are used by hosts to format new file systems or copy volumes without the host needing to
read and then write data. Throttles can be used to delay processing for offloads to free
bandwidth for other more critical operations, which can improve performance but limits the
rate at which host features, such as VMware VMotion, can copy data.
Only parent pools support throttles because only parent pools contain MDisks from
internal or external back-end storage. For volumes in child pools, the throttle of the parent
pool is applied.
If more than one throttle applies to an I/O operation, the lowest and most stringent throttle
is used. For example, if a throttle of 100 MBps is defined on a pool and a throttle of
200 MBps is defined on a volume of that pool, the I/O operations are limited to 100 MBps.
With throttling not enabled, a scenario exists where Host1 dominates the bandwidth, and after
enabling the throttle, a much better distribution of the bandwidth among the hosts results, as
shown in Figure 10-21.
Figure 10-21 Distribution of controller resources before and after I/O throttling
684 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Note: Throttling is applicable only for the I/Os that IBM Storage Virtualize receives from
hosts and host clusters. The I/Os that are generated internally, such as mirrored volume
I/Os, cannot be throttled.
Example 10-16 Creating a throttle by using the mkthrottle command in the CLI
Syntax:
Usage examples:
IBM_FlashSystem:FS9500:superuser>mkthrottle -type host -bandwidth 100 -host
ITSO_HOST3
IBM_FlashSystem:FS9500:superuser>mkthrottle -type host cluster -iops 30000
-hostcluster ITSO_HOSTCLUSTER1
IBM_FlashSystem:FS9500:superuser>mkthrottle -type mdiskgrp -iops 40000 -mdiskgrp 0
IBM_FlashSystem:FS9500:superuser>mkthrottle -type offload -bandwidth 50
IBM_FlashSystem:FS9500:superuser>mkthrottle -type vdisk -bandwidth 25 -vdisk
volume1
IBM_FlashSystem:FS9500:superuser>lsthrottle
throttle_id throttle_name object_id object_name throttle_type IOPs_limit
bandwidth_limit_MB
0 throttle0 2 ITSO_HOST3 host 100
1 throttle1 0 ITSO_HOSTCLUSTER1 host cluster
30000
2 throttle2 0 Pool0 mdiskgrp
40000
3 throttle3 offload 50
4 throttle4 10 volume1 vdisk 25
Note: You can change a throttle parameter by using the chthrottle command.
If a throttle already exists, the dialog box that is shown in Figure 10-22 also shows a Remove
button that you can use to delete the throttle.
686 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Storing documentation: Avoid storing IBM Storage Virtualize and SAN environment
documentation only in the SAN. If your organization has a disaster recovery (DR) plan,
include this storage documentation in it. Follow its guidelines about how to update and
store this data. If no DR plan exists and you have the proper security authorization, it might
be helpful to store an updated copy offsite.
In theory, this IBM Storage Virtualize and SAN environment documentation should be written
at a level sufficient for any system administrator who has average skills in the products to
understand. Make a copy that includes all your configuration information.
688 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Use the copy to create a functionally equivalent copy of the environment by using similar
hardware without any configuration, off-the-shelf media, and configuration backup files. You
might need the copy if you ever face a DR scenario, which is also why it is so important to run
periodic DR tests.
Create the first version of this documentation (“as-built documentation”) as you install your
solution. If you completed forms to help plan the installation of your IBM Storage Virtualize
solution, use these forms to help you document how your IBM Storage Virtualize solution was
first configured. Minimum documentation is needed for an IBM Storage Virtualize solution.
Because you might have more business requirements that require other data to be tracked,
the following sections do not address every situation.
Because error messages often point to the device that generated an error, a good naming
convention quickly highlights where to start investigating when an error occurs. Typical IBM
Storage Virtualize and SAN component names limit the number and type of characters that
you can use. For example, IBM Storage Virtualize names are limited to 63 characters, which
make creating a naming convention easier
Many names in an IBM Storage Virtualize and SAN environment can be modified online.
Therefore, you do not need to worry about planning outages to implement your new naming
convention. The naming examples that are used in the following sections are effective in most
cases, but might not be fully adequate for your environment or needs. The naming convention
to use is your choice, but you must implement it in the whole environment.
If multiple external controllers are attached to your IBM Storage Virtualize solution, these
controllers are detected as controllerX, so you might need to change the name so that it
includes, for example, the vendor name, the model, or its serial number. Therefore, if you
receive an error message that points to controllerX, you do not need to log in to
IBM Storage Virtualize to know which storage controller to check.
Note: IBM Storage Virtualize detects external controllers based on their worldwide node
name (WWNN). If you have an external storage controller that has one WWNN for each
WWPN, this configuration might lead to many controllerX names pointing to the same
physical box. In this case, prepare a naming convention to cover this situation.
and what works in your environment. The main “convention” you should follow is to avoid the
usage of special characters in names, apart from the underscore, the hyphen, and the period,
which are permitted, and spaces (which can make scripting difficult).
For example, you can change a name to include the following information:
For internal MDisks, refer to the IBM Storage Virtualize system or cluster name.
A reference to the external storage controller it belongs to (such as its serial number or
last digits).
The extpool, array, or RAID group that it belongs to in the storage controller.
The LUN number or name that it has in the storage controller.
Storage pools have several different possibilities. One possibility is to include the storage
controller, the type of back-end disks if they are external, the RAID type, and sequential digits.
If you have dedicated pools for specific applications or servers, another possibility is to use
them instead.
Volumes
Volume names should include the following information:
The host or cluster to which the volume is mapped.
A single letter that indicates its usage by the host, as shown in the following examples:
– B: For a boot disk, or R for a rootvg disk (if the server boots from SAN)
– D: For a regular data disk
– Q: For a cluster quorum disk (do not confuse with IBM Storage Virtualize quorum
disks)
– L: For a database log disk
– T: For a database table disk
A few sequential digits, for uniqueness.
Sessions standard for VMware datastores:
– esx01-sessions-001: For a data stores composed of a single volume
690 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
For example, ERPNY01-T03 indicates a volume that is mapped to server ERPNY01 and database
table disk 03.
Hosts
In today’s environment, administrators deal with large networks, the internet, and cloud
computing. Use good server naming conventions so that they can quickly identify a server
and determine the following information:
Where it is (to know how to access it).
What kind it is (to determine the vendor and support group in charge).
What it does (to engage the proper application support and notify its owner).
Its importance (to determine the severity if problems occur).
Changing a server’s name in IBM Storage Virtualize is as simple as changing any other
IBM Storage Virtualize object name. However, changing the name on the operating system of
a server might have implications for application configuration and DNS, and such a change
might require a server restart. Therefore, you might want to prepare a detailed plan if you
decide to rename several servers in your network. The following example is for a server
naming convention of LLAATRFFNN, where:
LL is the location, which might designate a city, data center, building floor, or room.
AA is a major application, for example, billing, error recovery procedure (ERP), and Data
Warehouse.
T is the type, for example, UNIX, Windows, and VMware.
R is the role, for example, Production, Test, Q&A, and Development.
FF is the function, for example, DB server, application server, web server, and file server.
NN is numeric.
If your SAN does not support aliases (for example, in heterogeneous fabrics with switches in
some interoperation modes), use WWPNs in your zones. However, update every zone that
uses a WWPN if you change it.
Your SAN zone name should reflect the devices in the SAN that it includes (normally in a
one-to-one relationship), as shown in the following examples:
SERVERALIAS_T1_IBM FlashSystem 9500CLUSTERNAME (from a server to the IBM Storage
Virtualize 9500, where you use T1 as an ID to zones that uses, for example, node ports P1
on Fabric A, and P2 on Fabric B)
SERVERALIAS_T2_IBM FlashSystem 9500CLUSTERNAME (from a server to the IBM Storage
Virtualize 9500, where you use T2 as an ID to zones that uses, for example, node ports P3
on Fabric A, and P4 on Fabric B)
IBM_DS8870_75XY131_IBM FlashSystem 9500CLUSTERNAME (zone between an external
back-end storage and the IBM Storage Virtualize 9500)
NYC_IBM FlashSystem 9500_POK_IBM FlashSystem 9500_REPLICATION (for remote copy
services)
After some time (typically a few hours), you receive an email with instructions about how to
download the report. The report includes a Visio diagram of your SAN and an organized
Microsoft Excel spreadsheet that contains all your SAN information. For more information and
to download the tool, see Brocade SAN Health.
The first time that you use the SAN Health Diagnostics Capture tool, explore the options that
are provided to learn how to create a well-organized and useful diagram.
692 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
Figure 10-28 shows a tab of the SAN Health Options window in which you can choose the
format of SAN diagram that best suits your needs. Depending on the topology and size of
your SAN fabrics, you might want to manipulate the options in the Diagram Format or
Report Format tabs.
SAN Health supports switches from manufacturers other than Brocade, such as Cisco. Both
the data collection tool download and the processing of files are available at no cost. You can
download Microsoft Visio and Excel viewers at no cost from the Microsoft website.
Another tool, which is known as SAN Health Professional, is also available for download at no
cost. With this tool, you can audit the reports in detail by using advanced search functions and
inventory tracking. You can configure the SAN Health Diagnostics Capture tool as a Windows
scheduled task. To download of the SAN Health Diagnostics Capture tool, see this Broadcom
web page.
Tip: Regardless of the method that is used, generate a fresh report at least once a month
or after any major changes. Keep previous versions so that you can track the evolution of
your SAN.
For more information about how to configure and set up IBM Spectrum Control, see
Chapter 9, “Implementing a storage monitoring system” on page 551.
Ensure that the reports that you generate include all the information that you need. Schedule
the reports with a period that you can use to backtrack any changes that you make.
Before you back up your configuration data, the following prerequisites must be met:
Independent operations that change the configuration for the system cannot be running
while the backup command is running.
Object names cannot begin with an underscore character (_).
Note: The system automatically creates a backup of the configuration data each day at
1 AM. This backup is known as a cron backup, and on the configuration node it is copied to
/dumps/svc.config.cron.xml_<serial#>.
694 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
The svcconfig backup command creates three files that provide information about the
backup process and the configuration. These files are created in the /tmp directory and
copied to the /dumps directory of the configuration node. You can use the lsdumps
command to list them. Table 10-7 describes the three files that are created by the backup
process.
svc.config.backup.sh_<serial#> Contains the names of the commands that were issued to create the
backup of the system.
svc.config.backup.log_<serial#> Contains details about the backup, including any reported errors or
warnings.
2. Check that the svcconfig backup command completes successfully, and examine the
command output for any warnings or errors. The following output is an example of the
message that is displayed when the backup process is successful:
CMMVC6155I SVCCONFIG processing completed successfully
3. If the process fails, resolve the errors and run the command again.
4. Keep backup copies of the files outside the system to protect them against a system
hardware failure. With Microsoft Windows, use the PuTTY pscp utility. With UNIX or Linux,
you can use the standard scp utility. By using the -unsafe option, you can use a wildcard
to download all the svc.config.backup files with a single command. Example 10-18
shows the output of the pscp command.
The configuration backup file is in Extensible Markup Language (XML) format and can be
inserted as an object into your IBM Storage Virtualize documentation spreadsheet. The
configuration backup file might be large. For example, it contains information about each
internal storage drive that is installed in the system.
Note: Directly importing the file into your IBM Storage Virtualize documentation
spreadsheet might make the file unreadable.
Also, consider collecting the output of specific commands. At a minimum, you should collect
the output of the following commands:
svcinfo lsfabric
svcinfo lssystem
svcinfo lsmdisk
svcinfo lsmdiskgrp
svcinfo lsvdisk
svcinfo lshost
svcinfo lshostvdiskmap
Note: Most CLI commands that are shown here work without the svcinfo prefix; however,
some commands might not work with only the short name, and therefore require the
svcinfo prefix to be added.
Import the commands into the master spreadsheet, preferably with the output from each
command on a separate sheet.
One way to automate either task is to first create a batch file (Windows), shell script (UNIX or
Linux), or playbook (Ansible) that collects and stores this information. Then, use spreadsheet
macros to import the collected data into your IBM Storage Virtualize documentation
spreadsheet.
When you are gathering IBM Storage Virtualize information, consider the following best
practices:
If you are collecting the output of specific commands, use the -delim option of these
commands to make their output delimited by a character other than tab, such as comma,
colon, or exclamation mark. You can import the temporary files into your spreadsheet in
comma-separated value (CSV) format, specifying the same delimiter.
Note: Use a delimiter that is not already part of the output of the command. Commas
can be used if the output is a particular type of list. Colons might be used for special
fields, such as IPv6 addresses, WWPNs, or ISCSI names.
If you are collecting the output of specific commands, save the output to temporary files.
To make your spreadsheet macros simpler, you might want to preprocess the temporary
files and remove any “garbage” or unwanted lines or columns. With UNIX or Linux, you
can use commands such as grep, sed, and awk. Freeware software is available for
Windows with the same commands, or you can use any batch text editor tool.
The objective is to automate fully this procedure so you can schedule it to regularly run
automatically. Make the resulting spreadsheet easy to consult and have it contain only the
information that you use frequently. The automated collection and storage of configuration
and support data (which is typically more extensive and difficult to use) are described in
10.14.7, “Automated support data collection” on page 698.
Any portion of your external storage controllers that is used outside the IBM Storage
Virtualize solution might have its configuration changed frequently. In this case, for more
information about how to gather and store the information that you need, see your back-end
storage controller documentation.
Fully allocate all the available space in any of the optional external storage controllers that
you might use as more back-ends to the IBM Storage Virtualize solution. This way, you can
696 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
perform all your disk storage management tasks by using the IBM Storage Virtualize user
interface.
By keeping this data on a spreadsheet, storage administrators have all the information that
they need to complete a web support request form or to provide to a vendor’s call support
representative. Typically, you are asked first for a brief description of the problem and then
asked later for a detailed description and support data collection.
Disk storage allocation and deallocation and SAN zoning configuration modifications
should be handled under properly submitted and approved change requests.
If you are handling a problem yourself or calling your vendor’s technical support desk, you
might need to produce a list of the changes that you recently implemented in your SAN or
that occurred since the documentation reports were last produced or updated.
When you use incident and change management tracking tools, adhere to the following
guidelines for IBM Storage Virtualize and SAN Storage Administration:
Whenever possible, configure your storage and SAN equipment to send SNMP traps to
the incident monitoring tool so that an incident ticket is automatically opened and the
proper alert notifications are sent. If you do not use a monitoring tool in your environment,
you might want to configure email alerts that are automatically sent to the mobile phones
or pagers of the storage administrators on duty or on call.
Discuss within your organization the risk classification that a storage allocation or
deallocation change request should have. These activities are typically safe and
nondisruptive to other services and applications when properly handled.
However, activities might cause collateral damage if human error or an unexpected failure
occurs during implementation. Your organization might decide to assume more costs with
overtime and limit such activities to off-business hours, weekends, or maintenance
windows if they assess that the risks to other critical applications are too high.
Use templates for your most common change requests, such as storage allocation or SAN
zoning modification to facilitate and speed up their submission.
Do not open change requests in advance to replace failed, redundant, or hot-pluggable
parts, such as disk drive modules (DDMs) in storage controllers with hot spares, or SFPs
in SAN switches or servers with path redundancy.
Typically, these fixes do not change anything in your SAN storage topology or
configuration, and they do not cause any more service disruption or degradation than you
already had when the part failed. Handle these fixes within the associated incident ticket
because it might take longer to replace the part if you must submit, schedule, and approve
a non-emergency change request.
An exception is if you must interrupt more servers or applications to replace the part. In
this case, you must schedule the activity and coordinate support groups. Use good
judgment and avoid unnecessary exposure and delays.
Keep handy the procedures to generate reports of the latest incidents and implemented
changes in your SAN storage environment. Typically, you do not need to periodically
generate these reports because your organization probably already has a Problem and
Change Management group that runs such reports for trend analysis purposes.
For IBM Storage Virtualize, this information includes snap data. For other equipment, for more
information about how to gather and store the support data that you might need, see the
related documentation.
You can create procedures that automatically create and store this data on scheduled dates,
delete old data, or transfer the data to tape.
698 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm Ch10 FS- MAINTENANCE.fm
You can use IBM Storage Insights to create support tickets and then attach the snap data to
this record from within the IBM Storage Insights GUI. For more information, see Chapter 11,
“Storage Virtualize Troubleshooting and diagnostics” on page 701.
To subscribe to this support and receive support alerts and notifications for your products,
subscribe for notifications at Sign up for Notifications.
You can subscribe to receive information from each vendor of storage and SAN equipment
from the IBM website. You can often quickly determine whether an alert or notification is
applicable to your SAN storage. Therefore, open them when you receive them and keep them
in a folder of your mailbox.
Sign up and tailor the requests and alerts that you want to receive. For example, type IBM
FlashSystem in the Product lookup text box and then click Subscribe to subscribe to IBM
FlashSystem 9x00 notifications, as shown in Figure 10-29.
700 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
11
11.1 Troubleshooting
Troubleshooting should follow a systematic approach to solve a problem. The goal of
troubleshooting or problem determination is to understand, why something does not work as
expected and create a resolution to resolve this. An important step therefore is to make a
proper problem description, which should be as accurately as possible. Then you need to
collect the support data from all involved components of the environment for analysis. This
might include a snap from the IBM Storage Virtualize system, logs from SAN or network
switches and host OS logs.
The following questions help define the problem for effective troubleshooting:
What are the symptoms of the problem?
– What is reporting the problem?
– Which error codes and messages were observed?
– What is the business impact of the problem?
– Where does the problem occur?
– Which exact component is affected, the whole system or for instance certain hosts,
IBM Storage Virtualize nodes
– Is the environment and configuration supported?
When does the problem occur?
– How often does the problem happen?
– Does the problem happen only at a certain time of day or night?
– What kind of activities was ongoing at the time the problem was reported?
– Did the problem happen after a change in the environment, such as a code upgrade or
installing software or hardware?
Under which conditions does the problem occur?
– Does the problem always occur when the same task is being performed?
– Does a certain sequence of events need to occur for the problem to surface?
– Do any other applications fail at the same time?
Can the problem be reproduced?
– Can the problem be recreated, for example by running a single command, a set of
commands, or a particular application?
– Are multiple users or applications encountering the same type of problem?
– Can the problem be reproduced on any other system?
Note: Collecting log files as close as possible to the time of the incident, and providing an
accurate problem description and timeline are essential for effective troubleshooting!
702 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
As shown in Figure 11-1, the first icon shows IBM Storage Virtualize events, such as an error
or a warning, and the second icon shows suggested, running or recently completed
background tasks.
The GUI dashboard provides an at-a-glance view of the system’s condition and notifies you of
any circumstances that require immediate action. It contains sections for performance,
capacity, and system health that provide an overall understanding of what is going on in the
system.
Figure 11-2 GUI Dashboard displaying system health events and hardware components
The System Health section in the bottom part of the dashboard provides information about
the health status of hardware, logical, and connectivity components. If you click Expand in
each of these categories, the status of the individual components is shown (see Figure 11-3).
Clicking on More Details takes you to the GUI panel related to that specific component, or
shows more information about it.
704 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
For more information about the components in each category and for troubleshooting, see
Troubleshooting.
Click on Run Fix to launch the Fix Procedure for this particular event. Fix Procedures help
resolving a problem. In the background, a Fix Procedure analyzes the status of the system
and its components and provides further information about the nature of the problem. This is
to ensure, that the actions taken do not lead to undesirable results, as for instance volumes
becoming inaccessible to the hosts. The Fix Procedure then automatically performs the
actions required to return the system to its optimal state. This may include checking for
dependencies, resetting internal error counters and apply updated to the system
configuration. Whenever user interaction is required, you will be shown suggested actions to
take and guided through the same. If the problem can be fixed, the related error in the event
log eventually will be marked as fixed. Also, an associated alert in the GUI will be cleared.
Error codes along with their detailed properties in the event log provide reference information
when a service action is required. The four-digit Error Code is visible in the event log. They
are accompanied by a six-digit Event ID which provides additional details about this event.
Three-digit Node Error Codes are visible in the node status in the Service Assistant GUI. For
more information about messages and codes, see Messages and Codes.
An IBM Storage Virtualize system might encounter various kinds of failure recovery in certain
conditions. These are known as Tier 1 (T1) through Tier 4 (T4) recovery.
A T1 or Tier 1 Recovery - Node warmstart (node assert) will be logged with error code
2030 in the event log.
A single node assert is a recovery condition that is deployed by the IBM Storage Virtualize
software, when a single node attempts to run an invalid code path or detects a transient
hardware problem.
A T1 recovery alias single-node warmstart, is performed without suspending I/O. This task
can be accomplished because the cluster is configured into redundant pairs of nodes or
node canisters, and the clustering software ensures the deployment of a “replicated
hardened state” across nodes. A single node can encounter an assert condition, perform
a software restart recovery action (capturing first-time debug data), and return to the
clustered system without the suspension of I/O.
On warm restart, the assert condition is cleared and the node rejoins the cluster
automatically. Typically, a single node assert restart takes 1 - 5 minutes. Host data I/O
continues as the host OS multipath software redirects the I/O to the partner node of the
same I/O group.
The event with error id 2030 will be logged upon return of the cluster node to the system.
Right-click the 2030 event and mark it as Fixed to prevent repeated alerts and
notifications for the same event.
A T2 or Tier 2 recovery is reported in the event log with error code 1001.
The cluster has asserted. In this case, all cluster nodes of the system have undergone a
warmstart at the same time to recover from a condition which could not have been
resolved otherwise. Error code 1001 means that system recovery was successful, and
then the cluster resumes I/O, no data is lost. There was a temporary loss of access until
the recovery completed, so host applications probably must be restarted. It is advisable to
conduct a sanity check of the hosts’ file systems afterwards. FlashCopy mappings and
remote copy (Metro Mirror (MM) and Global Mirror (GM)) relationships are restored, along
with the other essential cluster state information.
After a T2 recovery all configuration commands are blocked until you re-enable them, so
that the unfixed event log entry with error code 1001 is being marked as fixed. It is
recommended that they are not re-enabled until the recovery dumps and trace files from
all nodes have been collected and were reviewed by IBM Support to confirm that it is safe
to do so.
706 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
The Service GUI is the preferred method for collecting logs of each node. Open a browser
session to the Service GUI at https://<cluster_ip>/service. Select the Collect Logs
pane from the left navigation bar, and then select the option to create a support package
with the latest statesave.
A Tier 3 or T3 Recovery is required when there is no more active cluster node and all
nodes of the clustered system report node error 550 and/or 578. The Recover System
Procedure recovers the system if the system state is lost from all cluster nodes.
The T3 Recovery procedure re-creates the system configuration to the state from before
the incident which had lead to this situation. Depending on the type of the IBM Storage
Virtualize system and the configuration, this is achieved by ingesting the configuration and
hardened system data. This data is stored on either a quorum mdisk, quorum drive or an
IP quorum set up to store metadata. In combination with the information stored in the
configuration backup svc.config.backup.xml the system’s configuration and state will be
restored.
Note: Attempt to run the Recover System Procedure only after a complete and thorough
investigation of the cause of the system failure. Attempt to resolve those issues by
using other service procedures.
Selecting Monitoring → Events shows information messages, warnings, and issues about
the IBM Storage Virtualize system. Therefore, this area is a good place to check for problems
in the system.
To display the most important events that must be fixed, use the Recommended Actions
filter.
If an important issue must be fixed, look for the Run Fix button in the upper left with an error
message that indicates which event must be fixed as soon as possible. This fix procedure
helps resolve problems. It analyzes the system, provides more information about the problem,
suggests actions to take with the steps to follow, and finally checks to see whether the
problem is resolved.
Always use the fix procedures to resolve errors that are reported by the system, such as
system configuration problems or hardware failures.
Note: IBM Storage Virtualize systems detect and report error messages; however, events
may have been triggered by factors external to the system, for example back-end storage
devices or the storage area network (SAN).
It is safe to mark events as fixed; if error condition still exists or error reoccurs, a new event
will be logged. You can select multiple events in the table by pressing and holding the
CTRL-key and clicking the events to be fixed with mouse.
Figure 11-5 on page 708 shows Monitoring → Events window with Recommended Run Fix.
To obtain more information about any event, double-click or select an event in the table, and
select Actions → Properties. You can also select Run Fix Procedure and properties by
right-clicking an event.
The properties and details are displayed in a pane, as shown in Figure 11-6. Sense Data is
available in an embedded tab. You can review and click Run Fix to run the fix procedure.
708 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
Important: Execute these commands when any type of change that is related to the
communication between IBM Storage Virtualize systems and back-end storage subsystem
occurs such as back-end storage is configured or a SAN zoning change occurred). This
process ensures that IBM Storage Virtualize recognizes the changes.
Common error recovery involves the following IBM Storage Virtualize CLI commands:
detectmdisk
Discovers changes in the SAN and back-end storage.
lscontroller and lsmdisk
Provides the status of all controllers and MDisks. Pay attention to status values other than
online, for instance offline or degraded.
lscontroller <controller_id_or_name>
Checks the controller that was causing the issue and verifies that all the worldwide port
names (WWPNs) are listed as you expect. Also check if the path_counts are distributed
evenly across the WWPNs.
lsmdisk
Determines whether all MDisks are online.
Note: When an issue is resolved by using the CLI, verify that the error disappears by
selecting Monitoring → Events. If not, make sure to mark the error as fixed.
Four different types of Snap can be collected, Snap Type 1 through Snap Type 4, colloquially
often referred to as Snap/1, Snap/2, Snap/3 or Snap/4. The Snap types vary in the amount of
diagnostic information that is contained in the package:
Snap/1: Standard logs including performance stats
– fastest, smallest, no node dumps
Snap/2: Same as Snap/1 plus one existing statesave, the most recently created dump or
livedump from the current config node
– slightly slower than snap/1, big
Snap/3: Same as Snap/1 plus the most recent dump or livedump from each active
member node in the clustered system
Snap option4: Same as Snap/1 plus fresh livedump from each active member node in the
clustered, which are created upon triggering the data collection
710 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
dump file. It can be used by IBM Remote Technical Support and development teams to
understand, why the software restarted.
Tip: For urgent cases, start with collecting and uploading a Snap/1 followed by a Snap/4.
This enables IBM Remote Support to quicker commence the analysis, while the more
detailed Snap/4 is being collected and uploaded.
For more information about the required support package that is most suitable to diagnose
different type of issues and their content, see What data should you collect for a problem on
IBM Storage Virtualize systems?
Note: After an issue is solved, it is a best practice to do some housekeeping and delete old
dumps on each node by running the following command:
cleardumps -prefix /dumps node_id | node_name
By default, use IBM Storage Virtualize to automatically upload the support packages from
IBM Storage Virtualize by using the GUI or CLI. The support packages are collected and
uploaded to the IBM Support center automatically by using IBM Storage Virtualize or
downloading the package from the device and manually uploading to IBM.
You can monitor the progress of the individual sub-tasks by clicking on View more details as
shown in Figure 11-8.
712 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
To collect a Snap type 4 using the CLI, a livedump of each active node must be generated by
using the svc_livedump command. Then, the log files and newly generated dumps are
uploaded by using the svc_snap gui3 command, as shown in Example 11-1. To verify
whether the support package was successfully uploaded, use the sainfo lscmdstatus
command (TSXXXXXXX is the case number).
Note: The use of Service Assistant commands as sainfo or satask requires superuser
privileges.
IBM_FlashSystem:FS9110:superuser>sainfo lscmdstatus
last_command satask supportupload -pmr TSxxxxxxxxx -filename
/dumps/snap.serial.YYMMDD.HHMMSS.tgz
last_command_status CMMVC8044E Command completed successfully.
T3_status
T3_status_data
cpfiles_status Complete
cpfiles_status_data Copied 160 of 160
snap_status Complete
snap_filename /dumps/snap.serial.YYMMDD.HHMMSS.tgz
installcanistersoftware_status
supportupload_status Active
supportupload_status_data Uploaded 267.5 MiB of 550.2 MiB
supportupload_progress_percent 48
supportupload_throughput_KBps 639
supportupload_filename /dumps/snap.serial.YYMMDD.HHMMSS.tgz
If you do not want to automatically upload the snap to IBM, omit the upload pmr=TSxxxxxxxxx
command option. When the snap creation completes, all collected files are packaged into a
gzip-compressed tarball that uses the following format:
/dumps/snap.<panel_id>.YYMMDD.hhmmss.tgz
The creation of the Snap archive takes a few minutes to complete. Depending on the size of
the system and the configuration, it can take considerably longer, particularly if fresh
livedumps are being created.
The generated file can be retrieved from the GUI by selecting Settings → Support →
Manual Upload Instructions → Download Support Package, and then clicking Download
Existing Package. Find the exact name of the snap that was generated by running the
svc_snap command that was run earlier. Select that file, and click Download.
714 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
livedump status of a node can be checked witch command lslivedump. Once the status has
changed from dumping to inactive, the livedump file is ready to be copied off the system
using either the GUI or scp command.
Example 11-3 shows the output for the command multipath -ll, including the following
information:
Name of the mpath device (mpatha / mpathb).
UUID of the mpath device.
Discovered paths for each mpath device, including the name of the sd-device, the priority,
and state information.
You can also use the multipathd interactive console for troubleshooting. The multipath -k
command opens an interactive interface to the multipathd daemon.
Entering this command opens an interactive multipath console. After running this command, it
is possible to enter help to get a list of available commands, which can be used within the
interactive console. To exit the console, press Ctrl-d.
To display the current configuration, including the defaults, issue show config within the
interactive console.
lspath Lists all paths for all hdisks with their status and parent fscsi device
information.
lspath -H -l hdisk1 List all paths for the specified hdisk with its status and
corresponding fscsi device information. The output includes a
column header.
lspath -l hdisk1 -HF "name Lists more detailed information about the specified hdisk the parent
path_id parent connection fscsi device and its path status.
path_status status"
lspath -AHE -l hdisk0 -p Display attributes for a path and connection (-w) (-A is like lsattr
vscsi0 -w "810000000000" for devices. If only one path exists to the parent device, the
connection can be omitted by running:
lspath -AHE -l hdisk0 -p vscsi0)
lsmpio Shows all disks and corresponding paths with state, parent, and
connection information.
lsmpio -q Shows all disks with vendor ID, product ID, size, and volume name.
716 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
Command Result
lsmpio -ar Lists the parent adapter and remote port information (-a: adapter
(local), and -r: remote port).
lsmpio -are Lists the parent adapter and remote port error statistics (-e: error).
Besides managing the multipathing configuration by using the Windows GUI, it is possible to
use the CLI by using the tool mpclaim.exe, which is installed by default.
mpclaim.exe -e View the storage devices that are discovered by the system.
mpclaim.exe -s -d Checks the policy that your volumes are currently using.
Generic MPIO settings can be listed and modified by using Windows PowerShell cmdlets.
Table 11-4 shows the PowerShell cmdlets, which may be used to list or modify generic
Windows MPIO settings.
Get-MSDSMSupportedHW The cmdlet lists hardware IDs in the Microsoft Device Specific
Module (MSDSM) supported hardware list.
Get-MPIOSetting The cmdlet gets Microsoft MPIO settings. The settings are as
follows:
PathVerificationState
PathVerificationPeriod
PDORemovePeriod
RetryCount
RetryInterval
UseCustomPathRecoveryTime
CustomPathRecoveryTime
DiskTimeoutValue
Set-MPIOSetting The cmdlet changes Microsoft MPIO settings. The settings are as
follows:
PathVerificationState
PathVerificationPeriod
PDORemovePeriod
RetryCount
RetryInterval
UseCustomPathRecoveryTime
CustomPathRecoveryTime
DiskTimeoutValue
Command-line interface
To obtain logical unit number (LUN) multipathing information from the ESXi host CLI,
complete the following steps:
1. Log in to the ESXi host console.
2. To get detailed information about the paths, run esxcli storage core path list.
Example 11-4 shows an example for the output of the esxcli storage core path list
command.
718 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
Device: naa.600507680185801aa000000000000a68
Device Display Name: IBM Fibre Channel Disk
(naa.600507680185801aa000000000000a68)
Adapter: vmhba2
Channel: 0
Target: 1
LUN: 54
Plugin: NMP
State: active
Transport: fc
Adapter Identifier: fc.5001438028d02923:5001438028d02922
Target Identifier: fc.500507680100000a:500507680120000a
Adapter Transport Details: WWNN: 50:01:43:80:28:d0:29:23 WWPN:
50:01:43:80:28:d0:29:22
Target Transport Details: WWNN: 50:05:07:68:01:00:00:0a WWPN:
50:05:07:68:01:20:00:0a
Maximum I/O Size: 33553920
3. To list detailed information for all the corresponding paths for a specific device, run esxcli
storage core path list -d <naaID>.
Example 11-5 shows the output for the specified device with the ID
naa.600507680185801aa000000000000972, which is attached with eight paths to the ESXi
server. The output was omitted for brevity.
fc.5001438028d02923:5001438028d02922-fc.500507680100037e:500507680130037e-naa.600507680185801aa0
00000000000972
Runtime Name: vmhba2:C0:T2:L9
Device: naa.600507680185801aa000000000000972
Device Display Name: IBM Fibre Channel Disk (naa.600507680185801aa000000000000972)
Adapter: vmhba2
Channel: 0
Target: 2
LUN: 9
Plugin: NMP
State: active
Transport: fc
Adapter Identifier: fc.5001438028d02923:5001438028d02922
Target Identifier: fc.500507680100037e:500507680130037e
Adapter Transport Details: WWNN: 50:01:43:80:28:d0:29:23 WWPN: 50:01:43:80:28:d0:29:22
Target Transport Details: WWNN: 50:05:07:68:01:00:03:7e WWPN: 50:05:07:68:01:30:03:7e
Maximum I/O Size:
33553920fc.5001438028d02921:5001438028d02920-fc.500507680100037e:500507680110037e-naa.6005076801
85801aa000000000000972
UID:
4. The command esxcli storage nmp device list lists the LUN multipathing information for
all attached disks.
Example 11-6 shows the output for one of the attached disks. All other output was omitted
for brevity.
720 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
4. Click Properties.
5. In the Properties dialog, select the extent, if necessary.
6. Select Extent Device → Manage Paths and obtain the paths from the Manage Path
dialog.
For deeper analysis in cases where drives or FCM are involved, drivedumps are often useful.
Their data can help you understand problems with the drive, and they do not contain any data
that applications write to the drive. In some situations, drivedumps are automatically triggered
by the system. To collect support data from a disk drive, run the triggerdrivedump drive_id
command. The output is stored in a file in the /dumps/drive directory. This directory is on one
of the nodes that are connected to the drive.
Any snap that is taken after the trigger command contains the stored drivedumps. It is
sufficient to provide Snap Type 1: Standard logs for drivedumps.
For an issue in a SAN environment when it is not clear where the problem is occurring, you
might need to collect data from several devices in the SAN.
The following basic information must be collected for each type of device:
Hosts:
– Operating system: Version and level
– Host Bus Adapter (HBA): Driver and firmware level
– Multipathing driver level
SAN switches:
– Hardware model
– Software version
Storage subsystems:
– Hardware model
– Software version
IBM Storage Virtualize storage systems feature useful error logging and notification
mechanisms. The system tracks its internal events and informs the user about issues in the
SAN or storage subsystem. It also helps to isolate problems with the attached host systems.
Therefore, by using these functions, administrators can easily locate any issue areas and take
appropriate action to resolve any issue.
In many cases, IBM Storage Virtualize system and its service and maintenance features
guide administrators directly, provide help, and suggest remedial actions. Furthermore,
IBM Storage Virtualize determines whether or not a problem does still persist.
Another feature that helps administrators to isolate and identify issues that might be related to
IBM Storage Virtualize systems is the ability of their nodes to maintain a database of other
devices that communicate with the IBM Storage Virtualize system’s devices. Devices, such as
hosts and optional back-end storages, are added or removed from the database as they start
or stop communicating to IBM Storage Virtualize systems.
Although an IBM Storage Virtualize system’s node hardware and software events can be
verified in the GUI or CLI, external events, such as failures in the SAN zoning configuration,
hosts, and back-end storages, are common. You must troubleshoot these failures outside of
the IBM Storage Virtualize systems.
For example, a misconfiguration in the SAN zoning might lead to the IBM Storage Virtualize
cluster not working correctly. This problem occurs because the IBM Storage Virtualize cluster
nodes communicate with each other by using the FC SAN fabrics.
In this case, check the following areas from an IBM Storage Virtualize system’s perspective:
The attached hosts. For more information, see 11.3.2, “Host problems” on page 723.
The SAN. For more information, see 11.3.3, “Fibre Channel SAN and IP SAN problems”
on page 728.
The attached storage subsystem. For more information, see 11.3.5, “Storage subsystem
problems” on page 731.
The local FC port masking and portsets. For more information, see 8.8, “Portsets” on
page 530.
722 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
11.3.1 Interoperability
When you experience events in an IBM Storage Virtualize environment, as an initial step,
ensure that all components that comprise the storage infrastructure are interoperable, which
applies to hosts, host OS, Host Bus Adapter (HBA), driver, firmware, SAN devices, and
back-end devices. In an IBM Storage Virtualize environment, the product support matrix is the
main source for this information. For the latest IBM Storage Virtualize systems support matrix,
see IBM System Storage Interoperation Center (SSIC).
It is crucial, to maintain up to date HBA firmware and device driver levels. This equally applies
to multipath software and host OS patch levels. Failing to do so may lead to connectivity or
interoperability issues, for instance host logins fail to reestablish after SAN maintenance
activities. In worst case, this may lead to access loss.
After interoperability is verified, check the configuration of the host on the IBM Storage
Virtualize system’s side. The Hosts window in the GUI or the following CLI commands can be
used to start a verification of potential host-related issues:
lshost
Note: Depending on the connection type of the host (FC, FC direct attach, iSCSI, or
NVMe, SAS) the output slightly differs in detail from each other.
lshost <host_id_or_name>
This command shows more information about a specific host. It often is used when you
must identify which host port is not online in an IBM Storage Virtualize system node.
Example 11-9 shows the lshost <host_id_or_name> command output.
port_count 2
type generic
mask 1111111111111111111111111111111111111111111111111111111111111111
iogrp_count 4
status degraded
site_id
site_name
host_cluster_id
host_cluster_name
WWPN 100000051E0F81CD
node_logged_in_count 2
state active
WWPN 100000051E0F81CC
node_logged_in_count 0
state offline
lshostvdiskmap
This command checks that all volumes are mapped to the correct hosts. If a volume is not
mapped correctly, create the necessary host mapping.
lsfabric -host <host_id_or_name>
Use this command with parameter -host <host_id_or_name> to display FC connectivity
between nodes and hosts. Example 11-10 shows the lsfabric -host <host_id_or_name>
command output.
Based on this list, the host administrator must check and correct any issues found.
Hosts with a higher queue depth can potentially overload shared storage ports. Therefore, it
is a best practice that you verify that the total of the queue depth of all hosts that are sharing
a single target FC port is limited to 2048. If any of the hosts have a queue depth of more than
128, that depth must be reviewed because queue-full conditions can lead to I/O errors and
extended error recoveries.
724 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
For more information about managing hosts on IBM Storage Virtualize systems, see 8.10,
“I/O queues” on page 534.
Apart from hardware-related situations, problems can exist in such areas as the operating
system or the software that is used on the host. These problems normally are handled by the
host administrator or the service provider of the host system. However, the multipathing driver
that is installed on the host and its features can help to determine possible issues.
For example, a volume path issue is reported, which means that a specific HBA on the server
side cannot reach all the nodes in the I/O group to which the volumes are associated.
Note: Subsystem Device Driver Device Specific Module (SDDDSM) and Subsystem
Device Driver Path Control Module (SDDPCM) reached end of service (EOS). Therefore,
migrate SDDDSM to MSDSM on Windows platform and SDDPCM to AIX Path Control
Module (AIXPCM) on AIX and Virtual I/O Server (VIOS) platforms.
For more information, see IBM Spectrum Virtualize Multipathing Support for AIX and
Windows Hosts.
Faulty paths can be caused by hardware and software problems, such as the following
examples:
Hardware:
– A faulty small form-factor pluggable transceiver (SFP) on the host or SAN switch.
– Faulty fiber optic cables, for example damaged cables by exceeding the minimum
permissible bend radius.
– A faulty HBA.
– Faulty SAN switch.
– Contaminated SFP or cable connectors.
– Patch panels
Software or configuration:
– Incorrect zoning, portset, or portmask.
– Incorrect host-to-VDisk mapping.
– Outdated HBA firmware or driver.
– A back-level multipathing configuration or driver.
Based on field experience, it is a best practice that you complete the following hardware
checks first:
Whether connection error indicators are lit on the host, SAN switch or the IBM Storage
Virtualize system.
Whether all the parts are seated correctly. For example, cables are securely plugged in to
the SFPs and the SFPs are plugged all the way into the switch port sockets.
Ensure that fiber optic cables are not damaged. If possible, swap a suspicious cable with a
known good cables.
Note: When replacing or relocating fibre channel cables, always clean their pluggable
connectors using proper cleaning tools. This even applies to brand new cables taken from
sealed bags.
After the hardware check, continue to check the following aspects of the software setup:
Whether the HBA driver level and firmware level are at the preferred and supported levels.
Verify your SAN zoning configuration.
The general SAN switch status and health for all switches in the fabric.
The multipathing driver, and make sure that it is at the preferred configuration and
supported level.
Link layer errors reported by the host or the SAN switch may indicate so far undiscovered
cable or SFP issues.
Link issues
If the Ethernet port link does not come online, check whether the SFP or cables and the port
support auto-negotiation with the switch. This issue is especially true for SFPs, which support
25 Gb and higher port speeds because a mismatch might exist in Forward Error Correction
(FEC) that might prevent a port to auto-negotiate.
Another potential source of problems are 4X-splitter-cables and Direct Attach Copper (DAC)
cables.
Longer cables are not only exposed to more noise or interference (high Bit Error Ratio
(BER)); therefore, they require more powerful error correction codes.
Two IEEE 802.3 FEC specifications are important. For an auto-negotiation issue, verify
whether a compatibility issue exists with SFPs at both end points:
Clause 74: Fire Code (FC-FEC) or BASE-R (BR-FEC) (16.4 dB loss specification)
Clause 91: Reed-Solomon (RS-FEC) (22.4 dB loss specification)
Use the lshostiplogin command to list the login session type, such as associated host
object, login counts login protocol, and other details, for hosts that are identified by their iSCSI
Qualified Name (IQN). The output is provided for ports, which logged in to Ethernet ports that
are configured with IP addresses. The output shows, among other things, the protocol that is
used.
The output in the protocol field indicates the connection protocol that is used by the
configured IP host IQN to establish a login session that is referred by the login field. This
value can be one of the following values:
iSCSI
iSER
726 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
A DCBX-enabled switch and a storage adapter exchange parameters that describe traffic
classes and PFC capabilities.
In IBM Storage Virtualize systems, Ethernet traffic is divided into the following classes of
service based on the feature use case:
Host attachment (iSCSI or iSER)
Back-end storage (iSCSI)
Node-to-node communication (Remote Direct Memory Access (RDMA) clustering)
If challenges occur as the PFC is configured, verify the following attributes to determine the
issue:
Configure the IP address or VLAN by using mkip.
Configure the class of service (COS) by using chsystemethernet.
Ensure that the priority tag is enabled on the switch.
Ensure that the lsportip output is as follows:
dcbx_state, pfc_enabled_tags
The Enhanced Transmission Selection (ETS) setting is recommended if a port is shared.
For more information about problem solving, see Resolving a problem with PFC settings.
Verify that the IP addresses are reachable and the TCP ports are open.
In specific situations, the TCP/IP layer might attempt to combine several ACK responses into
a single response to improve performance. However, that combination can negatively affect
iSCSI read performance as the storage target waits for the response to arrive. This issue is
observed when the application is single-threaded and has a low queue depth.
It is a best practice to disable the TCPDelayedAck parameter on the host platforms to improve
overall storage I/O performance. If the host platform does not provide a mechanism to disable
TCPDelayedAck, verify whether a smaller “Max I/O Transfer Size” with more concurrency
(queue depth > 16) improves overall latency and bandwidth usage for the specific host
workload. In most Linux distributions, this Max I/O Transfer Size is controlled by the
max_sectors_kb parameter with a suggested transfer size of 32 kiB.
In addition, review network switch diagnostic data to evaluate potential issues as packet drop
or packet retransmission. It is advisable to enable flow control or PFC to enhance the
reliability of the network delivery system to avoid packet loss, aiming to enhance the overall
performance.
For more information about iSCSI performance analysis and tuning, see iSCSI performance
analysis and tuning.
Various types of SAN zones are needed to run IBM Storage Virtualize systems in your
environment: A host zone, and a storage zone for virtualized back-end storage systems. In
addition, you must have an IBM Storage Virtualize systems zone that contains all the
IBM Storage Virtualize node ports used for communication between the clustered system’s
nodes or node canisters. Dedicated SAN zoning is also needed, if the IBM Storage Virtualize
system does utilize Remote Copy Services for volume replication with another Storage
Virtualize system.
For more information and important points about setting up IBM Storage Virtualize systems in
a SAN fabric environment, see Chapter 2, “Storage area network guidelines” on page 121.
Because IBM Storage Virtualize systems are a major component of the SAN and connect the
host to the storage subsystem, check and monitor the SAN fabrics.
728 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
Some situations can cause issues in the SAN fabric and SAN switches. Problems can be
related to a hardware fault or to a software problem on the switch. The following hardware
defects are normally the easiest problems to find:
Switch power, fan, or cooling units
Installed SFP modules
Fiber optic cables
Software failures are more difficult to analyze. In most cases, you must collect data and
involve IBM Support. However, before you take further action, check the installed code level
for any known issues in the switch vendor’s release notes for the switch firmware and
software. Also, check whether a new code level is available that resolves the problem that you
are experiencing.
SAN connectivity issues commonly are related to zoning. For example, a wrong WWPN for a
host zone may have been chose, such as when two IBM Storage Virtualize System ports
must be zoned to one HBA with one port from each IBM Storage Virtualize system node.
Another example could be a host port WWPN that was omitted in the host zoning, causing the
host object to be displayed as degraded.
SAN zoning therefore should be done after thorough planning, using a unified naming
schema for aliases, zones et cetera. It is equally beneficial for both the user and any
supporting function, if the SAN switches follow a clear naming structure as well as a
coordinated domain id assignment. While it is not a problem from a technical point of view to
reuse switch domain ids across SAN fabrics, it unnecessarily complicates troubleshooting, as
the switch nPort IDs (nPID) will show up multiple times in diagnostic data and the output of
CLI commands as for instance lsportfc, lstargetportfc and lsfabric.
Existing SAN environment often have developed organically over time with no or little
documentation. It definitely pays to create a documentation and graphical SAN layouts, to
enable faster problem analysis and resolution.
On IBM Storage Virtualize systems, a part the worldwide port names (WWPN) is derived from
the worldwide node name (WWNN) of the node canister in which the adapter is installed. The
WWNN is part of each node’s Vital Product Data (VPD), it impacts the WWPN’s last four
digits. The ports’ WWPN also are derived from the PCIe slot the adapter is installed in and its
port id. For more information, see Worldwide node and port names.
So, the WWPNs for the different ports of the same node differ in the 6th and 5th last digit. For
example:
50:05:07:68:10:13:37:dc
50:05:07:68:10:14:37:dc
50:05:07:68:10:24:37:dc
The WWPNs for ports on different nodes differ in the last 4 digits. For example, here are the
WWPNs for port 3 and 4 on each node of a IBM FlashSystem:
50:05:07:68:10:13:37:dc
50:05:07:68:10:14:37:dc
50:05:07:68:10:13:37:e5
50:05:07:68:10:14:37:e5
As shown in Example 11-11, two ports that belong to the same Storage Virtualize node are
zoned to a host FC port. Therefore, the result is that the host port will not log in to both nodes
of that I/O group and the multipathing driver will not see redundant paths:
The correct zoning must look like the zoning that is shown in Example 11-12.
The following IBM FlashSystem error codes are related to the SAN environment:
Error 1060: Fibre Channel ports are not operational.
Error 1220: A remote port is excluded.
A bottleneck is another common issue that is related to SAN switches. The bottleneck can be
present in a port where a host, storage subsystem, or IBM Storage Virtualize device is
connected, or in Inter-Switch Link (ISL) ports. The bottleneck can occur in some cases, such
as when a device that is connected to the fabric is slow to process received frames, or if a
SAN switch port cannot transmit frames at a rate that is required by a device that is connected
to the fabric.
These cases can slow down communication between devices in your SAN. To resolve this
type of issue, see the SAN switch documentation to investigate and identify what is causing
the bottleneck and how fix it.
If you cannot fix the issue with these actions, use the method that is described in 11.2,
“Collecting diagnostic data” on page 710, collect the SAN switch debugging data, and then
contact the vendor for assistance or open a case with the vendor.
730 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
sas_wwn=""
iqn=""
hbt="80487" hbr="0" het="0" her="1465"
cbt="0" cbr="612" cet="260" cer="0"
lnbt="0" lnbr="0" lnet="955316" lner="955332"
rmbt="0" rmbr="0" rmet="0" rmer="0"
dtdt="242" dtdc="6" dtdm="956797"
dtdt2="242" dtdc2="6"
lf="14" lsy="21" lsi="0" pspe="0"
itw="54" icrc="0" bbcz="0"
tmp="46" tmpht="85"
txpwr="596" txpwrlt="126"
rxpwr="570" rxpwrlt="31"
hsr="0" hsw="0" har="0" haw="0"
/>
<port id="2"
type="FC"
type_id="2"
z
type_id="2"
[…]
Table 11-5 shows some of the most interesting attributes and their meanings.
It is not possible to reset or clear the shown counter with a command at the moment. To
examine the current trend of the values or whether they are increasing, a best practice is to
compare two outputs of the command for differences. Allow some run time between the two
iterations of the command.
For more information about the lsportstats command, see the IBM Documentation for the
lsportstats command.
IBM Storage Virtualize has several CLI commands that you can use to check the status of the
system and attached storage subsystems. Before you start a complete data collection or
problem isolation on the SAN or subsystem level, first use the following commands and check
the status from the IBM Storage Virtualize perspective:
lscontroller <controller_id_or_name>
Checks that multiple WWPNs that match the back-end storage subsystem controller ports
are available.
Checks that the path_counts are evenly distributed across each storage subsystem
controller, or that they are distributed correctly based on the preferred controller. The total
of all path_counts must add up to the number of MDisks multiplied by the number of
IBM Storage Virtualize nodes.
lsmdisk
Checks that all MDisks are online (not degraded or offline).
lsmdisk <MDisk_id_or_name>
Checks several of the MDisks from each storage subsystem controller. Are they online?
Do they all have path_count = number of back-end ports in the zone to IBM Storage
Virtualize x number of nodes? An example of the output from this command is shown in
Example 11-14. MDisk 0 is a local MDisk in an IBM FlashSystem, and MDisk 1 is provided
by an external, virtualized storage subsystem.
732 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
ctrl_type
ctrl_WWNN
controller_id
path_count
max_path_count
ctrl_LUN_#
UID
preferred_WWPN
active_WWPN
fast_write_state empty
raid_status online
raid_level raid6
redundancy 2
strip_size 256
spare_goal
spare_protection_min
balanced exact
tier tier0_flash
slow_write_priority latency
fabric_type
site_id
site_name
easy_tier_load
encrypt no
distributed yes
drive_class_id 0
drive_count 8
stripe_width 7
rebuild_areas_total 1
rebuild_areas_available 1
rebuild_areas_goal 1
dedupe no
preferred_iscsi_port_id
active_iscsi_port_id
replacement_date
over_provisioned yes
supports_unmap yes
provisioning_group_id 0
physical_capacity 85.87TB
physical_free_capacity 78.72TB
write_protected no
allocated_capacity 155.06TB
effective_used_capacity 16.58TB.
IBM_IBM FlashSystem:FLASHPFE95:superuser>lsmdisk 1
id 1
name flash9h01_itsosvccl1_0
status online
mode managed
MDisk_grp_id 1
MDisk_grp_name Pool1
capacity 51.6TB
quorum_index
block_size 512
controller_name itsoflash9h01
ctrl_type 6
ctrl_WWNN 500507605E852080
controller_id 1
path_count 16
max_path_count 16
ctrl_LUN_# 0000000000000000
UID 6005076441b53004400000000000000100000000000000000000000000000000
preferred_WWPN
active_WWPN many
Example 11-14 on page 732 shows that for MDisk 1 that the external storage controller
has eight ports that are zoned to IBM Storage Virtualize systems, which has two nodes (8
x 2 = 16).
lsvdisk
Checks that all volumes are online (not degraded or offline). If the volumes are degraded,
are there stopped FlashCopy jobs present? Restart stopped FlashCopy jobs or seek
IBM Storage Virtualize systems support guidance.
lsfabric
Use this command with the various options, such as -controller controllerid. Also,
check different parts of the IBM Storage Virtualize systems configuration to ensure that
multiple paths are available from each IBM Storage Virtualize node port to an attached
host or controller. Confirm that IBM Storage Virtualize systems node port WWPNs are
also consistently connected to an external back-end storage.
Example 11-15 shows how to obtain this information by using the lscontroller
<controllerid> and svcinfo lsnode commands.
734 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
WWPN 500507605E8520B1
path_count 32
max_path_count 32
WWPN 500507605E8520A1
path_count 32
max_path_count 64
WWPN 500507605E852081
path_count 32
max_path_count 64
WWPN 500507605E852091
path_count 32
max_path_count 64
WWPN 500507605E8520B2
path_count 32
max_path_count 64
WWPN 500507605E8520A2
path_count 32
max_path_count 64
WWPN 500507605E852082
path_count 32
max_path_count 64
WWPN 500507605E852092
path_count 32
max_path_count 64
Example 11-15 on page 734 shows that 16 MDisks are present for the external storage
subsystem controller with ID 1, and two IBM Storage Virtualize nodes are in the cluster. In this
example, the path_count is 16 x 2 = 32.
IBM Storage Virtualize has useful tools for finding and analyzing back-end storage subsystem
issues because it includes a monitoring and logging mechanism.
Typical events for storage subsystem controllers include incorrect configuration, which results
in a 1625 - A controller configuration is not supported error code. Other issues that are
related to the storage subsystem include failures that point to the MDisk I/O (error code
1310), disk media (error code 1320), and error recovery procedure (error code 1370).
However, all messages do not have only one specific reason for being issued. Therefore, you
must check several areas for issues, not only the storage subsystem.
3. Check the FC SAN or IP SAN environment for switch problems or zoning failures.
Make sure that the zones are correctly configured, and that the zone set is activated. The
zones that allow communication between the storage subsystem and the IBM Storage
Virtualize systems device must contain the WWPNs of the storage subsystem and
WWPNs of the IBM Storage Virtualize system.
4. Collect all support data and contact IBM Support.
Collect the support data for the involved SAN, IBM Storage Virtualize system, and external
storage systems, as described in 11.2, “Collecting diagnostic data” on page 710.
736 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
Note: IP replication that is configured over 25 Gbps ports does not use RDMA capabilities,
and it does not provide a performance improvement compared to 10 Gbps ports. 100 Gbps
ports do not support IP replication.
A system can be part of only two IP partnerships. IBM Storage Virtualize systems with
pre-8.4.2.0 firmware are still limited to one IP partnership. Partnerships on low memory
platform nodes share memory resources, which can lead to degraded performance.
Portsets replace the requirement for creating remote-copy groups for IP partnerships.
Dedicated portsets can be created for remote copy traffic. The dedicated portsets provide a
group of IP addresses for IP partnerships.
During updates of the software, any IP addresses that are assigned to remote-copy groups
with an IP partnership are automatically moved to a corresponding portset. For example, if
remote-copy group 1 is defined on the system before the update, IP addresses from that
remote-copy group are mapped to portset 1 after the update. Similarly, IP addresses in
remote-copy group 2 are mapped to portset 2.
The native IP replication feature uses the following TCP/IP ports for remote cluster path
discovery and data transfer, therefore, these ports need to be open:
IP partnership management IP communication: TCP port 3260
IP partnership data path connections: TCP port 3265
If a connectivity issue exists between the cluster in the management communication path, the
cluster reports error code 2021: Partner cluster IP address unreachable. However, when
a connectivity issue exists in the data path, the cluster reports error code 2020: IP Remote
Copy link unavailable.
If the IP addresses are reachable and TCP ports are open, verify whether the end-to-end
network supports a maximum transmission unit (MTU) of 1500 bytes without packet
fragmentation. When an external host-based ping utility is used to validate end-to-end MTU
support, use the “do not fragment” qualifier.
Fix the network path so that traffic can flow correctly. After the connection is made, the error
auto-corrects.
The network quality of service largely influences the effective bandwidth usage of the
dedicated link between the cluster. Bandwidth usage is inversely proportional to round-trip
time (RTT) and the rate of packet drop or retransmission in the network.
Note: For standard block traffic, a packet drop or retransmission of 0.5% or more can lead
to unacceptable usage of the available bandwidth.
Work with the network team to investigate over-subscription or other quality of service (QoS)
issues of the link, with an objective of having the lowest possible (less than 0.1%) packet-drop
percentage.
For more information about the configuration, see 6.6, “Native IP replication” on page 458.
For more information about performance contributors, see 6.6.8, “Native IP replication
performance considerations” on page 475.
An IBM Storage Virtualize cluster can be formed by using RDMA-capable NICs that use
RoCE or internet Wide-area RDMA Protocol (iWARP) technology. Consider the following
points:
Inter-node Ethernet connectivity can be done over identical ports only, and such ports
must be connected within the same switching fabric.
To ensure best performance and reliability, a minimum of two dedicated RDMA-capable
Ethernet ports are required for node-to-node communications. These ports must be
configured for inter-node traffic only and must not be used for host attachment,
virtualization of Ethernet-attached external storage, or IP replication traffic.
If the cluster will be created without an ISL (up to 300 meters (984 feet)), deploy
independent (isolated) switches.
If the cluster will be created on a short-distance ISL (up to 10 km (6.2 miles)), provision as
many ISL between switches as there are RDMA-capable cluster ports.
For a long-distance ISL (up to 100 km (62 miles)), the Dense Wavelength Division
Multiplexing (DWDM) and Coarse Wavelength Division Multiplexing (CWDM) methods are
applicable for L2 networks. Packet-switched or VXLAN methods are deployed for an L3
network because this equipment comes with deeper buffer “pockets”.
The following ports must be opened in the firewall for IP-based RDMA clustering:
TCP 4791, 21451, 21452, and 21455
UDP 4791, 21451, 21452, and 21455
For more information, see Configuration details for using RDMA-capable Ethernet port for
node-to-node communications.
Before completing managing tasks that are related to RDMA-capable Ethernet ports on a
node, use the following best practices to manage these ports:
If you already have a system that is configured to use RDMA-capable Ethernet ports, you
must ensure that one redundant path is available before adding, removing, or updating
settings for RDMA-capable Ethernet ports.
Add, remove, or update settings on only one RDMA-capable Ethernet port at a time. Wait
15 seconds between these changes before updating other RDMA-capable Ethernet ports.
If you are using a VLAN to create physical separation of networks, ensure that you follow
these extra guidelines when completing management-related tasks:
– VLAN IDs cannot be updated or added independently of other settings on a
RDMA-capable Ethernet port, such as an IP address.
– Before adding or updating VLAN ID information to RDMA-capable Ethernet ports, you
must configure VLAN support on the all the Ethernet switches in your network. For
738 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
example, on each switch, set VLAN to “Trunk” mode, and specify the VLAN ID for the
RDMA-capable Ethernet ports that will be in the same VLAN.
Problem determination
The first step is to review whether the node IP address is reachable and verify that the
required TCP/UDP ports are accessible in both directions.
The following CLI command lists the port level connectivity information for node to node or
clustering connectivity, and can be helpful to find the reason for connectivity error:
sainfo lsnodeipconnectivity
This command lists the port level connectivity information for node to node or clustering
connectivity.
The IBM Documentation for the lsnodeipconnectivity command lists the different error_data
values with a description, and provides possible corrective actions.
The first thing to check is whether any unfixed events exist that require attention. After the fix
procedure is followed to correct the alerts, the next step is to check the audit log to determine
whether any activity exists that can trigger the performance issue. If that information
correlates, more analysis can be done to check whether that specific feature is used.
The most common root causes for performance issues are SAN congestion, configuration
changes, incorrect sizing or estimation of advanced copy services (replication, FlashCopy,
and volume mirroring), or I/O load change.
Volume mirroring
The write-performance of the mirrored volumes is dictated by the slowest copy. Reads are
served from the primary copy of the volume (in a stretched cluster topology, both copies can
serve reads, which are dictated by the host site attribute). Therefore, size the solution as
needed.
The mirroring layer maintains a bitmap copy on the quorum device. If a quorum disk is not
accessible and volume mirroring cannot update the state information, a mirrored volume
might need to be taken offline to maintain data integrity. Similarly, slow access to the quorum
can affect the performance of mirroring volumes.
Problems sometimes occur during the creation of a mirrored volume or in relation to the
duration of the synchronization. Helpful details and best practices are described in 6.7.6,
“Bitmap space for out-of-sync volume copies” on page 484 and 6.7.5, “Volume mirroring
performance considerations” on page 482.
FlashCopy
FlashCopy is a function that you can use to create a point-in-time copy of one of your
volumes. Section 6.2.4, “FlashCopy planning considerations” on page 381 provides technical
background and details for FlashCopy configurations. Review the provided recommendations
Policy-based replication
Policy-based replication helps you to replicate data between systems with minimal
management, significantly higher throughput, and reduced latency compared to the
asynchronous remote copy function.
740 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
If a volume group exceeds the RPO of the associated replication policy, then an alert is
generated in the event log. Depending on notification settings, a notification is sent by either
an email, syslog, or SNMP. The management GUI displays following RPO statuses as shown
in Figure 11-10:
More details see: Checking the status and RPO for policy-based replication.
Safeguarded Copy
Safeguarded Copy on IBM Storage Virtualize supports the ability to create cyber-resilient
point-in-time copies of volumes that cannot be changed or deleted through user errors,
malicious actions, or ransomware attacks. The system integrates with IBM Copy Services
Manager (IBM CSM) to provide automated backup copies and data recovery.
The online documentation of IBM CSM provides a dedicated chapter for troubleshooting and
support.
For more information, see IBM FlashSystem Safeguarded Copy Implementation Guide,
REDP-5654 and IBM Copy Services Manager -> Troubleshooting and support.
HyperSwap
With HyperSwap, a fully independent copy of the data is maintained at each site. When data
is written by hosts at either site, both copies are synchronously updated before the write
operation is completed. The HyperSwap function automatically optimizes itself to minimize
data that is transmitted between two sites, and to minimize host read/write latency.
Verify that the link between the sites is stable and has enough bandwidth to replicate the peak
workload. Also, check whether a volume must frequently change the replication direction from
one site to another one. This issue occurs when a specific volume is being written by hosts
from both the sites. Evaluate whether this issue can be avoided to reduce frequent direction
changes. Ignore this issue if the solution is designed for active/active access.
If a single volume resynchronization between the sites takes a long time, review the
partnership link_bandwidth_mbits and per relationship_bandwidth_limit parameters.
The garbage-collection process is designed to defer the work as much as possible because
the more it is deferred, the higher the chance of having to move only a small amount of valid
data from the block to make that block available it to the free pool. However, when the pool
reaches more than 85% of its allocated capacity, garbage collection must speed up to move
valid data more aggressively to make space available sooner. This issue might lead to
increased latency because of increased CPU usage and load on the back-end. Therefore, it is
a best practice to manage storage provisioning to avoid such scenarios.
Note: If the usable capacity of a DRP exceeds more than 85%, I/O performance can be
affected. The system needs 15% of usable capacity that is available in DRPs to ensure that
capacity reclamation can be performed efficiently.
Users are encouraged to pay close attention to any GUI notifications and use best practices
for managing physical space. Use data reduction only at one layer (at the virtualization layer
or the back-end storage or drives) because no benefit is realized by compressing and
deduplicating the same data twice.
Because encrypted data cannot be compressed, data reduction must be done before the data
is encrypted. Correct sizing is important to get the best performance from data reduction;
therefore, use data reduction tools to evaluate system performance and space saving.
IBM Storage Virtualize systems use the following types of data reduction techniques:
IBM FlashSystem that use FCM NVMe drives have built-in hardware compression.
IBM FlashSystem that use industry-standard NVMe drives and SVC rely on the
IBM Storage Virtualize software and DRP pools to deliver data reduction.
For more information about DRPs, see Introduction and Implementation of Data Reduction
Pools and Deduplication, SG24-8430.
Compression
Starting with IBM Storage Virtualize 8.4, the integrated Comprestimator is always enabled
and running continuously, thus providing up-to-date compression estimation over the entire
cluster, both in the GUI and IBM Storage Insights. To display information for the
thin-provisioning and compression estimation analysis report for all volumes, run the
lsvdiskanalysis command.
742 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
Normal behavior alerts for hardware, logical, and connectivity components during an upgrade
of IBM Storage Virtualize storage systems are as follows:
Degraded Host Connectivity
Degraded Array MDisks
Degraded Volumes
Degraded connectivity to the internal disks
Degraded Control Enclosures
Degraded Expansion Enclosures
Degraded Drives
Node offline
FC Ports offline
Serial-attached SCSI (SAS) Ports offline
Enclosure Batteries offline
Node added
Node restarted
Number of device logins reduced on IBM Storage Virtualize System (for example, when an
IBM Storage Virtualize system is updated, it is used as back-end storage for the SVC)
When you attempt to upgrade an IBM Storage Virtualize system, you also might receive a
message, such as an error occurred in verifying the signature of the update package. This
message does not mean that an issue exists in your system. Sometimes, this issue occurs
because not enough space is available on the system to copy the file, or the package is
incomplete or contains errors. In this case, open a Support Ticket with IBM Support and
follow their instructions.
To avoid running out of space on the system, the usable capacity must be monitored carefully
by using the GUI of the IBM Storage Virtualize system. The IBM Storage Virtualize GUI is the
only capacity dashboard that shows the physical capacity.
IBM encourages users to configure Call Home on the IBM Storage Virtualize system. Call
Home monitors the physical free space on the system and automatically opens a service call
for systems that reach 99% of their usable capacity. IBM Storage Insights also can monitor
and report on any potential out-of-space conditions, and the new Advisor function warns
when the IBM Storage Virtualize system almost at full capacity. For more information, see
11.6.5, “IBM Storage Insights Advisor” on page 767.
When the IBM Storage Virtualize system pool reaches an out-of-space condition, the device
drops into a read-only state. An assessment of the data compression ratio (CR) and the
re-planned capacity estimation should be done to determine how much outstanding storage
demand might exist. This extra capacity must be prepared and presented to the host so that
recovery can begin.
The approaches that can be taken to reclaim space on the IBM Storage Virtualize system in
this scenario vary by the capabilities of the system, optional external back-end controllers, the
system configuration, and planned capacity overhead needs.
Freeing up space
You can reduce the amount of used space by using several methods, which are described in
the following sections.
744 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
3. Migrate volumes with extents on the write-protected array to another pool. If possible,
moving volumes to another pool can free up space in the affected pool to allow for space
reclamation.
4. As this volume moves into the new pool, its previously occupied flash extents are freed (by
using SCSI unmap), which then provides more free space to the IBM FlashSystem
enclosure to be configured to a proper provisioning to support the CR.
5. Delete dispensable volumes to free up space. If possible, within the pool (MDisk group) on
the IBM Storage Virtualize system, delete unnecessary volumes. IBM Storage Virtualize
systems support SCSI unmap, so deleting volumes results in space reclamation benefits by
using this method.
6. Bring the volumes in the pool back online by using a Directed Maintenance Procedure.
Note: Power off all hosts accessing the pool to avoid host writes from impacting the
success of the recovery plan.
For more information about the types of recovery for out of space situations, including
standard pools and DRPs, see Handling out of physical space conditions.
The Fix Procedure helps you to identify the enclosure and slot where the bad drive is located,
and guides you to the correct steps to follow to replace it.
When a flash drive fails, it is removed from the array, and the rebuild process to the available
rebuild areas starts. After the failed flash drive is replaced and the system detects the
replacement, it reconfigures the new drive, a copy-back starts, and the new drive is used to
fulfill the array membership goals of the system.
To obtain support for any IBM product, see the IBM Support home page.
If the problem is caused by IBM Storage Virtualize and you cannot fix it by using the
Recommended Action feature or by examining the event log, collect the IBM Storage
Virtualize support package, as described in 11.2.1, “IBM Storage Virtualize systems data
collection” on page 710.
To set up the remote support options by using the GUI, select Settings → Support →
Support Assistance → Reconfigure Settings, as shown in Figure 11-11.
You can use local support assistance if you have security restrictions that do not allow
support to connect remotely to your systems. With RSA, support personnel can work onsite
and remotely by using a secure connection from the support center.
They can perform troubleshooting, upload support packages, and download software to the
system with your permission. When you configure RSA in the GUI, local support assistance
also is enabled.
Note: Systems that are purchased with a 3-year warranty include enterprise-class support
(ECS), and they are entitled to IBM Support by using RSA to quickly connect and diagnose
problems. However, IBM Support might choose to use this feature on non-ECS systems at
their discretion; therefore, we recommend configuring and testing the connection on all
systems.
746 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
Call Home is available in several IBM systems, including IBM Storage Virtualize systems,
which allows them to automatically report problems and statuses to IBM.
In addition, Call Home Connect Cloud provides an app that is called Call Home Connect
Anywhere, a mobile version to monitor your systems from anywhere.
The IBM Call Home Connect Anywhere mobile app is available on iOS and Android, and it
provides a live view of your IBM assets, including cases, alerts, and support statuses. The
mobile app, which is available within the Apple App Store and the Google Play Store, is a
companion application to IBM Call Home Connect Cloud. If you do not already have assets
that are registered, you are directed to IBM Call Home Connect Cloud to register assets to be
viewed in the mobile app.
Call Home Connect Cloud provides the following information about IBM systems:
Automated tickets
Combined ticket view
Warranty and contract status
Health check alerts and recommendations
System connectivity heartbeat
Recommended software levels
Inventory
Security bulletins
Live updates for your assets, ensuring that you always see the latest data.
Case summaries for cases with IBM Support.
Proactive alerts when important conditions are detected for your assets.
IBM Call Home status and the last contact for your assets.
Detailed information on warranties, maintenance contracts, service levels, and end of
service information for each of your assets.
Recommended software levels for each of your assets.
For more information about Call Home Connect Cloud (Call Home Web), see the IBM
Support website “Let’s troubleshoot”.
At the IBM Support website, select Monitoring → Hardware: Call Home Connect Cloud to
see Call Home Connect Cloud, as shown in Figure 11-12 on page 748.
Call Home Connect Cloud provides an enhanced live view of your assets, including the status
of cases, warranties, maintenance contracts, service levels, and end of service information.
Additionally, Call Home Connect Cloud offers links to other online tools (for example,
IBM Storage Insights) and security documents.
Call Home Connect Cloud provides software and firmware level recommendations for
IBM Storage and IBM Power products.
For Call Home Connect Cloud to analyze the data of IBM Storage Virtualize systems and
provide useful information about them, devices must be added to the tool. The machine type,
model, and serial number are required to register the product in Call Home Connect Cloud.
Also, it is required that the IBM Storage Virtualize system have Call Home and inventory
notification enabled and operational. Figure 11-13 shows the summary dashboard for all
assets that are configured in Call Home Connect Cloud.
748 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
Figure 11-14 shows a list of configured assets (some of the details, including the email-id, are
hidden).
750 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
Figure 11-15 shows the Call Home Connect Cloud details windows of an IBM Storage
Virtualize system.
For more information about how to set up and use Call Home Connect Cloud, see Introducing
Call Home Connect Cloud.
It analyzes Call Home and inventory data of systems that are registered in Call Home
Connect Cloud and validates their configuration. Then, it displays alerts and provides
recommendations in the Call Home Connect Cloud tool.
Note: Use Call Home Connect Cloud because it provides useful information about your
systems. The Health Checker feature helps you to monitor the system, and operatively
provides alerts and creates recommendations that are related to them.
Some of the functions of the IBM Call Home Connect Cloud and Health Checker were ported
to IBM Storage Insights, as described in 11.6, “IBM Storage Insights” on page 752.
In addition, when IBM Support is needed, IBM Storage Insights simplifies uploading logs,
speeds resolution with online configuration data, and provides an overview of open tickets all
in one place.
752 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
– Understand how much of the capacity of the storage systems and pools is being
consumed by Safeguarded copies.
– Verify that the correct volumes are being protected.
– Generate reports to see how much of your capacity is protected.
In addition to the no additional charge IBM Storage Insights, you also can use IBM Storage
Insights Pro. This service is a subscription service that provides longer historical views of
data, offers more reporting and optimization options, and supports IBM file and block storage
with EMC VNX and VMAX.
Figure 11-16 shows a comparison of IBM Storage Insights and IBM Storage Insights Pro.
Figure 11-16 IBM Storage Insights versus IBM Storage Insights Pro
For more information regarding the features that are included in the available editions of
IBM Storage Insights, see IBM Storage Insights documentation.
For more information about the supported operating systems for a data collector, see
the IBM Storage Insights online documentation at Managing data collectors.
The data collector streams performance, capacity, asset, and configuration metadata to your
IBM Cloud instance.
The metadata flows in one direction: from your data center to IBM Cloud over HTTPS. In the
IBM Cloud, your metadata is protected by physical, organizational, access, and security
controls. IBM Storage Insights is ISO/IEC 27001 Information Security Management certified.
To make your data collection services more robust, install two or more data collectors on
separate servers or VMs in each of your data centers.
When you add storage devices, the data collectors that you deploy are tested to see whether
they can communicate with those devices. If multiple data collectors can communicate with a
device, then the data collector with the best response time collects the metadata. If the
collection of metadata is interrupted, the data collectors are tested again, and the data
collectors with the best response times take over.
Collected metadata
The following metadata about the configuration and operations of storage resources is
collected:
Name, model, firmware, and type of storage system.
Inventory and configuration metadata for the storage system’s resources, such as
volumes, pools, disks, and ports.
Capacity values, such as capacity, unassigned space, used space, and the CR.
Performance metrics, such as read/write data rates, I/O rates, and response times.
The application data that is stored on the storage systems cannot be accessed by the data
collector.
754 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
The IBM Cloud team that is responsible for the day-to-day operation and maintenance of
IBM Cloud instances.
IBM Support for investigating and closing service tickets.
For more information about setting up the customized dashboard, see Creating customized
dashboards to monitor your storage.
756 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
2. Select Create Ticket (see Figure 11-20). Several windows open in which you enter
information about the machine, a problem description, and the option to upload logs.
Note: The “Permission given” information box (see Figure 11-21 on page 758) is an
option that the customer must enable in the IBM Storage Virtualize systems GUI. For
more information, see 11.4, “Remote Support Assistance” on page 746.
Figure 11-21 shows the ticket data collection that is done by the IBM Storage Insights
application.
758 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
As shown in Figure 11-22, you can add a problem description and attach other files to
support the ticket, such as error logs or window captures of error messages.
3. You are prompted to set a severity level for the ticket, as shown in Figure 11-23. Severity
levels range from Severity 1 (for a system that is down or extreme business impact) to
Severity 4 (noncritical issue).
760 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
4. The final summary window (see Figure 11-24) includes the option to add logs to the ticket.
When completed, click Create Ticket to create the support ticket and send it to IBM. The
ticket number is created by the IBM Support system and returned to your IBM Storage
Insights instance.
5. Figure 11-25 shows how to view the summary of the open and closed ticket numbers for
the system that is selected by using the Action menu option.
762 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
2. Enter the IBM Support case number, and then click Next (see Figure 11-27). The IBM
Support case number uses the following format:
TS000XXXXX
These details were supplied when you created the ticket or by IBM Support if the Problem
Management Record (PMR) was created by a problem Call Home event (assuming that
Call Home is enabled).
A window opens in which you can choose the log type to upload. The window and the
available options are shown in Figure 11-28 on page 764.
The following options are available:
– Type 1 - Standard logs.
For general problems, including simple hardware and simple performance problems.
– Type 2 - Standard logs and the most recent statesave log.
– Type 3 - Standard logs and the most recent statesave log from each node.
For 1195 and 1196 node errors and 2030 software restart errors.
– Type 4 - Standard logs and new statesave logs.
For complex performance problems, and problems with interoperability of hosts or
storage systems, compressed volumes, and remote copy operations, including 1920
errors.
You can allow IBM Support to collect and upload packages from your storage systems
without requiring permission from your organization. If you grant this permission, it can
help IBM Support resolve your support tickets faster.
If IBM Support does not have that permission, a notification appears within the window.
If you are unsure about which log type to upload, contact IBM Support for guidance. The
most common type to use is type 1, which is the default type. The other types are more
detailed logs and for issues in order of complexity.
764 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
3. After the type of log is selected, click Next. The log collection starts. When completed, the
log completion window is displayed, as shown in Figure 11-29.
4. After clicking Next, it is possible to provide more information in a text field, or upload more
files, as shown in Figure 11-30.
766 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
5. By clicking Next, the update of the support ticket is completed, as shown in Figure 11-31.
IBM Storage Insights analyzes your device data to identify violations of best practice
guidelines and other risks, and to provide recommendations about how to address these
potential problems.
To view these recommendations, select Insights → Advisor. To see more information about
a recommendation or to acknowledge it, double-click the recommendation.
All advice that is categorized as Error, Warning, Informational, or Acknowledged is shown for
all attached storage within a table. Using a filter, it is possible, for example, to display only
advisories for a specific IBM Storage Virtualize system.
Figure 11-32 on page 768 shows the initial IBM Storage Insights Advisor menu.
Figure 11-33 shows an example of the detailed IBM Storage Insights Advisor
recommendations.
As shown in Figure 11-33, the details of a Running out of space recommendation is shown
the Advisor page. In this scenario, the user clicked the Warning tag to focus only on the
recommendations that feature a severity of Warning.
768 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm CH11-85-TROUBLESHOOTING.fm
For more information about setting and configuring the Advisor options, see Monitoring
recommended actions in teh Advisor.
770 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
A
These solutions can meet demands of IBM i customers for entry to high-end storage
infrastructure solutions.
All family members that are based on IBM Storage Virtualize software use a common
management interface. They also provide a comprehensive set of advanced functions and
technologies, such as advanced Copy Services functions, encryption, compression, storage
tiering, Non-Volatile Memory Express (NVMe) flash, storage-class memory (SCM) devices,
and external storage virtualization. Many of these advanced functions and technologies also
are of interest to IBM i customers who are looking for a flexible, high-performing, and highly
available (HA) SAN storage solution.
Unless otherwise stated, the considerations also apply to previous generations of products,
such as the IBM Storwize family, the IBM FlashSystem 9100 series, and IBM FlashSystem
V9000.
Note: For the most recent IBM i functional enhancements, see IBM i Functional
Enhancements Summary.
772 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
IBM i Storage management
Because of the unique IBM i storage architecture, special considerations for planning and
implementing a SAN storage solution are required (also with IBM Storage Virtualize-based
storage). This section describes how IBM i storage management manages its available disk
storage.
Many host systems require the user to take responsibility for how information is stored and
retrieved from the disk units. An administrator also must manage the environment to balance
disk usage, enable disk protection, and maintain balanced data to be spread for optimum
performance.
The IBM i architecture is different in the way that the system takes over many of the storage
management functions, which are the responsibility of a system administrator on other
platforms.
IBM i, with its Technology Independent Machine Interface (TIMI), largely abstracts the
underlying hardware layer from the IBM i operating system and its users and manages its
system and user data in IBM i disk pools, which are also called auxiliary storage pools
(ASPs).
When you create a file, you do not assign it to a storage location. Instead, the IBM i system
places the file in the location that ensures the best performance from an IBM i perspective
(see Figure A-1).
Figure A-1 IBM i storage management spreads objects across logical unit numbers
Note: When a program presents instructions to the machine interface for execution, the
interface appears to the program as the system hardware, but it is not. The instructions that
are presented to TIMI pass through a layer of microcode before they are understood by the
hardware. Therefore, TIMI and System Licensed Internal Code (SLIC) allow IBM Power
with IBM i to take technology in stride.
Single-level storage
IBM i uses a single-level storage, object-orientated architecture. It sees all disk space and the
main memory or main storage as one address space. It also uses the same set of virtual
addresses to cover main memory and disk space. Paging the objects in this virtual address
space is performed in 4 KB pages, as shown in Figure A-2. After a page is written to disk, it is
stored with metadata, including its unique virtual address. For this purpose, IBM i originally
used a proprietary 520 bytes per sector disk format.
Note: The system storage that is conformed with main storage or main memory and
auxiliary storage is addressed in the same way. This single, device-independent
addressing mechanism means that objects are referred to by name or name and library,
and never by disk location. The virtual addressing of IBM i is independent of the physical
location of the object, type, capacity, and the number of disks units or LUNs on the system.
The IBM i disk storage space is managed by using ASPs. Each IBM i system has a system
ASP (ASP 1), which includes the load source (also known as boot volume on other systems)
as disk unit 1, and optional user ASPs (ASPs 2 - 33). The system ASP and the user ASPs are
designated as SYSBAS, and they constitute the system database.
The single-level storage with its unique virtual addresses also implies that the disk storage
that is configured in SYSBAS of an IBM i system must be available in its entirety for the
system to remain operational. It cannot be shared for simultaneous access by other IBM i
systems.
To allow for sharing of IBM i disk storage space between multiple IBM i systems in a cluster,
switchable independent auxiliary storage pools (IASPs) can be configured. The IBM i ASPs'
architecture is shown in Figure A-3.
774 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
The system database is referred
to as SYSBAS
System ASP
User
Defined
File
System
Primary Primary Primary
(UDFS)
Secondary Secondary
Secondary
Single-level storage makes main memory work as a large cache. Reads are done from pages
in main memory, and requests to disk are done only when the needed page is not there yet.
Writes are done to main memory or main storage, and write operations to disk are performed
as a result of swap, file close, or forced write. Application response time depends on disk
response time and many other factors.
Note: In Figure A-4, the ASP is conformed by assigned LUNs from IBM Storage Virtualize
to the IBM i system. It shows an application request and update to a database record.
Throughout the time that the TIMI task is in progress, an interaction above TIMI can occur.
This interaction does not continue until the TIMI task concludes.
776 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Disk performance considerations
Disk subsystem performance affects overall IBM i system performance, especially in a
commercial data processing environment where a large volume of data often must be
processed. Disk drives or the LUNs’ response times contribute to a major portion of the
overall response time (online transaction processing (OLTP)) or run time (batch).
Also, disk subsystem performance is affected by the type of protection (redundant array of
independent disks (RAID), distributed RAID (DRAID), or mirroring).
The amount of free space (GB) on the drives and the extent of fragmentation also has an
effect. The reason is the need to find suitable contiguous space on the disks to create objects
or extend objects. Disk space often is allocated in extents of 32 KB. If a 32 KB contiguous
extent is not available, two extents of 16 KB are used.
The following disk performance considerations are described in the following sections:
Disk I/O requests
Disk subsystems
Disk operation
Asynchronous I/O wait
Disk protection
Logical database I/O versus physical disk I/O
Note: The Set Object Access (SETOBJACC) command on IBM i temporarily changes the
speed of access to an object by bringing the object into a main storage pool or purging it
from all main storage pools. An object can be kept in main storage by selecting a pool for
the object that has available space and does not have jobs that are associated with it.
An information request (data or instructions) from the CPU that is based on user interactions
is submitted to the disk subsystem if it cannot be satisfied from the contents of main memory.
If the request can be satisfied from the disk subsystem cache, it responds or forwards the
request to the disk drives or LUNs.
Similarly, a write request is retained in memory unless the operating system determines that it
must be written to the disk subsystem. Then, the operating system attempts to satisfy the
request by writing to the controller cache.
Note: The QAPMDISKRB from the collections services data files in IBM i includes disk file
response bucket entries. It also contains one record for each device resource name. It is
intended to be used with the QAPMDISK file.
Disk operation
On IBM i, physical disk I/O requests are categorized as database (physical or logical files) or
non-database I/Os, as shown in Figure A-6.
778 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
The time that is taken to respond to synchronous disk I/Os contributes to the OLTP response
time or batch run time. With asynchronous I/O, the progress of a request does not wait for the
completion of I/O.
Often, write requests are asynchronous, including journal deposits with commitment control.
However, the writes become synchronous if journaling is active without commitment control.
JBWIO is the number of times that the process waited for outstanding asynchronous I/O
operations to complete. For more information, see this IBM Documentation web page.
This issue might be caused by faster processors that are running with relatively poor disk
subsystems performance. Disk subsystem performance can be affected by busy or slow disks
or small I/O cache.
Disk protection
For more information about external storage considerations for setting up your RAID
protection, see Chapter 4, “Storage pools” on page 241.
Note: If you need high I/O performance on your IBM i workload, an option is to create a
DRAID 1 on your supported storage system, such as IBM FlashSystem 7200 or 9200 with
IBM Storage Virtualize 8.4 or later. In this configuration, the rebuild area is distributed over
all member drives. The minimum extent size for this type of DRAID is 1024 MB.
When an application program requests data, storage management checks whether the data
is available in memory. If so, the data is moved to the open data path in the job buffer. If the
data is not in memory, the request is submitted to the disk subsystem as a read command.
In that context, logical database I/O information is moved between the open data path of the
user program and the partition buffer. This information is a count of the number of buffer
movements, and not a reflection of the records that are processed.
Physical disk I/O occurs when information is read or written as a block of data to or from the
disk. It involves the movement of data between the disk and the partition buffer in memory.
For more information, see IBM i 7.5: Performance.
IBM Storage Virtualize storage for hosts is formatted with a block size of 512 bytes; therefore,
a translation or mapping is required to attach it to IBM i. IBM i changes the data layout to
support 512-byte blocks (sectors) in external storage by using an extra ninth sector to store
the headers for every page.
The eight 8-byte headers from each 520-byte sectors of a page are stored in the ninth sector,
which is different than 520-byte sector storage where the 8 bytes are stored continuous with
the 512 bytes of data to form the 520-byte sector.
The data that was stored in eight sectors is now stored by using nine sectors, so the required
disk capacity on IBM Storage Virtualize based systems is 8/9ths of the IBM i usable capacity.
Similarly, the usable capacity in IBM i is 8/9ths of the allocated capacity in these storage
systems.
When attaching IBM Storage Virtualize family storage to IBM i, plan for extra capacity on the
storage system so that the 8/9ths of the effective storage capacity that is available to IBM i
covers the capacity requirements for the IBM i workload.
780 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Storage connection to IBM i
IBM Storage Virtualize storage can be attached to IBM i in the following ways:
Native connection without using the IBM PowerVM® Virtual I/O Server (VIOS)
Connection with VIOS in N_Port ID Virtualization (NPIV) mode
Connection with VIOS in virtual Small Computer System Interface (SCSI) mode
The decision for IBM i native storage attachment or VIOS attachment is based on the
customer’s requirements. Native attachment has its strength in terms of simplicity, and it can
be a preferred option for static and smaller IBM i environments with only a few partitions. It
does not require extra administration and configuration of a VIOS environment. However, it
also provides the least flexibility and cannot be used with PowerVM advanced functions, such
as Live Partition Mobility (LPM) or remote restart.
Table A-1 lists the key criteria to help you with the decision of selecting an IBM i storage
attachment method.
Table A-1 Comparing IBM i native and Virtual I/O Server attachment
Criteria Native attachment VIOS attachment
Performance
(with NPIV)
The next sections describe the guidelines and best practices for each type of connection.
Note: For more information about the current requirements, see the following web pages:
IBM System Storage Interoperation Center (SSIC)
IBM i POWER External Storage Support Matrix Summary
Native attachment
Refer to SSIC for the native connection support for IBM i with IBM Storage Virtualize storage.
This website provides the most current information for the supported versions.
Native connection with SAN switches can be done by using the following adapters:
32 Gb PCIe3 2-port Fibre Channel (FC) adapters (Feature Code #EN1A or #EN1B
(IBM POWER9™ processor-based servers only))
16 Gb PCIe3 4-port FC adapters (Feature Code #EN1C or #EN1D (POWER9
processor-based servers only))
16 Gb PCIe3 2-port FC adapters (Feature Code #EN0A or #EN0B)
Direct native connection without SAN switches can be done by using the following adapters:
16-Gb adapters in IBM i connected to 16-Gb adapters in IBM Storage Virtualize 7.5 or
later based storage with non-NPIV target ports
4-Gb FC adapters in IBM i connected to 8-Gb adapters in IBM Storage Virtualize based
storage with non-NPIV target ports
For resiliency and performance reasons, connect IBM Storage Virtualize storage to IBM i with
multipathing that uses two or more FC adapters. Consider the following points:
You can define a maximum of 127 LUNs (up to 127 active + 127 passive paths) to a 16- or
32-Gb port in IBM i with IBM i 7.2 TR7 or later, and with IBM i 7.3 TR3 or later.
You can define a maximum of 64 LUNs (up to 64 active + 64 passive paths) to a 16- or
32-Gb port with IBM i release and TR lower than IBM i 7.2 TR7 and IBM i 7.3 TR3.
You can define a maximum of 64 LUNs (up to 64 active + 64 passive paths) to a 4- or 8-Gb
port, regardless of the IBM i level.
IBM i enables SCSI command tag queuing in the LUNs from natively connected
IBM Storage Virtualize storage. The IBM i queue depth per LUN and path with this type of
connection is 16.
VIOS attachment
The following FC adapters are supported for VIOS attachment of IBM i to IBM Storage
Virtualize storage:
32 Gb PCIe3 2-port FC adapter (Feature Code #EN1A or #EN1B (POWER9
processor-based servers only))
16 Gb PCIe3 4-port FC adapter (Feature Code #EN1C or #EN1D (POWER9
processor-based servers only))
16 Gb PCIe3 2-port FC adapter (Feature Code #EN0A or #EN0B)
8-Gb PCIe 2-port FC adapter (Feature Code #5735 or #5273)
8 Gb PCIe2 2-port FC adapter (Feature Code #EN0G or #EN0F)
8 Gb PCIe2 4-port FC adapter (Feature Code #5729)
8 Gb PCIe2 4-port FC adapter (Feature Code #EN12 or #EN0Y)
Important: For more information about the current requirements, see the following web
pages:
IBM System Storage Interoperation Center (SSIC)
IBM i POWER External Storage Support Matrix Summary
782 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
system are directly mapped to the IBM i server. VIOS does not see NPIV-connected LUNs;
instead, it is an FC pass-through.
The storage LUNs are presented to IBM i with their native device type of 2145 for
IBM Storage Virtualize based storage. NPIV attachment requires 8 Gb or later generation FC
adapter technology, and SAN switches that must be NPIV-enabled (see Figure A-8).
Observe the following rules for mapping IBM i server virtual FC (vFC) client adapters to the
physical FC ports in VIOS when implementing an NPIV connection:
Up to 64 vFC adapters can be mapped to the same physical FC adapter port in VIOS.
With VIOS 3.1 and later, this limit was increased to support mapping of up to 255 vFC
adapters to a 32-Gb physical FC adapter port.
Mapping of more than one NPIV client vFC adapter from the same IBM i system to a VIOS
physical FC adapter port is supported since IBM i 7.2 TR7 and i 7.3 TR3. However, when
PowerVM partition mobility is used, only a single vFC adapter can be mapped from the
same IBM i system to a VIOS physical FC adapter port.
The same port can be used in VIOS for NPIV mapping and connecting with VIOS virtual
Small Computer System Interface (VSCSI).
If PowerHA solutions with IBM i IASPs are implemented, different vFC adapters must be
used for attaching the IASP LUNs, and an adapter is not shared between SYSBAS and
IASP LUNs.
A maximum of 127 LUNs (up to 127 active + 127 passive paths) can be configured to a vFC
adapter with IBM i 7.2 TR7 or later, and with IBM i 7.3 TR3 or later.
IBM i enables SCSI command tag queuing for LUNs from a VIOS NPIV that is connected to
IBM Storage Virtualize storage. The IBM i queue depth per LUN and path with this type of
connection is 16.
NPIV acceleration
VIOS 3.1.2 or later strengthened FC NPIV to provide multiqueue support. This enhanced
performance, including more throughput, reduced latency, and higher input/output operations
per second (IOPS), spreads the I/O workload across multiple work queues.
Note: NPIV acceleration is supported by IBM i 7.2 or later, and by IBM POWER9 firmware
940 or later.
When deciding on an PowerVM VIOS storage attachment for IBM i, NPIV attachment is often
preferred over VSCSI attachment for the following reasons:
With VSCSI, an emulation of generic SCSI devices is performed by VIOS for its client
partitions, such as IBM i, which requires extra processing and adds a small delay to I/O
response times.
VSCSI provides much lower scalability in terms of the maximum supported LUNs per
virtual adapter than NPIV. It also requires more storage management, such as multipath
configuration and customization at the VIOS layer, which adds complexity.
Because of the VSCSI emulation unique device characteristics of the storage device, such
as device type (or in the case of tape devices, media type) and other device attributes are
no longer presented to the IBM i client.
VSCSI attachment is not supported for PowerHA LUN-level switching technology, which is
required for IASP HyperSwap solutions with IBM Storage Virtualize.
Similar considerations for NPIV apply regarding the usage of IBM i multipathing across two or
more VIOS to improve resiliency and performance. However, because with VSCSI
multipathing also is implemented at the VIOS layer, the following considerations apply:
784 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
IBM i multipathing is performed with two or more VSCSI client adapters, each of them
assigned to a VSCSI server adapter in a different VIOS. With VSCSI, volumes (LUNs)
from the IBM Storage Virtualize system are not mapped directly to an IBM i host, but to the
two or more VIOS servers. These LUNs, which are detected as hard disk drives (HDDs)
on each VIOS, must be mapped as a virtual target device to the relevant VSCSI server
adapters to be used by the IBM i client.
In addition to IBM i multipathing across multiple VIOS servers, with VSCSI, multipathing is
implemented at the VIOS server layer to provide further I/O parallelism and resiliency by
using multiple physical FC adapters and SAN fabric paths from each VIOS server to its
storage.
The IBM recommended multipath driver for IBM Storage Virtualize based storage running
microcode 7.6.1 or later is the VIOS built-in AIX Path Control Module (AIXPCM) multipath
driver, which replaces the previously recommended Subsystem Device Driver Path
Control Module (SDDPCM) multipath driver.
For more information, see this IBM Support web page.
Up to 4095 LUNs can be connected per target, and up to 510 targets per port in a physical
adapter in VIOS. With IBM i 7.2 and later, a maximum of 32 disk LUNs can be attached to a
VSCSI adapter in IBM i.
With IBM i releases before 7.2, a maximum of 16 disk LUNs can be attached to a VSCSI
adapter in IBM i. The LUNs are reported in IBM i as generic SCSI disk units of type 6B22.
IBM i enables SCSI command tag queuing in the LUNs from a VIOS VSCSI adapter that is
connected to IBM Storage Virtualize storage. A LUN with this type of connection features a
queue depth of 32.
FC adapter attributes
With a VIOS VSCSI connection or NPIV connection, use the VIOS chdev command to specify
the following attributes for each SCSI I/O Controller Protocol Device (fscsi) device that
connects an IBM Storage Virtualize storage LUN to IBM i:
The attribute fc_err_recov should be set to fast_fail.
The attribute dyntrk should be set to yes.
The specified values for the two attributes specify how the VIOS FC adapter driver or VIOS
disk driver handle specific types of fabric-related failures and dynamic configuration changes.
Without setting these values for the two attributes, the way these events are handled is
different, which causes unnecessary retries or manual actions.
Note: These attributes also are set to the recommended values when applying the default
rules set that is available with VIOS 2.2.4.x or later.
Important: While working with SCSI and NPIV, you cannot use both for the paths to the
same LUN. However, VIOS supports NPIV and SCSI concomitantly, that is, some LUNs
can be attached to the virtual worldwide port names (WWPNs) of the NPIV FC adapter. At
the same time, the VIOS also can provide access to LUNs that are mapped to virtual target
devices and exported as VSCSI devices.
One or more VIOSs can provide the pass-through function for NPIV. Also, one or more
VIOSs can host VSCSI storage. Therefore, the physical Host Bus Adapter (HBA) in the
VIOS supports NPIV and VSCSI traffic.
A best practice is to use the IBM Workload Estimator tool to estimate the needed VIOS
resources. However, as a starting point in context of CPU and memory for VIOS, see this IBM
Support web page.
786 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Disk drives for IBM i
This section describes how to implement internal disk drives in IBM Storage Virtualize
storage or externally virtualized back-end storage for an IBM i host. These suggestions are
based on the characteristics of a typical IBM i workload, such as a relatively high write ratio, a
relatively high-access density, and a small degree of I/O skew because of the spreading of
data by IBM i storage management.
Considering these characteristics and typical IBM i customer expectations for low I/O
response times, we expect that many SAN storage configurations for IBM i will be based on
an all-flash storage configuration.
If for less demanding workloads or for commercial reasons a multitier storage configuration
that uses enterprise class (tier0_flash) and high-capacity (tier1_flash) flash drives or even
enterprise HDDs (tier2_HDD) is preferred, ensure that a sufficiently large part of disk capacity
is on flash drives. As a rule, for a multitier configuration with the typically low IBM i I/O skew, at
least 20% of IBM i capacity should be based on the higher tier flash storage technology.
Even if specific parts of IBM i capacity are on flash drives, it is important that you provide
enough HDDs with high rotation speed for a hybrid configuration with flash drives and HDDs.
Preferably, use 15 K RPM HDDs of 300 GB or 600 GB capacity, along with flash technology.
IBM i transaction workload often achieves the best performance when disk capacity is used
entirely from enterprise class flash (tier0_flash) storage.
The usage of a multitier storage configuration by IBM Storage Virtualize storage is achieved
by using Easy Tier. For more information, see Implementing the IBM FlashSystem with IBM
Spectrum Virtualize Version 8.4.2, SG24-8506.
Even if you do not plan to use a multitier storage configuration or currently have no multitier
storage configuration that is installed, you can still use Easy Tier for intra-tier rebalancing. You
also can evaluate your workload with its I/O skew, which provides information about the
benefit that you might gain by adding flash technology in the future.
Compression considerations
If compression is wanted, the preferred choice for using compression at the IBM Storage
Virtualize storage system layer for a performance-critical IBM i workload is by using
IBM FlashCore Module (FCM) hardware compression technology at the disk drive level within
IBM Storage Virtualize standard pools or data reduction pools (DRPs) with fully allocated
volumes. These configuration options do not affect performance compared to other
compression technologies, such as DRP compressed volumes or IBM Real-time
Compression (RtC) at the storage subsystem level.
If you plan to use deduplication for archival or test purposes, deduplication might be a
viable solution for saving huge amounts of storage. If the deduplication solution is planned
for a production or development environment, we recommend that you test it thoroughly
before committing.
When modeling Easy Tier, specify the lowest skew level for IBM i workload or import an I/O
skew curve from available Easy Tier reports. The steps that are taken for sizing and modeling
IBM i are shown in Figure A-9.
Figure A-9 Sizing and modeling for IBM i by using Disk Magic
788 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
The modeling helps to ensure an adequate solution sizing by providing predictions for the
modeled IBM Storage Virtualize storage resource usage, the predicted disk response time for
IBM i, and the usage and response times of workload growth.
Note: Contact your IBM representative or IBM Business Partner to discuss a performance
modeling and sizing for a planned IBM Storage Virtualize storage solution for IBM i.
Initially, IBM i unmap support that was implemented by using the SCSI Write Same command
was introduced with IBM i 7.2 TR8 and IBM i 7.3 TR4 for LUN initialization only, that is, for the
Add Disk Units to ASP function.
With IBM i 7.3 TR9 and IBM i 7.3 TR5, runtime support was added, which also supports
synchronous unmap for scenarios, such as object deletion and journal clearance. The
runtime unmap algorithm was further enhanced supported by IBM i 7.3 TR7 and IBM i 7.4
TR1, which implements an asynchronous periodic free-space cleaning.
IBM Storage Virtualize 8.1.1 and later storage systems can use the unmap function to
efficiently deallocate space, such as for volume deletion, on their back-end storage by
sending SCSI unmap commands to specific supported internal solid-state drives (SSDs) and
FCMs, and selected virtualized external flash storage.
Space reclamation that is triggered by host unmap commands is supported by IBM Storage
Virtualize 8.1.2 and later for DRP thin-provisioned volumes, which can increase the free
capacity in the storage pool so that it becomes available for use by other volumes in the pool.
For more information about IBM Storage Virtualize storage SCSI unmap support, see 4.4,
“Data reduction pools best practices” on page 268, and this IBM Support web page.
Although IBM i supports a usable, large-size LUN of up to 2 TB (1 byte for IBM Storage
Virtualize storage), the usage of only a few large-size LUNs for IBM i is not recommended for
performance reasons.
In general, the more LUNs that are available to IBM i, the better the performance for the
following reasons:
If more LUNs are attached to IBM i, storage management uses more threads and enables
better performance.
More LUNs provide a higher I/O concurrency, which reduces the likelihood of I/O queuing
and the wait time component of the disk response time, which results in lower latency of
disk I/O operations.
The sizing process helps to determine a reasonable number of LUNs that are required to
access the needed capacity while meeting performance objectives. Regarding both these
aspects and best practices, we suggest the following guidelines:
For any IBM i disk pool (ASP), define all the LUNs as the same size.
40 GB is the preferred minimum LUN size.
You should not define LUNs larger than about 200 GB.
Note: This rule is not fixed because it is important that enough LUNs are configured,
with which this guideline helps. Selecting a larger LUN size should not lead to
configurations, such as storage migrations, with fewer LUNs being configured, with
possibly detrimental effects on performance.
A minimum of eight LUNs for each ASP is preferred for small IBM i partitions, a couple of
dozen LUNs for medium partitions, and up to a few hundreds for large partitions.
When defining LUNs for IBM i, consider the following required minimum capacities for the
load source (boot disk) LUN:
With IBM i 7.1, the minimum capacity is 20 GB.
With IBM i 7.2 before TR1, the minimum capacity is 80 GB in IBM i.
With IBM i 7.2 TR1 and later, the minimum capacity is 40 GB in IBM i.
IBM Storage Virtualize dynamic volume expansion is supported for IBM i with IBM i 7.3 TR4
and later. An IBM i initial program load (IPL) is required to use the extra volume capacity.
Tip: For more information about cross-referencing IBM i disks units with IBM Storage
Virtualize LUNs by using NPIV, see this IBM Support web page.
Table A-2 Limits increased for maximum disk arms and LUN sizes
System limits IBM i 7.2 IBM i 7.3 IBM i 7.4 IBM i 7.5
790 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Note: For more information about these limits and other limits, see this IBM
Documentation web page.
Data layout
Spreading workloads across all IBM Storage Virtualize storage components maximizes the
usage of the hardware resources in the storage subsystem. I/O activity must be balanced
between the two nodes or controllers of the IBM Storage Virtualize storage system I/O group,
which often is addressed by the alternating preferred node volume assignments at LUN
creation.
However, performance problems might arise when sharing resources because of resource
contention, especially with incorrect sizing or unanticipated workload increases.
Apart from the usage of Easy Tier on IBM Storage Virtualize for managing a multitier storage
pool, an option is available to create a separate storage pool for different storage tiers on IBM
Storage Virtualize storage and create different IBM i ASPs for each tier. IBM i applications
that have their data in an ASP of a higher storage tier experience a performance boost
compared to the ones that use an ASP with a lower storage tier.
IBM i internal data relocation methods, such as the ASP balancer hierarchical storage
management function and IBM Db2 media preference, are not available to use with
IBM Storage Virtualize flash storage.
If multiple IBM i partitions connect through the same FC port in VIOS, consider the maximum
rate of the port at 70% utilization and the sum of I/O rates and data rates of all connected
LPARs.
For sizing, you might consider the throughput that is listed in Table A-3, which shows the
throughput of a port in a specific adapter at 70% utilization.
Make sure to plan for the usage of separate FC adapters for IBM i disk and tape attachment.
This separation is recommended because of the required IBM i virtual input/output processor
(IOP) reset for tape configuration changes and for workload performance isolation.
IBM i Host
FC1 FC2
SAN SAN
Fabric A Fabric B
P1 P2 P1 P2
Node 1 Node 2
Figure A-10 SAN switch zoning for IBM i with IBM Storage Virtualize storage
792 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
For VIOS VSCSI attachment, zone one physical port in VIOS with one or more available FC
ports from each of both node canisters of the IBM Storage Virtualize storage I/O group. SVC
or Storwize ports that are zoned with one VIOS port should be evenly spread between both
node canisters. A maximum of eight host paths is supported from VIOS to IBM Storage
Virtualize storage.
IBM i multipath
Multipath provides greater resiliency for SAN-attached storage, and it can improve
performance as well. IBM i supports up to eight active paths and up to eight passive paths to
each LUN. In addition to availability considerations, lab performance testing shows that two or
more paths provide performance improvements when compared to a single path.
Typically, two active paths to a LUN are a good balance of price and performance. The
scenario that is shown in Figure A-10 on page 792 results in two active and two passive paths
to each LUN for IBM i. However, you can implement more than two active paths for workloads
where high I/O rates are expected to the LUNs (a high I/O access density is expected).
It is important to understand that IBM i multipathing for a LUN is achieved by connecting the
LUN to two or more FC ports that belong to different adapters in an IBM i partition. Adding
more than one FC port from the same IBM Storage Virtualize storage node canister to a SAN
switch zone with an IBM i FC initiator port does not provide more active paths because an
IBM i FC initiator port, by design, logs in to only one target port of a node.
With IBM i native attachment, the ports for multipath must be from different physical FC
adapters in IBM i. With VIOS NPIV, the vFC adapters for multipath must be assigned to
different VIOSs for redundancy. However, if more than two active paths are used, you can use
two VIOSs and split the paths among them. With VIOS VSCSI attachment, the VSCSI
adapters for IBM i multipath must be assigned to different VIOSs.
IBM Storage Virtualize storage uses a redundant dual active controller design that
implements SCSI Asymmetric Logical Unit Access (ALUA). Some of the paths to a LUN are
presented to the host as optimized and others as non-optimized.
With an ALUA-aware host such as IBM i, the I/O traffic to and from a specific LUN normally
goes through only the optimized paths, which often are associated with a specific LUN of a
preferred node. The non-optimized paths, which often are associated with the non-preferred
node, are not actively used.
In an IBM Storage Virtualize storage topology, such as HyperSwap or IBM SAN Volume
Controller Enhanced Stretched Cluster (ESC) that implements host site awareness, the
optimized paths are not necessarily associated with a preferred node of a LUN but with the
node of the I/O group that includes the same site attributes as the host.
If the node with the optimized paths fails, the other node of the I/O group takes over the I/O
processing. With IBM i multipath, all the optimized paths to a LUN are reported as active on
IBM i, while the non-optimized paths are reported as passive. IBM i multipath employs its
load-balancing among the active paths to a LUN and starts to use the passive paths if all the
active paths failed.
Apart from a required minimum size, the load source LUN does not include any special
requirements. The FC or SCSI I/O adapter for the load source must be tagged (that is,
specified) by the user in the IBM i partition profile on the IBM Power Hardware Management
Console (HMC). When installing the IBM SLIC with disk capacity on IBM Storage Virtualize
storage, the installation prompts you to select one of the available LUNs for the load source.
IBM i mirroring
Some customers prefer to use IBM i mirroring functions for resiliency. For example, they use
IBM i mirroring between two IBM Storage Virtualize storage systems, each connected with
one VIOS.
When setting up IBM i mirroring with VIOS-connected IBM Storage Virtualize storage,
complete the following steps to add the LUNs to the mirrored ASP:
1. Add the LUNs from two virtual adapters, with each adapter connecting one to-be mirrored
half of the LUNs.
2. After mirroring is started for those LUNs, add the LUNs from another two new virtual
adapters, each adapter connecting one to-be mirrored half, and so on. This way, you
ensure that IBM i mirroring is started between the two IBM Storage Virtualize storage
systems and not among the LUNs from the same storage system.
Remote replication
The IBM Storage Virtualize family supports Metro Mirror (MM) synchronous remote
replication and Global Mirror (GM) asynchronous remote replication.
Two options are available for GM: Standard GM, and Global Mirror with Change Volumes
(GMCV), which allows for a flexible and configurable recovery point objective (RPO) that
allows data replication to be maintained during peak periods of bandwidth constraints, and
data consistency at the remote site to be maintained and during resynchronization.
Regarding the usage of IBM Storage Virtualize Copy Services functions, the IBM i single-level
storage architecture requires that the disk storage of an IBM i system is treated as a single
entity, that is, the scope of copying or replicating an IBM i disk space must include SYSBAS
(referred to as full system replication) or an IASP (referred to IASP replication).
794 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Full system replication is used for disaster recovery (DR) purposes where an IBM i standby
server is used at the DR site, as shown in Figure A-11.
IBM
IBM Storage Virtualize
Spectrum Virtualize IBMIBM
Storage Virtualize
Spectrum Virtualize
production storage
production storage Disaster Recovery storage
DR storage
Metro or
SYSBAS Global Mirror SYSBAS’
Master volumes Auxiliary volumes
(primary role) (secondary role)
Figure A-11 IBM i full system replication with IBM Storage Virtualize
When a planned or unplanned outage occurs for the IBM i production server, the IBM i
standby server can be started (undergo an IPL) from the replicated SYSBAS volumes, and
then on IBM Storage Virtualize, they take on the primary role to become accessible to the IBM
i standby host.
IASP-based replication for IBM i is used for a high availability (HA) solution where an IBM i
production and an IBM i backup node are configured in an IBM i cluster and the IASP that is
replicated by IBM Storage Virtualize remote replication is switchable between the two cluster
nodes, as shown in Figure A-12.
IBM i Cluster
Device Domain &
Recovery Domain
IBMSpectrum
Storage Virtualize IBM IBM
Storage Virtualize
Spectrum Virtualize
IBM Virtualize
production DR/HA storage
productionstorage
storage DR/HA storage
Production Metro or Backup
SYSBAS IASP Global Mirror IASP’ SYSBAS
In this scenario, the IBM i production system and the IBM i backup system each have their
own non-replicated SYSBAS volumes and only the IASP volumes are replicated. This
solution requires IBM PowerHA SystemMirror® for i Enterprise Edition (5770-HAS *BASE
and option 1) to manage the IBM i cluster node switch and failovers and the IBM Storage
Virtualize storage remote replication switching.
For more information about IBM i HA solutions with IBM Storage Virtualize Copy Services,
see PowerHA SystemMirror for IBM i Cookbook, SG24-7994.
The sizing of the required replication link bandwidth for MM or GM must be based on the peak
write data rate of the IBM i workload to avoid affecting production performance. For more
information, see 6.5.3, “Remote copy network planning” on page 415.
For environments that use remote replication, a minimum of two FC ports is suggested on
each IBM Storage Virtualize storage node that is used for remote mirroring. The remaining
ports on the node should not have any visibility to any other IBM Storage Virtualize cluster.
Following these zoning guidelines helps you avoid configuration-related performance issues.
FlashCopy
When planning for FlashCopy with IBM i, make sure that enough disk drives are available to
the FlashCopy target LUNs to maintain a good performance of the IBM i production workload
while FlashCopy relationships are active. This guideline is valid for FlashCopy with
background copy and without background copy.
When FlashCopy is used with thin-provisioned target LUNs, make sure that sufficient capacity
is available in the storage pool to be dynamically allocated when needed for the copy-on-write
(CoW) operations. The required thin target LUN capacity depends on the amount of write
operations to the source and target LUNs, the locality of the writes, and the duration of the
FlashCopy relationship.
FlashCopy cold
The following considerations apply to FlashCopy cold:
All memory is flushed to disk.
The source IASP must be varied off before performing a FlashCopy.
This method is the only method to ensure that all writes are sent out to disk and included.
FlashCopy warm
The following considerations apply to FlashCopy warm:
No memory is flushed to disk.
Writes in memory are excluded from the FlashCopy target.
Zero disruption to IBM i source system.
FlashCopy quiesced
IBM i provides a quiesce function that can suspend database transactions and database and
Integrated File System (IFS) file change operations for the system and configured basic ASPs
or IASPs.
796 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
IBM Lab Services PowerHA Tools for IBM i
PowerHA Tools for IBM i are a set of tools that are developed by IBM Lab Services that focus
on high availability and disaster recovery (HADR) and offline backup solutions by using
external storage and PowerHA for IBM i.
These tools extend the functions that are included in the base PowerHA code, which
contributes to the automation and usability of this kind of solution.
They are an IBM Lab Services asset, and they are delivered with the corresponding
implementation, support, and training services.
Some of these tools are enabled to work with IBM Storage Virtualize products.
These tools are useful both for the implementation and administration of the solution and for
the automation of switchover tasks.
The solution has a control LPAR (IBM i) that communicates with the storage units, HMCs, and
IBM i LPARs (primary and secondary nodes).
The control LPAR is used to monitor, manage, and switch replication by using the commands
and menus that are provided by this toolkit.
For redundancy reasons, a best practice is to have one dedicated control LPAR per site. The
synchronization between these LPARs is done through the PowerHA clustering functions.
To ensure that unnecessary switchover is never performed, for example, in situations when
other corrective actions are more appropriate, automatic switchovers are not allowed, and the
switchover task must be initiated manually by running a toolkit command.
Figure A-13 IBM Lab Services PowerHA Tools for IBM i: Full System Replication Manager
Note: For more information about the Full System Replication toolkit, see PowerHA Tools
for IBM i - Full System Replication.
This tool provides access to the HMCs to manage IBM i partitions, the IBM i function to pause
database activity on a commit boundary, and the IBM Storage Virtualize FlashCopy feature to
create a copy of the production LPAR from which backups such as full system saves can be
taken in a restricted state, all while minimizing the impact on users. The users experience a
pause in database activity that is generally less than 30 seconds.
FSFC Manager for IBM i copies the whole system ASP so that you can implement an
FlashCopy based offline backup solution for an IBM i system, which avoids the need to
migrate the customer environment to an IASP environment.
The PowerHA FSFC for IBM i Manager operation has the following steps:
1. On the production LPAR, the database operations are paused on a transaction boundary,
and the information in main memory is flushed to disks.
2. The FlashCopy relationships are started in the external storage unit.
3. Database activity is resumed on the production LPAR.
4. An IPL is performed in the backup LPAR, and then the configured backups are started.
5. If necessary, the Backup Recovery and Media Services (BRMS) information is updated in
the production partition by transferring it from the backup partition.
798 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Figure A-14 shows the Full System FlashCopy Manager.
Figure A-14 IBM Lab Services PowerHA Tools for IBM i: Full System FlashCopy
Note: For more information about the FSFC toolkit, see PowerHA Tools for IBM i - Full
System FlashCopy.
IASP Manager
PowerHA Tools IASP Manager is a product that is designed for IBM i customers that use
PowerHA, IASP, and external storage solutions.
Its main objective is to enhance the automation, monitoring, management, testing, and
customization capabilities of these kinds of environments, which complements the PowerHA
functions with command-line interface (CLI) commands and automated scripts.
With IBM Storage Virtualize, the PowerHA Tools IASP Manager-FlashCopy is available, and it
provides functions to assist with the automation and management of FlashCopy through a set
of commands you can use to create a point in time copy of an IASP.
In IBM Storage Virtualize, IASP Manager is available the PowerHA Tools IASP
Manager-FlashCopy toolkit, which is responsible for completely automating the entire
FlashCopy process by running the following tasks:
Vary off the production IASP or quiesce its database activity.
Start the FlashCopy relationships on the external storage box.
Vary on the production IASP or resume its database activity.
Connect or disconnect the host connections for the FlashCopy target LUNs.
Vary on the IASP on FlashCopy node.
Integrate a customized backup program on the FlashCopy target.
Note: For more information about the IASP Manager-FlashCopy toolkit, see PowerHA
Tools for IBM i - IASP Manager.
Many of these tools were created in response to customers' automation requirements. The
IBM Lab Services team continues enhancing existing tools and adding new ones regularly.
These tools are useful for many implementations of solutions that are based on PowerHA.
When specifically referring to implementations of solutions for IBM i with IBM Storage
Virtualize, we can highlight the following:
IBM i Independent ASP (IASP) migration and management
PowerHA-managed Geographic Mirroring
PowerHA-managed IBM Storage Virtualize based FlashCopy
PowerHA Tools for IBM i IASP Manager managed implementations
Some of the benefits that Smart Assist can provide through its various commands are as
follows:
Extra functions to make the setup and installation phases of the environment easier.
Programming interfaces to help monitor the environment.
Command utilities to help with th daily administration of the environment:
– IASP Management
– PowerHA Cluster Management
– Admin Domain Management
– PowerHA Environments
– SVC Based Management
– IASP Manager Environments
Note: For more information about Smart Assist for PowerHA on IBM i, see Smart Assist for
PowerHA on IBM i.
HyperSwap
IBM Storage Virtualize HyperSwap as an active-active remote replication solution is
supported for IBM i full system replication with IBM i 7.2 TR3 or later. It is supported for native
and VIOS NPIV attachment.
HyperSwap for IBM i IASP replication is supported by IBM i 7.2 TR5 or later and IBM i 7.3
TR1 or later. With this solution, you must install IBM PowerHA SystemMirror for i Standard
Edition (5770-HAS *BASE and option 2), which enables LUN level switching to site 2. It is
supported for native and VIOS NPIV attachment.
800 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
HyperSwap relies on the SCSI ALUA-aware IBM i host multipath driver to manage the paths
to the local and remote IBM Storage Virtualize storage systems, which are logically
configured as a single clustered system.
From a SAN switch zoning perspective, HyperSwap requires that the IBM i host is zoned with
both IBM Storage Virtualize nodes of the I/O group on each site. For a balanced
configuration, the SAN switches from a dual-fabric configuration must be used evenly.
Figure A-15 shows an example of the SAN fabric connections for IBM i HyperSwap with VIOS
NPIV attachment. This configuration example results in four active paths and 12 passive
paths that are presented on IBM i for each HyperSwap LUN.
IBM i
Virtual FC 40 42 41 43
VIOS1 VIOS2
Switch Switch
Fabric A
Switch Switch
Fabric B
Site 1 Site 2
Next, we briefly describe some HA scenarios that use HyperSwap for IBM i.
After the outage of an I/O group at site 1 occurs, the I/O rate automatically transfers to the
IBM Storage Virtualize nodes at site 2. The IBM i workload keeps running, and no relevant
messages exist in the IBM i message queues.
When the outage completes, the IBM i I/O rate automatically transfers to nodes on site 1. The
IBM i workload keeps running without interruption.
After the outage of site 1 completes, we power down IBM i at site 2, unmap the IBM i LUNs
from the host at site 2, and then map the LUNs to the host at site 1. We perform an IPL of
IBM i at site 1 and resume the workload. The I/O rate is transferred to the IBM Storage
Virtualize storage nodes at site 1.
For more information about the PowerHA for IBM i setup, see IBM PowerHA SystemMirror for
i: Preparation (Volume 1 of 4), SG24-8400.
In this scenario, ensure that all IBM i LUNs (not only the IASP LUNs) are HyperSwap
volumes.
If a disaster occurs at site 1, PowerHA automatically switches the IASP to the system at site
2, and the workload can be resumed at site 2.
After the failure at site 1 is fixed, use PowerHA to switch the IASP back to site 1 and resume
the workload at this site.
In this scenario, we combine LPM with HyperSwap to transfer the workload onto site 2 during
a planned outage of site 1. This combination requires VIOS NPIV attachment and all IBM i
LUNs configured as HyperSwap LUNs.
For more information about LPM and its requirements, see IBM PowerVM Virtualization
Introduction and Configuration, SG24-7940.
To use LPM, you must define the IBM i host in IBM Storage Virtualize with the WWPNs of the
second port of the vFC adapters. As a best practice, create a separate host object definition
for the secondary ports to specify site 2 for this host object. Then, enable the I/O rate to be
transferred to the nodes at site 2 after migrating the IBM i partition with LPM.
After the outage is complete, you can use LPM again to transfer the IBM i partition back to
site 1. After the migration, the I/O rate automatically moves to the nodes at site 1.
Important: LPM now supports multiple client vFC adapter ports that are mapped to a
single physical FC port. Each client vFC must be mapped to a separate physical port in
advance, whether LPM with FC NPIV is used. That restriction was removed for the use of
VIOS 3.1.2.10 or later and IBM i 7.2 or later. Therefore, the same physical port can be
double-mapped to the same IBM i client partition. This configuration allows for better
adapter usage.
802 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
SAN Volume Controller stretched cluster
SVC is a hardware and software storage solution that implements IBM Storage Virtualize.
SVC appliances map physical volumes in the storage device to virtualized volumes, which
makes them visible to host systems (for example, IBM i). SVC also provides Copy Services
functions that can be used to improve availability and support DR, including MM, GM, and
FlashCopy.
Therefore, the IBM PowerHA SystemMirror for IBM i interface is compatible with SVC. After
the basic SVC environment is configured, PowerHA can create a copy session with the
volumes.
The usage of PowerHA with SVC management creates an automated HADR solution with
minimal extra configurations. PowerHA and SVC interfaces are compatible with hardware that
is running IBM Storage Virtualize and IBM Storwize series.
A scenario that uses full system replication with IBM Storage Virtualize in SVC presents full
system replication by using volume mirroring is shown in Figure A-16.
Site 3
Figure A-16 Full system replication that uses SAN Volume Controller volume mirroring
The scenario that is shown in Figure A-16 shows an IBM i production system at site 1, a
prepared IBM i backup system at site 2 that is powered off, and a third site that is the active
quorum.
Two nodes of SVC are in a stretched cluster topology that is called a split-cluster.
Simultaneous IBM i access to both copies must be prevented, that is, the IBM i backup
system must be powered off when the IBM i production system is active, and vice versa.
In our example, after a failure of site 1 (including a failure of the IBM i production system and
the storage at site 1), the IBM i LUNs are still available because of the two data copies (the
second at site 2).
An abnormal IPL is done at the IBM i backup systems. Later, the IPL ends, and we can
resume the workload at site 2.
After the outage of site 1 is finished, we power down the IBM i backup system at site 2, and
the resynchronization between both copies is incremental and started by the SVC
automatically. Volume mirroring is below the cache and copy services. Then, we restart the
workload of IBM i production at site 1.
Note: For this example, HA testing and configuration changes are more challenging than
with remote copy. For example, manual assignment is needed for the preferred node to
enable local reads. Therefore, ESC, which was introduced with SVC 7.2, adds a site
awareness feature (reads always locally) and DR capability if simultaneous site and active
quorum failures occur.
LUN-level switching
This solution uses a single copy of an IASP group that can be switched between two IBM i
systems. Likewise, LUN-level switching is supported for NPIV attachment and native
attachment for storage that is based on IBM Storage Virtualize or IBM Storwize series.
LUN-level switching also is supported for an SVC. This solution engages in heterogeneous
environments where an SVC stretched cluster is used as the basis for a cross-platform,
two-site HA solution.
804 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
LUN-level switching can be used alone or together with MM or GM in a 3-node cluster. In a
3-node cluster, LUN-level switching provides local HA. Whether the whole local site goes
down, MM or GM provides a DR option to a remote site.
Note: LUN-level switching plus MM or GM (IASP) in the 3-site solutions is not available
for IBM Storage Virtualize at the time of writing.
If you want to add LUN-level switching to MM or GM, you do not need to create the cluster
or IASP, or change the cluster administrative domain. You must create the IASP device
description on the backup system from the LUN switching perspective.
The LUN-level switching IBM PowerHA SystemMirror for IBM i editions are listed in Table A-4.
MM No No Yes
GM No No Yes
Site 1 Site 2
IBM
IBM Storage
Storage Virtualize
Virtualize IBM Storage Virtualize
production storage DR/HA storage
Production Backup
SYSBAS IASP Active Quorum
IASP’ SYSBAS
Site 3
Legend: Dotted lines Denote FC paths required for Stretched
Cluster configuration
Figure A-17 IBM PowerHA System Mirror for i LUN-level switching with SAN Volume Controller stretched cluster
The scenario that uses IBM PowerHA System Mirror for i LUN-level switching with an SVC
stretched cluster provides the following benefits over IBM i full system replication with an SVC
stretched cluster:
High degree of automation for planned and unplanned site switches and failovers
Shorter recovery times by using IASP
Reduced mirroring bandwidth requirements by using IASP (Temporary writes in SYSBAS,
such as for index builds, are not mirrored.)
As shown in Figure A-17, availability is achieved through the inherent active architecture of
SVC with volume mirroring.
During a failure, the SVC nodes and associated mirror copy of the data remain online and
available to service all host I/O. The two data copies are placed in different managed disk
(MDisk) groups or IBM Storage Virtualize storage systems. The resynchronization between
both copies of IASP is incremental. Mirrored volumes feature the same functions and
behavior as a standard volume.
806 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Note: Because HyperSwap was introduced with IBM Storage Virtualize 7.5 on Storwize
and SVC, the scenario that uses the topology of HyperSwap with SVC also is valid for
IBM i.
Even with HyperSwap, we can use consistency groups (CGs) that are enabled by using the
IBM i multipath driver, but not in stretched cluster scenarios. A remote mirroring license is
required for the usage of HyperSwap with IBM i.
For more information about limits and restrictions for SVC, see this IBM Support web page.
When one of the systems in the IBM DB2® Mirror configuration is not available, Db2 Mirror
tracks all update, change, and delete operations to the database table and all other
mirror-eligible objects. When the pair is reconnected, all changes are synchronized between
the systems. This process includes databases that are in an IASP or as part of the base
system storage.
Db2 Mirror is compatible with IASPs and uses IASPs for IFS support within the Db2 Mirror
configuration. For non-IFS objects, IASPs can be used, but are not required.
Also, Db2 Mirror supports applications that use traditional record-level access or SQL-based
database access. Support for IFS and IFS journals is accomplished through deployment into
an IASP, which can be configured as a switchable LUN, or in a mirrored pair of IASPs through
storage replication.
This solution requires a POWER8 processor-based server or later and IBM i 7.4 or higher. For
more information about software requirements for Db2 Mirror, see this IBM Documentation
web page.
DR can be achieved by using various options, such as the IBM PowerHA SystemMirror for i
Enterprise Edition, full system replication, or logical replication.
Important: Db2 Mirror local continuous availability can be combined with HADR replication
technologies. Consider the following points:
Remote replication for DR can be implemented by storage-based replication, that is, by
using IBM Storage Virtualize Copy Services software.
Any IFS IASP must remain switchable between both local Db2 Mirror nodes by
choosing a DR topology that is supported by IBM PowerHA SystemMirror for IBM i.
Any DB IASP is available on both local nodes (no switch between local nodes).
A DB IASP is not required for local Db2 Mirror database replication, but might be
preferred for implementing a remote replication solution with shorter recovery times
compared to SYSBAS replication.
For a complete business continuity solution at the DR site, a remote Db2 Mirror node
pair can be configured for a 4-node Db2 Mirror PowerHA cluster configuration. IFS
IASPs and DB IASPs must be registered with the remote Db2 Mirror pair (by using the
SHADOW option for the DB IASP to maintain its Db2 Mirror configuration data, such as
default inclusion state and Remote Code Load (RCL)).
For more information, see IBM Db2 Mirror for i Getting Started, REDP-5575.
808 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Setup copy mode
For more information about these nodes, the setup process, and configuration, see
this IBM Documentation web page.
Db2 Mirror is initially configured on a single partition that is called the setup source node.
During the setup and configuration process, the setup source node is cloned to create the
second node of the Db2 Mirror pair, which is called the setup copy node. The setup copy node
is configured and initialized automatically by Db2 Mirror during its first IPL.
The Db2 Mirror configuration and setup process supports external and internal storage.
External storage systems are used during the cloning process, and IBM storage systems are
recommended rather than non-IBM external storage because the cloning process is
automated, that is, Db2 Mirror automates the cloning for the IBM Storage Virtualize family.
The cloning technologies that are used for IBM storage systems are FlashCopy (cold and
warm) and remote copy. FlashCopy is used when both Db2 Mirror nodes connect to the same
IBM Storage Virtualize storage system. Cold cloning requires that the setup source node is
shut down during the cloning portion of the setup process. A warm clone allows the setup
source node to remain active during the entire Db2 Mirror setup and configuration process.
Remote copy is used when the Db2 Mirror nodes are connected to different IBM Storage
Virtualize storage systems. However, a manual copy also is available. For more information,
see this IBM Documentation web page.
Note: Volume mirroring that is supported in IBM FlashSystem 9200 and SVC is a valid
cloning method for Db2 Mirror for the manual copy category. It is not automated like when
you use FlashCopy, MM, or GM.
Note: For more information about creating an SSH key pair, see this IBM Documentation
web page. After an SSH key pair is created, attach the SSH public key to the
IBM Storage Virtualize storage system. The corresponding private key file must be
uploaded to the managing node so that it can be used during the Db2 Mirror setup
process.
Whether you plan to perform the remote copy during a planned outage window, you must
ensure that your bandwidth between storage systems is sufficient to complete the remote
copy during that period. The Db2 Mirror cloning process cannot pause the cloning and then
resume it later. Therefore, you must plan for enough time for the remote copy to complete.
Important: For IBM Storage Virtualize, the Copy Services partnership between storage
systems must be manually created before Db2 Mirror is configured.
Several options are described in this section as examples with IBM Storage Virtualize storage
systems. A specific implementation depends on your business resilience requirement.
Note: The Db2 Mirror configuration and setup process supports SVC topologies, such as
ESC and HyperSwap.
By using one storage system, you can take advantage of FlashCopy to set up your
configuration rapidly. This solution might be considered for a DR strategy to provide storage
resiliency.
810 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
As shown in Figure A-18, two IBM Power servers are used (at least one RoCE adapter per
server). However, you can reduce the cost decreased resiliency of this scenario by
implementing Db2 Mirror across two IBM i LPARs on the same IBM Power server. For this
example, a SYSBAS is cloned, and IASP also can be added by using another set of volumes.
Production Production
SYSBAS SYSBAS’
’
IBM Storage Virtualize
I storage
Figure A-18 Db2 Mirror environment with one IBM Storage Virtualize storage system
Db2 Mirror environment with two IBM Storage Virtualize storage systems
The usage of two IBM Storage Virtualize storage systems provides further redundancy by
helping to ensure that the active node remains running and available during a storage outage.
In this example, two IBM Power servers and IBM Storage Virtualize storage systems are
used. Also, remote copy is used to set up Db2 Mirror.
As shown in Figure A-19, the set of volumes for SYSBAS and the set of volumes for IASP are
replicated. GM also can be used.
Production Production
SYSBAS IASP SYSBAS’ IASP’
IBM Storage Virtualize IBM Storage Virtualize
storage storage
Figure A-19 Db2 Mirror environment with two IBM Storage Virtualize storage systems
The communication between the continuous availability at site 1 and the DR at site 2 can be
achieved by using technologies such as IBM PowerHA SystemMirror for use with MM or GM
with IASPs, full system replication, and logical replication from a third-party vendor.
A topology with multiple IBM Storage Virtualize storage systems and multiple IBM Power
servers is shown in Figure A-20.
Full system replication is fully supported. If you are not using IASP, this type of replication can
be done for IBM i at the IBM Storage Virtualize storage level.
At site 1, an active side exists because of full system replication. However, at site 2, the IBM i
systems are powered off, and the replication is active across sites.
Two copies are at a DR location because if one side fails, the other side must continue
replicating. If only three nodes are replicating, you cannot predict which side fails and does
not have a valid copy of the storage data to switch.
812 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
Draft Document for Review September 20, 2023 12:39 pm 8543bibl.fm
Related publications
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this book.
IBM Redbooks
The following IBM Redbooks publications provide additional information about the topic in this
document. Note that some publications referenced in this list might be available in softcopy
only.
Implementation Guide for IBM Storage FlashSystem and IBM SAN Volume Controller:
Updated for IBM Storage Virtualize Version 8.6, SG24-8542
IBM Storage Virtualize and VMware: Integrations, Implementation and Best Practices,
SG24-8549
Policy-Based Replication with IBM Storage FlashSystem, IBM SAN Volume Controller and
IBM Storage Virtualize, REDP-5704
IBM Spectrum Virtualize 3-Site Replication, SG24-8504
Introduction and Implementation of Data Reduction Pools and Deduplication, SG24-8430
IBM Storage as a Service (STaaS) Offering Guide, REDP-5644
IBM FlashSystem Safeguarded Copy Implementation Guide, REDP-5654
Automate and Orchestrate Your IBM FlashSystem Hybrid Cloud with Red Hat Ansible,
REDP-5598
IBM Spectrum Virtualize and SAN Volume Controller Enhanced Stretched Cluster with
VMware, SG24-8211
IBM Storwize V7000, Spectrum Virtualize, HyperSwap, and VMware Implementation,
SG24-8317
IBM Storage Virtualize, IBM Storage FlashSystem, and IBM SAN Volume Controller
Security Feature Checklist, REDP-5717
IBM Spectrum Virtualize HyperSwap SAN Implementation and Design Best Practices,
REDP-5597
IBM SAN Volume Controller Stretched Cluster with PowerVM and PowerHA, SG24-8142
Implementing IBM FlashSystem 900, SG24-8271
You can search for, view, download or order these documents and other Redbooks,
Redpapers, Web Docs, draft and additional materials, at the following website:
ibm.com/redbooks
Online resources
These websites are also relevant as further information sources:
IBM Redbooks Storage videos
https://www.redbooks.ibm.com/feature/storagevideos
814 Performance and Best Practices Guide for IBM Storage FlashSystem and IBM SAN Volume Controller
To determine the spine width of a book, you divide the paper PPI into the number of pages in the book. An example is a 250 page book using Plainfield opaque 50# smooth which has a PPI of 526. Divided
250 by 526 which equals a spine width of .4752". In this case, you would use the .5” spine. Now select the Spine width for the book and hide the others: Special>Conditional
Text>Show/Hide>SpineSize(-->Hide:)>Set . Move the changed Conditional text settings to all files in your book by opening the book file with the spine.fm still open and File>Import>Formats the
Conditional Text Settings (ONLY!) to the book files.
Draft Document for Review September 20, 2023 12:39 pm 8543spine.fm 815
Performance and Best Practices SG24-8543-00
Guide for IBM Storage ISBN DocISBN
(1.5” spine)
1.5”<-> 1.998”
789 <->1051 pages
Performance and Best Practices SG24-8543-00
Guide for IBM Storage ISBN DocISBN
(1.0” spine)
0.875”<->1.498”
460 <-> 788 pages
SG24-8543-00
Performance and Best Practices Guide for IBM Storage ISBN DocISBN
(0.5” spine)
0.475”<->0.873”
250 <-> 459 pages
Performance and Best Practices Guide for IBM Storage FlashSystem and
(0.2”spine)
0.17”<->0.473”
90<->249 pages
(0.1”spine)
0.1”<->0.169”
53<->89 pages
To determine the spine width of a book, you divide the paper PPI into the number of pages in the book. An example is a 250 page book using Plainfield opaque 50# smooth which has a PPI of 526. Divided
250 by 526 which equals a spine width of .4752". In this case, you would use the .5” spine. Now select the Spine width for the book and hide the others: Special>Conditional
Text>Show/Hide>SpineSize(-->Hide:)>Set . Move the changed Conditional text settings to all files in your book by opening the book file with the spine.fm still open and File>Import>Formats the
Conditional Text Settings (ONLY!) to the book files.
Draft Document for Review September 20, 2023 12:39 pm 8543spine.fm 816
Performance and Best SG24-8543-00
Practices Guide for IBM ISBN DocISBN
(2.5” spine)
2.5”<->nnn.n”
1315<-> nnnn pages
Performance and Best Practices SG24-8543-00
Guide for IBM Storage ISBN DocISBN
FlashSystem and IBM SAN
(2.0” spine)
2.0” <-> 2.498”
1052 <-> 1314 pages
Back cover
Draft Document for Review September 26, 2023 12:40 pm
SG24-8543-00
ISBN DocISBN
Printed in U.S.A.
®
ibm.com/redbooks