0% found this document useful (0 votes)
30 views

Previewpdf

Uploaded by

Janani Selvaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Previewpdf

Uploaded by

Janani Selvaraj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Data

Infrastructure
Management
Insights and Strategies
Data
Infrastructure
Management
Insights and Strategies

Greg Schulz

AN AUERBACH BOOK
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2019 by Greg Schulz


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper

International Standard Book Number-13: 978-1-138-48642-3 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to
publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials
or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material repro-
duced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any
form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming,
and recording, or in any information storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copy-
right.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400.
CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have
been granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identifica-
tion and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
Contents

Preface ix
Who Should Read This Book x
How This Book Is Organized x
Acknowledgments xi
About the Author xiii

Part One
Applications, Data, and IT Data Infrastructures 1

Chapter 1 IT Data Infrastructure Fundamentals 3


1.1 Getting Started 4
1.2 What’s the Buzz in and around IT Data Infrastructures? 7
1.2.1 Data Infrastructures—How Server Storage I/O Resources
Are Used 11
1.2.2 Why Data Infrastructures Are Important (Demand Drivers) 16
1.2.3 Data Value 17
1.3 Data Infrastructures Past, Present, and Future 18
1.3.1 Where Are We Today? (Balancing Legacy with Emerging) 18
1.3.2 Where Are We Going? (Planning, Lessons Learned) 20
1.4 Data Infrastructure Management Tradecraft 22
1.5 Data Infrastructure Terminology (Context Matters) 23
1.6 Common Questions and Tips 25
1.7 Strategies 26
1.8 Chapter Summary 27

Chapter 2 Application and IT Environments 29


2.1 Getting Started 29
2.1.1 Context for the Chapter 30

v
vi Data Infrastructure Management: Insights and Strategies

2.2 Everything Is Not the Same with Data Infrastructures 30


2.2.1 Various Types of Environments (Big and Small) 31
2.2.2 Gaining Data and Application Insight 32
2.2.3 Various Types of Applications 33
2.2.4 Various Types of Data 38
2.3 Common Applications Characteristics 49
2.3.1 Performance and Activity (How Resources Get Used) 51
2.3.2 Availability (Accessibility, Durability, Consistency) 53
2.3.3 Capacity and Space (What Gets Consumed and Occupied) 54
2.3.4 Economics (People, Budgets, Energy, and Other Constraints) 55
2.4 Where Applications and Data Get Processed and Reside 56
2.5 Application Data and Data Infrastructure Strategies 57
2.6 Common Questions and Tips 57
2.7 Chapter Summary 61

Part Two
Data Infrastructure Management 63

Chapter 3 Data Infrastructure Management 65


What You Will Learn in This Chapter 65
3.1 Getting Started 65
3.2 Data Infrastructure Management and Tools 66
3.3 Data Infrastructure Habitats and Facilities 69
3.4 Data Infrastructure Management 70
3.5 Troubleshooting, Problem Solving, Remediation, and Repairs 75
3.6 Common Questions and Tips 85
3.7 Chapter Summary 86

Chapter 4 Data Infrastructure Availability, Data Protection, Security,


and Strategy 87
What You Will Learn in This Chapter 87
4.1 Getting Started 87
4.2 Data Protection Fundamentals 88
4.3 Availability, Data Protection, and Security 89
4.4 Availability (Resiliency and Data Protection) Services 90
4.5 Revisiting 4 3 2 1—The Golden Rule of Data Protection 91
4.6 Availability, FTT, and FTM Fundamentals 94
4.7 Common Availability Characteristics and Functionalities 95
4.8 Enabling Availability, Resiliency, Accessibility, and RTO 98
4.9 Reliability, Availability, and Serviceability (RAS) 100
4.10 Enabling RPO (Archive, Backup, CDP, Snapshots, Versions) 103
Contents vii

4.11 Point-in-Time Protection for Different Points of Interest 107


4.12 Snapshots, Consistency, and Checkpoints 109
4.13 Data Infrastructure Security (Logical and Physical) 111
4.13.1 Data Infrastructure Security Implementation 112
4.13.2 Physical Security and Facilities 113
4.13.3 Logical and Software-Defined Security 113
4.13.4 Encryption Codes and Key Management 115
4.13.5 Identity Access Management and Control 116
4.13.6 General Data Infrastructure Security–Related Topics 117
4.14 Common Questions and Tips 118
4.15 Chapter Summary 122

Chapter 5 Data Infrastructure Metrics and Management 123


What You Will Learn in This Chapter 123
5.1 Getting Started 123
5.2 Avoid Flying Blind—Having Situational Awareness 124
5.2.1 Metrics That Matter (and Where to Get Them) 126
5.3 Data Infrastructure Decision Making 132
5.3.1 Comparing Data Infrastructure Components and Services 135
5.3.2 Analysis, Benchmark, Comparison, Simulation, and Tests 137
5.4 Data Infrastructure, Strategy, and Design Considerations 142
5.5 Common Questions and Tips 144
5.6 Chapter Summary 145

Part Three
Enabling Data Infrastructures 147

Chapter 6 Data Infrastructure Deployment Considerations: Part I 149


What You Will Learn in This Chapter 149
6.1 Getting Started 149
6.2 Applications, Tips, and Learning Experiences 150
6.3 Software-Defined, Virtual, Containers, and Clouds 151
6.3.1 Clouds: Public, Private, and Hybrid 152
6.3.2 Public Cloud Services 156
6.3.3 Private and Hybrid Cloud Solutions 159
6.4 Docker, Containers, and Microservices 162
6.5 Workspace and Virtual Desktop Infrastructure 167
6.6 Data Infrastructure Migration 170
6.7 Common Questions and Tips 171
6.8 Chapter Summary 172
viii Data Infrastructure Management: Insights and Strategies

Chapter 7 Data Infrastructure Deployment Considerations: Part II 173


What You Will Learn in This Chapter 173
7.1 Getting Started 173
7.2 Microsoft Azure, Hyper-V, Windows, and Other Tools 173
7.3 VMware vSphere, vSAN, NSX, and Cloud Foundation 177
7.4 Data Databases: Little Data SQL and NoSQL 181
7.5 Big Data, Data Ponds, Pools, and Bulk-Content Data Stores 190
7.6 Legacy vs. Converged vs. Hyper-Converged vs. Cloud
and Containers 195
7.7 Common Questions and Tips 198
7.8 Chapter Summary 199

Chapter 8 Data Infrastructure Futures, Wrap-up, and Summary 201


What You Will Learn in This Chapter 201
8.1 Getting Started on the Wrap-up 201
8.2 People, Process, and Best Practices 203
8.2.1 Skills Development 204
8.3 Emerging Topics, Trends, and Predictions 205
8.4 Chapter and Book Summary 209

Appendix A: Companion Websites and Where to Learn More 211

Glossary 213

Index 235
Preface

This book follows from my previous books, Software-Defined Data Infrastructure Essentials:
Cloud, Converged, and Virtual Fundamental Server Storage I/O Tradecraft * (aka “The Blue
Book”), Resilient Storage Networks: Designing Flexible Scalable Data Infrastructures† (aka “The
Red Book”), The Green and Virtual Data Center ‡ (aka “The Green Book”), and Cloud and
Virtual Data Storage Networking § (aka “The Yellow, or Gold, Book”).
Data Infrastructure Management: Insights and Strategies provides fundamental coverage of
physical, cloud, converged, and virtual server storage I/O networking technologies, trends,
tools, techniques, and tradecraft skills. Software-defined data centers (SDDC), software data
infrastructures (SDI), software-defined data infrastructures (SDDI, and traditional data infra-
structures support business applications including components such as a server, storage, I/O
networking, hardware, software, services, and best practices, among other management tools.
Spanning cloud, virtual, container, converged (and hyper-converged), as well as legacy and
hybrid systems, data infrastructures exist to protect, preserve, and serve data and information.
Although there are plenty of new things, sometimes those new things get used in old ways,
and sometimes old things can get used in new ways. As you have probably heard before, the
one thing that is constant is change, yet something else that occurs is that as things or technol-
ogies change, they get used or remain the same. A not-so-bold prophecy would be to say that
next year will see even more new things, not to mention old things being used in new ways.
For example, many technology changes or enhancements have occurred from the time I
started writing this book until its completion. There will be more from the time this goes to the
publisher for production, then until its release and you read it in print or electronically. That

* Schulz, G. (2017). Software-Defined Data Infrastructure Essentials: Cloud, Converged, and Virtual
Fundamental Server Storage I/O Tradecraft. 1st Edition. Taylor & Francis Group/CRC Press.
† Schulz, G. (2004). Resilient Storage Networks: Designing Flexible Scalable Data Infrastructures. 1st Edition.

Elsevier/Digital Press.
‡ Schulz, G. (2009). The Green and Virtual Data Center. 1st Edition. Taylor & Francis Group/CRC

Press.
§ Schulz, G. (2011). Cloud and Virtual Data Storage Networking. 1st Edition. Taylor & Francis Group/

CRC Press.

ix
x Data Infrastructure Management: Insights and Strategies

is where my companion website, www.storageio.com, along with my blog, www.storageioblog.


com, and Twitter @StorageIO come into play. There you can further expand your tradecraft,
seeing what’s current, new, and emerging, along with related companion content to this book.

Who Should Read This Book


Data Infrastructure Management is for people who are currently involved with or looking to
expand their knowledge and tradecraft skills (experience) of data infrastructures, along with
associated topics. Software-defined data centers (SDDC), software data infrastructures (SDI),
software-defined data infrastructures (SDDI), and traditional data infrastructures are made
up of software, hardware, services, and best practices and tools spanning servers, I/O network-
ing, and storage from physical to software-defined virtual, container, and clouds. The role of
data infrastructures is to enable and support information technology (IT) and organizational
information applications.
If you are looking to expand your knowledge into an adjacent area or to understand what’s
“under the hood,” from converged, hyper-converged, to traditional data infrastructures topics,
this book is for you. For experienced storage, server, and networking professionals, this book
connects the dots and provides coverage of virtualization, cloud, and other convergence themes
and topics.
This book is also for those who are new or need to learn more about data infrastructure,
server, storage, I/O networking, hardware, software, and services. Another audience for this
book is experienced IT professionals who are now responsible for or working with data infra-
structure components, technologies, tools, and techniques.

How This Book Is Organized


There are three parts in addition to the front and back matter (including Appendix A and a
robust Glossary). The front matter consists of Acknowledgments and About the Author sec-
tions; a Preface, including Who Should Read This Book and How This Book Is Organized;
and a Table of Contents. The back matter indicates where to learn more, along with my com-
panion sites (www.storageio.com, www.storageioblog.com, and @StorageIO). The back matter
also includes the Index.
In each chapter, you will learn—as part of developing and expanding (or refreshing) your
data infrastructures tradecraft—hardware, software, services, and technique skills. There are
various tables, figures, screenshots, and command examples, along with who’s doing what. You
will also find tradecraft tips, context matters, and tools for your toolbox, along with common
questions as well as learning experiences.
Feel free to jump around as you need to. While the book is laid out in a sequential hierarchy
“stack and layer” fashion, it is also designed for random jumping around. This enables you to
adapt the book’s content to your needs and preferences, which may be lots of small, quick reads
or longer, sustained, deep reading.
Acknowledgments

Thanks and appreciation to all of the vendors, vars, service providers, press and media, free-
lance writers as well as reporters, investors and venture capitalists, bloggers, and consultants,
as well as fellow Microsoft MVPs and VMware vExperts. Also, thanks to all Twitter tweeps
and IT professionals around the world that I have been fortunate enough to talk with while
putting this book together.
I would also like to thank all of my support network as well as others who were directly or
indirectly involved with this project.
Special thanks to Tom Becchetti and Greg Brunton. Thanks to John Wyzalek, my pub-
lisher, along with everyone else at CRC/Taylor & Francis/Auerbach, as well as a big thank you
to Theron Shreve at DerryField Publishing Services and his associates, Marje Pollack, Susan
Culligan, and Lynne Lackenbach, for working their magic.
Finally, thanks to my wife Karen (www.karenofarcola.com) for having the patience to sup-
port me and take care of “Little Buddy Bjorn” while I worked on this project.
To all of the above and, to you the reader, thank you very much.

xi
About the Author

Greg Schulz is Founder and Senior Analyst of the independent IT advisory and consultancy
firm Server StorageIO (www.storageio.com). He has worked in IT at an electrical utility and at
financial services and transportation firms in roles ranging from business applications develop-
ment to systems management and architecture planning.
Greg is the author of the SNIA Emerald Endorsed reading book Software-Defined Data
Infrastructure Essentials (CRC Press, 2017), as well as the Intel Recommended Reading List
books Cloud and Virtual Data Storage Networking (CRC Press, 2011) and The Green and Virtual
Data Center (CRC Press, 2009) as well as Resilient Storage Networks (Elsevier, 2004), among
other works. He is a multi-year VMware vExpert as well as a Microsoft MVP (Cloud Data
Center Management) and has been an advisor to various organizations including CompTIA
Storage+ among others.
In addition to holding frequent webinars, on-line, and live in-person speaking events and
publishing articles and other content, Greg is regularly quoted and interviewed as one of the
most sought-after independent IT advisors providing perspectives, commentary, and opinion on
industry activity. Greg is also a licensed FAA commercial drone pilot who generates GBytes of
big data every few minutes of flight time. You can view some of his video and other digital works
at www.picturesoverstillwater.com.
Greg has a B.A. in computer science and a M.Sc. in software engineering from the University
of St. Thomas. You can find him on Twitter @StorageIO; his blog is at www.storageioblog.com,
and his main website is www.storageio.com.

xiii
Part One
Applications, Data, and
IT Data Infrastructures

Part One comprises Chapters 1 and 2, and provides an overview of the book as well as key
concepts including industry trends, different environments, and applications that rely on IT
data infrastructures. Data infrastructures are what are inside data centers including software
defined data centers (SDDC), legacy, software defined virtual, cloud, serverless, hybrid among
others.

Buzzword terms, trends, technologies, and techniques include applications and landscapes,
AI, analytics, automation, big data, blockchain, bitcoin, backup, cloud, converged, container,
landscapes, IoT, storage class memory (SCM), persistent memory (PMEM), data protection,
performance, availability, capacity, and economics (PACE), serverless, server storage I/O net-
working, software-defined, structured and unstructured, among others.
Chapter 1
IT Data Infrastructure
Fundamentals

What You Will Learn in This Chapter


• How/why everything is not the same in most IT environments
• What are data infrastructures and their fundamental components
• IT data infrastructure terminology
• IT industry trends and server storage I/O demand drivers
• How to articulate the role and importance of data infrastructures
• Data infrastructure management, strategy, and insight

This opening chapter kicks off our discussion of IT data infrastructure management, strategy,
and insights. Key themes, buzzwords, and trends addressed in this chapter include server stor-
age I/O resources, application and data demand drivers, fundamental needs and uses of data
infrastructures, and associated technologies.
Our conversation is going to span hardware, software, services, tools, techniques, cloud,
serverless, container, SDDI, SDDC, SDC, and industry trends along with associated applica-
tions across different environments. Depending on when you read this, some of the things
discussed will be more mature and mainstream, while others will be nearing or have reached
the end of their usefulness.

Key themes in this book include:

• Data is finding new value, meaning that it is living longer.


• Everything is not the same across different data centers, though there are similarities.
• Hardware needs software; software needs hardware.
• Role and importance of data infrastructures.
• Servers and storage get defined and consumed in various ways.

3
4 Data Infrastructure Management: Insights and Strategies

• Clouds, public, private, virtual private and hybrid servers, services and storage.
• Context matters regarding server, storage I/O, and associated topics.
• The best server, storage, I/O technology depends on what are your needs.

1.1. Getting Started


Data infrastructure fundamentals include hardware systems and components as well as soft-
ware that, along with management tools, are core building blocks for converged and noncon-
verged environments.
When I am asked to sum up, or describe IT data infrastructures in one paragraph, it is
this: Data infrastructures are defined by combining hardware, software, as well as services into
a platform to protect, preserve, secure, and serve information via applications and their data.
Likewise, servers, including so-called serverless, need memory and storage to store data. Data
storage is accessed via I/O networks by servers whose applications manage and process data.
The fundamental role of a computer server is to process data into information; it does this by
running algorithms (programs or applications) that must be able to have data to process. The
sum of those parts is the data infrastructure (legacy or software defined) enabled by hardware,
software, and other services.
Although some people may tell you that they are dead, data centers are very much alive,
particularly in clouds, where they are called availability zones (AZs) among other terms, as
shown in Figure 1.1. Some context is that data center is used as a generic term to refer to the
physical facility as well as what’s inside of it and, in some cases, even the applications that are
deployed there.

Figure 1.1 Users, Application Landscapes, Data Infrastructures, and Data Centers
IT Data Infrastructure Fundamentals 5

When you hear the term software-defined data center (SDDC), software-defined infrastruc-
ture (SDI), along with software-defined data infrastructure (SDDI), among other variations,
these are referring to what’s inside the data center or AZ. Data infrastructure resources, includ-
ing hardware, software, and service resources, are defined to support applications, along with
their workload needs, to provide information services to users, clients, and customers of those
capabilities (across the top of Figure 1.1).
Figure 1.1 shows how data infrastructures are what is inside physical data centers support-
ing various application landscapes to provide information services to users, customers, con-
sumers, or devices. Data centers—also known as habitats for technology—are where the servers,
storage, I/O network hardware, along with power, cooling, and other equipment exist, all of
which are defined by software to provide platforms for various applications.
Application landscapes are the various workloads that are deployed upon and use data
infrastructure resources. Different application landscapes range from traditional operational
and transaction-oriented database systems to email and messaging, business intelligence (BI),
and analytics, along with enterprise resource planning (ERP), such as those from SAP, among
others.
Other application landscapes include development environments, workspace, and virtual
desktop infrastructures (VDI), as well as big data, batch, and real-time analytics; augmented
reality (AR), virtual reality (VR), gaming, and other immersive workloads. Some different
landscapes include artificial intelligence (AI), machine learning (ML), deep learning (DL),
and cognitive, along with the Internet of Things (IoT) and Internet of Devices, among others.
Keep in mind that all applications have some software code along with data and data
structures that define how information gets created, transformed, and stored. Some data infra-
structure decisions include where applications code and data get deployed, as well as what
granularity of resources is needed.
For example, does a given application landscape, such as SAP, need a cluster of servers
deployed as bare metal (BM) or Metal as a Service (MaaS), or as software-defined virtual
machines (VM), docker (or windows) containers managed by Kubernetes (K8s), mesos,
Openshift, or OpenStack, among others? Perhaps the application can be deployed into a
serverless or Function as a Service (FaaS) deployment model, among others.
Data infrastructure management insight and strategy topics include decision making,
along with making sure that applications, along with their data, configuration settings, and
resources, are protected, secured, preserved, and served as needed.
Data infrastructure decision making includes whether to do it yourself (DiY), buy, rent, or
subscribe to a license from the software vendor, as well as from a services and hardware per-
spective. Decision making also includes what resources are needed, as well as where to place
different workloads to meet their service-level objective (SLOs) and other service criteria.
Additional data infrastructure management tasks include maintaining insight, situational
awareness, proactive monitoring, problem resolution and remediation, ongoing maintenance,
policy management, data protection including backup, business continuance (BC), business
resiliency (BR), high availability (HA), disaster recovery (DR), and security, among others
discussed later.
Likewise, the fundamental role of data storage (“storage”) is to provide persistent memory
for servers to place data to be protected, preserved, and served. Connectivity for moving data
between servers and storage, from servers to servers, or from storage to storage is handled via
6 Data Infrastructure Management: Insights and Strategies

Figure 1.2 IT data infrastructure fundamentals: the “big picture.”

I/O networks (internal and external). There are different types of servers, storage, and I/O
networks for various environments, functionalities, as well as application or budget needs.
Figure 1.2 shows a very simplistic, scaled-down view of data infrastructure resources,
including servers, storage, and I/O components supporting applications being accessed via
tablets, mobile devices, laptops, virtual desktop infrastructure (VDI), workstations, IoT/IoD
Supervisory Control and Data Acquisition (SCADA), sensors, computer numerical control
(CNC), cameras, autonomous vehicles, robotics, and other servers. Also shown in Figure 1.2
is storage (internal or external, dedicated and shared) being protected by some other storage
system (or service).
Keeping in mind that Figure 1.2 is the data infrastructure resouces “big picture” (e.g.,
what’s inside data centers) and a simple one at that means we could scale it down even further
to a laptop or tablet, or, in the opposite direction, to a large web-scale or cloud environment of
tens of thousands of servers, storage, and I/O components, hardware, and software.
A fundamental theme is that servers process data using various applications programs to
create information; I/O networks provide connectivity to access servers and storage; storage is
where data gets stored, protected, preserved, and served from; and all of this needs to be man-
aged. There are also many technologies involved, including hardware, software, and services as
well as various techniques that make up a server, storage, and I/O enabled data infrastructure.

Data infrastructure fundamental focus areas include:

• Organizations—Markets and industry focus, organizational size


• Applications—What’s using, creating, and resulting in server storage I/O demands
• Technologies—Tools and hard products (hardware, software, services, packaging)
• Tradecraft—Techniques, skills, best practices, how managed, decision making
• Management—Configuration, monitoring, reporting, troubleshooting, performance,
availability, data protection and security, access, and analytics, capacity planning, auto-
mation, resource lifecycle, policy definition along with orchestration

Applications are what transform data into information. Figure 1.3 shows how applications,
which are software defined by people and software, consist of algorithms, policies, procedures,
IT Data Infrastructure Fundamentals 7

Figure 1.3 How data infrastructure resources transform data into information.

and rules that are put into some code to tell the server processor (central processing unit
[CPU]) what to do. Note that CPUs can include one or more cores per socket, with one or
more sockets per server. Also note that there can be one or more threads of execution per core.
There are also physical CPUs as well as virtual CPUs (vCPUs) provided via software-defined
virtual and cloud environments, as well as logical processors (LPs), which refer to core threads.
Application programs include data structures (not to be confused with infrastructures)
that define what data looks like and how to organize and access it using the “rules of the road”
(the algorithms). The program algorithms along with data structures are stored in memory,
together with some of the data being worked on (i.e., the active working set).
Persistent data is stored in some form of extended memory—storage devices such as non-
volatile memory (NVM), also known as storage class memory (SCM), as well as persistent
memory (PMEM). Other storage includes solid-state devices (SSD), hard disk drives (HDD),
or tape, among others, either locally (also known as on-prem, on-premises, or on-site) or
remotely. Also shown in Figure 1.3 are various devices that perform input/output (I/O) with
the applications and server, including mobile devices as well as other application servers. In
Chapter 2 we take a closer look at various applications, programs, and related topics.

1.2. What’s the Buzz in and around IT Data Infrastructures?


There is a lot going on, in and around data infrastructure server, storage, and I/O network-
ing connectivity from a hardware, software, and services perceptive. From consumer to small/
medium business (SMB), enterprise to web-scale and cloud-managed service providers, physi-
cal to virtual, spanning structured database (aka “little data”) to unstructured big data and
very big fast data, a lot is happening today. Some industry as well as customer data infrastruc-
ture buzz includes newer, faster physical servers with more cores, memory, and I/O bandwidth,
along with expansion capabilities. These new servers also have new storage and I/O network-
ing capabilities that are available in various packaging, from hyper-converged infrastructure
(HCI) aggregated to converged infrastructure (CI), cluster and cloud in a box (CiB), as well as
individual systems and components for legacy and software-defined deployments.
8 Data Infrastructure Management: Insights and Strategies

Serverless is another favorite buzz topic that some marketers use to define software that is
independent of hardware or that uses fewer servers. However, discussions center around data
infrastructures (cloud-based or otherwise) that remove the concept of using and managing
a server—merely push your code and it runs. Serverless, also known as Function as a Service
(FaaS), is also referred to as containers including Docker, Windows, and Kubernetes, among
other names.
Figure 1.3 takes a slightly closer look at the server storage I/O data infrastructure, revealing
different components and focus areas that will be expanded on throughout this book.

Some buzz and popular trending topics, themes, and technologies include, among others:

• NVM, NVM Express (NVMe), SCM, PMEM, including NAND flash SSD
• Software defined data centers (SDDC), networks (SDN), and storage (SDS)
• Converged infrastructure (CI), hyper-converged infrastructure (HCI)
• Serverless, Function as a Service (FaaS), blockchain, distributed ledgers
• Scale-out, scale-up, and scale-down resources, functional and solutions
• Virtualization, containers, cloud, and operating systems (OS)
• NVM express (NVMe), NVMe over Fabric (NVMeoF), GPU, PCIe, IP, and Gen-Z
accessible storage
• Block, file, object, and application program interface (API) accessed storage
• Data protection, business resiliency (BR), archiving, and disaster recovery (DR)

In Figure 1.4 there are several different focus topics that enable access to server and storage
resources and services from various client or portable devices. These include access via server
I/O networks, including:

• Direct attached dedicated and shared


• Local area networks (LAN)

Figure 1.4 Data infrastructure server, storage, and I/O network resources.
IT Data Infrastructure Fundamentals 9

• Storage area networks (SAN)


• Metropolitan area networks (MAN), and wide area networks (WAN)

Besides technologies, tools, and trends, another topic is where to place resources: on-prem,
off-site in a cloud, co-location (co-lo), or managed service provider, and at what granularity.
Granularities can include serverless, FaaS, SaaS, AaaS, PaaS, container, VM or cloud instance,
dedicated instance, physical machine, BM or MaaS, among others.

Also, keep in mind that hardware needs software and software needs hardware, including:

• Container management (Docker, Mesos, K8, Openshift, Windows)


• Operating systems (OS), Docker, Kubernetes, Linux, and Windows containers
• Hypervisors (Xen and KVM, Microsoft Hyper-V, VMware vSphere/ESXi)
• Data protection and management tools
• Monitoring, reporting, and analytics
• File systems, databases, and key-value repositories

All of these are critical to consider, along with other applications that all rely on some
underlying hardware (bare metal, virtual, or cloud abstracted). There are various types of
servers, storage, and I/O networking hardware as well as software that have various focus areas
or specialties, which we will go deeper into in later chapters.
There is a lot of diversity across the different types, sizes, focus, and scope of organizations.
However, it’s not just about size or scale; it’s also about the business or organizational focus,
applications, needs (and wants), as well as other requirements and constraints, including a
budget. Even in a specific industry sector such as financial services, healthcare or life science,
media and entertainment, or energy, among others, there are similarities but also differences.
While everything that comes to server storage I/O is not the same, there are some commonali-
ties and similarities that appear widely.
Figure 1.5 shows examples of various types of environments where servers, storage, and I/O
have an impact on the consumer, from small office/home office (SOHO) to SMB (large and
small), workgroup, remote office/branch office (ROBO), or departmental, to small/medium
enterprise (SME) to large web-scale cloud and service providers across private, public, and
government sectors.
In Figure 1.5 across the top, going from left to right, are various categories of environments
also known as market segments, price bands, and focus areas. These environments span dif-
ferent industries or organizational focus from academic and education to healthcare and life
sciences, engineering and manufacturing to aerospace and security, from media and entertain-
ment to financial servers, among many others.
Also shown in Figure 1.5 are some common functions and roles that servers, storage, and
I/O resources are used for, including network routing and access and content distribution
networks (CDN). Other examples include supercomputing, high-performance computing
(HPC), and high-productivity computing. Some other applications and workloads shown in
Figure 1.5 include general file sharing and home directories, little data databases along with
big data analytics, email and messaging, among many others discussed further in Chapter 2.
Figure 1.5 also shows that the role of servers (processors [sockets, cores, threads, ASIC,
FPGA, GPU], memory, and I/O connectivity) combined with some storage can be deployed in
10 Data Infrastructure Management: Insights and Strategies

Figure 1.5 Different environments with applications using data infrastructure resources.

various ways from preconfigured engineered packaged solutions to build your own leveraging
open source and available (aka commodity or white box) hardware.
Besides different applications, industry, and market sectors concerned with server, storage,
and I/O topics, there are also various technology and industry trends. These include, among
others, analytics, application-aware, automation, cloud (public, private, community, virtual
private, and hybrid) CiB, HCI, CI, containers, and serverless.
Other data infrastructure and applications include data center infrastructure management
(DCIM), data lakes, data ponds, data pools and data streams, data protection and security,
HCI, insight and reporting, little data, big data, big fast data, very big data, management,
orchestration, policies, software defined, structured data and unstructured data, templates,
virtual server infrastructures (VSI), and VDI, among others.

Additional fundamental buzz and focus from a technology perspective include:

• Server-side memory and storage (storage in, adjacent to, or very close to the server)
• NVM including 3D NAND flash, 3D XPoint, and others such as phase change memory
(PCM), DRAM, and various emerging persistent and nonpersistent memory technologies
• SCM, which have the persistence of NVM storage and the performance as well as dura-
bility of traditional server DRAM
• I/O connectivity including IoT (Hubs, gateways, edge, and management interfaces),
PCIe, Gen-Z (emerging compute I/O interface), NVMe including NVMeoF, along with
its variations, SAS/SATA, InfiniBand, Converged Ethernet, RDMA over Converged
Ethernet (RoCE), block, file, object, and API-accessed storage
• Data analytics including batch and real-time analytics, Lambda architectures, Hadoop,
Splunk, SAS, Snowflake, AI/ML/DL along with other cogantive workloads, Horton-
works, Cloudera, and Pivotal, among others
• Databases and key-value repositories including blockchain distributed ledgers, SQL
(AWS RDS including Auroa, IBM DB2, Microsoft SQL Server and Azure Cosmos,
IT Data Infrastructure Fundamentals 11

MariaDB, MemSQL, MySQL, Oracle, PostgresSQL, ClearDB, TokuDB) and NoSQL


(Aerospike, Cassandra, CouchDB, Kafka, HBASE, MongoDB, Neo4j, Riak, Redis,
TokuDB) as well as big data or data warehouse (HDFS based, Pivotal Greenplum, SAP
HANA, and Teradata), among others

1.2.1. Data Infrastructures—How Server Storage I/O Resources Are Used

Depending on your role or focus, you may have a different view than somebody else of what is
infrastructure, or what an infrastructure is. Generally speaking, people tend to refer to infra-
structure as those things that support what they are doing at work, at home, or in other aspects
of their lives. For example, the roads and bridges that carry you over rivers or valleys when
traveling in a vehicle are referred to as infrastructure.
Similarly, the system of pipes, valves, meters, lifts, and pumps that bring fresh water to
you, and the sewer system that takes away waste water, are called infrastructure. The telecom-
munications network—both wired and wireless, such as cell phone networks—along with
electrical generating and transmission networks are considered infrastructure. Even the planes,
trains, boats, and buses that transport us locally or globally are considered part of the trans-
portation infrastructure. Anything that is below what you do, or that supports what you do,
is considered infrastructure.
This is also the situation with IT systems and services where, depending on where you sit or
use various services, anything below what you do may be considered infrastructure. However,
that also causes a context issue in that infrastructure can mean different things. For example in
Figure 1.6 the user, customer, client, or consumer who is accessing some service or application
may view IT in general as infrastructure, or perhaps as business infrastructure.
Those who develop, service, and support the business infrastructure and its users or clients
may view anything below them as infrastructure, from desktop to database, servers to storage,

Figure 1.6 IT Information and data infrastructure.


12 Data Infrastructure Management: Insights and Strategies

Figure 1.7 Data infrastructures: server storage I/O hardware and software.

network to security, data protection to physical facilities. Moving down a layer in Figure 1.6 is
the information infrastructure which, depending on your view, may also include servers, stor-
age, and I/O hardware and software.
For our discussion, to help make a point, let’s think of the information infrastructure as
the collection of databases, key-value stores, repositories, and applications along with develop-
ment tools that support the business infrastructure. This is where you may find developers who
maintain and create actual business applications for the business infrastructure. Those in the
information infrastructure usually refer to what’s below them as infrastructure. Meanwhile,
those lower in the stack shown in Figure 1.5 may refer to what’s above them as the customer,
user, or application, even if the actual user is up another layer or two.
Context matters in the discussion of infrastructure. Data infrastructures support the data-
bases and applications developers as well as things above, while existing above the physical
facilities infrastructure, leveraging power, cooling, and communication network infrastruc-
tures below.
Figure 1.7 shows a deeper look into the data infrastructure shown at a high level in
Figure 1.6. The lower left of Figure 1.7 shows the common-to-all-environments hardware,
software, people, processes, and practices that comprise tradecraft (experiences, skills, tech-
niques) and “valueware.” Valueware is how you define the hardware and software along with
any customization to create a resulting service that adds value to what you are doing or sup-
porting. Also shown in Figure 1.7 are common application and services attributes including
performance, availability, capacity, and economics (PACE), which vary with different applica-
tions or usage scenarios.

Common data infrastructure I/O fundamentals across organizations and environments


include:

• While everything is not the same, there are similarities.


• One size, technology, or approach does not apply to all scenarios.
IT Data Infrastructure Fundamentals 13

• Some things scale up, others scale down; some can’t scale up, or scale down.
• Data protection includes security, protection copies, and availability.
• The amount (velocity), as well as size (volume) of data continues to grow.

Figure 1.8 shows the fundamental pillars or building blocks for a data infrastructure,
including servers for computer processing, I/O networks for connectivity, and storage for stor-
ing data. These resources including both hardware and software as well as services and tools.
The size of the environment, organization, or application needs will determine how large or
small the data infrastructure is or can be.
For example, at one extreme you can have a single high-performance laptop with a hyper-
visor running OpenStack and various operating systems, along with their applications, leverag-
ing flash SSD and high-performance wired or wireless networks powering a home lab or test
environment.
Another example can be SDDC software such as VMware running on Amazon Web
Services (AWS) dedicated aka bare metal MaaS systems, or Microsoft Azure Stack with Azure
software running on an on-prem appliance. On the other hand, you can have a scenario with
tens of thousands (or more) servers, networking devices, and hundreds of petabytes (PBs), exa-
bytes (EB), or zettabytes (ZB) of storage (or more).
A reminder that a gigabyte is 1,000 megabytes (e.g., million bytes), a terabyte is 1,000 giga-
bytes, and a PB is 1,000 terabytes. Also, keep in mind that bytes are referred to in base 2 (e.g.,
1024) or decimal base 10 (e.g., 1000). Keep context as to whether 1024 or 1000 are referring to
a thousand, as well as bits, which are little b (e.g., Kb [thousand bits]) or bytes, which are big
B (e.g., KB [thousand bytes]).
In Figure 1.8 the primary data infrastructure components or pillar (server, storage, and
I/O) hardware and software resources are packaged and defined to meet various needs. Data
infrastructure storage management includes configuring the server, storage, and I/O hardware
and software as well as services for use, implementing data protection and security, provision-
ing, diagnostics, troubleshooting, performance analysis, and other activities. Server storage
and I/O hardware and software can be individual components, prepackaged as bundles or

Figure 1.8 Data infrastructure building blocks (hardware, software, services).


14 Data Infrastructure Management: Insights and Strategies

application suites and converged, among other options. Note that data infrastructure resources
can be deployed at the edge for traditional as well as emerging “fog” computing scenarios, as
wll as at data centers and clouds as core sites.

1.2.1.1. Compute Server and Memory

Fast applications need faster software (databases, file systems, repositories, operating systems,
hypervisors), servers (physical, virtual, cloud, container, serverless, converged, and hyper-
converged), and storage and I/O networks. Servers provide the compute or computational capa-
bilities to run application and other data infrastructure software. Data infrastructure software
includes lower-level drivers, operating systems, hypervisors, containers, storage, networking,
file systems, and databases, along with other management tools. Servers and their applica-
tions software manage and process data into information by leveraging local as well as remote
storage accessed via I/O networks. Server compute processing consists of one or more sockets
(which compute processor chips plug into). Processor chips include one or more cores that can
run one or more threads (workload code). In addition to the compute capabilities, there are
also memory management and access, as well as I/O interconnects. Server compute sockets and
support chips are arranged on a mother or main board that may also have optional daughter
or mezzanine boards for extra resources. Server compute resources also include offload proces-
sors such as graphical processing units (GPUs) that handle compute-intensive operations for
graphics, video, image processing, and AI/ML/DL analytics, among other workloads. Other
processing resources include custom application-specific integrated circuit (ASIC) and field
programmable gate arrays (FPGA).
Application and computer server software is installed, configured, and stored on storage.
That storage may be local or external dedicated, or shared. Servers leverage different tiers
of memory from local processor cache to primary main dynamic random access memory
(DRAM). Memory is storage, and storage is a persistent memory. Note that there are also
SCMs (storage class memories) that are also referred to as persistent memory (PMEM), meaning
they are not volatile as is DRAM. SCM and PMEM are packaged as DIMM (e.g., NVDIMM)
as well as PCIe add-in card (AiC) and drive form factors, among others.
Memory is used for holding both applications and data, along with operating systems,
hypervisors, device drivers, as well as cache buffers. Memory is staged or read and written to
data storage that is a form of persistent memory ranging from NVM NAND flash SSD to
HDD and magnetic tape, among others.
Server compute and memory, along with associated I/O network and storage, are packaged
as well as deployed in various ways, along with granularity. Some packaging and granularity
examples include standalone systems (rack or tower), scale out clusters, appliances, physical
and virtual machine (software defined), cloud instance (form of VM), containers, serverless,
composable, CI, HCI, and CiB, among others.

1.2.1.2. Data

Servers or other computers need storage, storage needs servers, and I/O networks tie the two
together. The I/O network may be an internal PCIe or memory bus, or an external Wi-Fi
IT Data Infrastructure Fundamentals 15

network for IP connection, or use some other interface and protocol. Data and storage may be
coupled directly to servers or accessed via a networked connection to meet different application
needs. Also, data may be dedicated (affinity) to a server or shared across servers, depending on
deployment or application requirements.
While data storage media are usually persistent or non-volatile, they can be configured
and used for ephemeral (temporary) data or for longer-term retention. For example, a policy
might specify that data stored in a certain class or type of storage does not get backed up, is
not replicated or have high-availability (HA) or other basic protection, and will be deleted or
purged periodically. Storage is used and consumed in various ways. Persistent storage includes
NVM such as flash and NVM, PMEM, and SCM-based SSD, magnetic disk, tape, and opti-
cal, among other forms. Data storage can be accessed as a service including tables (databases),
message queues, objects and blobs, files, and blocks using various protocols and interfaces.

1.2.1.3. Networking Connectivity

Servers network and access storage devices and systems via various I/O connectivity options
or data highways. Some of these are local or internal to the server, while others can be exter-
nal over a short distance (such as in the cabinet or rack), across the data center, or campus, or
metropolitan and wide area, spanning countries and continents. Once networks are set up,
they typically are used for moving or accessing devices and data with their configurations
stored in some form of storage, usually non-volatile memory or flash-based. Networks can
be wired using copper electrical cabling or fiber optic, as well as wireless using various radio
frequency (RF) and other technologies locally, or over long distances.

1.2.1.4. Data Infrastructure Resource Packaging

There are also various levels of abstraction, management, and access, such as via block, file,
object, or API. Shared data vs. data sharing can be internal dedicated, external dedicated,
external shared and networked. In addition to various ways of consuming, storage can also be
packaged in different ways such as legacy storage systems or appliances, or software combined
with hardware (“tin wrapped”).
Other packaging variations include virtual storage appliance (VSA), as a cloud instance or
service, as well as via “shrink wrap” or open-source software deployed on your servers. Servers
and storage hardware and software can also be bundled into containers (Docker, Windows,
Kubernetes, Openshift), CI, HCI, and CiB, similar to an all-in-one printer, fax, copier, and
scanner that provide converged functionality.

1.2.1.5. Data Infrastructure Management Tools

Various software tools (along with some physical hardware tools) are used for managing data
infrastructures along with their resources. Some of these tools are used for defining, coordi-
nating, orchestrating (or choreographing) various resources to the requirements of services and
applications that need them. Other tools are used for monitoring, reporting, gaining insight,
situational awareness of how services are being delivered, customer satisfaction, responsiveness,
16 Data Infrastructure Management: Insights and Strategies

availability, cost, and security. Other management tools are used for defining how, when,
and where data protection—including BC, BR, DR, HA, and backups—is done, along with
implementing security.

1.2.2. Why Data Infrastructures Are Important (Demand Drivers)

There is no information recession; more and more data being generated and processed that
needs to be protected, preserved, as well as served. With increasing volumes of data of various
sizes (big and little), if you simply do what you have been doing in the past, you better have a
big budget or go on a rapid data diet.
On the other hand, if you start using those new and old tools in your toolbox, from disk
to flash and even tape along with cloud, leveraging data footprint reduction (DFR) from the
application source to targets including archiving as well as deduplication, you can get ahead
of the challenge.
Figure 1.9 shows data being generated, moved, processed, stored, and accessed as informa-
tion from a variety of sources, in different locations. For example, video, audio, and still image
data are captured from various devices or sensors including IoT based, copied to local tablets
or workstations, and uploaded to servers for additional processing.
Besides primary data, additional telemetry, log, and metadata are also collected for
processing.
As an example, when I fly one of my drones that records video at 4K (4096 ´ 2160 or Ultra
HD 3840 ´ 2160 resolution), 60 frames per second (60fps), primary video data is about 1GB
every two minutes. This means a 22-minute flight produces about 11GB of video, plus any
lower-resolution thumbnails for preview, along with telemetry.
The telemetry data includes time, altitude, attitude, location coordinates, battery status,
camera and another sensor status, which for a 22-minute flight can be a couple of MBs. Other

Figure 1.9 Examples of data infrastructure demand drivers and influences.


IT Data Infrastructure Fundamentals 17

telemetry and metadata include event as well as error logs among additional information.
While the primary data (e.g., video) is a few large files that can be stored as objects or blobs,
the metadata can be many small files or stream logs.
Data is also captured from various devices in medical facilities such as doctors’ offices,
clinics, labs, and hospitals, and information on patient electronic medical records (EMR) is
accessed. Digital evidence management (DEM) systems provide similar functionalities, sup-
porting devices such as police body cameras, among other assets. Uploaded videos, images,
and photos are processed using AI, ML, DL, and other cognitive services in real time or batch.
The upload is indexed, classified, checked for copyright violations using waveform analysis or
other software, among other tasks, with metadata stored in a database or key-value repository.
The resulting content can then be accessed via other applications and various devices. These
are very simple examples that will be explored further in later chapters, along with associated
tools, technologies, and techniques to protect, preserve, and serve information.
Figure 1.9 shows many different applications and uses of data. Just as everything is not the
same across different environments or even applications, data is also not the same. There is
“little data” such as traditional databases, files, or objects in home directories or shares along
with fast “big data.” Some data is structured in databases, while other data is unstructured in
file directories.

1.2.3. Data Value

Data also may have different values at different times. Very often, context is needed: Data can
represent a value or a piece of information, but one must also consider the value or worth of
that data item to the organization. The basic three types of data value are no value, has value,
or has an unknown value.
Data that has unknown value may eventually have some value or no value. Likewise, data
that currently has some form of value may eventually have no value and be discarded, or may
be put into a pool or parking place for data with known value until its new value is determined
or the time has come to delete the data. In addition to the three basic types of data value, there
can be different varieties of data with a value, such as high-value or low-value.
Besides the value of data, access is also part of its life cycle, which means the data may be
initially stored and updated, but later becomes static, to be read but not modified. We will
spend more time in Chapter 2 looking at different types of applications in terms of data values,
life cycles, and access patterns.

General demand drivers of data value include:

• Creation—The Internet of Things (IoT) and the Internet of Devices (IoD), such as sen-
sors and other devices including camera, phones, tablets, drones, satellites, imagery,
telemetry, and log detail data, legacy creation, and machine-to-machine (M2M), along
with AI, ML, DL, and other cognitive, batch, and real-time analytics in support of digi-
tal transformation.
• Curation, Transformation—Transformation, analytics, and general processing, across
different sectors, industries, organizations, and focus areas. Also tagging and adding
metadata to data where applicable to add value and context.
18 Data Infrastructure Management: Insights and Strategies

• Consumption—Content distribution, cache, collaboration, downloads, various consumers.


• Preservation—With more data being stored for longer periods of time, combined with
other factors mentioned above, simply preserving data is in itself a large demand driver.

Additional demand drivers include:

• Increased number of devices creating and consuming data


• More volumes of data being created, processed, and stored, faster
• Telemetry and log data (call detail, transactions, cookies, and tracking)
• Larger data items (files, images, videos, audio)
• More copies of items being kept for longer periods of time
• Data finding new value now, or in the future
• Mobile and consumerized access of information services
• IoT-generated data including telemetry as well as rich content
• Virtual machines and virtual disks creating virtual data sprawl

In addition to the above drivers, there are also various industry- or organization-focused
applications including government, police and military, healthcare and life sciences, security
and surveillance video, media and entertainment, education, research, manufacturing, energy
exploration, transportation, finance, and many others.

1.3. Data Infrastructures Past, Present, and Future


There is always something new and emerging, including the next shiny new thing (SNT) or
shiny new technology. Likewise, there are new applications and ways of using the SNT as well
as legacy technologies and tools.
If you are like most people, your environment consists of a mix of legacy applications (and
technology) as well as new ones (e.g., “brownfield”); or perhaps everything is new, and you are
starting from scratch or “greenfield” (i.e., all new). On the other hand, you might also be in
a situation in which everything is old or legacy and is simply in a sustaining mode. Likewise,
your environment may span from onsite, also known as on-prem (on premises or at the premise
customer site), as well as cloud, managed service, or co-lo based.
In addition to new technology, some existing technology may support both new and old
or legacy applications. Some of these applications may have been re-hosted or moved from
one operating hardware and operating system environment to another, or at least upgraded.
Others may have been migrated from physical to software-defined virtual, cloud, container, or
serverless environments, while still others may have been rewritten to take advantage of new or
emerging technologies and techniques.

1.3.1. Where Are We Today? (Balancing Legacy with Emerging)

A trend has been a move from centralized computer server and storage to distributed and edge,
then to centralized via consolidation (or aggregation), then back to distributed over different
IT Data Infrastructure Fundamentals 19

generations from mainframe to time-sharing to minicomputers to PCs and client servers, to the
web and virtualized to the cloud. This also includes going from dedicated direct attached stor-
age to clustered and shared storage area networks (SAN) or network attached storage (NAS) to
direct attached storage (DAS), to blob and object storage, virtual and cloud storage, and back
to direct attached storage, among other options.
What this all means is that from a server storage I/O hardware, software, and services
perspective, consideration is needed for what is currently legacy, how to keep it running or
improve it, as well as what might be a shiny new thing today but legacy tomorrow, while also
keeping an eye on the future.
This ties into the theme of keeping your head in the clouds (looking out toward the future)
but your feet on the ground (rooted in the reality of today), and finding a balance between what
needs to be done and what you want to do. Figure 1.10 shows, for example, how enterprise and
other storage options are evolving, taking on new or different roles while also being enhanced.
Some of the changes that are occurring include traditional enterprise-class storage systems
being used for secondary roles or combined with software-defined storage management tools,
virtualization, cloud, and other technologies. Also shown in Figure 1.10 from a high level are
how traditional lower-end or open technologies are handling more enterprise workloads.
Common to what is shown in Figure 1.10 as well as in other scenarios is that the trend of
using commercial off-the-shelf (COTS) servers continues to evolve. COTS are also known as
commodity, industry standard, and white box products, with motherboards, processors, and
memory. This means that servers are defined to be data and storage systems or appliances, as
well as being application servers with different hardware configurations to support those needs.
Besides physical servers, many cloud providers also offer cloud machines (instances) with vari-
ous standard configurations ranging from high-performance computing to memory- or I/O-
intensive, among other options.
Another trend centers around convergence (all-in-ones and others), where server, storage,
and I/O hardware is designed with software to support applications as well as storage and
management functions as turnkey solutions. We shall have more to say about these and other
trends later.

Figure 1.10 Server and storage yesterday and today.


20 Data Infrastructure Management: Insights and Strategies

Figure 1.11 Evolving server and storage I/O leveraging new technologies.

1.3.2. Where Are We Going? (Planning, Lessons Learned)

To see and understand where we are going will help us to see where we have been and where
we are currently. At first glance, Figure 1.11 may look similar to what you have seen in the
past, but what’s different and will be covered in more depth in subsequent chapters is the role
of servers being defined as storage. In addition to being defined as media content servers for
video, audio, or other streaming applications, as a database, email, the web, or other servers,
servers are also increasingly being leveraged as the hardware platform for data and storage
applications.
A closer look at Figure 1.11 will also reveal new and emerging non-volatile memory that can
be used in hybrid ways, for example, as an alternative to DRAM or NVRAM for main mem-
ory and cache in storage systems or networking devices, as well as as a data storage medium.
Also shown in Figure 1.11 are new access methods such as emerging Gen-z, along with NVMe
as well as 100 GbE (and faster) to reduce the latency for performing reads and writes to fast
NVM memory and storage. NVMe, while initially focused for inside the server or storage
system, will also be available for external access to storage systems over low-latency networks.
Building off of what is shown in Figure 1.11, Figure 1.12 provides another variation of how
servers will be defined to be used for deployments as well as how servers provide the processing
capability for running storage application software. In Figure 1.12 the top center is a shrunk-
down version of Figure 1.11, along with servers defined to be storage systems or appliances for
block SAN, NAS file, and Object, Blob, or table (database), or other endpoint API including
converted, nonconverged, and virtual and cloud-based applications.
Figure 1.12 shows how the primary data infrastructure building block resources, including
server, storage, I/O networking hardware, software, and services, combine to support various
IT Data Infrastructure Fundamentals 21

Figure 1.12 Servers as storage and storage as servers, converged and nonconverged.

needs. Not shown in detail is the trend toward protecting data at ingest vs. the past trend of
protecting after the fact. This also means collecting telemetry, metadata, and other informa-
tion about data when it is being stored, to reduce the overhead complexity of searching or
scanning data after the fact for management and protection purposes.
Figure 1.12 also illustrates that fundamentals for server, storage, and I/O networking need
to have a focus spanning hardware, software, services, tools, processes, procedures, techniques,
and tradecraft, even if the resulting solutions are cloud, converged, virtual, or nonconverged.
Remember, the primary role of data infrastructures and servers, storage, and I/O resources
is to protect, preserve, and serve existing as well as new applications and their data in a cost-
effective, agile, flexible, and resilient way.
Some additional considerations about data infrastructure management insight and strategy
on a go-forward basis include where applications will best be deployed. This also means where
the data is generated or primarily accessed from, along with how it is used and shared by other
applications across different locations.
With the increased deployment of mobile users and consumers of information services,
data infrastructures will also need to support a traditional browser or virtual terminal–based
access, as well as smart device apps using iOS-, Android-, and Windows-based platforms,
among others.
Likewise, continued deployment of IoT devices will place additional demands on remote
and edge-based infrastructures, as well as central and core, including on-prem as well as cloud-
based, data infrastructures. Then there are the new technology trends to factor in, from AI to
blockchain distributed ledgers; compute offload including GPU, FPGA, and ASIC; to data
ponds, data lakes, and oceans of information to be considered in subsequent chapters.
22 Data Infrastructure Management: Insights and Strategies

1.4. Data Infrastructure Management Tradecraft


There are many different tasks, activities, and functions concerned with servers, storage, and
I/O hardware, software, and services, depending on your environment. Likewise, depending
on your focus area or job responsibility, you might be focused on lower-level detailed tasks,
or higher-level vision, strategy, architecture, design, and planning, or some combination of
these tasks.

In general, common management tasks and activities include, among others:

• Vision, strategy, architecture design, project planning, and management


• Configuration of hardware and software; change management
• Troubleshooting, diagnostics, and resolution
• Remediation; applying hardware or software updates
• Maintaining inventory of hardware and software resources
• Disposition of old items, resources, life-cycle management
• Migrations, conversions, movement, media maintenance
• What to buy, lease, rent, subscribe, or download for acquisition
• Data protection (archiving, backup, HA, BR, BC, DR, and security)
• Reporting, analysis, forecasting, chargeback or show-back
• Service-level agreements (SLA), service-level objective (SLO) management

Trends involving data infrastructure management tradecraft (skills) include capturing


existing experiences and skills from those who are about to retire or simply move on to some-
thing else, as well as learning for those new to IT or servers, storage, I/O, and data infrastruc-
ture hardware, software, and services. This means being able to find a balance of old and new
tools, techniques, and technologies, including using things in new ways for different situations.
Part of expanding your tradecraft skillset is knowing when to use different tools, tech-
niques, and technologies from proprietary and closed to open solutions, from tightly integrated
to loosely integrated, to bundled and converged, or to a la carte or unbundled components,
with do-it-yourself (DIY) integration. Tradecraft also means being able to balance when to
make a change of technology, tool, or technique for the sake of change vs. clinging to some-
thing comfortable or known, vs. leveraging old and new in new ways while enabling change
without disrupting the data infrastructure environment or users of its services.
Additional trends include the convergence of people along with their roles within organiza-
tions and data infrastructures. For some environments, there is a focus on specialization, such
as hardware or software focus, server, storage, I/O networking, operating system or hypervi-
sor-centric (e.g., VMware vs. Hyper-V, KVM, Xen), and Docker, Kubernetes, and Windows
containers.
Other focus areas include data protection, such as backup, BC, BR, DR, as well as security
and audit, in addition to performance, capacity planning, forecast, and modeling. In other
environments, there is a trend of having cross-functional teams that have a specialist who is
also a generalist in other areas, as well as a generalist who has one or more specialties. The idea
is to remove barriers to productivity that often are tied to products, technology, platforms, or
organizational political turf wars, among others.
IT Data Infrastructure Fundamentals 23

Another trend is around the Development Operations (DevOps) movement, with varia-
tions of No Operations (NoOp) among variations for both legacy and new startup environ-
ments. Some context is needed around DevOps, in that it means different things for various
situations. For example, in some settings—notably, smaller, new, startup-type organiza-
tions—as its name implies, the people who develop, test, and maintain application code also
deploy, manage, and support their code. This approach of DevOps is different from some
traditional environments in which application code development, testing, and maintaining
is done in an organization separate from technical infrastructure (e.g., data infrastructure
and operations).
There are also hybrid variations—for example, some will see DevOps as new and different,
while others will have déjà vu from how they used to develop application code, push it into
production, and provide some level of support. Another variation of DevOps is development
for operations, or development for data infrastructure, or development areas (take care of the tools
and development platforms) or what some may remember as system programmers (e.g., sys
progs). For those not familiar, sys progs were a variation or companion to system admins who
deployed, patched, and maintained operating systems and wrote specialized drivers and other
specials tools—not so different from those who support hypervisors, containers, private cloud,
and other software-defined data infrastructure topics.
Put another way, in addition to learning server storage I/O hardware and software trade-
craft, also learn the basic tradecraft of the business your information systems are supporting.
After all, the fundamental role of IT is to protect, preserve, and serve information that enables
the business or organization; no business exists just to support IT.

1.5. Data Infrastructure Terminology (Context Matters)


For some of you this will be a refresher; for others, an introduction of server and storage I/O talk
(terms and expressions). There are many different terms, buzzwords, and acronyms about IT
and particularly server storage I/O–related themes, some of which have more than one meaning.
This is where context matters; look for context clarification in the various chapters. For
example, SAS can mean serial attached SCSI in the context of physical cabling, or the protocol
command set, as a disk or SSD device, as a server to storage, or storage system to storage device
connectivity interface. SAS can also mean statistical analysis software for number crunching,
statistics, analysis, big data, and related processing. Yet another use of SAS is for shared access
signature as part of securing resources in Microsoft Azure cloud environments.
Context also matters in terms that are used generically, similar to how some people refer to
making a copy as making a Xerox, or getting a Kleenex vs. a tissue. Thus, a “mainframe” could
be legacy IBM “big iron” hardware and software or a minicomputer or large server running
Windows, Unix, Linux, or something other than a small system; others might simply refer to
that as a server or a server as the mainframe, as the cloud, and vice versa (Hollywood and some
news venues tend to do this).
The following is not an exhaustive list nor a dictionary, rather some common terms and
themes that are covered in this book and thus provide some examples (among others). As
examples, the following also include terms for which context is important: a bucket may mean
one thing in the context of cloud and object storage and something else in other situations.
24 Data Infrastructure Management: Insights and Strategies

Likewise, a container can mean one thing for Docker and micro-services and another with
reference to cloud and object storage or archives.

ARM Can mean Microsoft public cloud Azure Resource Manager as well as a type
of compute processor.
BaaS Back-end as a Service, Backup as a Service, among others.
Buckets Data structures or ways of organizing memory and storage, because object
storage buckets (also known as containers) contain objects (or blobs) or items
being stored. There are also non-object or cloud storage buckets, including
database and file system buckets for storing data.
Containers Can refer to database or application containers, data structures such as
archiving and backup “logical” containers, folders in file systems, as well
as shipping containers in addition to Windows, Linus, Docker, or compute
micro-services.
Convergence Can mean packaging such as CI and HCI, as well as hardware and software,
server, storage, and networks bundled with management tools. Convergence
can occur at different layers for various benefits.
CSV Comma Separated Variables is a format for exporting, importing, and
interchange of data such as with spreadsheets. Cluster Shared Volumes for
Microsoft Windows Server and Hyper-V environments are examples.
DevOp Can refer to development for data infrastructure or technical operations,
as well as environments in which developers create, maintain, deploy, and
support code, once in production.
Flash Can refer to NAND flash solid-state persistent memory packaged in various
formats using different interfaces. Flash can also refer to Adobe Flash for
video and other multimedia viewing and playback, including websites.
Objects There are many different types of server and storage I/O objects, so context
is important. When it comes to server, storage, and I/O hardware and
software, objects do not always refer to object (or blob) storage, as objects
can refer to many different things.
Orchestration Can occur at different levels or layers and mean various things.
Partitions Hard disk drive (HDD), solid-state device (SSD), or other storage device and
aggregate subdividing (i.e., storage sharing), database instance, files or table
partitioned, file system partition, logical partition or server compute and
memory tenancy.
PCI Payment Card Industry and PCIe (Peripheral Computer Interconnect
Express)
PM Physical machine, physical memory, paged memory, persistent memory (aka
PMEM), program manager or product manager, among others.
RAID Can be hardware or software, system, array, appliance, software defined,
mirror and replication, parity and erasure code based.
Serverless Can refer to micro-services including cloud based, such as AWS Lambda,
Function as a Service (FaaS). Context is also used to describe software that is
available, sold, or not dependent on a particular brand of server. Keep in mind
that software still requires a physical server to exist somewhere—granted, it
can be masked and software defined.
IT Data Infrastructure Fundamentals 25

SMB Small/Medium Business, refers to the size or type of an organization or target


market for a solution (or services). SMB is also the name for Server Message
Block protocols for file and data sharing, also known as CIFS (Common
Internet File System) aka Windows file sharing.
SME Small/Medium Enterprise, a type or size of organization; also refers to a
Subject Matter Expert skilled in a given subject area.
Virtual disk Abstracted disks such as from a RAID controller, software-defined storage
such as Microsoft Storage Spaces, as well as virtual machine including
Hyper-V VHD/VHDX, VMware VMDK, and qcow, among others.
VM Virtual Memory (Operating System Virtual Memory, Hypervisor Virtual
Memory, Virtual Machine Memory), Virtual Machine, Video Monitor.

You can find a more comprehensive glossary at the end of this book.

Tip: Knowing about the tool is important; however, so too is knowing how to use a tool and
when along with where or for what. This means knowing the tools in your toolbox, but also
knowing when, where, why, and how to use a given tool (or technology), along with techniques
to use that tool by itself or in conjunction with multiple other tools.

Part of server storage I/O data infrastructure tradecraft is understanding what tools to use
when, where, and why, not to mention knowing how to improvise with those tools, find new
ones, or create your own.
Remember, if all you have is a hammer, everything starts to look like a nail. On the other
hand, if you have more tools than you know what to do with, or how to use them, perhaps fewer
tools are needed along with learning how to use them by enhancing your skillset and tradecraft.

1.6. Common Questions and Tips


A common question I get asked is “Why can’t we just have one type of data storage such as NAND
flash SSD, or put everything in the cloud?” My usual answer is that if you can afford to buy, rent,
or subscribe to a cloud or managed service provider (MSP) all-flash array (AFA) solution, then
do so.
While some organizations are going the all-flash storage approach, most cannot afford to
do so yet, instead adopting a hybrid approach. The cloud is similar: Some have gone or will be
going all-cloud, others are leveraging a hybrid approach at least near-term. What about virtual
storage and storage virtualization for the software-defined data center? Sure, some environments
are doing more with software-defined storage (depending on what your definition is) vs. others.
What is the best storage? That depends on how you will be using the storage and with what
applications. Will the storage be shared across multiple servers or hypervisors? Do you need to
share files and data? What are the Performance, Availability, Capacity, and Economic (PACE)
requirements? With what does the storage need to coexist within your environment?
Do cloud and virtual servers need the cloud and virtual storage? If your cloud or virtual server is
running at or on a public cloud, such as Amazon or rack space, you may want to access their
26 Data Infrastructure Management: Insights and Strategies

cloud and virtual storage. If your virtual server is running on one of your own servers, you may
be fine using traditional shared storage.
Are clouds safe? They can be if you use and configure them in safe ways, as well as do your
due diligence to assess the security, availability, as well as the privacy of a particular cloud
provider. Keep in mind that cloud data protection and security are a shared responsibility
between you and your service provider. There are things that both parties need to do to pre-
vent cloud data loss.
Why not move everything to the cloud? The short answer is data locality, meaning you will need
and want some amount of storage close to where you are using the data or applications—that
might mean a small amount of flash or RAM cache, or HDD, Hybrid, or similar. Another
consideration is available and effective network bandwidth or access.
With serverless so popular, why discuss legacy, on-prem, bare metal, virtual, or even cloud IaaS?
Good question, and one that for some environments, specific application landscapes, or pieces
of their workloads can make perfect sense.
On the other hand, even though there is strong industry adoption (e.g., what the indus-
try likes to talk about) along with some good initial customer adoption (what customers are
doing), serverless on a bigger, broader basis is still new and not for everybody, at least not yet.
Given the diversity of various applications and environment needs, data infrastructures that
provide flexibility of different deployment models enable inclusiveness to align appropriate
resources to multiple needs.
What is the best data infrastructure? The best data infrastructure is the one that adapts to your
organization’s needs and is flexible, scalable, resilient, and efficient, as well as being effective to
enable productivity. That means the best data infrastructure may be a mix of legacy, existing,
and emerging software-defined technologies and tools, where new and old things are used in
new ways. This also means leveraging public cloud, containers, and serverless, among other
approaches where needed. On the other hand, depending on your needs, wants, or preferences,
the best data infrastructure may be highly distributed from edge to core, data center to the
cloud, or all existing only within a public cloud.

1.7. Strategies
Throughout this book, there are various tips, frequently asked questions (along with answers),
examples, and strategy discussions. All too often the simple question of what is the best is
asked, looking for a simple answer to what are usually complex topics. Thus, when some-
body asks me what is the best or what to use, I ask some questions to try and provide a more
informed recommendation. On the other hand, if somebody demands a simple answer to a
complicated subject, they may not get the answer, product, technology, tool, or solution they
wanted to hear about. In other words, the response to many data infrastructure and related
tools, technology, trends, and techniques questions is often, “It depends.”
Should your environment be software defined, public cloud, cloud native, or lift and
shift (e.g., migrate what you have to cloud or another environment)? What about converged
infrastructure vs. hyper-converged infrastructure (aggregated) vs. cluster or cloud in a box vs.
IT Data Infrastructure Fundamentals 27

serverless or legacy? Take a step back: What services and resources are needed, and what modes
and models of service delivery? Perhaps yours is a one-size solution approach fits-all-needs
environment; however, look at how different technologies and trends fit as well as adapt to your
environment needs vs. you having to work for the technology.
What data infrastructure resources do your applications and associated workloads need
regarding performance, availability (including data protection and security), capacity, and eco-
nomic considerations? This includes server compute (primary general purpose, GPU, ASIC,
FPGA, along with other offloads and specialized processors), memory (DRAM or SCM and
PMEM), I/O (local and wide area), storage (performance and capacity), along with manage-
ment tools.
Another consideration is what your service-level objectives (SLOs) and service-level agree-
ments (SLAs) are for different services provided as part of your data infrastructure. What met-
rics, report, and management insight are available for chargeback billing, invoice reconciliation
and audits, show back for information along with “scare back.” Note that “scare back” is a form
of show back in which information is provided on local on-prem or public cloud resource usage
along with costs that can cause alarm—as well as awareness—about actual spending.
For example, some of your data infrastructure customers decide to deploy their applications
in a public cloud, then bring the invoice to you to pay, as well as support. In other words, some-
body brings you the bill to pay for their cloud resources, and they want you to take care of it.
Some additional considerations include: do they know how much more or less the public cloud
costs are compared to the on-prem services costs? Likewise, how can you make informed deci-
sions about costs regarding public vs. private vs. legacy if you do not know what those fees are?
In other words, part of a strategy is to avoid flying blind and have insight and awareness
into your application landscape, workload characteristics along with resource usage, and cost
to deliver or provide various services.

1.8. Chapter Summary


There is no such thing as an information recession—quite the opposite, in fact, with new as
well as existing applications creating, processing, moving, and storing data being kept long-
term. The result is that data infrastructures are evolving to meet the various application land-
scape and workload demands, deployment locations (edge, on-prem, cloud), and models (BM,
VM, cloud instance, container, serverless, CI, HCI, CiB, among others).
Keep in mind that data infrastructures—including legacy, SDDC, SDI, and SDDI—are
what exist inside physical data centers. Not only are data infrastructures composed of hard-
ware (servers, storage, I/O networks, and related equipment) along with software, manage-
ment tools, and services, but they are also defined by policies, procedures, templates, and best
practices established by people. The fundamental role of data infrastructures is to provide an
infrastructure environment platform on which applications transform data into information
services for users of those services.
Just as everything is not the same across different environments of various size, scope, or
type of business and industry, there are also multiple types of data infrastructures spanning
legacy to software defined, on-site and on-prem to off-site, co-lo, MSP, and across one or more
public cloud services. Likewise, there are many different application landscapes, from legacy to
28 Data Infrastructure Management: Insights and Strategies

emerging, with diverse workloads characteristics. The different application workloads require
various data infrastructure resources (server compute, memory, I/O, storage space, software
licenses) along with performance, availability, capacity, and economic (PACE) considerations.
Also, keep in mind that data centers are still alive. However, their role and function are
changing from traditional on-prem to cloud, where they are referred to as AZs. Likewise, data
infrastructures—in addition to existing on-prem in legacy data centers to in clouds that define
and support AZs—also live at the edge for traditional ROBO and for emerging fog computing
(e.g., think small cloud-like functionality at or near the edge).
What this all means is that there is not a one-size-fits-all environment; application work-
load scenarios and data infrastructures need to be resilient, flexible, and scalable. This also
means that there are different strategies to leverage and insight to gain for effective data infra-
structure decision making.

General action items include:

• Avoid treating all data and applications the same.


• Know your applications PACE criteria and requirements as well as needs vs. wants.
• Don’t be afraid of cloud and virtualization.

Bottom line: The fundamental role of IT data infrastructures including server, storage I/O
connectivity hardware, and software and management tools is to support protecting, preserv-
ing, and serving data and information across various types of organizations.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy