Previewpdf
Previewpdf
Infrastructure
Management
Insights and Strategies
Data
Infrastructure
Management
Insights and Strategies
Greg Schulz
AN AUERBACH BOOK
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to
publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials
or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material repro-
duced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any
form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming,
and recording, or in any information storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copy-
right.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400.
CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have
been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identifica-
tion and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
Preface ix
Who Should Read This Book x
How This Book Is Organized x
Acknowledgments xi
About the Author xiii
Part One
Applications, Data, and IT Data Infrastructures 1
v
vi Data Infrastructure Management: Insights and Strategies
Part Two
Data Infrastructure Management 63
Part Three
Enabling Data Infrastructures 147
Glossary 213
Index 235
Preface
This book follows from my previous books, Software-Defined Data Infrastructure Essentials:
Cloud, Converged, and Virtual Fundamental Server Storage I/O Tradecraft * (aka “The Blue
Book”), Resilient Storage Networks: Designing Flexible Scalable Data Infrastructures† (aka “The
Red Book”), The Green and Virtual Data Center ‡ (aka “The Green Book”), and Cloud and
Virtual Data Storage Networking § (aka “The Yellow, or Gold, Book”).
Data Infrastructure Management: Insights and Strategies provides fundamental coverage of
physical, cloud, converged, and virtual server storage I/O networking technologies, trends,
tools, techniques, and tradecraft skills. Software-defined data centers (SDDC), software data
infrastructures (SDI), software-defined data infrastructures (SDDI, and traditional data infra-
structures support business applications including components such as a server, storage, I/O
networking, hardware, software, services, and best practices, among other management tools.
Spanning cloud, virtual, container, converged (and hyper-converged), as well as legacy and
hybrid systems, data infrastructures exist to protect, preserve, and serve data and information.
Although there are plenty of new things, sometimes those new things get used in old ways,
and sometimes old things can get used in new ways. As you have probably heard before, the
one thing that is constant is change, yet something else that occurs is that as things or technol-
ogies change, they get used or remain the same. A not-so-bold prophecy would be to say that
next year will see even more new things, not to mention old things being used in new ways.
For example, many technology changes or enhancements have occurred from the time I
started writing this book until its completion. There will be more from the time this goes to the
publisher for production, then until its release and you read it in print or electronically. That
* Schulz, G. (2017). Software-Defined Data Infrastructure Essentials: Cloud, Converged, and Virtual
Fundamental Server Storage I/O Tradecraft. 1st Edition. Taylor & Francis Group/CRC Press.
† Schulz, G. (2004). Resilient Storage Networks: Designing Flexible Scalable Data Infrastructures. 1st Edition.
Elsevier/Digital Press.
‡ Schulz, G. (2009). The Green and Virtual Data Center. 1st Edition. Taylor & Francis Group/CRC
Press.
§ Schulz, G. (2011). Cloud and Virtual Data Storage Networking. 1st Edition. Taylor & Francis Group/
CRC Press.
ix
x Data Infrastructure Management: Insights and Strategies
Thanks and appreciation to all of the vendors, vars, service providers, press and media, free-
lance writers as well as reporters, investors and venture capitalists, bloggers, and consultants,
as well as fellow Microsoft MVPs and VMware vExperts. Also, thanks to all Twitter tweeps
and IT professionals around the world that I have been fortunate enough to talk with while
putting this book together.
I would also like to thank all of my support network as well as others who were directly or
indirectly involved with this project.
Special thanks to Tom Becchetti and Greg Brunton. Thanks to John Wyzalek, my pub-
lisher, along with everyone else at CRC/Taylor & Francis/Auerbach, as well as a big thank you
to Theron Shreve at DerryField Publishing Services and his associates, Marje Pollack, Susan
Culligan, and Lynne Lackenbach, for working their magic.
Finally, thanks to my wife Karen (www.karenofarcola.com) for having the patience to sup-
port me and take care of “Little Buddy Bjorn” while I worked on this project.
To all of the above and, to you the reader, thank you very much.
xi
About the Author
Greg Schulz is Founder and Senior Analyst of the independent IT advisory and consultancy
firm Server StorageIO (www.storageio.com). He has worked in IT at an electrical utility and at
financial services and transportation firms in roles ranging from business applications develop-
ment to systems management and architecture planning.
Greg is the author of the SNIA Emerald Endorsed reading book Software-Defined Data
Infrastructure Essentials (CRC Press, 2017), as well as the Intel Recommended Reading List
books Cloud and Virtual Data Storage Networking (CRC Press, 2011) and The Green and Virtual
Data Center (CRC Press, 2009) as well as Resilient Storage Networks (Elsevier, 2004), among
other works. He is a multi-year VMware vExpert as well as a Microsoft MVP (Cloud Data
Center Management) and has been an advisor to various organizations including CompTIA
Storage+ among others.
In addition to holding frequent webinars, on-line, and live in-person speaking events and
publishing articles and other content, Greg is regularly quoted and interviewed as one of the
most sought-after independent IT advisors providing perspectives, commentary, and opinion on
industry activity. Greg is also a licensed FAA commercial drone pilot who generates GBytes of
big data every few minutes of flight time. You can view some of his video and other digital works
at www.picturesoverstillwater.com.
Greg has a B.A. in computer science and a M.Sc. in software engineering from the University
of St. Thomas. You can find him on Twitter @StorageIO; his blog is at www.storageioblog.com,
and his main website is www.storageio.com.
xiii
Part One
Applications, Data, and
IT Data Infrastructures
Part One comprises Chapters 1 and 2, and provides an overview of the book as well as key
concepts including industry trends, different environments, and applications that rely on IT
data infrastructures. Data infrastructures are what are inside data centers including software
defined data centers (SDDC), legacy, software defined virtual, cloud, serverless, hybrid among
others.
Buzzword terms, trends, technologies, and techniques include applications and landscapes,
AI, analytics, automation, big data, blockchain, bitcoin, backup, cloud, converged, container,
landscapes, IoT, storage class memory (SCM), persistent memory (PMEM), data protection,
performance, availability, capacity, and economics (PACE), serverless, server storage I/O net-
working, software-defined, structured and unstructured, among others.
Chapter 1
IT Data Infrastructure
Fundamentals
This opening chapter kicks off our discussion of IT data infrastructure management, strategy,
and insights. Key themes, buzzwords, and trends addressed in this chapter include server stor-
age I/O resources, application and data demand drivers, fundamental needs and uses of data
infrastructures, and associated technologies.
Our conversation is going to span hardware, software, services, tools, techniques, cloud,
serverless, container, SDDI, SDDC, SDC, and industry trends along with associated applica-
tions across different environments. Depending on when you read this, some of the things
discussed will be more mature and mainstream, while others will be nearing or have reached
the end of their usefulness.
3
4 Data Infrastructure Management: Insights and Strategies
• Clouds, public, private, virtual private and hybrid servers, services and storage.
• Context matters regarding server, storage I/O, and associated topics.
• The best server, storage, I/O technology depends on what are your needs.
Figure 1.1 Users, Application Landscapes, Data Infrastructures, and Data Centers
IT Data Infrastructure Fundamentals 5
When you hear the term software-defined data center (SDDC), software-defined infrastruc-
ture (SDI), along with software-defined data infrastructure (SDDI), among other variations,
these are referring to what’s inside the data center or AZ. Data infrastructure resources, includ-
ing hardware, software, and service resources, are defined to support applications, along with
their workload needs, to provide information services to users, clients, and customers of those
capabilities (across the top of Figure 1.1).
Figure 1.1 shows how data infrastructures are what is inside physical data centers support-
ing various application landscapes to provide information services to users, customers, con-
sumers, or devices. Data centers—also known as habitats for technology—are where the servers,
storage, I/O network hardware, along with power, cooling, and other equipment exist, all of
which are defined by software to provide platforms for various applications.
Application landscapes are the various workloads that are deployed upon and use data
infrastructure resources. Different application landscapes range from traditional operational
and transaction-oriented database systems to email and messaging, business intelligence (BI),
and analytics, along with enterprise resource planning (ERP), such as those from SAP, among
others.
Other application landscapes include development environments, workspace, and virtual
desktop infrastructures (VDI), as well as big data, batch, and real-time analytics; augmented
reality (AR), virtual reality (VR), gaming, and other immersive workloads. Some different
landscapes include artificial intelligence (AI), machine learning (ML), deep learning (DL),
and cognitive, along with the Internet of Things (IoT) and Internet of Devices, among others.
Keep in mind that all applications have some software code along with data and data
structures that define how information gets created, transformed, and stored. Some data infra-
structure decisions include where applications code and data get deployed, as well as what
granularity of resources is needed.
For example, does a given application landscape, such as SAP, need a cluster of servers
deployed as bare metal (BM) or Metal as a Service (MaaS), or as software-defined virtual
machines (VM), docker (or windows) containers managed by Kubernetes (K8s), mesos,
Openshift, or OpenStack, among others? Perhaps the application can be deployed into a
serverless or Function as a Service (FaaS) deployment model, among others.
Data infrastructure management insight and strategy topics include decision making,
along with making sure that applications, along with their data, configuration settings, and
resources, are protected, secured, preserved, and served as needed.
Data infrastructure decision making includes whether to do it yourself (DiY), buy, rent, or
subscribe to a license from the software vendor, as well as from a services and hardware per-
spective. Decision making also includes what resources are needed, as well as where to place
different workloads to meet their service-level objective (SLOs) and other service criteria.
Additional data infrastructure management tasks include maintaining insight, situational
awareness, proactive monitoring, problem resolution and remediation, ongoing maintenance,
policy management, data protection including backup, business continuance (BC), business
resiliency (BR), high availability (HA), disaster recovery (DR), and security, among others
discussed later.
Likewise, the fundamental role of data storage (“storage”) is to provide persistent memory
for servers to place data to be protected, preserved, and served. Connectivity for moving data
between servers and storage, from servers to servers, or from storage to storage is handled via
6 Data Infrastructure Management: Insights and Strategies
I/O networks (internal and external). There are different types of servers, storage, and I/O
networks for various environments, functionalities, as well as application or budget needs.
Figure 1.2 shows a very simplistic, scaled-down view of data infrastructure resources,
including servers, storage, and I/O components supporting applications being accessed via
tablets, mobile devices, laptops, virtual desktop infrastructure (VDI), workstations, IoT/IoD
Supervisory Control and Data Acquisition (SCADA), sensors, computer numerical control
(CNC), cameras, autonomous vehicles, robotics, and other servers. Also shown in Figure 1.2
is storage (internal or external, dedicated and shared) being protected by some other storage
system (or service).
Keeping in mind that Figure 1.2 is the data infrastructure resouces “big picture” (e.g.,
what’s inside data centers) and a simple one at that means we could scale it down even further
to a laptop or tablet, or, in the opposite direction, to a large web-scale or cloud environment of
tens of thousands of servers, storage, and I/O components, hardware, and software.
A fundamental theme is that servers process data using various applications programs to
create information; I/O networks provide connectivity to access servers and storage; storage is
where data gets stored, protected, preserved, and served from; and all of this needs to be man-
aged. There are also many technologies involved, including hardware, software, and services as
well as various techniques that make up a server, storage, and I/O enabled data infrastructure.
Applications are what transform data into information. Figure 1.3 shows how applications,
which are software defined by people and software, consist of algorithms, policies, procedures,
IT Data Infrastructure Fundamentals 7
Figure 1.3 How data infrastructure resources transform data into information.
and rules that are put into some code to tell the server processor (central processing unit
[CPU]) what to do. Note that CPUs can include one or more cores per socket, with one or
more sockets per server. Also note that there can be one or more threads of execution per core.
There are also physical CPUs as well as virtual CPUs (vCPUs) provided via software-defined
virtual and cloud environments, as well as logical processors (LPs), which refer to core threads.
Application programs include data structures (not to be confused with infrastructures)
that define what data looks like and how to organize and access it using the “rules of the road”
(the algorithms). The program algorithms along with data structures are stored in memory,
together with some of the data being worked on (i.e., the active working set).
Persistent data is stored in some form of extended memory—storage devices such as non-
volatile memory (NVM), also known as storage class memory (SCM), as well as persistent
memory (PMEM). Other storage includes solid-state devices (SSD), hard disk drives (HDD),
or tape, among others, either locally (also known as on-prem, on-premises, or on-site) or
remotely. Also shown in Figure 1.3 are various devices that perform input/output (I/O) with
the applications and server, including mobile devices as well as other application servers. In
Chapter 2 we take a closer look at various applications, programs, and related topics.
Serverless is another favorite buzz topic that some marketers use to define software that is
independent of hardware or that uses fewer servers. However, discussions center around data
infrastructures (cloud-based or otherwise) that remove the concept of using and managing
a server—merely push your code and it runs. Serverless, also known as Function as a Service
(FaaS), is also referred to as containers including Docker, Windows, and Kubernetes, among
other names.
Figure 1.3 takes a slightly closer look at the server storage I/O data infrastructure, revealing
different components and focus areas that will be expanded on throughout this book.
Some buzz and popular trending topics, themes, and technologies include, among others:
• NVM, NVM Express (NVMe), SCM, PMEM, including NAND flash SSD
• Software defined data centers (SDDC), networks (SDN), and storage (SDS)
• Converged infrastructure (CI), hyper-converged infrastructure (HCI)
• Serverless, Function as a Service (FaaS), blockchain, distributed ledgers
• Scale-out, scale-up, and scale-down resources, functional and solutions
• Virtualization, containers, cloud, and operating systems (OS)
• NVM express (NVMe), NVMe over Fabric (NVMeoF), GPU, PCIe, IP, and Gen-Z
accessible storage
• Block, file, object, and application program interface (API) accessed storage
• Data protection, business resiliency (BR), archiving, and disaster recovery (DR)
In Figure 1.4 there are several different focus topics that enable access to server and storage
resources and services from various client or portable devices. These include access via server
I/O networks, including:
Figure 1.4 Data infrastructure server, storage, and I/O network resources.
IT Data Infrastructure Fundamentals 9
Besides technologies, tools, and trends, another topic is where to place resources: on-prem,
off-site in a cloud, co-location (co-lo), or managed service provider, and at what granularity.
Granularities can include serverless, FaaS, SaaS, AaaS, PaaS, container, VM or cloud instance,
dedicated instance, physical machine, BM or MaaS, among others.
Also, keep in mind that hardware needs software and software needs hardware, including:
All of these are critical to consider, along with other applications that all rely on some
underlying hardware (bare metal, virtual, or cloud abstracted). There are various types of
servers, storage, and I/O networking hardware as well as software that have various focus areas
or specialties, which we will go deeper into in later chapters.
There is a lot of diversity across the different types, sizes, focus, and scope of organizations.
However, it’s not just about size or scale; it’s also about the business or organizational focus,
applications, needs (and wants), as well as other requirements and constraints, including a
budget. Even in a specific industry sector such as financial services, healthcare or life science,
media and entertainment, or energy, among others, there are similarities but also differences.
While everything that comes to server storage I/O is not the same, there are some commonali-
ties and similarities that appear widely.
Figure 1.5 shows examples of various types of environments where servers, storage, and I/O
have an impact on the consumer, from small office/home office (SOHO) to SMB (large and
small), workgroup, remote office/branch office (ROBO), or departmental, to small/medium
enterprise (SME) to large web-scale cloud and service providers across private, public, and
government sectors.
In Figure 1.5 across the top, going from left to right, are various categories of environments
also known as market segments, price bands, and focus areas. These environments span dif-
ferent industries or organizational focus from academic and education to healthcare and life
sciences, engineering and manufacturing to aerospace and security, from media and entertain-
ment to financial servers, among many others.
Also shown in Figure 1.5 are some common functions and roles that servers, storage, and
I/O resources are used for, including network routing and access and content distribution
networks (CDN). Other examples include supercomputing, high-performance computing
(HPC), and high-productivity computing. Some other applications and workloads shown in
Figure 1.5 include general file sharing and home directories, little data databases along with
big data analytics, email and messaging, among many others discussed further in Chapter 2.
Figure 1.5 also shows that the role of servers (processors [sockets, cores, threads, ASIC,
FPGA, GPU], memory, and I/O connectivity) combined with some storage can be deployed in
10 Data Infrastructure Management: Insights and Strategies
Figure 1.5 Different environments with applications using data infrastructure resources.
various ways from preconfigured engineered packaged solutions to build your own leveraging
open source and available (aka commodity or white box) hardware.
Besides different applications, industry, and market sectors concerned with server, storage,
and I/O topics, there are also various technology and industry trends. These include, among
others, analytics, application-aware, automation, cloud (public, private, community, virtual
private, and hybrid) CiB, HCI, CI, containers, and serverless.
Other data infrastructure and applications include data center infrastructure management
(DCIM), data lakes, data ponds, data pools and data streams, data protection and security,
HCI, insight and reporting, little data, big data, big fast data, very big data, management,
orchestration, policies, software defined, structured data and unstructured data, templates,
virtual server infrastructures (VSI), and VDI, among others.
• Server-side memory and storage (storage in, adjacent to, or very close to the server)
• NVM including 3D NAND flash, 3D XPoint, and others such as phase change memory
(PCM), DRAM, and various emerging persistent and nonpersistent memory technologies
• SCM, which have the persistence of NVM storage and the performance as well as dura-
bility of traditional server DRAM
• I/O connectivity including IoT (Hubs, gateways, edge, and management interfaces),
PCIe, Gen-Z (emerging compute I/O interface), NVMe including NVMeoF, along with
its variations, SAS/SATA, InfiniBand, Converged Ethernet, RDMA over Converged
Ethernet (RoCE), block, file, object, and API-accessed storage
• Data analytics including batch and real-time analytics, Lambda architectures, Hadoop,
Splunk, SAS, Snowflake, AI/ML/DL along with other cogantive workloads, Horton-
works, Cloudera, and Pivotal, among others
• Databases and key-value repositories including blockchain distributed ledgers, SQL
(AWS RDS including Auroa, IBM DB2, Microsoft SQL Server and Azure Cosmos,
IT Data Infrastructure Fundamentals 11
Depending on your role or focus, you may have a different view than somebody else of what is
infrastructure, or what an infrastructure is. Generally speaking, people tend to refer to infra-
structure as those things that support what they are doing at work, at home, or in other aspects
of their lives. For example, the roads and bridges that carry you over rivers or valleys when
traveling in a vehicle are referred to as infrastructure.
Similarly, the system of pipes, valves, meters, lifts, and pumps that bring fresh water to
you, and the sewer system that takes away waste water, are called infrastructure. The telecom-
munications network—both wired and wireless, such as cell phone networks—along with
electrical generating and transmission networks are considered infrastructure. Even the planes,
trains, boats, and buses that transport us locally or globally are considered part of the trans-
portation infrastructure. Anything that is below what you do, or that supports what you do,
is considered infrastructure.
This is also the situation with IT systems and services where, depending on where you sit or
use various services, anything below what you do may be considered infrastructure. However,
that also causes a context issue in that infrastructure can mean different things. For example in
Figure 1.6 the user, customer, client, or consumer who is accessing some service or application
may view IT in general as infrastructure, or perhaps as business infrastructure.
Those who develop, service, and support the business infrastructure and its users or clients
may view anything below them as infrastructure, from desktop to database, servers to storage,
Figure 1.7 Data infrastructures: server storage I/O hardware and software.
network to security, data protection to physical facilities. Moving down a layer in Figure 1.6 is
the information infrastructure which, depending on your view, may also include servers, stor-
age, and I/O hardware and software.
For our discussion, to help make a point, let’s think of the information infrastructure as
the collection of databases, key-value stores, repositories, and applications along with develop-
ment tools that support the business infrastructure. This is where you may find developers who
maintain and create actual business applications for the business infrastructure. Those in the
information infrastructure usually refer to what’s below them as infrastructure. Meanwhile,
those lower in the stack shown in Figure 1.5 may refer to what’s above them as the customer,
user, or application, even if the actual user is up another layer or two.
Context matters in the discussion of infrastructure. Data infrastructures support the data-
bases and applications developers as well as things above, while existing above the physical
facilities infrastructure, leveraging power, cooling, and communication network infrastruc-
tures below.
Figure 1.7 shows a deeper look into the data infrastructure shown at a high level in
Figure 1.6. The lower left of Figure 1.7 shows the common-to-all-environments hardware,
software, people, processes, and practices that comprise tradecraft (experiences, skills, tech-
niques) and “valueware.” Valueware is how you define the hardware and software along with
any customization to create a resulting service that adds value to what you are doing or sup-
porting. Also shown in Figure 1.7 are common application and services attributes including
performance, availability, capacity, and economics (PACE), which vary with different applica-
tions or usage scenarios.
• Some things scale up, others scale down; some can’t scale up, or scale down.
• Data protection includes security, protection copies, and availability.
• The amount (velocity), as well as size (volume) of data continues to grow.
Figure 1.8 shows the fundamental pillars or building blocks for a data infrastructure,
including servers for computer processing, I/O networks for connectivity, and storage for stor-
ing data. These resources including both hardware and software as well as services and tools.
The size of the environment, organization, or application needs will determine how large or
small the data infrastructure is or can be.
For example, at one extreme you can have a single high-performance laptop with a hyper-
visor running OpenStack and various operating systems, along with their applications, leverag-
ing flash SSD and high-performance wired or wireless networks powering a home lab or test
environment.
Another example can be SDDC software such as VMware running on Amazon Web
Services (AWS) dedicated aka bare metal MaaS systems, or Microsoft Azure Stack with Azure
software running on an on-prem appliance. On the other hand, you can have a scenario with
tens of thousands (or more) servers, networking devices, and hundreds of petabytes (PBs), exa-
bytes (EB), or zettabytes (ZB) of storage (or more).
A reminder that a gigabyte is 1,000 megabytes (e.g., million bytes), a terabyte is 1,000 giga-
bytes, and a PB is 1,000 terabytes. Also, keep in mind that bytes are referred to in base 2 (e.g.,
1024) or decimal base 10 (e.g., 1000). Keep context as to whether 1024 or 1000 are referring to
a thousand, as well as bits, which are little b (e.g., Kb [thousand bits]) or bytes, which are big
B (e.g., KB [thousand bytes]).
In Figure 1.8 the primary data infrastructure components or pillar (server, storage, and
I/O) hardware and software resources are packaged and defined to meet various needs. Data
infrastructure storage management includes configuring the server, storage, and I/O hardware
and software as well as services for use, implementing data protection and security, provision-
ing, diagnostics, troubleshooting, performance analysis, and other activities. Server storage
and I/O hardware and software can be individual components, prepackaged as bundles or
application suites and converged, among other options. Note that data infrastructure resources
can be deployed at the edge for traditional as well as emerging “fog” computing scenarios, as
wll as at data centers and clouds as core sites.
Fast applications need faster software (databases, file systems, repositories, operating systems,
hypervisors), servers (physical, virtual, cloud, container, serverless, converged, and hyper-
converged), and storage and I/O networks. Servers provide the compute or computational capa-
bilities to run application and other data infrastructure software. Data infrastructure software
includes lower-level drivers, operating systems, hypervisors, containers, storage, networking,
file systems, and databases, along with other management tools. Servers and their applica-
tions software manage and process data into information by leveraging local as well as remote
storage accessed via I/O networks. Server compute processing consists of one or more sockets
(which compute processor chips plug into). Processor chips include one or more cores that can
run one or more threads (workload code). In addition to the compute capabilities, there are
also memory management and access, as well as I/O interconnects. Server compute sockets and
support chips are arranged on a mother or main board that may also have optional daughter
or mezzanine boards for extra resources. Server compute resources also include offload proces-
sors such as graphical processing units (GPUs) that handle compute-intensive operations for
graphics, video, image processing, and AI/ML/DL analytics, among other workloads. Other
processing resources include custom application-specific integrated circuit (ASIC) and field
programmable gate arrays (FPGA).
Application and computer server software is installed, configured, and stored on storage.
That storage may be local or external dedicated, or shared. Servers leverage different tiers
of memory from local processor cache to primary main dynamic random access memory
(DRAM). Memory is storage, and storage is a persistent memory. Note that there are also
SCMs (storage class memories) that are also referred to as persistent memory (PMEM), meaning
they are not volatile as is DRAM. SCM and PMEM are packaged as DIMM (e.g., NVDIMM)
as well as PCIe add-in card (AiC) and drive form factors, among others.
Memory is used for holding both applications and data, along with operating systems,
hypervisors, device drivers, as well as cache buffers. Memory is staged or read and written to
data storage that is a form of persistent memory ranging from NVM NAND flash SSD to
HDD and magnetic tape, among others.
Server compute and memory, along with associated I/O network and storage, are packaged
as well as deployed in various ways, along with granularity. Some packaging and granularity
examples include standalone systems (rack or tower), scale out clusters, appliances, physical
and virtual machine (software defined), cloud instance (form of VM), containers, serverless,
composable, CI, HCI, and CiB, among others.
1.2.1.2. Data
Servers or other computers need storage, storage needs servers, and I/O networks tie the two
together. The I/O network may be an internal PCIe or memory bus, or an external Wi-Fi
IT Data Infrastructure Fundamentals 15
network for IP connection, or use some other interface and protocol. Data and storage may be
coupled directly to servers or accessed via a networked connection to meet different application
needs. Also, data may be dedicated (affinity) to a server or shared across servers, depending on
deployment or application requirements.
While data storage media are usually persistent or non-volatile, they can be configured
and used for ephemeral (temporary) data or for longer-term retention. For example, a policy
might specify that data stored in a certain class or type of storage does not get backed up, is
not replicated or have high-availability (HA) or other basic protection, and will be deleted or
purged periodically. Storage is used and consumed in various ways. Persistent storage includes
NVM such as flash and NVM, PMEM, and SCM-based SSD, magnetic disk, tape, and opti-
cal, among other forms. Data storage can be accessed as a service including tables (databases),
message queues, objects and blobs, files, and blocks using various protocols and interfaces.
Servers network and access storage devices and systems via various I/O connectivity options
or data highways. Some of these are local or internal to the server, while others can be exter-
nal over a short distance (such as in the cabinet or rack), across the data center, or campus, or
metropolitan and wide area, spanning countries and continents. Once networks are set up,
they typically are used for moving or accessing devices and data with their configurations
stored in some form of storage, usually non-volatile memory or flash-based. Networks can
be wired using copper electrical cabling or fiber optic, as well as wireless using various radio
frequency (RF) and other technologies locally, or over long distances.
There are also various levels of abstraction, management, and access, such as via block, file,
object, or API. Shared data vs. data sharing can be internal dedicated, external dedicated,
external shared and networked. In addition to various ways of consuming, storage can also be
packaged in different ways such as legacy storage systems or appliances, or software combined
with hardware (“tin wrapped”).
Other packaging variations include virtual storage appliance (VSA), as a cloud instance or
service, as well as via “shrink wrap” or open-source software deployed on your servers. Servers
and storage hardware and software can also be bundled into containers (Docker, Windows,
Kubernetes, Openshift), CI, HCI, and CiB, similar to an all-in-one printer, fax, copier, and
scanner that provide converged functionality.
Various software tools (along with some physical hardware tools) are used for managing data
infrastructures along with their resources. Some of these tools are used for defining, coordi-
nating, orchestrating (or choreographing) various resources to the requirements of services and
applications that need them. Other tools are used for monitoring, reporting, gaining insight,
situational awareness of how services are being delivered, customer satisfaction, responsiveness,
16 Data Infrastructure Management: Insights and Strategies
availability, cost, and security. Other management tools are used for defining how, when,
and where data protection—including BC, BR, DR, HA, and backups—is done, along with
implementing security.
There is no information recession; more and more data being generated and processed that
needs to be protected, preserved, as well as served. With increasing volumes of data of various
sizes (big and little), if you simply do what you have been doing in the past, you better have a
big budget or go on a rapid data diet.
On the other hand, if you start using those new and old tools in your toolbox, from disk
to flash and even tape along with cloud, leveraging data footprint reduction (DFR) from the
application source to targets including archiving as well as deduplication, you can get ahead
of the challenge.
Figure 1.9 shows data being generated, moved, processed, stored, and accessed as informa-
tion from a variety of sources, in different locations. For example, video, audio, and still image
data are captured from various devices or sensors including IoT based, copied to local tablets
or workstations, and uploaded to servers for additional processing.
Besides primary data, additional telemetry, log, and metadata are also collected for
processing.
As an example, when I fly one of my drones that records video at 4K (4096 ´ 2160 or Ultra
HD 3840 ´ 2160 resolution), 60 frames per second (60fps), primary video data is about 1GB
every two minutes. This means a 22-minute flight produces about 11GB of video, plus any
lower-resolution thumbnails for preview, along with telemetry.
The telemetry data includes time, altitude, attitude, location coordinates, battery status,
camera and another sensor status, which for a 22-minute flight can be a couple of MBs. Other
telemetry and metadata include event as well as error logs among additional information.
While the primary data (e.g., video) is a few large files that can be stored as objects or blobs,
the metadata can be many small files or stream logs.
Data is also captured from various devices in medical facilities such as doctors’ offices,
clinics, labs, and hospitals, and information on patient electronic medical records (EMR) is
accessed. Digital evidence management (DEM) systems provide similar functionalities, sup-
porting devices such as police body cameras, among other assets. Uploaded videos, images,
and photos are processed using AI, ML, DL, and other cognitive services in real time or batch.
The upload is indexed, classified, checked for copyright violations using waveform analysis or
other software, among other tasks, with metadata stored in a database or key-value repository.
The resulting content can then be accessed via other applications and various devices. These
are very simple examples that will be explored further in later chapters, along with associated
tools, technologies, and techniques to protect, preserve, and serve information.
Figure 1.9 shows many different applications and uses of data. Just as everything is not the
same across different environments or even applications, data is also not the same. There is
“little data” such as traditional databases, files, or objects in home directories or shares along
with fast “big data.” Some data is structured in databases, while other data is unstructured in
file directories.
Data also may have different values at different times. Very often, context is needed: Data can
represent a value or a piece of information, but one must also consider the value or worth of
that data item to the organization. The basic three types of data value are no value, has value,
or has an unknown value.
Data that has unknown value may eventually have some value or no value. Likewise, data
that currently has some form of value may eventually have no value and be discarded, or may
be put into a pool or parking place for data with known value until its new value is determined
or the time has come to delete the data. In addition to the three basic types of data value, there
can be different varieties of data with a value, such as high-value or low-value.
Besides the value of data, access is also part of its life cycle, which means the data may be
initially stored and updated, but later becomes static, to be read but not modified. We will
spend more time in Chapter 2 looking at different types of applications in terms of data values,
life cycles, and access patterns.
• Creation—The Internet of Things (IoT) and the Internet of Devices (IoD), such as sen-
sors and other devices including camera, phones, tablets, drones, satellites, imagery,
telemetry, and log detail data, legacy creation, and machine-to-machine (M2M), along
with AI, ML, DL, and other cognitive, batch, and real-time analytics in support of digi-
tal transformation.
• Curation, Transformation—Transformation, analytics, and general processing, across
different sectors, industries, organizations, and focus areas. Also tagging and adding
metadata to data where applicable to add value and context.
18 Data Infrastructure Management: Insights and Strategies
In addition to the above drivers, there are also various industry- or organization-focused
applications including government, police and military, healthcare and life sciences, security
and surveillance video, media and entertainment, education, research, manufacturing, energy
exploration, transportation, finance, and many others.
A trend has been a move from centralized computer server and storage to distributed and edge,
then to centralized via consolidation (or aggregation), then back to distributed over different
IT Data Infrastructure Fundamentals 19
generations from mainframe to time-sharing to minicomputers to PCs and client servers, to the
web and virtualized to the cloud. This also includes going from dedicated direct attached stor-
age to clustered and shared storage area networks (SAN) or network attached storage (NAS) to
direct attached storage (DAS), to blob and object storage, virtual and cloud storage, and back
to direct attached storage, among other options.
What this all means is that from a server storage I/O hardware, software, and services
perspective, consideration is needed for what is currently legacy, how to keep it running or
improve it, as well as what might be a shiny new thing today but legacy tomorrow, while also
keeping an eye on the future.
This ties into the theme of keeping your head in the clouds (looking out toward the future)
but your feet on the ground (rooted in the reality of today), and finding a balance between what
needs to be done and what you want to do. Figure 1.10 shows, for example, how enterprise and
other storage options are evolving, taking on new or different roles while also being enhanced.
Some of the changes that are occurring include traditional enterprise-class storage systems
being used for secondary roles or combined with software-defined storage management tools,
virtualization, cloud, and other technologies. Also shown in Figure 1.10 from a high level are
how traditional lower-end or open technologies are handling more enterprise workloads.
Common to what is shown in Figure 1.10 as well as in other scenarios is that the trend of
using commercial off-the-shelf (COTS) servers continues to evolve. COTS are also known as
commodity, industry standard, and white box products, with motherboards, processors, and
memory. This means that servers are defined to be data and storage systems or appliances, as
well as being application servers with different hardware configurations to support those needs.
Besides physical servers, many cloud providers also offer cloud machines (instances) with vari-
ous standard configurations ranging from high-performance computing to memory- or I/O-
intensive, among other options.
Another trend centers around convergence (all-in-ones and others), where server, storage,
and I/O hardware is designed with software to support applications as well as storage and
management functions as turnkey solutions. We shall have more to say about these and other
trends later.
Figure 1.11 Evolving server and storage I/O leveraging new technologies.
To see and understand where we are going will help us to see where we have been and where
we are currently. At first glance, Figure 1.11 may look similar to what you have seen in the
past, but what’s different and will be covered in more depth in subsequent chapters is the role
of servers being defined as storage. In addition to being defined as media content servers for
video, audio, or other streaming applications, as a database, email, the web, or other servers,
servers are also increasingly being leveraged as the hardware platform for data and storage
applications.
A closer look at Figure 1.11 will also reveal new and emerging non-volatile memory that can
be used in hybrid ways, for example, as an alternative to DRAM or NVRAM for main mem-
ory and cache in storage systems or networking devices, as well as as a data storage medium.
Also shown in Figure 1.11 are new access methods such as emerging Gen-z, along with NVMe
as well as 100 GbE (and faster) to reduce the latency for performing reads and writes to fast
NVM memory and storage. NVMe, while initially focused for inside the server or storage
system, will also be available for external access to storage systems over low-latency networks.
Building off of what is shown in Figure 1.11, Figure 1.12 provides another variation of how
servers will be defined to be used for deployments as well as how servers provide the processing
capability for running storage application software. In Figure 1.12 the top center is a shrunk-
down version of Figure 1.11, along with servers defined to be storage systems or appliances for
block SAN, NAS file, and Object, Blob, or table (database), or other endpoint API including
converted, nonconverged, and virtual and cloud-based applications.
Figure 1.12 shows how the primary data infrastructure building block resources, including
server, storage, I/O networking hardware, software, and services, combine to support various
IT Data Infrastructure Fundamentals 21
Figure 1.12 Servers as storage and storage as servers, converged and nonconverged.
needs. Not shown in detail is the trend toward protecting data at ingest vs. the past trend of
protecting after the fact. This also means collecting telemetry, metadata, and other informa-
tion about data when it is being stored, to reduce the overhead complexity of searching or
scanning data after the fact for management and protection purposes.
Figure 1.12 also illustrates that fundamentals for server, storage, and I/O networking need
to have a focus spanning hardware, software, services, tools, processes, procedures, techniques,
and tradecraft, even if the resulting solutions are cloud, converged, virtual, or nonconverged.
Remember, the primary role of data infrastructures and servers, storage, and I/O resources
is to protect, preserve, and serve existing as well as new applications and their data in a cost-
effective, agile, flexible, and resilient way.
Some additional considerations about data infrastructure management insight and strategy
on a go-forward basis include where applications will best be deployed. This also means where
the data is generated or primarily accessed from, along with how it is used and shared by other
applications across different locations.
With the increased deployment of mobile users and consumers of information services,
data infrastructures will also need to support a traditional browser or virtual terminal–based
access, as well as smart device apps using iOS-, Android-, and Windows-based platforms,
among others.
Likewise, continued deployment of IoT devices will place additional demands on remote
and edge-based infrastructures, as well as central and core, including on-prem as well as cloud-
based, data infrastructures. Then there are the new technology trends to factor in, from AI to
blockchain distributed ledgers; compute offload including GPU, FPGA, and ASIC; to data
ponds, data lakes, and oceans of information to be considered in subsequent chapters.
22 Data Infrastructure Management: Insights and Strategies
Another trend is around the Development Operations (DevOps) movement, with varia-
tions of No Operations (NoOp) among variations for both legacy and new startup environ-
ments. Some context is needed around DevOps, in that it means different things for various
situations. For example, in some settings—notably, smaller, new, startup-type organiza-
tions—as its name implies, the people who develop, test, and maintain application code also
deploy, manage, and support their code. This approach of DevOps is different from some
traditional environments in which application code development, testing, and maintaining
is done in an organization separate from technical infrastructure (e.g., data infrastructure
and operations).
There are also hybrid variations—for example, some will see DevOps as new and different,
while others will have déjà vu from how they used to develop application code, push it into
production, and provide some level of support. Another variation of DevOps is development
for operations, or development for data infrastructure, or development areas (take care of the tools
and development platforms) or what some may remember as system programmers (e.g., sys
progs). For those not familiar, sys progs were a variation or companion to system admins who
deployed, patched, and maintained operating systems and wrote specialized drivers and other
specials tools—not so different from those who support hypervisors, containers, private cloud,
and other software-defined data infrastructure topics.
Put another way, in addition to learning server storage I/O hardware and software trade-
craft, also learn the basic tradecraft of the business your information systems are supporting.
After all, the fundamental role of IT is to protect, preserve, and serve information that enables
the business or organization; no business exists just to support IT.
Likewise, a container can mean one thing for Docker and micro-services and another with
reference to cloud and object storage or archives.
ARM Can mean Microsoft public cloud Azure Resource Manager as well as a type
of compute processor.
BaaS Back-end as a Service, Backup as a Service, among others.
Buckets Data structures or ways of organizing memory and storage, because object
storage buckets (also known as containers) contain objects (or blobs) or items
being stored. There are also non-object or cloud storage buckets, including
database and file system buckets for storing data.
Containers Can refer to database or application containers, data structures such as
archiving and backup “logical” containers, folders in file systems, as well
as shipping containers in addition to Windows, Linus, Docker, or compute
micro-services.
Convergence Can mean packaging such as CI and HCI, as well as hardware and software,
server, storage, and networks bundled with management tools. Convergence
can occur at different layers for various benefits.
CSV Comma Separated Variables is a format for exporting, importing, and
interchange of data such as with spreadsheets. Cluster Shared Volumes for
Microsoft Windows Server and Hyper-V environments are examples.
DevOp Can refer to development for data infrastructure or technical operations,
as well as environments in which developers create, maintain, deploy, and
support code, once in production.
Flash Can refer to NAND flash solid-state persistent memory packaged in various
formats using different interfaces. Flash can also refer to Adobe Flash for
video and other multimedia viewing and playback, including websites.
Objects There are many different types of server and storage I/O objects, so context
is important. When it comes to server, storage, and I/O hardware and
software, objects do not always refer to object (or blob) storage, as objects
can refer to many different things.
Orchestration Can occur at different levels or layers and mean various things.
Partitions Hard disk drive (HDD), solid-state device (SSD), or other storage device and
aggregate subdividing (i.e., storage sharing), database instance, files or table
partitioned, file system partition, logical partition or server compute and
memory tenancy.
PCI Payment Card Industry and PCIe (Peripheral Computer Interconnect
Express)
PM Physical machine, physical memory, paged memory, persistent memory (aka
PMEM), program manager or product manager, among others.
RAID Can be hardware or software, system, array, appliance, software defined,
mirror and replication, parity and erasure code based.
Serverless Can refer to micro-services including cloud based, such as AWS Lambda,
Function as a Service (FaaS). Context is also used to describe software that is
available, sold, or not dependent on a particular brand of server. Keep in mind
that software still requires a physical server to exist somewhere—granted, it
can be masked and software defined.
IT Data Infrastructure Fundamentals 25
You can find a more comprehensive glossary at the end of this book.
Tip: Knowing about the tool is important; however, so too is knowing how to use a tool and
when along with where or for what. This means knowing the tools in your toolbox, but also
knowing when, where, why, and how to use a given tool (or technology), along with techniques
to use that tool by itself or in conjunction with multiple other tools.
Part of server storage I/O data infrastructure tradecraft is understanding what tools to use
when, where, and why, not to mention knowing how to improvise with those tools, find new
ones, or create your own.
Remember, if all you have is a hammer, everything starts to look like a nail. On the other
hand, if you have more tools than you know what to do with, or how to use them, perhaps fewer
tools are needed along with learning how to use them by enhancing your skillset and tradecraft.
cloud and virtual storage. If your virtual server is running on one of your own servers, you may
be fine using traditional shared storage.
Are clouds safe? They can be if you use and configure them in safe ways, as well as do your
due diligence to assess the security, availability, as well as the privacy of a particular cloud
provider. Keep in mind that cloud data protection and security are a shared responsibility
between you and your service provider. There are things that both parties need to do to pre-
vent cloud data loss.
Why not move everything to the cloud? The short answer is data locality, meaning you will need
and want some amount of storage close to where you are using the data or applications—that
might mean a small amount of flash or RAM cache, or HDD, Hybrid, or similar. Another
consideration is available and effective network bandwidth or access.
With serverless so popular, why discuss legacy, on-prem, bare metal, virtual, or even cloud IaaS?
Good question, and one that for some environments, specific application landscapes, or pieces
of their workloads can make perfect sense.
On the other hand, even though there is strong industry adoption (e.g., what the indus-
try likes to talk about) along with some good initial customer adoption (what customers are
doing), serverless on a bigger, broader basis is still new and not for everybody, at least not yet.
Given the diversity of various applications and environment needs, data infrastructures that
provide flexibility of different deployment models enable inclusiveness to align appropriate
resources to multiple needs.
What is the best data infrastructure? The best data infrastructure is the one that adapts to your
organization’s needs and is flexible, scalable, resilient, and efficient, as well as being effective to
enable productivity. That means the best data infrastructure may be a mix of legacy, existing,
and emerging software-defined technologies and tools, where new and old things are used in
new ways. This also means leveraging public cloud, containers, and serverless, among other
approaches where needed. On the other hand, depending on your needs, wants, or preferences,
the best data infrastructure may be highly distributed from edge to core, data center to the
cloud, or all existing only within a public cloud.
1.7. Strategies
Throughout this book, there are various tips, frequently asked questions (along with answers),
examples, and strategy discussions. All too often the simple question of what is the best is
asked, looking for a simple answer to what are usually complex topics. Thus, when some-
body asks me what is the best or what to use, I ask some questions to try and provide a more
informed recommendation. On the other hand, if somebody demands a simple answer to a
complicated subject, they may not get the answer, product, technology, tool, or solution they
wanted to hear about. In other words, the response to many data infrastructure and related
tools, technology, trends, and techniques questions is often, “It depends.”
Should your environment be software defined, public cloud, cloud native, or lift and
shift (e.g., migrate what you have to cloud or another environment)? What about converged
infrastructure vs. hyper-converged infrastructure (aggregated) vs. cluster or cloud in a box vs.
IT Data Infrastructure Fundamentals 27
serverless or legacy? Take a step back: What services and resources are needed, and what modes
and models of service delivery? Perhaps yours is a one-size solution approach fits-all-needs
environment; however, look at how different technologies and trends fit as well as adapt to your
environment needs vs. you having to work for the technology.
What data infrastructure resources do your applications and associated workloads need
regarding performance, availability (including data protection and security), capacity, and eco-
nomic considerations? This includes server compute (primary general purpose, GPU, ASIC,
FPGA, along with other offloads and specialized processors), memory (DRAM or SCM and
PMEM), I/O (local and wide area), storage (performance and capacity), along with manage-
ment tools.
Another consideration is what your service-level objectives (SLOs) and service-level agree-
ments (SLAs) are for different services provided as part of your data infrastructure. What met-
rics, report, and management insight are available for chargeback billing, invoice reconciliation
and audits, show back for information along with “scare back.” Note that “scare back” is a form
of show back in which information is provided on local on-prem or public cloud resource usage
along with costs that can cause alarm—as well as awareness—about actual spending.
For example, some of your data infrastructure customers decide to deploy their applications
in a public cloud, then bring the invoice to you to pay, as well as support. In other words, some-
body brings you the bill to pay for their cloud resources, and they want you to take care of it.
Some additional considerations include: do they know how much more or less the public cloud
costs are compared to the on-prem services costs? Likewise, how can you make informed deci-
sions about costs regarding public vs. private vs. legacy if you do not know what those fees are?
In other words, part of a strategy is to avoid flying blind and have insight and awareness
into your application landscape, workload characteristics along with resource usage, and cost
to deliver or provide various services.
emerging, with diverse workloads characteristics. The different application workloads require
various data infrastructure resources (server compute, memory, I/O, storage space, software
licenses) along with performance, availability, capacity, and economic (PACE) considerations.
Also, keep in mind that data centers are still alive. However, their role and function are
changing from traditional on-prem to cloud, where they are referred to as AZs. Likewise, data
infrastructures—in addition to existing on-prem in legacy data centers to in clouds that define
and support AZs—also live at the edge for traditional ROBO and for emerging fog computing
(e.g., think small cloud-like functionality at or near the edge).
What this all means is that there is not a one-size-fits-all environment; application work-
load scenarios and data infrastructures need to be resilient, flexible, and scalable. This also
means that there are different strategies to leverage and insight to gain for effective data infra-
structure decision making.
Bottom line: The fundamental role of IT data infrastructures including server, storage I/O
connectivity hardware, and software and management tools is to support protecting, preserv-
ing, and serving data and information across various types of organizations.