0% found this document useful (0 votes)
99 views34 pages

Accelerating Ent Digital Transformation

Bla bla

Uploaded by

yash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
99 views34 pages

Accelerating Ent Digital Transformation

Bla bla

Uploaded by

yash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Open Distributed Unified

MySQL Compatibility Unified Insight Scalability

TiDB offers unparalleled Real time analytics, and TiDB maintains your
ease of migration while the ability to perform data environment at the
protecting your analysis on data as soon optimum scale for current
investment in as it is written provides needs, maximizing
existing applications the agility to adapt to performance and
changing needs and minimizing expense
accommodate growth

Learn more
Accelerating Enterprise
Digital Transformation
with a Next Generation
Database

Steve Suehring

Beijing Boston Farnham Sebastopol Tokyo


Accelerating Enterprise Digital Transformation with a Next Generation Database
by Steve Suehring
Copyright © 2022 O’Reilly Media Inc. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA
95472.
O’Reilly books may be purchased for educational, business, or sales promotional use.
Online editions are also available for most titles (http://oreilly.com). For more infor‐
mation, contact our corporate/institutional sales department: 800-998-9938 or
corporate@oreilly.com.

Acquisitions Editor: Andy Kwan Proofreader: Justin Billing


Development Editor: Gary O’Brien Interior Designer: David Futato
Production Editor: Kate Galloway Cover Designer: Randy Comer
Copyeditor: Sharon Wilkey Illustrator: Kate Dullea

January 2022: First Edition

Revision History for the First Edition


2022-01-11: First Release

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Accelerating


Enterprise Digital Transformation with a Next Generation Database, the cover image,
and related trade dress are trademarks of O’Reilly Media, Inc.
The views expressed in this work are those of the author and do not represent the
publisher’s views. While the publisher and the author have used good faith efforts to
ensure that the information and instructions contained in this work are accurate, the
publisher and the author disclaim all responsibility for errors or omissions, includ‐
ing without limitation responsibility for damages resulting from the use of or reli‐
ance on this work. Use of the information and instructions contained in this work is
at your own risk. If any code samples or other technology this work contains or
describes is subject to open source licenses or the intellectual property rights of oth‐
ers, it is your responsibility to ensure that your use thereof complies with such licen‐
ses and/or rights.
This work is part of a collaboration between O’Reilly and PingCAP. See our state‐
ment of editorial independence.

978-1-098-11860-0
[LSI]
Table of Contents

1. Enterprise Digital Transformation and Business Growth. . . . . . . . . . . 1


Driving Business Growth 2
Adding Business-Intelligent Services 3
Extending Real-Time Insights 4
Achieving Business Agility 5
Summary 5

2. Attributes of Next Generation Databases. . . . . . . . . . . . . . . . . . . . . . . . 7


Why Are New Attributes Necessary? 7
Understanding Success Metrics for Modern Database
Platforms 9
Summary 14

3. Finding the Digital Transformation Path. . . . . . . . . . . . . . . . . . . . . . . 17


Solving Business Imperatives 18
Finding a Balanced Approach 21
Achieving Sustainable Success 22
Summary 23

4. Key Takeaways. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Key Decision Points for the Next Generation Database
Platform 25
Crafting a Long-Term Road Map 27
Summary 27

iii
CHAPTER 1
Enterprise Digital Transformation
and Business Growth

Enterprise digital transformation describes a new way of thinking


about business. In many ways, this transformation turns things
around, from product-centric to customer-centric, with a focus on
value throughout the organization. Big data and the ability to cap‐
ture a steady stream of insights from many disparate sources simul‐
taneously are important drivers of enterprise digital transformation.
This transformation is not merely a new way of increasing revenue
or cutting costs. Rather, enterprise digital transformation is a funda‐
mental and foundational change to the way business is conducted.
Central to the transformation is how data is ingested, consumed,
processed, and analyzed.
No discussion of change in business would be complete without
looking at the fundamental worldwide changes that began in 2020
and continue today. Organizations that positioned themselves with
cloud native technologies were well-placed to thrive in the new envi‐
ronment because these technologies made transitioning to remote
work seamless. These organizations didn’t need to migrate complex
on-premises infrastructure to the cloud, and adjusting capacity to
respond to new demands was a primary architectural component of
their existing cloud native deployments.

1
Many of these changes will permanently alter how business is con‐
ducted, and the pace of change also enables organizations to add
capabilities faster than ever. This chapter examines the modern digi‐
tal enterprise and how business is moving forward thanks to the
digital transformation.

Driving Business Growth


A primary driver of growth in business is a change in the business
environment, whether driven from inside the business or through
external forces. Whether that change presents an opportunity or
poses a threat depends largely on how well the organization is posi‐
tioned to react to the change. In many cases, changes in the business
environment present an opportunity for increased profits through
competitive advantage.
The mere existence of an IT function within an organization was
itself a competitive advantage in the early days of computing. Those
organizations became more efficient and in many ways became
faster than their competitors at identifying new opportunities. How‐
ever, computing resources were quickly deployed by other organiza‐
tions, thus closing the advantages available by simply incorporating
computing into business processes.
Fast-forward to today, and organizations are still seeking profits
from growth opportunities found through competitive advantage.
As before, competitive advantage is short-lived. Finding competitive
advantage frequently means sifting through huge volumes of data,
but that data is often contained in separate systems that have
evolved over many years. Even if the data from these disparate siloed
systems can be analyzed in a coherent manner, the data and events
that generated the data may be too far in the past to be actionable.
Therefore, if a growth opportunity is found, it may be too late to act
on that opportunity or bring the opportunity to market.
Unfortunately, bringing together data silos that have grown over
many years is a difficult task, sometimes prevented merely by inter‐
nal politics within the organization. We need a different approach
entirely: finding growth opportunities through digital transforma‐
tion from within the organization, in much the same way that the
early IT functions changed the nature of business itself.

2 | Chapter 1: Enterprise Digital Transformation and Business Growth


Improving ROI
Because of the pace of change, building IT infrastructure to support
strategic business initiatives can be costly, sometimes with little
return. Cloud-based, on-demand resources help alleviate significant
up-front costs that can push profitability into the future. But infra‐
structure is only one part of the return-on-investment (ROI) story.
Improving ROI means choosing cloud native software that is simple
to deploy in a repeatable and reliable way. Repeatable deployment
facilitates Agile development methodologies and enables developers
and testers to create and use environments as needed without
involving infrastructure specialists. This DevOps-centric approach
is scalable and further helps reduce the overall investment in tech‐
nology, thus providing an opportunity for greater returns.

Driving New Revenue Opportunities


Enterprise digital transformation creates opportunities for new reve‐
nue streams. Vast amounts of data contain deep and detailed infor‐
mation. Previously, even if this data was captured, the cost of mining
for useful and actionable information was prohibitive. Today, data
can be captured and analyzed in real time as a stream to reveal
insights that would previously have been hidden. For example,
connecting purchases and purchasing interest from multiple sources
can generate leads for complementary products and services. Col‐
lecting the purchasing habits of consumers with similar profiles ena‐
bles foresight into potential future needs.

Adding Business-Intelligent Services


Enterprise digital transformation enables businesses to deploy intel‐
ligent services that take advantage of incoming and already-existing
data to make smarter decisions. Rather than building single-use
business-intelligence services from monolithic platforms, a business-
intelligent service is aware of the available data and aware of the
needs of the business. Data is analyzed as it is being recorded and is
used to make better decisions but also to build services that are
more intelligent.
The digital transformation for business-intelligent services moves
the organization from reactive to proactive. For example, predicting
future purchases based on a current basket of goods is a problem

Adding Business-Intelligent Services | 3


that has been solved. However, integrating a customer’s basket of
goods with inventory and stock levels across vendors while also
incorporating sales targets to incentivize the customer to purchase
in real time requires a new business-level awareness.

Extending Real-Time Insights


Many AI-related predictive analytics are well-known, and some‐
times well-solved, problems. Transforming the predictions into real-
time actionable tasks and then doing so at scale is the digital
transformation in action. Scaled analytical platforms work well for
looking at historical data and then using that history to help predict
the future. But being able to create predictions based on incoming
streaming data is necessary for a true digital transformation.
Scaling to meet demand is necessary in order to fully leverage
insights derived from incoming streaming data that is processed in
real time. The amount of incoming data is sometimes predictable,
such as when an advertising campaign is going to begin, or the data
may follow predictable cycles of increase/decrease. At other times,
the amount of incoming data is unpredictable, such as when a social
media interaction is unexpectedly picked up by an influencer.
Opportunities that can be found in data, such as a social media post
gaining popularity unexpectedly, are missed without the ability to
scale up seamlessly to meet demand. Scaling resources down when
not needed is itself a vital capability of a modern digital organiza‐
tion. Unused compute and storage resources are costly while provid‐
ing no tangible benefit.
Scaling simple compute and storage resources is a common capabil‐
ity in many cloud native organizations. However, scaling data-
related capabilities is more difficult and requires a hybrid approach
that incorporates components of analytical and transactional data‐
base architectures. Hybrid transactional and analytical processing
(HTAP) also needs to integrate seamlessly in cloud-based architec‐
tures rather than having cloud capabilities added as an afterthought.
By bringing elements from both analytical and transactional data‐
bases, HTAP helps the business become more agile.

4 | Chapter 1: Enterprise Digital Transformation and Business Growth


Achieving Business Agility
As the use of computing resources within the enterprise has evolved,
projects involving technology have also evolved. In the past, projects
frequently used waterfall multistage methodologies that involved
significant effort attempting to capture all requirements early in the
process. Any changes realized in later stages also required significant
effort to incorporate back into the project. The end results were late
delivery, underwhelming products, and missed opportunities.
Organizations counteracted the long waterfall life cycle with Agile
processes for IT projects. The time to market with an Agile method‐
ology is significantly lower, and the resulting product matches the
business need more accurately. Rather than attempting to capture all
requirements up front, an agile organization embraces change at any
point in the process.
Today, organizations are using those same Agile processes across
business functions. Doing so enables the entire organization to react
quickly, expect change rather than avoid it, and work with a
customer-centric mentality. By embracing change, business itself
is transformed. The ability to react quickly to changes in the exter‐
nal business environment helps reduce any advantage gained by a
competitor. Thinking in terms of customer-centricity enables
opportunities to be found that might have been overlooked in the
past.

Summary
Enterprise digital transformation is an opportunity for organiza‐
tions to fundamentally change the way business is conducted.
Breaking down data silos, moving toward cloud native technologies,
and incorporating Agile processes with industry-leading technolo‐
gies enables the organization to become more proactive rather than
reactive. The digital-first mindset brought by these technologies and
processes moves the organization toward the end goal of helping
decision makers gain insight based on real-time data rather than
historical snapshots.

Achieving Business Agility | 5


CHAPTER 2
Attributes of Next Generation
Databases

Chapter 1 outlined the needs of the modern enterprise and the digi‐
tal transformation required to drive business growth and obtain and
maintain a competitive advantage. Some key elements of that digital
transformation include achieving agility at the business level, gain‐
ing and capitalizing on real-time insights from data, and deploying
intelligent services throughout the enterprise.
This chapter puts those factors in context with a close-up examina‐
tion of new database technology to help achieve the digital transfor‐
mation. New database platforms need to have specific attributes to
help the organization grow and move forward. The chapter begins
with additional information on why it’s important to consider new
factors when evaluating database platforms.

Why Are New Attributes Necessary?


Business needs for data have evolved in the last several years. The
data itself has evolved and grown; in many cases, the data has grown
exponentially. Measures of performance, scalability, reliability, and
interoperability have changed significantly in the last 10 years. Many
times, those measures are simply inadequate when working with
multiterabyte databases. Virtualization and cloud computing have
also created opportunities for organizations to increase and enhance
their usage of data. For example, the ability to capture and then
process terabytes of data no longer requires extensive, internally

7
deployed computing resources; instead, capture and processing can
be done entirely in the cloud while paying only for the use of
resources.
Database platforms and their identified use cases are sometimes
divided into processing transactional data and processing analytical
data. Transactional data needs to be fully available and have fast, fre‐
quently instant, response times. At times, transactional data exists
only for the duration of a session or only for a short period, the
length of the transaction itself. By definition then, transactional data
is not always completely captured for analytical processing later.
This is a missed opportunity.
New opportunities are found by analyzing data, but if analysis
doesn’t occur until after an event, the potential gain is naturally
reduced. Data analysis often requires migration to another system
through an extract, transform, and load (ETL) process that is also
often costly. Being able to extract knowledge and value from data in
real time by combining elements of transactional and analytical pro‐
cessing into a single platform helps bridge the gap. The challenge is
that the transactional-to-analytical bridge needs to exist even for
multiterabyte datasets.
The need for HTAP databases has existed for years. However, the
nature of business has changed so significantly and so rapidly that
enterprises can no longer afford to operate with yesterday’s pro‐
cesses of moving data around. Along with the paradigm shift in
databases are new measures of success that help define and delineate
goals for database platforms in the enterprise. These measures
include the following:

• Ease of scaling
• Ability to maintain continuity of business operations
• Reducing the impact of data silos
• Ease of use
• Open, community-based innovation
• Cloud capabilities

The remainder of this chapter examines each of these factors and


how next generation databases compare to current technologies.

8 | Chapter 2: Attributes of Next Generation Databases


Understanding Success Metrics for Modern
Database Platforms
A modern database needs to provide high performance, even when
working with multiterabyte databases. However, numerous consid‐
erations around database platforms exist beyond raw speed and the
ability to handle large amounts of data. These considerations can be
the key to success and enable an enterprise to gain and keep a com‐
petitive advantage.
This section focuses on key metrics of success by which a database
platform can be measured. The primary focus is on specific consid‐
erations when operating at enterprise levels, though many of these
metrics apply to any database platform.

Providing Ease of Scaling


Scaling refers to the ability to meet performance needs, both in
terms of response time and storage, by increasing compute or stor‐
age resources. Meeting the performance needs demanded by
modern applications can sometimes mean overbuilding or over-
allocating resources that are not needed. For example, storage may
be allocated based on a 12- or 24-month estimate. This means that
for the first several months or more, storage is over-allocated.
An accuracy problem also is inherent in the estimation of future
resources. Estimating compute resources can be a guessing game
and, if incorrect, can mean unacceptable response times. In cases
involving multiple database resources, one system may have idle
capacity while another is severely overloaded.
Predicting the location of the bottleneck is quite difficult because the
application may have been architected in such a way as to render the
prediction useless. An application that performs a complex series of
queries to retrieve data may have significant room for optimization,
but by the time the issue is identified, it is too late in the project life
cycle to address the issue. The frequently used solution is to allocate
more compute resources to the individual database server affected
by the performance issue rather than optimize the application code.
Response-time performance can be further delineated by perfor‐
mance related to reading or writing data—that is, querying and
selecting data or inserting and updating data. Because patterns of

Understanding Success Metrics for Modern Database Platforms | 9


usage are unpredictable, certain tables or even records within a data‐
base table may be accessed more often than others. These areas, fre‐
quently called hot spots, can have a negative impact on performance.
A common solution to avoiding hot spots is to divide, or shard, the
data into logical segments. Each shard of data is then deployed to a
different server to balance the request load among those servers.
However, shards can also become hotspots as data usage patterns
change, which then requires resharding yet again. However, shard‐
ing and resharding the data is a time-consuming process that can
lead to underutilized resources.
An alternative to sharding and resharding is to distribute data
sparsely into regions when the data is written. Regions consist of seg‐
ments of data and shouldn’t be confused with geographical regions.
Rather than having to manually shard and reshard data, modern
database platforms manage the complexity by automatically seg‐
menting data into regions. Regions are then replicated to ensure
high availability.
A distributed database provides a layer of abstraction between com‐
ponents. At the highest level is a stateless Structured Query Language
(SQL) layer, which would traditionally be thought of as the server.
The server is the endpoint to which clients connect in order to inter‐
act with the database. The server can also generate a distributed exe‐
cution plan for a given SQL statement, optimizing the statement
along the way.
Because the server in a distributed database acts as an endpoint and
coordinator, compute resources are the primary resources used by
each node. The server in this distributed architecture can be
deployed behind industry-standard load balancers such as Linux
Virtual Server (LVS), HAProxy, or a load balancer from F5. Each
node participating as a server can utilize additional compute
resources as needed, thus enabling vertical scaling. This level of the
architecture can also be scaled horizontally by adding more servers
behind the load balancer.
Metadata management (including things like the distribution of data
across all of the storage nodes, the current topology of the cluster,
and transaction identifiers) is accomplished on a different layer
within a distributed database architecture. This management layer
also provides administrative visibility into the database with tools
such as a dashboard.

10 | Chapter 2: Attributes of Next Generation Databases


The third and final layer in a distributed database architecture is the
data storage layer. Many popular architectures utilize key/value stor‐
age in this layer. Just as scaling of compute resources is important
for the application-facing endpoint servers, independent scaling of
storage is important for the data storage layer. Storage can also be
scaled up and down as needed, depending on the use case or busi‐
ness need. For example, a large advertising campaign might result in
a significant increase in data from customers interacting with an app
or website. The size of the data storage layer can be increased tem‐
porarily to accommodate the increased demand. When the data is
migrated to an analytics platform database, the size of the data stor‐
age layer can be decreased.
All three of the major layers in a distributed database architecture
can be scaled independently of one another. The combination of the
endpoint server located behind a load balancer along with a meta‐
data management layer coordinating data storage makes it easy to
scale up and down as needed with a cluster. Attempting to estimate
bottlenecks is no longer a requirement, and the time-consuming
task of resharding data becomes much easier.

Maintaining Continuity of Business Operations


Ensuring continuity of business operations involves reducing the
number of single points of failure that can affect the availability of
IT operations. When considering data availability, the need to pro‐
vide multiple copies of data is vitally important. Databases typically
do this by replicating data to multiple servers. In an enterprise sce‐
nario, data is replicated in real time or near real time across servers
and across geographical regions. In this way, an issue that affects one
global region does not adversely affect overall business operations,
because processing can be shifted to other locations that are
unaffected.
Business operations are also affected by noncatastrophic events. For
instance, with a current-generation database, if sharding or reshard‐
ing is necessary, the entire software stack is typically affected, from
the application through the database. The application needs to con‐
tain logic to determine which database server endpoint contains the
relevant data, or the application needs to utilize a metadata service
and make decisions about data retrieval.

Understanding Success Metrics for Modern Database Platforms | 11


When an application needs to make decisions about data retrieval,
writing of new records, and updates to existing records, a chance for
inconsistency exists across the database. If a new shard has just been
deployed, an edge application may not be updated and will still
attempt to write to or retrieve from the previous endpoint. Once
inconsistency creeps into the data, unraveling the result can be diffi‐
cult and certainly reduces confidence in the veracity of the data
within the organization.

Reducing the Impact of Data Silos


As an organization grows and matures, its usage of data changes.
Applications developed more than 10 years ago may have data
stored in legacy systems that can sometimes be difficult to access for
modern, cloud-deployed applications. Getting access to the data in a
form that can be used by those modern applications usually means a
complex ETL process that is both time-consuming to create and
time-consuming to maintain.
Sometimes data silos are not the result of technology factors but
rather historical organizational issues. For example, mergers and
acquisitions frequently result in incompatible database systems
coming together. Merging those systems can be difficult, and often a
distinct business reason for a costly merge is lacking.
When working with a next generation database, the impact of data
silos is reduced. An HTAP database enables abstraction of data silos.
Doing so facilitates access to the data within the silo in such a way
that developers do not need to worry about how to access the data
from a legacy data silo. The ability to provide a consistent experi‐
ence for developers reduces time to market for applications and
increases performance once deployed.

Providing Ease of Use


Data stored in nonstandard formats or within a storage engine that
is not widely supported or is unnecessarily complex adds time and
risk to the application development life cycle. While most
enterprise-level databases are supported, some are complex to oper‐
ate, requiring constant care and maintenance. Both add cost by
making data less accessible and thus increasing time to market for
new innovation.

12 | Chapter 2: Attributes of Next Generation Databases


SQL is the standard for accessing data stored in a relational data‐
base. A long-time leader in relational databases is MySQL, and
developers are familiar with the widely deployed database engine.
Supporting MySQL and enabling developers to access huge amounts
of data through a familiar and well-supported interface further
reduces the time it takes to develop queries and create an
application.
Database administrators (DBAs) increasingly need to support more
and more database engines, more versions, and more platforms.
From native, local relational database deployments to cloud-based
NoSQL stores, DBAs are expected to provide an always-available
experience regardless of the platform. The ubiquity of MySQL ena‐
bles developers to have a familiar experience immediately and use
their expertise for optimization of the database platform.

Promoting Open, Community-Based Innovation


Open source has proven itself time and again over the last 25 years.
The Linux operating system and open source software for web
servers like Apache are the backbone on which much of the internet
operates. Open source software benefits from a community of dedi‐
cated individuals who are responsible for innovation and industry-
leading software.
Innovation is a key component of competitive advantage. Organiza‐
tions at the forefront of technology have embraced open source and
have done so for many years. MySQL and closely related database
platforms have thrived with open source innovation. Community
support and engagement sustains the innovative mentality found in
the open source community. Organizations that contribute to open
source or provide open source versions of their code are frequently
leaders in their area.

Robust and Native Cloud Capabilities


Numerous organizations redesigned their websites to be mobile-first
after smartphones became the de facto standard for many web-
based interactions. In this way, websites are built for and tested on
mobile platforms first so that the visitor will have a rich experience
even when using a phone or tablet rather than a desktop.

Understanding Success Metrics for Modern Database Platforms | 13


Just as mobile-first has changed the way organizations approach
web-related projects, a push has arisen to bring cloud capabilities to
software that was previously desktop- or server-based. While suc‐
cessful in some cases, building an application for the cloud naturally
makes that application work better in the cloud. Not only is the
architecture of such an application built with the cloud in mind, but
the capabilities and robustness of the application increase.
When considering a database platform for the enterprise, cloud
capabilities are a must-have. These capabilities will ensure that the
choice is future-proof, as the platform remains able to evolve with
the modern digital enterprise as well as the pace of change required
to gain and then sustain competitive advantage.

Summary
Databases are sometimes discussed in terms of their use within an
enterprise, either transactional or analytical. However, the modern
enterprise needs to combine these two functions in order to better
serve its customers. An HTAP database platform combines elements
of each type of database so that an enterprise can develop new capa‐
bilities and find new insights without needing to offload analytics to
a different system.
Several metrics can be used to measure the success of running a
given database platform within an enterprise. The ease with which
the database can be scaled up and down to meet changing demands
on processing and capacity is one such metric. Maintaining business
continuity, or being able to continue operations at full capacity even
in the event of a failure of another system, is another important
aspect of database platforms. Helping to eliminate or at least miti‐
gate the effect of data silos is a characteristic needed by enterprise-
level customers today.
Having a familiar set of tools and being compliant with open stand‐
ards makes migration to a different database platform much easier,
and this key metric is often overlooked when considering a change
in an enterprise system. Compliance with open standards is impor‐
tant, as is having a rich community of support from which innova‐
tive ideas can be leveraged. Finally, having cloud native capabilities
is a key success metric for a database platform rather than having
cloud deployments added later.

14 | Chapter 2: Attributes of Next Generation Databases


Together, these metrics help form the basis for evaluating database
platforms. Many solutions are successful for a subset of metrics, and
many database platforms meet one or two of the metrics very well.
But it is rare to find a database platform solution that will truly ach‐
ieve all of the metrics.

Summary | 15
CHAPTER 3
Finding the Digital
Transformation Path

Chapter 2 described several metrics that can be found in a next-


generation database solution. For many organizations, reaching that
solution is not easily accomplished, given the limitations on the
organization, both internally and externally. Beyond technical chal‐
lenges that can sometimes occur when operating at scale, the discus‐
sion around enterprise digital transformation can require different
business units to reach a compromise that enables the organization
to move forward. No longer can data silos from legacy applications
drive the organization. Instead, value needs to be found in siloed
data by integrating it with other data.
Designing successful IT architectures requires striking a balance
between the desire for performance and reliability and the cost of
providing that desired level of performance and reliability. Ideally,
the architecture would perform at a level that provides instant
results, and reliability would result in 100% uptime from all loca‐
tions at all times. For small and trivial workloads, designing an
infrastructure to support these needs may be possible.
However, at scale providing instant access and 100% reliability
quickly becomes prohibitively difficult. Cost itself is not the only
factor. At scale, the technology itself frequently requires significant
redesign to meet the needs of a digitally native organization. This
chapter examines some of the factors involved in reaching the digi‐
tal transformation goal.

17
Solving Business Imperatives
Balancing performance of a database depends on several factors. Of
course, the primary factor driving the need is the intended use of the
database itself. Phrased another way, the business need and usage of
the data drives the necessary database architecture. Analytical pro‐
cesses that are focused primarily on historical data and that don’t
produce new insights can remain on a platform that is built for such
analytical processing. However, sometimes that historical data, when
combined with current data, can yield answers to questions that the
organization hadn’t even considered asking.
The primary driver is the need to not only solve a specific business
problem but also find a technology to facilitate solving problems
that haven’t yet been recognized. Getting to that type of solution
necessarily involves a hybrid approach with both transactional and
analytical components. To provide those capabilities while also pro‐
viding a unified interface to developers and database administrators
means abstracting some components.
A level of abstraction on the database platform means that the devel‐
opment and administrative experience can be the same and not
require additional training to provide support. Abstraction also
facilitates more efficient scaling by focusing scaling activities on spe‐
cific underlying areas of the data that may be hotspots, again while
being seamless to developers and database administrators. The end
result is a faster, more agile approach to all interactions with data,
keeping the database itself out of the way of the developers and ena‐
bling both developers and business users to focus on gaining value
from that data.
Many of the same factors that drive database architecture are the
same regardless of the scale or amount of data. However, solving
performance-related issues becomes more difficult as workloads and
database sizes increase. The addition of multiple business needs
for the same data sometimes creates competing priorities around
performance and results in multiple connection points, some for
transactional needs and others for analytical needs.
Certain business needs, such as an increasing amount of data and an
increasing velocity of data capture, are common to all organizations.
Finding a solution then means solving those problems that are
unique to each organization while also solving the common

18 | Chapter 3: Finding the Digital Transformation Path


problems across all organizations on the digital transformation path.
Potential solutions should be mindful of the overall business need to
drive growth and help find new revenue opportunities.

Optimizing Database Workloads


In general, the faster and more responsive the need, the higher the
cost associated with providing that performance in a reliable man‐
ner. Compute resources are at a premium for responsive data needs,
as are memory resources. Providing reliability for a database work‐
load is more difficult than for other services such as web content.
Data integrity affects all decisions around database performance,
regardless of the size.
Table 3-1 lists several factors related to database workload perfor‐
mance and overall scaling to meet business needs.

Table 3-1. Database workload performance


Metric Description
Processing/ Sizing the available compute resources to meet peak demand is difficult and
compute costly, leading to wasted resources during off-peak times. Transactional and
analytical databases have different needs for compute resources.
Memory Being able to load an entire dataset into memory results in a significant
performance improvement when compared to disk-based storage. Practical
limitations exist on the size of the dataset that can be loaded into memory.
Database engine Database software from different vendors—each with its own set of
hardware requirements, differing implementations, and performance
limitations—creates integration challenges that become more of an issue as
workload demands increase.
Differing database Even with a limited number of different database engines, chances are that
engine versions business applications may support only certain versions of a particular
database engine. This creates additional administration and development
challenges around integration of the data stored within those applications.
Latency demands Responsiveness demands on a query result lead to specific design decisions
for the database architecture.
Monitoring and Questions around how much visibility administrators need in order to monitor
visibility the performance of the database are important to consider when architecting
the design. Too little visibility or cumbersome access decreases the ability for
administrators to proactively address potential performance issues.

Additional layers, such as a proxy layer, can be added to the architec‐


ture to improve performance for transactional processing. Network-
level resources are also a factor. Latency is always important because

Solving Business Imperatives | 19


delays at the network can lead to data being inconsistent among
geographic locations.

Transactional data
The nature of the data itself drives database architecture and design.
Data that is transactional and most valuable at a certain moment in
time requires instant response. As the demands on both read and
write increase, providing that real-time response is difficult. Add the
frequently dispersed geographic needs for the data, and now net‐
work and replication issues become more relevant.

Analytical data
Analytical data does not necessarily need the same level of instant
access that is expected of transactional data. Many times analytical
data was previously transactional and then loaded into an analytical
database backend. While instant access isn’t needed, query complex‐
ity leads to its own set of performance implications around analyti‐
cal datasets.

Combining transactional and analytical


It is through the combination of transactional and analytical data
that new value can be found. This does not simply mean adding
more resources to the transactional side or adding real-time capabil‐
ities to a traditionally analytical database platform. Rather, a new
approach that is cloud native to enable scaling and resiliency can
help bring best-of-industry technologies into a balanced and coher‐
ent approach to the data.

Eliminating Data Silos


Business systems that are created to solve a problem are sometimes
created in a vacuum, without sufficient coordination within an orga‐
nization. Data then becomes trapped, or siloed, in that business
application. Data also ends up in multiple locations and in multiple
formats, making cohesive analysis impossible. Extracting the data to
get it to interact with other organizational data frequently involves
an ETL process that can never seem to run often enough or fast
enough.

20 | Chapter 3: Finding the Digital Transformation Path


A digitally transformed organization has few silos of data that may
be required for compliance and regulatory reasons. However, find‐
ing a means to eliminate silos is key to avoiding duplicate and out-
of-sync data. Many business systems will have a means to interface
with other databases or can be phased out in order to move the
organization forward.
The elimination of data silos frees the organization to have a single
system of record for data, with a known-good set of that data avail‐
able for analysis. Doing so reduces administrative overhead in
merely maintaining those systems. But more important modern
analytical processes can be used to gather insight from that data in
ways that were not feasible in the past. For example, the use of AI
technologies to help with analytical processes is a cost-effective
means to find new patterns that might not have been seen before.
The primary issue is moving away from those ETL processes and
entrenched systems. The existence of silos can also sometimes
indicate a siloed culture within the organization, which can present
another impediment toward digital transformation. But bringing
analytical and transactional data closer to each other is key to
moving forward for the competitive advantage needed in today’s
business climate.

Finding a Balanced Approach


Joining transactional and analytical datasets has been possible for as
long as both types of data have existed. ETL approaches range in
sophistication from simple scripts (which programmatically extract
the data, perform minimal transformation, and then load the data
into the analytical system) to complex software solutions (which
typically require training and their own set of infrastructure special‐
ists just to manage the ETL processes). Obviously, adding ETL pro‐
cesses to an already complex architecture increases the chance for
errors and inconsistencies being introduced into the data itself.
Even when an ETL process runs successfully, operating on large
amounts of data is simply not practical. The insights gained by
transforming large amounts of transactional data into an analytical
backend for further queries are too old by the time those insights are
discovered. Performance of the ETL process itself can be improved,
but this adds cost, which is usually significant when operating at

Finding a Balanced Approach | 21


scale. The potential for gain from the analyzed data may be less than
the cost of the increased ETL performance.
ETL is just one example of the trade-off between cost and perfor‐
mance. Others, such as compute and memory, were discussed
earlier in this chapter. Determining the point where optimal perfor‐
mance and cost intersect is difficult, if not impossible, because of
the changing nature of the data and the business needs around that
data. We need a different path entirely, a path that doesn’t require
trade-off solutions that add both complexity and cost and are meant
to solve only yesterday’s problems.
Hybrid architectures that provide both transactional and analytical
capabilities solve many of the problems around finding a balanced
approach at scale. HTAP solves today’s and tomorrow’s problems
alike by abstracting the problem space itself. When considering how
to provide data to solve a business problem, developers can work
with a single interface and let the processing engine determine the
best way to respond to the query.

Achieving Sustainable Success


Looking ahead to determine future needs has always been challeng‐
ing in any business climate. The velocity of data creation was not
technically feasible, much less envisioned, when many of the current
database products were first developed. The assumptions under
which those platforms were created frequently no longer apply to a
modern, digitally transformed organization.
Even though planning may be difficult, it’s reasonable to assume that
the velocity of data creation will not only continue, but also likely
increase over time. Applications that use that data will also evolve.
Analytical processes will become less labor-intensive as AI becomes
more powerful.
Already today, as AI has improved, processes previously offloaded
to analytical backends will move forward, toward transactional sys‐
tems, so that the results can be acted on in real time. However, cum‐
bersome and time-consuming ETL processes prevent that real-time
analysis from happening. The hybrid approach, providing both
transactional and analytical capabilities, provides the means to gain
insight in real time and at scale.

22 | Chapter 3: Finding the Digital Transformation Path


Summary
The path toward digital transformation is different for every organi‐
zation. Many face challenges around legacy systems that keep data
siloed, and other organizations have regulatory and compliance
concerns that affect their transformation path. However, all organi‐
zations face common challenges around the need for storing more
and more data and realizing insights from that data as soon as
possible.
Hybrid approaches that provide both transactional and analytical
components help organizations move forward on the digital trans‐
formation path. Finding a balance between transactional and analyt‐
ical processing and the various technical components involved is not
always possible with existing tools or is sometimes cost-prohibitive.
Using native capabilities rather than adapting legacy systems is a key
component of the digital transformation and becomes much more
important as data is scaled up and as the organization moves along
its digital transformation path.

Summary | 23
CHAPTER 4
Key Takeaways

This chapter focuses on summarizing the key themes and ideas


found within the book. Some of these include the decision points
where an organization can find value. For example, understanding
the trade-offs between real-time analysis and archival data retrieval
is an important part of determining how far an organization can go
toward digital transformation.
Looking at the long-term horizon can be difficult. But this chapter
also includes factors to consider when looking at the way a database
platform can be shaped for both the present and the future.

Key Decision Points for the Next Generation


Database Platform
Examining the factors necessary to move forward on the digital
transformation path is a key theme throughout this book. Whether
an organization is beginning on the path or is trying to reach the
path’s next milestone, the use of data within the organization will be
essential to success.
Business growth is a key driver behind decisions across the entire
organization. Digital transformation is a means by which the orga‐
nization can drive growth and be able to sustain that growth.
New opportunities can be identified when an organization moves
analytical processes earlier in the data life cycle. For example, analyt‐
ical processes frequently help identify new opportunities for

25
revenue. When analysis results are found closer to data capture, the
organization can act quickly on those opportunities. Ideally, analyti‐
cal processes are happening in real time when the data is being cap‐
tured. However, many of the current database platforms simply
don’t have the capability to perform both transactional and analyti‐
cal processing in real time and at scale. Some platforms can perform
both by using a hybrid approach, combining the best elements of
transactional and analytical processing into a coherent and modem
database experience.
A hybrid approach, combining both transactional and analytical
processing, is the goal. Obviously, reaching that goal is not an
instant switch but rather requires planning to ensure minimal
impact on end users. A key first step is to identify the primary sys‐
tems in which analytical processes are being executed. Which pieces
of the analytical processing can be moved or migrated to a new sys‐
tem? For example, if data is siloed in a legacy system, what options
exist to snapshot that data or capture pieces of the data and move or
replicate those pieces toward a more real-time analysis?
Other considerations include the following:

• Which legacy systems and data will be crucial to success, and


how well does the new database platform integrate with those
systems?
• How easy is it to scale up and down with flexibility in the new
database system?
• What types of resiliency and reliability does the new platform
have, and what are the associated costs of meeting the business
requirements?
• How easy is it for developers to work with the new database
platform?
• What was the original design paradigm for the platform, and
did that paradigm include cloud capabilities natively?

Because each organization is different, other factors may need to be


considered when choosing a database platform on which the digital
transformation can succeed.

26 | Chapter 4: Key Takeaways


Crafting a Long-Term Road Map
In the near term, it’s relatively easy to see that an open source,
community-based focus can be leveraged in such a way as to gain
insight and collaborate in new ways. It’s also easy to see that data will
continue to grow. From there, assuming that data collection will
continue to increase and that the demands on knowledge from that
data will also increase appears to be reasonable.
With those factors in mind, a long-term road map should be based
on success itself. Deploying solutions that require significant
expense to merely provide redundancy is looking not at the current
or future needs for a database platform but rather at yesterday’s
needs. Technologies built on a standard client/server model, assum‐
ing low-latency and reliability, are also not looking toward the
future.
Rather than making decisions based on assumptions, the exercise of
removing assumptions about connectivity, availability, and reliabil‐
ity helps architect better solutions. Removing assumptions about
data size and usage helps further develop the road map. However,
certain assumptions can be made, such as the increasing use of AI,
intelligent services, and cloud-based deployments to help bring
additional fidelity to the road map.
Business agility in the context of long-term planning means finding
inflection points where course changes can be made. In an agile
context, revisiting the assumptions using the current reality is the
primary means to become aware of the need for changes. Turning
agility into ability is possible when following best practices for
digital transformation.

Summary
Capturing data when that data is increasing in size, speed, and com‐
plexity is a difficult but solvable problem. Performing extended
analytical processing on data, even at scale, is also a well-known and
solved problem. Providing a means to both capture and process
data, in real time and at scale, is the next level that will enable
insights to be found and acted on quickly. The hybrid transactional
and analytical processing concept can help organizations achieve the
next step in enterprise digital transformation.

Crafting a Long-Term Road Map | 27


About the Author
Steve Suehring is an assistant professor of computing and new
media technologies at the University of Wisconsin–Stevens Point.
Steve has worked as an editor for LinuxWorld Magazine and has
written several books on a variety of technologies, including
JavaScript, Linux security, MySQL, and others. Specializing in highly
technical, advanced-level subject matter, Steve has worked with cli‐
ents of all sizes to write technical white papers, corporate briefs,
internal and external facing documentation, and executive technical
summaries.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy