0% found this document useful (0 votes)
191 views8 pages

MIT Dremio A New Paradigm For Managing Data

Uploaded by

Sara Candeias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
191 views8 pages

MIT Dremio A New Paradigm For Managing Data

Uploaded by

Sara Candeias
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Open data lakehouse architectures speed insights and

deliver self-service analytics capabilities.

Produced in association with

A new paradigm
for managing data
2  MIT Technology Review Insights

Key takeaways

1
Business leaders recognize the

R
imperative to build a data-driven
egeneron Pharmaceuticals, a biotechnology culture, but they are challenged
company that develops life-transforming by enormous amounts and varying
medicines, found itself inundated with types of data, as well as their legacy
vast volumes of data during the peak of data management systems.

2
the covid-19 pandemic. In order to derive
Functioning as a single environment
actionable information from these disparate data sets, to capture all types of data while
which ranged from clinical trial data to real-time supply also enabling business intelligence
chain information, the company needed new ways to and analytics, a data lakehouse can
join and relate them, regardless of what format they be a “best-of-both-worlds” data
were in or where they came from. architecture solution.

3
A data lakehouse unites disparate data
Shah Nawaz, chief technology officer and vice president
types and use cases, providing simple,
of digital technology and engineering at Regeneron,
self-service data access across the
says, “At the time, everybody in the world was reporting
organization—while also simplifying
on their covid-19 findings from different countries and in IT workloads.
different languages.” The challenge was how to make
sense of these massive data sets in a timely manner,
assisting researchers and clinicians, and ultimately
getting the best treatments to patients faster. After
all, he says, “when you’re dealing with large-scale In response, many organizations, including Regeneron,
data sets in hundreds, if not thousands, of locations, are turning to a new form of data architecture as a
connecting the dots can be a complex problem.” modern approach to data management. In fact, by
2024, more than three-quarters of current data lake
Regeneron isn’t the only company eager to derive more users will be investing in this type of hybrid “data
value from its data. Despite the enormous amounts of lakehouse” architecture to enhance the value generated
data they collect and the amount of capital they invest in from their accumulated data, according to Matt Aslett,
data management solutions, business leaders are still a research director with Ventana Research.
not benefitting from their data. According to IDC
research, 83% of CEOs want their organizations to be “Data lakehouse” is the term for a modern, open data
more data driven, but they struggle with the cultural and architecture that combines the performance and
technological changes needed to execute an effective optimization of a data warehouse with the flexibility of
data strategy. a data lake. But achieving the speed, performance,
agility, optimization, and governance promised by this
technology also requires embracing best practices that
prioritize corporate goals and support enterprise-wide
collaboration.

“When you’re dealing with large-scale


data sets in hundreds, if not thousands,
of locations, connecting the dots can
be a complex problem.”
Shah Nawaz, chief technology officer, Regeneron
MIT Technology Review Insights 3

A look inside
data lakehouse
capabilities
At the core of data lakehouse technology
are five key features.

Transaction support: Though data lakes work well with


unstructured data, they lack data warehouses’ support
for ACID transactions. ACID refers to the four properties
that define a database transaction (atomicity, consistency,
isolation, and durability) and ensure data consistency
and reliability across the enterprise. Data lakehouses,
however, are designed to support ACID transactions, as
well as other key features of traditional data warehouse
workloads.

Business intelligence and analytics support:


It’s not enough to simply gather and store data.
Organizations must be able to parse that data for
meaningful and impactful insights. For this reason,
data lakehouse technology can connect directly to
business intelligence (BI) dashboards and interactive
analytics on data in the data lake, which are difficult
The evolution of data management to access with traditional data architectures.
To appreciate the capabilities of the data lakehouse, it’s
Open data architecture: Leveraging open standards,
helpful to understand the history of data management. technologies, and formats within the open data lakehouse
Data warehouses were developed in the 1980s to enables data teams to choose the best tools and
effectively store structured data from business systems execution engines for each analytic workload. This makes
in the data center. But over the last fifteen years, as the data teams more flexible and efficient—and more able
volume and variety of data collected by businesses grew, to easily adopt the next wave of innovative technologies.
data lakes emerged as a more flexible, scalable, and
Decoupled storage and compute: The concept of
cost-effective alternative. separating compute and storage has been around for
years, but next-generation cloud data storage enables
Data lakes eliminate the need to store different data the separation of compute and data in a lakehouse
formats in different environments, and they can house architecture, with data being its own tier. Managing
large volumes of semi-structured and unstructured data. storage and compute using separate clusters provides
With the right tools in place, data lakes can be used with the ability to better manage costs and scale as necessary.
It also enables workload isolation, so data consumers
a variety of analytics products, enabling organizations
don’t have to compete for resources.
to glean insights from all types of data. “If the data
warehouse is a dedicated environment, the data lake, Governance and security: Given the damaging impact
in theory at least, is more enterprise-wide, with data of a data breach, organizations can’t afford to risk
coming in from multiple sources and used by multiple exposure to bad actors. By serving as a single repository
parts of the organization,” says Aslett. for all of an organization’s data, a data lakehouse can
simplify security measures and improve data governance.
Flexible security controls ensure that data can be safely
But just like data warehouses, data lakes also have
accessed from data sources across the enterprise, while
their shortcomings. Designed to accommodate all data seamless integration with an IT environment’s existing
formats, data lakes can become disorganized, making security controls allows organizations to employ their
it challenging to use them with analytics tools and hard existing security elements, such as authentication and
to properly secure and govern them. Additionally, data authorization.
4  MIT Technology Review Insights

lakes struggle with query performance at the scale and


concurrency required for enterprise-grade business
“It’s about having the
intelligence (BI) and reporting. flexibility and scale of a
Many organizations have tried to handle these data lake but also the
challenges by building out multiple data management
systems—one or many data lakes alongside several data
necessary transactional
warehouses, as well as additional specialized systems, guarantees and
such as image databases. But this approach introduces
multiple layers of complexity to the data architecture and
performance of a
creates unnecessary work for data teams. data warehouse.”
To leverage data in a data lake for BI and reporting in  omer Shiran, founder and
T
this type of mixed environment, data teams must build chief product officer, Dremio
custom pipelines to move and transform data into
proprietary data warehouses and properly prepare data
sets for analysis. These pipelines are often manual and Enter the data lakehouse—a big data storage
ad hoc, resulting in “a slow process for companies to architecture that serves as a single repository for all
work through,” according to Tomer Shiran, founder and types of data while also enabling business intelligence
chief product officer at Dremio, an open data lakehouse and analytics capabilities. Rather than integrate or
provider. Consequently, data engineers become focused replace data warehouses and data lakes, data lakehouse
on responding to data access requests rather than on technology offers “the best of both worlds in one system,”
more long-term organizational priorities. says Shiran. “It’s about having the flexibility and scale of
a data lake but also, in use cases where the data should
be structured, the necessary transactional guarantees
and performance of a data warehouse.”
n E oro m su o
ma n
u ts n
a d
id sptar e types of tad a The result, says Aslett, is a modern and open
architecture that “adds structured data management
and data processing capabilities, such as analytics,
acceleration, and table formats to support consistency.
What’s more, a data lakehouse enables environments

56 % 32%
ORGANIZATIONS
to better support data from multiple sources and to be
used for multiple use cases and applications.”
ORGANIZATIONS MANAGING
MANAGING 20+ DATA Nawaz attests to the value of bringing data together
10+ DATA SOURCES SOURCES at Regeneron. He explains, “By joining cross-functional
domain data, regardless of where it sits, data lakehouse
technology enables us to create an entire value
chain story from early discovery all the way to
commercialization of new products.”
58%
ORGANIZATIONS From simplification to
USING “BIG DATA” greater collaboration
This data management architecture can benefit business
leaders and IT teams alike. Chief among the advantages
is a data lakehouse’s ability to deliver secure, self-service
Source: The Evolving World of Analytics and Data: Market Insights
data access, liberating data with live, interactive queries
from Benchmark Research, Ventana Research, 2022 directly on Amazon S3, Microsoft Azure Data Lake
Storage, HDFS, or another S3 storage solution. The
MIT Technology Review Insights 5

Top tad a cultue


r se
r pa e
b nets
snoit az g rO htiw eht segno rt a d e rutl c po t ( )elitrauq e ra
4.6x o
m e r likely to su e tad a in o jam r e
d cisions
6.3x om e r likely to su e tad a in iad ly e m etings
x1 .8 o
m e r likely to su e tad a in o rp a c a h to work
7. 01 x o
m er likely to su e tad a to suo p rt o
rp o
p sals
Source: “How Data Culture Fuels Business Value in Data-Driven Organizations,” IDC Thought Leadership White Paper, 2021

result is greater self-sufficiency and faster insights for warehouse team, a separate BI team, a separate data
data consumers at a time when organizations can’t science team, and a separate data lake team. However,
afford to waste time preparing data for analysis. once you merge technology in a way that works for all
these use cases, it can change a company’s culture.
This architecture provides users with easy access to It’s really an opportunity to bring together people and
data for a wide variety of tasks. A marketing manager, processes.”
for example, may wish to reduce customer churn while
a data analytics team leverages data to predict factory In fact, “by bringing data together” while encouraging
maintenance issues. In the case of Regeneron, the “knowledge sharing across the organization,” Aslett
company improves the lives of patients by combining says data lakehouse technology can drive innovation,
data—both structured and unstructured—in a single, enabling teams “to develop new projects, new initiatives,
centralized repository. “If we’re trying to address our and new ideas” for a distinctly competitive edge.
patients’ needs, we strongly feel that there has to be a
connected data ecosystem so that we can respond much Strategies for data
quicker,” says Nawaz. Whatever the scenario, employees management success
at Regeneron are empowered to discover, curate, analyze, To fully realize the benefits of modern data architecture,
and share datasets with a distinctly self-service mindset. organizations must establish best practices. One such
practice is viewing the modernization of IT infrastructure
IT teams also benefit from data lakehouse technology. as not only a technological feat but also as a critical
By simplifying infrastructure, a data lakehouse can cultural shift.
significantly ease the burden on time-strapped IT teams.
“There are advantages in only having one environment to “It’s a people, process, and technology issue,” says
manage,” says Aslett. In fact, he says, not only does data Shiran. As a result, he says, “Organizations have to
lakehouse technology “consolidate multiple different embrace cultural changes, especially well-established
data spread across the organization,” but it can also companies that have legacy systems and IT processes
“consolidate the numerous platforms companies have, and architecture.” Nawaz agrees. Creating a data
reduce data silos, improve knowledge sharing, and lakehouse “is a paradigm shift,” he says. “We’re helping
enhance information flows.” to shape how our organization thinks about analytics
as a whole.”
Another advantage of a data lakehouse is its power to
encourage enterprise-wide collaboration. “When all For many organizations, this means adopting a
this technology was separate, people and processes data-centric approach to all aspects of the business.
were separate,” says Shiran. “There was a separate “Organizations that are more successful are making
6  MIT Technology Review Insights

cultural changes to make data the focal point of the “From the laboratory system to the shop floor system,
organization, in terms of driving the development of nearly twenty different systems contain supply chain
new products and initiatives,” says Aslett. data,” he says. “But how do you streamline supply chain
analytics? In order to run the supply chain business, we
But facilitating any type of cultural shift requires C-suite need to join all these data streams together. That’s one
commitment. “Cultural change has to come from the top area where data lakehouse technology can be applied.”
and it has to be driven by leaders,” says Aslett. “A lack of
leadership buy-in can be a real impediment to success Not only are use cases growing, so too are the potential
with data analytics.” beneficiaries of this technology. In the future, Aslett says,
data lakehouse capabilities will be “more suitable for
Another essential is developing a deep understanding supporting self-service analytics where business leaders
of how your data architecture will serve your business and senior executives will access data themselves rather
needs. “People talk about data and it sounds great, but than have reports and dashboards created for them.”
at the end of the day, what’s in it for the business? Is it
really making an impact?” asks Nawaz. For now, though, data lakehouse technology is helping
companies like Regeneron unite disparate data types
To answer these important questions, companies must and a wide variety of workloads in a single, big-data
consider what they hope to achieve from their data storage solution. After all, says Nawaz, “To respond to
management efforts. Nawaz says Regeneron invested patient needs effectively, we believe that all these dots
heavily in the “thought process and design thinking” need to be connected.”
around building its modern data architecture.

“Organizations that are more successful are making


cultural changes to make data the focal point of the
organization, in terms of driving the development of
new products and initiatives.”
Matt Aslett, analyst, Ventana Research
MIT Technology Review Insights 7

“A new paradigm for managing data” is an executive briefing paper by MIT Technology Review Insights. We would like to
thank all participants as well as the sponsor, Dremio. MIT Technology Review Insights has collected and reported on all
findings contained in this paper independently, regardless of participation or sponsorship. Laurel Ruma and Teresa Elsey
were the editors of this report, and Nicola Crepaldi was the publisher.

About MIT Technology Review Insights


MIT Technology Review Insights is the custom publishing division of MIT Technology Review, the world’s
longest-running technology magazine, backed by the world’s foremost technology institution—producing
live events and research on the leading technology and business challenges of the day. Insights conducts
qualitative and quantitative research and analysis in the U.S. and abroad and publishes a wide variety of
content, including articles, reports, infographics, videos, and podcasts. And through its growing MIT Technology
Review Global Insights Panel, Insights has unparalleled access to senior-level executives, innovators, and
entrepreneurs worldwide for surveys and in-depth interviews.

From the sponsor


Dremio is the easy and open data lakehouse, providing self-service analytics with data warehouse
functionality and data lake flexibility across all of your data. Founded in 2015, Dremio is headquartered in
Santa Clara. CNBC recognized Dremio as a Top Startup for the Enterprise and Deloitte named Dremio to its
2022 Technology Fast 500. To learn more, follow the company on GitHub, LinkedIn, Twitter, and Facebook,
or visit www.dremio.com.

Illustrations
Cover art by Adobe Stock and spot illustrations created by Chandra Tallman with icons by The Noun Project and Adobe Stock.

While every effort has been taken to verify the accuracy of this information, MIT Technology Review Insights cannot accept any responsibility or liability for reliance on any person
in this report or any of the information, opinions, or conclusions set out in this report.

© Copyright MIT Technology Review Insights, 2023. All rights reserved.


MIT Technology Review Insights
www.technologyreview.com
@techreview @mit_insights
insights@technologyreview.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy