0% found this document useful (0 votes)
128 views10 pages

BDE ManagedHadoopDataLakes PAVLIK PDF

This document discusses production use cases for managed Hadoop data lakes. It begins by outlining how data lakes can accelerate time to market, identify new products and markets, and enable advanced analytics. It then provides examples of data lake uses for healthcare, telecommunications, retail, and financial services. The document also discusses challenges in building, managing, and deriving value from data lakes and requirements for a managed data lake solution to address these challenges. Finally, it includes examples of reference architectures for network and healthcare data lakes.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
128 views10 pages

BDE ManagedHadoopDataLakes PAVLIK PDF

This document discusses production use cases for managed Hadoop data lakes. It begins by outlining how data lakes can accelerate time to market, identify new products and markets, and enable advanced analytics. It then provides examples of data lake uses for healthcare, telecommunications, retail, and financial services. The document also discusses challenges in building, managing, and deriving value from data lakes and requirements for a managed data lake solution to address these challenges. Finally, it includes examples of reference architectures for network and healthcare data lakes.

Uploaded by

Raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Production Use Cases of Managed Hadoop Data Lakes

October 27, 2015


Greg Pavlik | Senior Solutions Architect
gpavlik@zaloni.com

Hadoop: Unleashing big datas potential across industries


Accelerating time to market
Identifying new products and markets
Enabling advanced analytics for improved decision-making
Healthcare

Telco

Retail

Financial Services

e.g. Patient 360o


view, Fraud

e.g. Network
analysis

e.g. Improved
decision-making,
Customer 360o view

e.g. Customer 360o


view, Fraud

Zaloni Confidential and Proprietary - Provided under NDA

The promise of a Hadoop Data Lake: All data is welcome.


Dont limit the data from which you can derive value.
Store all types of structured and unstructured data.
Dont worry about having all the answers today.
Store complete raw data - you can go back as your understanding crystalizes.
Dont limit how you can query the data.
Use various tools to get the insights on the data.
Dont build walls.
Provide democratized access via a single unified view across the Enterprise.

Zaloni Confidential and Proprietary - Provided under NDA

Tackling data lake complications


Building Hadoop
data lake:

Managing Hadoop
data lake:

Deriving value from


data lake:

Rate of Change:

Ingestion:

Quality Issues:

Skills Gap:

Lack of Visibility:

Reliance on IT:

Complexity:

Privacy and Compliance:

Reusability:

Keeping up with constantly


evolving Hadoop ecosystem
Lack of expertise in both
development and architecture
Numerous components to
integrate: Hardware, software,
applications

Difficulty getting data into the


data lake effectively
Lack of data visibility and
transparency
Addressing data privacy and
compliance issues

Need for improved data quality


control
Business users must rely on IT to
prepare data for analysis
Lack of automation means
constantly re-creating the wheel

Requirements for a Managed Data Lake


Unified Data Management: Integrated solution to manage
the entire data pipeline (instead of point products)
Managed Ingestion: Simplified onboarding of new data
sets, managed so that IT knows where data comes from
and where it lands
Integrated
Data Lake
Management

Data Reliability: Confidence that your analytics are always


running on the right data, with the right quality.
Data Visibility: Metadata management capabilities allow
you to keep track of what data is in Hadoop, its source, its
format and its lineage.
Data Security: Ensuring access control, provides data
masking (e.g. PII)
Self Service: End user accessibility to leverage the data
in the Data Lake

Data Lake Reference Architecture


Hadoop Data Lake
Source
Systems

Transient
Loading
Zone

Raw Data

Integrate to
common format

Refined
Data

Data Validation
Data Cleansing
Aggregations

Consumption
Zone

OLTP or ODS
File Data

Enterprise Data
Warehouse

DB Data

Trusted
Data

Original
unaltered data
attributes

Reference Data

Data Wrangling
Data Discovery
Exploratory Analytics

Discovery
Sandbox

ETL Extracts

Master Data

Tokenized Data

Logs

(or other unstrctured data)

Streaming

Cloud Services

{}

APIs

Metadata

Data
Quality

Data
Catalog

Security

Business Analysts
Researchers
Data Scientists

Data lake reference architecture


Hadoop Data Lake
Source
Systems

Transient
Loading
Zone

Raw Data

Integrate to
common format
Dta Validation
Data Cleansing
Aggregations

Refined
Data

Consumption
Zone

OLTP or ODS
File Date

Enterprise Data
Warehouse

Trusted
Data

Original
unaltered data
attributes

Reference Data

Master Data

DB Date
Data Wrangling
Data Discovery
Exploratory Analytics

Discovery
Sandbox

ETL Extracts
Tokenized Data

Logs
(or other unstrctured date)

Streaming
Metadata
Cloud Services

APIs

Data
Quality

Data Catalog

Security

Business Analysts
Researchers
Data Scientists

Network Data Lake Reference Architecture


Custom Applications

Exploration and Ad-hoc Analytics

BI Tools

Subscriber Usage
Customer Churn
Capacity Planning
Customer 360

BI Tools

Custom Apps

Network Data Lake


HDFS

Bedrock Network Data Models

Data Warehouse

Landing Zone + Bedrock Network Data Collectors

Unstructured Data
CRM
Billing

IPFIX
SNMP
RADIUS

CDR

DPI

Healthcare Data Lake Reference Architecture


Data Sources
Relational
Streaming
File

Edge Node

Data Lake

OGG Adapter

Consumers
Analytical
Applications

Data

Enterprise Data
Warehouse

Stream Adapters

Hadoop Cluster

Export

BDCA

Data Sets

Apps/ Analytics Tools


Bedrock Application
Manager

Transformations

Claims

EMR
Bedrock Applications Manager

Lab/Pathology
Pharmacy
Member
Social
Enterprise Data

Configure Ingestion
Operations and
Metadata Store

Administer Metadata
Data Quality &
Rules Engine

Manage, Monitor, Schedule

Query Builder

Work flow
Executor

HEDIS Reporting
Bundle Payments
Readmission Risk
Medical Benefits
Management
Scorecards
Enterprise Reports

Visit zaloni.com or
Contact us at info@zaloni.com

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy