0% found this document useful (0 votes)

93 views12 pages

Metadata Management On A Hadoop Eco-System: Whitepaper by

Uploaded by

Ramkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views12 pages

Metadata Management On A Hadoop Eco-System: Whitepaper by

Uploaded by

Ramkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Metadata management on a Hadoop

Eco-System Whitepaper by
Satya Nayak
Project Lead, Mphasis

Jansee Korapati
Module Lead, Mphasis
Introduction • A
well-built metadata layer will allow organization
The data lake stores large amount of structured and to harness the potential of data lake and deliver the
unstructured data in various varieties at different following mechanisms to the end users to access data
transformed layers. While the data is growing to terabytes and perform analysis:
and petabytes, and your data lake is being used by the - Self Service BI (SSBI)
enterprise, you are likely to come across questions/
- Data-as-a-Service (DaaS)
challenges, such as what data is available in the data lake,
how it is consumed/prepared/transformed, who is using - Machine Learning-as-a-Service
this data, who is contributing to this data, how old is the
- Data Provisioning (DP)
data… etc.
A well maintained metadata layer can effectively answer
these kind of queries and thus im-prove the usability of
the data lake. This white paper provides the benefits of You can optimize your data lake
an effective metadata layer for a data lake implemented
using Hadoop Cluster; information on various metadata to the fullest with metadata
Management tools is presented, with their features and management.
architecture.

Benefits and Functions of

Metadata Layer • T
he metadata Layer defines the structure of files in Raw
• The metadata layer captures vital information about Zone and describes the entities in-side the file. Using
the data as it enters the data lake and indexes the the base level description, the schema evolution of the
information so that users can search metadata before file or record is tracked by a versioning schema. This will
they access the data it-self. Capturing metadata is eventually allow you to create association among several
fundamental to make data more accessible and to entities and thereby, facilitate browsing and searching
extract value from the data lake. the data that the end user is looking for.
• The metadata layer provides significant information • In the consumption layer, it is very convenient to know
about the background and significance of the data the source of the data while going through a report as
stored in the data lake to its users. This is accomplished there might be different version of the input data.
by intelligently tagging every bit of data as it is ingested.
• Clarity of relationships in metadata helps in resolving
ambiguity and inconsistencies when determining the
The Metadata layer associations between entities stored throughout data
environment.

Where it fits in the System?

Stewardship Metadata layer is a general purpose component which will
Patterns & Entity & be used across different layers and captures data about
Trends Attrib info data from various layers. Let us take an example of typical
data flow of a data lake. Data is ingested from disparate
source systems like databases, streaming logs, sen-sor
data and social media and initially stored in transient
zones. It, then, goes through various phases like data
Identification validations checks, business rule transformations and
Metadata Data Lineage
Info finally ends up in the presentation layer. The metadata is
captured during all these phases in different layers.

Distribution
Quality info info
Data
Metadata layer can access data
Version from multiple layers.

Fig. 1
Various Metadata Management Tools
Here is the list of different metadata management tools
that can capture metadata on a Hadoop cluster. There is
no order of preference for this list. These tools are listed in
You can choose from metadata
a random order. management tools widely
• C
loudera Navigator: Cloudera Navigator is a data
available, as per your business
governance solution for Hadoop, offering critical requirement.
capabilities such as data discovery, continuous
optimization, audit, lineage, metadata management,
and policy enforcement. As part of ClouderaEnterprise,
Cloudera Navigator is critical to enabling high-
performance agile analytics, supporting continuous
Features of architecture of commonly
data architecture optimization, and meeting regulatory used Metadata Tools
compliance requirements. 1. Cloudera Navigator
• A
pache Atlas: Currently in Apache incubator, this is a Cloudera Navigator is a proprietary tool from Cloudera
scalable and extensible component which can create, for data management in Hadoop Eco-System. It primarily
automate and define relationship on data for metadata provides two solutions in the area of Data Governance.
in the data lake System. You can also export metadata
to third party system from Atlas. It can be used for data
discovery and lineage tracking.
• Apache Falcon: Falcon is aimed at making the feed
processing and feed management on Hadoop clusters Source Data
easier for their end consumers. System Zone Integration
Discovery
• H
Catlog: HCatalog is a table and storage management
layer for Hadoop that enables users with different data
processing tools like Pig and MapReduce to read and Data
write data on the grid more easily. Provisioning
Transient
Enrichment
• L
oom: Loom provides metadata management and Zone
data preparation for Hadoop. The core of Loom is an External
extensible metadata repository for managing Access
business and technical metadata, including data
lineage, for all the data in Hadoop and surrounding
systems. Loom’s active scan framework automates the Raw Zone Data Hub Intenal
generation of metadata for Hadoop data by crawling Processing
HDFS to discover and introspect new files.
• W
aterline: Waterline Data automates the creation and
management of an inventory of data assets at the field
level, empowering data architects to provide all the Information Lifecycle Management layer
data the business needs through secure self-service.
It ensures data governance policies are adhered to, by
enabling data stewards to audit data lineage, protect Metadata Layer
sensitive data, and identify compliance issues.
• G
round Metadata (AMP Lab): Ground is a data Security & Governance Layer
context system, under development at University of
California, Berkeley. It is aimed at building a flexible, Fig. 2
open source, vendor neu-tral system that enables users
to classify about what data they have, where that data Data Management
is flowing to and from, who is using the data, when the
data changed, and why and how the data is changing. Data management provides visibility into and control
Among other things, we believe a data context system over the data residing in Hadoop data stores and the
is particularly use-ful for data inventory, data usage computations performed on that data. The features
tracking, model-specific interpretation, reproducibility, included here are:
interoperability, and collective governance.
• A
uditing data access and verifying access privileges: Data Encryption
The goal of auditing is to capture a complete and
Data encryption and key management provide a critical
immutable record of all the activities within a system.
layer of protection against potential threats by malicious
Cloudera Navigator auditing features add secured,
actors on the network or in the datacenter. Encryption
real-time audit components to key data and access
frame-works. Cloudera Navigator allows compliance and key manage-ment are also required for meeting
groups to configure, collect, and view audit events, and key compliance initiatives and ensuring the integrity of
understand who accessed what data and how. your enterprise data. The following Cloudera Navigator
components enable compliance groups to manage
• Searching metadata and visualizing lineage - Cloudera encryption:
Navigator metadata management features allow DBAs,
data stewards, business analysts, and data scientists to • Cloudera Navigator Encrypt transparently encrypts and
define, search, and amend the properties, and tag data secures data at rest without requiring changes to your
entities and view relationships between datasets. applications and ensures there is minimal performance
lag in the encryption or decryption process.
• Policies - Cloudera Navigator policy features enable
data stewards to specify automated ac-tions based on • Cloudera Navigator Key Trustee Server is an enterprise-
data access or on a schedule to add metadata, create grade virtual safe-deposit box that stores and manages
alerts, and move or purge data. cryptographic keys and other security artifacts.
• Analytics - Cloudera Navigator analytics features enable • Cloudera Navigator Key HSM allows Cloudera Navigator
Hadoop administrators to examine data usage patterns Key Trustee Server to seamlessly integrate with a
and create policies based on those patterns. hardware security module (HSM).
Cloudera Navigator Metadata Architecture

HDFS
Clodera Navigator Metadata Architecture

Navigator UI
Hive

Impala

Cloudera MapReduce
Manager Navigator
Server Navigator API
Metadata Server
Oozie

Spark

Navigator Storage
YARN Database Directory

Fig. 3

The Navigator Metadata Server performs the following functions:

• O
btains connection information about CDH services • Indexes and stores entity metadata
from the Cloudera Manager Server • Manages authorization data for Navigator users
• A
t periodic intervals, extracts metadata for the entities • Manages audit report metadata
managed by those services
• Generates metadata and audit analytics
• M
anages and applies metadata extraction policies
• Implements the Navigator UI and API
during metadata extraction
2. Apache Atlas Centralized Auditing

Atlas is a Data Governance initiative from Hortonworks on • Capture security access information for every
Hadoop cluster. It was initially started from Hortonworks application, process, and interaction with data
and then taken over to Apache as a top level project. • Capture the operational information for execution, steps,
Atlas is a scal-able and extensible set of core foundational and activities
governance services – enabling enterprises to ef-fectively Search & Lineage (Browse)
and efficiently meet their compliance requirements within
Hadoop and allows integra-tion with the whole enterprise • P
re-defined navigation paths to explore the data
data ecosystem. classification and audit information
• Text-based search features locate relevant data and
Features audit event across data lake quickly and accurately
Data Classification • Browse visualization of data set lineage allowing users
to drill-down into operational, security, and provenance
• Import or define taxonomy business-oriented related information
annotations for data
• Define, annotate, and automate capture of relationships Security & Policy Engine
between datasets and underly-ing elements including • Rationalize compliance policy at runtime based on data
source, target, and derivation processes classification schemes, attrib-utes and roles.
• Export metadata to third-party systems • Advanced definition of policies for preventing data
derivation based on classification (i.e. re-identification) –
Prohibitions
• Column and Row level masking based on cell values
and attributes.

Apache Atlas

Healthcare Financial Energy Retail Other

HIPAA SOX PPDM PCI CWM

HL7 Dodd-Frank PLL

REST API

Services

Search Lineage Exchange

Knowledge Store
Data Lifecycle Tag Based
Taxonomies Managment Policies

Type-System

Policy Rules Audi Store Real Time Tag

Based Access
Models Control

Fig. 4
REST API

Bridge
Search DSL
Connectors

Messaging Framework

Hive Storm Kafka

Search
Falcon Sqoop Others

Type System
Graph DB Repository

Fig. 5
In terms of implementation, Atlas has the following components to accomplish the design.
• Web service: This exposes RESTful APIs and a web user interface to create, update and query metadata.
• Metadata store: Metadata is modeled using a graph, implemented using the Graph database Titan. Titan has options
for a variety of backing stores for persisting the graph, including an embedded
Berkeley DB, Apache HBase and Apache Cassandra. The choice of the backing store determines the level of service
availability.
• Index store: For powering full text searches on metadata, Atlas also indexes the metadata, again via Titan. For the full
text search feature, it can use backend systems like Elastic Search or Apache Solr.
• Bridges/Hooks: To add metadata to Atlas, libraries called ‘hooks’ are enabled in various systems like Apache Hive,
Apache Falcon and Apache Sqoop, which capture metadata events in the respective systems and propagate them to
Atlas. The Atlas server consumes these events and updates its stores.
• M
etadata notification events: Any updates to metadata in Atlas, either via the Hooks or the API, are propagated from
Atlas to downstream systems via events. Systems like Apache Ranger consume these events and allow administrators
to act on them, for e.g. to configure policies for Access control.
• Notification server: Atlas uses Apache Kafka as a notification server for communica-tion between hooks and
downstream consumers of metadata notification events. Events are written by the hooks and Atlas to different Kafka
topics. Kafka enables a loosely coupled integration between these disparate systems.
Bridges/Hook
External components like hive/sqoop/storm/falcon should model their taxonomy using type system and register the
types with Atlas. For every entity created in this external component, the corresponding entity should be registered
in Atlas as well. This is typically done in a hook, which runs in the external component and is called for every entity
operation. Hook generally processes the entity asynchronously using a thread pool to avoid adding latency to the main
operation.

Atlas exposes notification interface and can be used for reliable entity registration by hook as well. The hook can send
notification message containing the list of entities to be registered. Atlas service contains hook consumer that listens to
these messages and registers the entities.
Notification Server Design:
Notification is used for reliable entity registration from hooks and for entity/type change notifications. Atlas, by default,
provides Kafka integration, but it’s possible to provide other implementations as well. Atlas service starts embedded
Kafka server by default. Atlas also provides NotificationHookConsumer that runs in Atlas Service and listens to messages
from hook and registers the entities in Atlas.

Hook Hook Notification

Atlas message message Hook
Hook
Consumer
Notification
Server (Kafka)

Hive Apache
Entity/Trait/ Atlas
Type change
notification

Notification
Listener
(Ex: Ranger)

Fig. 6
3. Apache Falcon
Apache Falcon addresses enterprise challenges related to Hadoop data replication, business continuity, and lineage
tracking by deploying a framework for data management and process-ing. Falcon centrally manages the data lifecycle,
facilitates quick data replication for business continuity and disaster recovery and provides a foundation for audit and
compliance by track-ing entity lineage and collection of audit logs.
Apache Falcon is a framework to simplify data pipeline processing and management on Hadoop clusters. It makes on-
boarding new workflows/pipelines simpler, with support for late data handling and retry policies. It allows user to easily
define relationships between various data and processing elements and integrate with metastore/catalog such as Apache
Hive/HCatalog. Finally it also captures the lineage information for feeds and processes.
Following is the high level architecture of Apache Falcon.

Cluster Feed Process

Submit & Schedule Falcon Entities

Frequent
Replication Archival Rentention
Feeds

Frequent
Monitoring Feeds
Falcon Engine
Late Data Exception
Lineage Audit Arrival Handling

Falcon Entities are manifested as

Oozie Coorinator

Fig. 7
4. Waterline Data Inventory
Waterline Data is a data marketplace platform provider, combining an automated data inventory with a self-service
data catalog. The inspiration for “Waterline” came from the meta-phor of the data lake where data is hidden below
the waterline. 80% of business value is cre-ated from Big Data by discovering data, and 80% of data is discovered by
finding and under-standing trusted data. The mission of Waterline Data is to accelerate time to value for data discovery
by helping business analysts and data scientists find, understand, and provision in-formation assets through self-service
– without having to “dive” for data – and by helping data stewards provide agile data governance with automated and
crowd-sourced business seman-tics and data lineage.

INVENTOY FIND &

PROVISION WRANGLE
UNDERSTAND
INGEST ANALYZE
PREDICT
GOVERN VISUALIZE

DATA END USER

LAKE WATERLINE TOOLS

DATA

Fig. 8

Enterprises are evolving their data lakes to encompass all enterprise information as-sets, creating a “data marketplace”
for the business users-–a logical data layer and catalog over physical data assets, to find, understand, and provision data
assets at the speed of the business. Waterline Data is pioneering the “data marketplace” platform by providing smart
data discovery to discover and crowd-source business metadata, data lineage, and infonomics, underlying a self-service
business data catalog across on-premise and cloud data sources.
Features Summary
The below table lists the top features and the tools that support them.

Feature Cloudera Atlas Falcon

Navigator
Data Discovery Yes Yes Yes
Continuous Optimization Yes - -
Audit Yes Yes Yes
Lineage Yes Yes Yes
Security & Policy Enforcement Yes Yes -
Export Metadata to thirdparty system - Yes -
Provenance Yes Yes Yes
Multi-Cluster Replication - - Yes
Dataset Retention/Archival Yes Yes Yes
Late data Handling - - Yes
Automation Yes Yes Yes
Feed Management - - Yes
Data Classifications - Yes -
Text Based Search Yes Yes -

Conclusion
In this digital age, IT industry is heavily dependent on data and its innovative usage. As we are able to capture enormous
data and do analytics on them, there is a lot of inherent values getting realized every day in the business. As we keep
storing data from various sources, managing data gets difficult day-by-day. Since data goes into more complexity,
Metadata Management is very critical for companies. Community development is helping to create vari-ety of tools to
address this challenge.
References
1. https://www.cloudera.com/products/cloudera-navigator.html
2. http://atlas.incubator.apache.org/
3. http://hortonworks.com/apache/atlas/
4. https://falcon.apache.org/
5. http://hortonworks.com/apache/falcon/
6. http://hortonworks.com/apache/ranger/
7. http://www.waterlinedata.com/
Satya Nayak
Project Lead, Mphasis
Satya Nayak has 11 years of experience in enterprise applications and
delivering software solutions. Satya has provided variety of solutions to
various industry domains in his career. He has been working on Big Data
and related technologies for 2.5 years. Currently he is working as a
Project Lead, which involves creating designs and making innovative
solutions for client requirements in Big Data Space.
Satya enjoys exploring new domains and taking up challenging roles.
Satya is Cloudera Certified Spark and Hadoop Developer (CCA-175) and
MapR Certified Spark Developer (MCSD).

Jansee Korapati
Module Lead, Mphasis
Jansee Korapati has 7+ years of experience in IT services, which includes
2 years in implementing Data Lakes using Big Data technologies like
Hadoop, Spark, Sqoop, Hive, Impala and Flume. She is an expert in
assessment and performance tuning of the long running queries in
Hive with massive data.
Jansee has been working for Mphasis for 6 months as Module Lead.
She is responsible for metadata management, ingestion and transformation
layers of the current project.

About Mphasis
Mphasis is a global technology services and solutions company specializing in the areas of Digital, Governance and Risk & Compliance. Our solution focus
and superior human capital propels our partnership with large enterprise customers in their digital transformation journeys. We partner with global financial
institutions in the execution of their risk and compliance strategies. We focus on next generation technologies for differentiated solutions delivering
optimized operations for clients.

For more information, contact: marketinginfo@mphasis.com

USA USA UK INDIA
VAS 9/12/16 US LETTER BASIL4188

460 Park Avenue South 226 Airport Parkway 88 Wood Street Bagmane World Technology Center
Suite #1101 San Jose London EC2V 7RS, UK Marathahalli Ring Road
New York, NY 10016, USA California, 95110 Tel.: +44 20 8528 1000 Doddanakundhi Village
Tel.: +1 212 686 6655 USA Mahadevapura
Bangalore 560 048, India
Tel.: +91 80 3352 5000

www.mphasis.com
Copyright © Mphasis Corporation. All rights reserved.
Various Metadata Management Tools HCatlog: HCatalog is a table and storage
•
Here is the list of different metadata management tools management layer for Hadoop that enables users
that can capture metadata on a Hadoop cluster. There is with different data processing tools like Pig and
no order of preference for this list. These tools are listed in MapReduce to read and write data on the grid more
a random order. easily.

Cloudera Navigator: Cloudera Navigator is a data

• Loom: Loom provides metadata management and
•
governance solution for Hadoop, offering critical data preparation for Hadoop. The core of Loom is an
capabilities such as data discovery, continuous extensible metadata repository for managing business
optimization, audit, lineage, metadata management, and technical metadata, including data lineage, for all
and policy enforcement. As part of Cloudera the data in Hadoop and surrounding systems. Loom’s
Enterprise, Cloudera Navigator is critical to enabling active scan framework automates the generation
high-performance agile analytics, supporting of metadata for Hadoop data by crawling HDFS to
continuous data architecture optimization, and discover and introspect new files.
meeting regulatory compliance requirements. Waterline: Waterline Data automates the creation and
•
Apache Atlas: Currently in Apache incubator, this
• management of an inventory of data assets at the field
is a scalable and extensible component which can level, empowering data architects to provide all the
create, automate and define relationship on data data the business needs through secure self-service.
for metadata in the data lake System. You can also It ensures data governance policies are adhered to, by
export metadata to third party system from Atlas. It enabling data stewards to audit data lineage, protect
can be used for data discovery and lineage tracking. sensitive data, and identify compliance issues.

Apache Falcon: Falcon is aimed at making the

• • Ground Metadata (AMP Lab): Ground is a data
feed processing and feed management on Hadoop context system, under development at University of
clusters easier for their end consumers. California, Berkeley. It is aimed at building a flexible,
open source, vendor neu-tral system that enables
users to classify about what data they have, where
that data is flowing to and from, who is using the
data, when the data changed, and why and how the
data is changing. Among other things, we believe
a data context system is particularly use-ful for
data inventory, data usage tracking, model-specific
interpretation, reproducibility, interoperability, and
collective governance.

Features of Architecture of Commonly

Source Data
Used Metadata Tools
System Zone Integration Discovery
1. Cloudera Navigator
Cloudera Navigator is a proprietary tool from Cloudera
Data for data management in Hadoop Eco-System. It primarily
Provisioning
Transient Enrichment provides two solutions in the area of Data Governance.
Zone
External
Access

Raw Zone Data Hub Intenal

Processing

Data Management
Information Lifecycle Management layer
Data management provides visibility into and control
over the data residing in Hadoop data stores and the
Metadata Layer computations performed on that data. The features
included here are:
Security & Governance Layer

Fig. 2

Module 7: Database and Data Warehouse: Objectives
No ratings yet
Module 7: Database and Data Warehouse: Objectives
14 pages
Unit-1: Overview and Concepts Data Warehousing and Business Intelligence
No ratings yet
Unit-1: Overview and Concepts Data Warehousing and Business Intelligence
27 pages
KMS Admin Manual
No ratings yet
KMS Admin Manual
37 pages
OsokeyServerlessComputingSeismicWhitepaperAWS 2019
No ratings yet
OsokeyServerlessComputingSeismicWhitepaperAWS 2019
24 pages
Current Log
No ratings yet
Current Log
23 pages
White Paper Modern Data Stack
No ratings yet
White Paper Modern Data Stack
21 pages
log
No ratings yet
log
10 pages
Knowledge Base Management System SRS
No ratings yet
Knowledge Base Management System SRS
19 pages
Ingram Spark Global Print and Ebook Agreement
No ratings yet
Ingram Spark Global Print and Ebook Agreement
29 pages
ITU Big Data Standard
No ratings yet
ITU Big Data Standard
38 pages
DataOps Methodology
No ratings yet
DataOps Methodology
8 pages
Marklogic Server: MLCP User Guide
No ratings yet
Marklogic Server: MLCP User Guide
140 pages
2412.18641v2
No ratings yet
2412.18641v2
50 pages
Syllabus MLIS 2020-2021
No ratings yet
Syllabus MLIS 2020-2021
142 pages
Newsletter - July 2023
No ratings yet
Newsletter - July 2023
3 pages
Research Data Governance & Materials Handling Policy
No ratings yet
Research Data Governance & Materials Handling Policy
10 pages
Srinivasan Padmanabhan Resume
No ratings yet
Srinivasan Padmanabhan Resume
6 pages
Meteorology Thesis Topics
75% (4)
Meteorology Thesis Topics
5 pages
Music Platforms and The Optimization of Culture
No ratings yet
Music Platforms and The Optimization of Culture
9 pages
Data Governance Final v1
100% (1)
Data Governance Final v1
750 pages
ProMoTe A Data Product Model Template For Data Meshes
No ratings yet
ProMoTe A Data Product Model Template For Data Meshes
18 pages
Advanced Reporting and Etl For Mongodb: Easily Build A 360-Degree View of Your Customers and More
No ratings yet
Advanced Reporting and Etl For Mongodb: Easily Build A 360-Degree View of Your Customers and More
20 pages
What Is A File Extension
No ratings yet
What Is A File Extension
4 pages
Information Governance Analysis and Strategy
No ratings yet
Information Governance Analysis and Strategy
44 pages
A Guide To Social Media Intelligence
No ratings yet
A Guide To Social Media Intelligence
36 pages
Data Warehouse
No ratings yet
Data Warehouse
26 pages
Deployment Guidelines For CDGC, CDQ, CDP, and CDMP
No ratings yet
Deployment Guidelines For CDGC, CDQ, CDP, and CDMP
75 pages
Data Lakehouse
No ratings yet
Data Lakehouse
7 pages
Multimedia Database
No ratings yet
Multimedia Database
30 pages
ORACLE-BASE - Oracle Data Pump (Expdp and Impdp) in Oracle Database 10g
No ratings yet
ORACLE-BASE - Oracle Data Pump (Expdp and Impdp) in Oracle Database 10g
13 pages
Kudu
No ratings yet
Kudu
9 pages
Cloudera Kudu
100% (1)
Cloudera Kudu
102 pages
New, Changed, and Deprecated Features For Microsoft Dynamics AX 2012
No ratings yet
New, Changed, and Deprecated Features For Microsoft Dynamics AX 2012
207 pages
INTRODUCTION TO LIBRARIES AND INFORMATION INSTITUTIONnotes
No ratings yet
INTRODUCTION TO LIBRARIES AND INFORMATION INSTITUTIONnotes
9 pages
Cloudera Hadoop Introduction PDF
100% (1)
Cloudera Hadoop Introduction PDF
50 pages
Data Governance & IT's Increasing Importance
No ratings yet
Data Governance & IT's Increasing Importance
8 pages
Apache HIVE
No ratings yet
Apache HIVE
9 pages
Drill Slides
No ratings yet
Drill Slides
14 pages
Big Data Technology Stack
100% (1)
Big Data Technology Stack
12 pages
Data Strategy Guide
No ratings yet
Data Strategy Guide
18 pages
TOGAF® Business Architecture Level 1 Study Guide
From Everand
TOGAF® Business Architecture Level 1 Study Guide
Andrew Josey
No ratings yet
Er Studio Da Quick
No ratings yet
Er Studio Da Quick
72 pages
Hadoop ECO System
No ratings yet
Hadoop ECO System
1 page
Big Data Landscape 2017
No ratings yet
Big Data Landscape 2017
1 page
Get Multi-domain master data management: advanced MDM and data governance in practice Morgan Kaufmann Publishers. free all chapters
100% (2)
Get Multi-domain master data management: advanced MDM and data governance in practice Morgan Kaufmann Publishers. free all chapters
41 pages
IDQ - 1WMP Data Migration Use Cases
100% (1)
IDQ - 1WMP Data Migration Use Cases
11 pages
iCEDQ Ebooks - DataOps Implementation Guide
100% (1)
iCEDQ Ebooks - DataOps Implementation Guide
13 pages
Speed Your Data Lake ROI
100% (1)
Speed Your Data Lake ROI
16 pages
Cloudera Introduction
No ratings yet
Cloudera Introduction
93 pages
USAID Sample Evaluation Report Template Final
100% (1)
USAID Sample Evaluation Report Template Final
18 pages
Data Warehouse Massively Parallel Processing Design Patterns
100% (1)
Data Warehouse Massively Parallel Processing Design Patterns
28 pages
EB6546
No ratings yet
EB6546
8 pages
Eshant Garg: Azure Data Engineer, Architect, Advisor
No ratings yet
Eshant Garg: Azure Data Engineer, Architect, Advisor
44 pages
EB2406 - Teradata PDF
No ratings yet
EB2406 - Teradata PDF
18 pages
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
No ratings yet
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
59 pages
Data Prep Ebook Snowflake 1
No ratings yet
Data Prep Ebook Snowflake 1
8 pages
Enabling Scalable OLAP Directly On A Data Lakehouse Architecture
No ratings yet
Enabling Scalable OLAP Directly On A Data Lakehouse Architecture
39 pages
Tableau Performance Optimization Flow Chart 2020
No ratings yet
Tableau Performance Optimization Flow Chart 2020
3 pages
How To Sell A Data Warehouse To Upper Management Checklist
No ratings yet
How To Sell A Data Warehouse To Upper Management Checklist
6 pages
Govindarajan Data Vault PDF
100% (1)
Govindarajan Data Vault PDF
29 pages
Case Studies of Open Source Data Quality Management
No ratings yet
Case Studies of Open Source Data Quality Management
64 pages
Top 50 Data Warehousing Interview Questions & Answers
No ratings yet
Top 50 Data Warehousing Interview Questions & Answers
8 pages
LUF-MDM-002 Informatica MDM Hub Installation and Configuration Guide v01.1
100% (1)
LUF-MDM-002 Informatica MDM Hub Installation and Configuration Guide v01.1
50 pages
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
Rapid Fire BI: A New Approach To Business Intelligence Tableau
No ratings yet
Rapid Fire BI: A New Approach To Business Intelligence Tableau
16 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
XBRL US Pacific Rim Workshop Database and Business Intelligence Workshop Karen Hsu Director Product Marketing, Informatica
No ratings yet
XBRL US Pacific Rim Workshop Database and Business Intelligence Workshop Karen Hsu Director Product Marketing, Informatica
18 pages
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
No ratings yet
Federated vs. Centeralized vs. De-Centeralized Data Warehouse
5 pages
How To Scale Data Governance
100% (1)
How To Scale Data Governance
13 pages
Data Architecture
No ratings yet
Data Architecture
1 page
Designing A Data Governance Model Based
No ratings yet
Designing A Data Governance Model Based
7 pages
Ds Data Quality Business Intelligence
No ratings yet
Ds Data Quality Business Intelligence
2 pages
Data Architect or ETL Architect
100% (1)
Data Architect or ETL Architect
4 pages
Alation 1 Pager
No ratings yet
Alation 1 Pager
2 pages
Idq 1
No ratings yet
Idq 1
13 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Battle of The Giants - Comparing Kimball and Inmon
No ratings yet
Battle of The Giants - Comparing Kimball and Inmon
15 pages
Report - Atlan - Data Catalog Primer
100% (1)
Report - Atlan - Data Catalog Primer
24 pages
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
No ratings yet
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
12 pages
Data Migration Deloitte Solution-Siemens
No ratings yet
Data Migration Deloitte Solution-Siemens
2 pages
Profisee Datasheet Integrator 8.5x11
No ratings yet
Profisee Datasheet Integrator 8.5x11
1 page
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Data Vault and HQDM Principles PDF
No ratings yet
Data Vault and HQDM Principles PDF
8 pages
ETL vs. ELT: Frictionless Data Integration - Diyotta
100% (1)
ETL vs. ELT: Frictionless Data Integration - Diyotta
3 pages
Data-Quality Brochure 6787
No ratings yet
Data-Quality Brochure 6787
6 pages
Data Architecture Is Composed of Models
No ratings yet
Data Architecture Is Composed of Models
7 pages
Insurance DataWare House Design Vechiles
No ratings yet
Insurance DataWare House Design Vechiles
2 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
Global Services
No ratings yet
Global Services
15 pages
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Metadata Management On A Hadoop Eco-System: Whitepaper by

Uploaded by

Metadata Management On A Hadoop Eco-System: Whitepaper by

Uploaded by

Metadata management on a Hadoop

Benefits and Functions of

Where it fits in the System?

The Navigator Metadata Server performs the following functions:

Healthcare Financial Energy Retail Other

HIPAA SOX PPDM PCI CWM

Search Lineage Exchange

Policy Rules Audi Store Real Time Tag

Hive Storm Kafka

Hook Hook Notification

Cluster Feed Process

Falcon Entities are manifested as

INVENTOY FIND &

DATA END USER

Feature Cloudera Atlas Falcon

For more information, contact: marketinginfo@mphasis.com

Cloudera Navigator: Cloudera Navigator is a data

Apache Falcon: Falcon is aimed at making the

Features of Architecture of Commonly

Raw Zone Data Hub Intenal

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.