0% found this document useful (0 votes)
65 views41 pages

Chapter 09 MRSHuawei's Big Data Platform

Uploaded by

Njabulo Clement
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views41 pages

Chapter 09 MRSHuawei's Big Data Platform

Uploaded by

Njabulo Clement
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

MRS: Huawei's Big Data Platform

Foreword

⚫ This chapter first provides an overview of Huawei's big data platform, MRS,
before looking at its advantages and application scenarios. Then, it
describes some MRS components, including Hudi, HetuEngine, Ranger, and
LDAP+Kerberos authentication. Finally, it illustrates the MRS cloud-native
data lake baseline solution.

2 Huawei Confidential
Objectives

⚫ Upon completion of this course, you will possess a deeper understanding


of:
 Huawei's big data platform, MRS
 Hudi, HetuEngine, Ranger, and LDAP+Kerberos security authentication
 MRS cloud-native data lake baseline solution

3 Huawei Confidential
Contents

1. Overview of MRS

2. MRS Components

3. MRS Cloud-Native Data Lake Baseline Solution

4 Huawei Confidential
Trends in the Evolution of Big Data Technology

BI Report Data Machine Real-time BI Report Machine/Deep Stream Data science BI


science learning processing learning analytics

Raw Pre-cleansed Data Data


Data marts Data preparation ETL data data warehouse marts
Data marts
Unified catalog Unified security

Data warehouse Data lakehouse


Hadoop data lake
ETL
---- ---- ----
---

2020s: Cloud-native data lake era


PC era 2000s - Internet era 2010s - Mobile Internet era (cloud, AI, and multi-dimensional analysis)

Big data MRS AI Time sequence/


Data Time sequence/
Data mart MRS AI Spatiotemporal
warehouse Data warehouse Spatiotemporal

Single-node system Distributed + general-


+ midrange Cluster + Distributed + Cloud
appliance purpose server MRS DWS
computer
Centralized Distributed Lakehouse Multi-lake collaboration, lake warehouse
A standalone
system architecture with architecture with collaboration, and cross-domain, cross-cloud
sufficed. high reliability high scalability collaborative analysis

5 Huawei Confidential
Huawei Cloud Services
⚫ Huawei Cloud is Huawei's signature cloud service brand. It is a culmination of Huawei's 30-plus years
of expertise in ICT infrastructure products and solutions. Huawei Cloud is committed to providing
stable, secure, and reliable cloud services that help organizations of all sizes grow in an intelligent
world. To complement an already impressive list of offerings, Huawei Cloud is pursuing a vision of
inclusive AI, a vision of AI that is affordable, effective, and reliable for everyone. As a foundation,
Huawei Cloud provides a powerful computing platform and an easy-to-use development platform for
Huawei's full-stack all-scenario AI strategy.
⚫ Huawei aims to build an open, cooperative, and win-win cloud ecosystem and helps partners quickly
integrate into that local ecosystem. Huawei Cloud adheres to business boundaries, respects data
sovereignty, does not monetize customer data, and works with partners for joint innovation to
continuously create value for customers and partners.

6 Huawei Confidential
Huawei Cloud MRS
⚫ MapReduce Service (MRS) is used to deploy and manage Hadoop systems on Huawei Cloud.
⚫ MRS provides enterprise-level big data clusters on the cloud. Tenants have full control over clusters and
can easily run big data components such as Hadoop, Spark, HBase, Kafka, and Storm. MRS is fully
compatible with open-source APIs, and incorporates the advantages of Huawei Cloud computing and
storage and big data industry experience to provide customers with a full-stack big data platform
featuring high performance, low cost, flexibility, and ease-of-use. In addition, the platform can be
customized based on service requirements to help enterprises quickly build a massive data processing
system and discover new value points and business opportunities by analyzing and mining massive
amounts of data in real time or at a later time.

7 Huawei Confidential
MRS Highlights

Decoupled storage and compute First-class performance and experience


• A unified data lake eliminates data silos; a single copy • Full-stack performance acceleration and millisecond-
of data is enough, with no need for data transfer; level response to millions of metadata records by
multiple computing engines, making flexible allocation means of four-level vertical optimization on hardware,
and on-demand scaling of storage and computing data organization, computing engine, and AI intelligent
resources possible; 30% more cost-effective than the optimization, providing users with the ultimate
industry average performance experience

Cutting-edge open-source technologies High security and availability


• In-depth reconstruction of mainstream engines, such as • Cross-AZ HA of a single cluster, eliminating single points
Spark, Hive, and Flink, with key technologies such as of failure (SPOFs), rolling patch installation/upgrade,
indexing, caching, and metadata; Huawei-developed task reconnection upon disconnection, and zero service
CarbonData with millisecond-level point query; Superior interruption; multi-level security assurance capabilities,
Scheduler for 20,000+ nodes in a single cluster such as network resource isolation, account security,
and data security control

8 Huawei Confidential
MRS Architecture
Huawei Cloud FusionInsight MRS cloud-native data lake Cloud-native architecture for agile
building of data lakes
Real-time Offline Interactive Real-time Multi-modal Manager • Easy deployment: one-click cluster creation
analytics analysis query retrieval analysis and service provisioning within 30 minutes
Automatic • Agile building: unified data import, metadata
deployment management, and security management for
Convergent

rapid construction of big data platforms


processing

ClickHouse Hive HBase Large cluster • Decoupled storage and compute: independent
HetuEngine management compute and storage resource expansion for
Kafka Redis IoTDB convenient deployment of gPaaS & AI DaaS
Spark Tenant services like DataArts Studio and GES
management
Flink Tez Elasticsearch Three cloud-native data lakes
Refined with one architecture
Scheduler monitoring
Yarn Superior • Offline data lake, real-time data lake, and
scheduling
Centralized logical data lake
alarm • Specialized data marts: Data warehouse
management

Data lake formation handling within a data lake, shortening the analysis
link and construction period
Data

Data Data Online log


CDL security Ranger retrieval Continuous evolution of
ingestion
enterprise-class versions
Backup & DR • Multi-engine converged analysis, improving
analysis efficiency by 30%
Data storage

Storage engine: Hudi • Up to 21,000 nodes are deployed in a cluster,


Data Rolling
storage upgrade supporting cluster federation
HDFS | OBS • Rolling upgrade for continuous evolution
without service interruptions
Huawei Cloud PM VM BMS • High availability: geo-redundancy with two-
site, three-center DR

9 Huawei Confidential
MRS Application Scenarios

Offline data lake Real-time data lake Logical data lake


• High-performance interactive query • High timeliness: Real-time incremental data import • Cross-lake, cross-domain, cross-warehouse, and
engine enables data processing within to the lake within seconds, from T+1 to T+0 all-domain data collaborative analysis
the lake. • High resource utilization: Incremental data is • Reduced data migration for computing without
• Unified metadata, global data scattered into the lake, meaning 2x more moving data, 50x higher analysis efficiency
visualization enhanced resource utilization • 10x faster service rollout efficiency (weeks ->
• Converged analysis and unified SQL • Stream-batch convergence with unified SQL days)
query are supported. interfaces

Abundant specialized data marts


The "three data lakes + one
• Lakes and marts in the same cluster for unified data mart" meets customers'
management and seamless interconnection
requirements during different
• Full self-service, millisecond-level real-time
OLAP analysis of ClickHouse phases of data lake
• High-throughput, low-latency time series construction.
database IoTDB

10 Huawei Confidential
MRS in Hybrid Cloud: Data Base of the FusionInsight
Intelligent Data Lake
Government Enterprises Finance Internet Ecosystem
collaboration
Accumulating and sharing industry digital assets Application enablement | Open
ecosystem

Data enablement
AI enablement Application
FusionInsight intelligent data lake enablement
Data
ModelArts
Application MRS + DWS + DataArts Studio ROMA intelligence
Data enablement | AI enablement
Developer
Connected
Organization Compute Storage Network Security

Everything Basic service infrastructure Unified management


Leading cloud infrastructure upgrades through full-stack innovations
Enterprise perspective |
Service support
Unified ecosystem and APIs for
consistent experience

Hybrid cloud Public cloud Edge cloud Resource


convergence
Multi-architecture computing |
Huawei Cloud Stack: enables industry digital transformation through one Smooth evolution
infrastructure, three enablement platforms, and an open ecosystem

11 Huawei Confidential
Contents

1. Overview of MRS

2. Components
◼ Hudi
 HetuEngine
 Ranger
 LDAP+Kerberos Security Authentication

3. MRS Cloud-Native Data Lake Solution

12 Huawei Confidential
Hudi
⚫ Hudi is an open-source project launched by Apache in 2019 and became a top
Apache project in 2020.
⚫ Huawei participated in Hudi community development in 2020 and used Hudi in
FusionInsight.
⚫ Hudi is in a data lake table format, which provides the ability to update and delete
data as well as consume new data on HDFS. It supports multiple compute engines
and provides insert, update, and delete (IUD) interfaces and streaming primitives,
including upsert and incremental pull, over datasets on HDFS.
⚫ Hudi is the file organization layer of the data lake. It manages Parquet files, provides
data lake capabilities and IUD APIs, and supports compute engines.
13 Huawei Confidential
Hudi Features
⚫ Supports fast updates through custom indexes.
⚫ Supports snapshot isolation for data write and query.
⚫ Manages file size and layout based on statistics.
⚫ Supports timeline.
⚫ Supports data rollback.
⚫ Supports savepoints for data restoration.
⚫ Merges data asynchronously.
⚫ Optimizes data lake storage using the clustering mechanism.

14 Huawei Confidential
Hudi Architecture: Batch and Real-Time Data Import, Compatible
with Diverse Components, and Open-Source Storage Formats
⚫ Storage modes
Spark Flink Hive HetuEngine
 Copy On Write (COW): high read performance and
slower write speed than that of MOR
 Merge on Read (MOR): high write performance and
Read view
lower read performance
Read-optimized Incremental Real-time
⚫ Storage formats view view view
 The open-source Parquet and HFile formats are Storage Hudi datasets
supported. The support for ORC is under planning. mode
Timeline
⚫ Storage engines Data files Index
COW Metadata
Spark Streaming
 Open-source HDFS and Huawei Cloud Object
Storage formats
Storage Service (OBS)
Fink
Batch Parquet HFile ORC
⚫ Views MOR
 Read-optimized view Storage engine
 Incremental view HDFS OBS
 Real-time view

15 Huawei Confidential
Contents

1. Overview of MRS

2. Components
 Hudi
◼ HetuEngine
 Ranger
 LDAP+Kerberos Security Authentication

3. MRS Cloud-Native Data Lake Baseline Solution

16 Huawei Confidential
A Big Data Ecosystem Requires Interactive Query and
Unified SQL Access
➢ Services require interactive analysis within subseconds or seconds, which may result in redundant data replicas.

Unable to respond to interactive analysis


Client within subseconds or seconds:

 Multiple data replicas: high costs and


difficult O&M
Data  Non-reusable IT infrastructure
injection HDFS Oracle
◆ Perform interactive analysis through the database.

➢ Unified SQL access is required due to diversified components in the data lake.

Applications Diversified data components:

 High costs of service system


interconnection and management
 Difficulty in converged data analysis
HDFS Hive HBase Elasticsearch

17 Huawei Confidential
HetuEngine
⚫ HetuEngine is a Huawei-developed high-performance engine for distributed SQL query and data virtualization. Fully
compatible with the big data ecosystem, HetuEngine implements mass data query within seconds. It supports
heterogeneous data sources, enabling one-stop SQL analysis in the data lake.
 Cross-domain service entry
Unified authentication and access control

Cross-domain
 Distributed networking
Cloud Data source  High-performance cross-domain data
O&M and Resource Permission Configuration transmission at GB/s level
information
service monitoring
management
management management tuning  Zero metadata synchronization
layer  Restricted data access
Business metadata  Cross-domain computing pushdown
 Fast rollout with simplified configuration

Compute instances Compute instances Compute instances Compute instances  Data virtualization

Cross-source
 Materialized view
Engine  UDFs specific for users and data sources
 Subquery pushdown
Layer  Small table roaming
 Compatibility with HQL syntax
 Interconnection with common BI tools

 Cloud service entry

Cloud-native
 Centralized O&M of resources and permissions
 Visualized and instant data source configuration
 Auto scaling
Data layer  Multi-instance and multi-tenant deployment
Hive/HDFS/OBS ClickHouse HBase Elasticsearch DWS
 Rolling restart without interrupting services
 Backup & DR

18 Huawei Confidential
Open-Source Community Edition vs. HetuEngine

Open-Source Presto Community


Edition
VS MRS HetuEngine
Cross-domain x GB/s, supporting
bidirectional read and write
Performance X times
Cross-domain xx MB/s Cross-source DWS read/write and
Elasticsearch processing: 5 times higher
Multi-cluster working
Scalability mode
Single-cluster working
Horizontal scale-out
mode
of computing power
Deeply-integrated
Security To be integrated by big data security
the application system + metadata
provider management
Dynamic cloud-native deployment
Availability within seconds
Reliability Elastic scaling to adapt to service
Imbalance load changes

Presto syntax
Syntax Syntax enhancement: compatible with
Presto syntax 90% HQL scenarios
compatibility

19 Huawei Confidential
Contents

1. Overview of MRS

2. Components
 Hudi
 HetuEngine
◼ Ranger
 LDAP+Kerberos Security Authentication

3. MRS Cloud-Native Data Lake Baseline Solution

20 Huawei Confidential
Ranger
⚫ Apache Ranger offers a centralized security management framework and supports
unified authorization and auditing. It manages fine-grained access control over
Hadoop and related components, such as HDFS, Hive, HBase, Kafka, and Storm.
Users can use the front-end web UI provided by Ranger to configure policies to
control users' access to these components.

21 Huawei Confidential
Ranger Architecture

22 Huawei Confidential
Relationship Between Ranger and Other Components
⚫ Ranger provides PBAC authentication plug-ins for component servers. Currently, components
like HDFS, Yarn, Hive, HBase, Kafka, Storm, and Spark2x support Ranger authentication.
More components will become available in the future.

24 Huawei Confidential
Contents

1. Overview of MRS

2. Components
 Hudi
 HetuEngine
 Ranger
◼ LDAP+Kerberos Security Authentication

3. MRS Cloud-Native Data Lake Baseline Solution

25 Huawei Confidential
LDAP
⚫ LDAP stands for Lightweight Directory Access Protocol. It is a protocol for
implementing centralized account management architecture based on X.500
protocols.
⚫ On the Huawei big data platform, an LDAP server functions as a directory service
system to implement centralized account management.
⚫ LDAP has the following characteristics:
 LDAP runs over TCP/IP or other connection-oriented transfer services.
 LDAP is an Internet Engineering Task Force (IETF) standard track protocol and is specified
in RFC 4510 on Lightweight Directory Access Protocol (LDAP): Technical Specification
Road Map.

26 Huawei Confidential
Kerberos
⚫ Kerberos is an authentication concept named after the ferocious three-headed guard dog of
Hades from Greek mythology. The Kerberos protocol adopts a client–server model and
cryptographic algorithms such as Data Encryption Standard (DES) and Advanced Encryption
Standard (AES). Furthermore, it provides mutual authentication, so that the client and server
can verify each other's identity.
⚫ Huawei big data platform uses KrbServers to provide Kerberos functions for all components.
To manage access control permissions on data and resources in a cluster, it is recommended
that the cluster be installed in security mode. In security mode, a client application must be
authenticated and a secure session must be established before the application can access
resources in the cluster. MRS uses KrbServers to provide Kerberos authentication for all
components, implementing a reliable authentication mechanism.

27 Huawei Confidential
Architecture of Huawei Big Data Security Authentication
Scenarios
3.1 Perform authentication.

3. Perform authentication.
User CAS server Kerberos1

1. Log in.

Manager 2. Send requests. Manager 4.1 Obtain a user group.


LDAP1
web UI WS
4.2 Obtain a user group. sync

5. Go to the component web UI. 9. Component accesses Kerberos2 LDAP2


Kerberos

Component 6. Send requests. Component 4.2 Obtain a user group.

web UI web app

28 Huawei Confidential
Enhanced Open-Source LDAP+Kerberos Features
⚫ Service Authentication in the Cluster
 In an MRS cluster in security mode, mutual access between services is implemented based on the Kerberos
security architecture. When a service (such as HDFS) in the cluster is set to start, the corresponding sessionkey
(keytab, used for identity authentication of the application) is obtained from Kerberos. If another service (such
as Yarn) needs to access HDFS to add, delete, modify, or query data in HDFS, the corresponding TGT and ST
must be obtained for secure access.

⚫ Application Development Authentication


 MRS components provide application development interfaces for customers and upper-layer service product
clusters. During application development, a cluster in security mode provides specified application development
authentication interfaces to implement application security authentication and access.

⚫ Cross-Manager Mutual Trust


 MRS provides the mutual trust function between two Managers to implement data read and write operations
between them.

29 Huawei Confidential
Contents

1. Overview of MRS

2. Components

3. MRS Cloud-Native Data Lake Baseline Solution

30 Huawei Confidential
A Panorama of the FusionInsight MRS Cloud-Native Data
Lake Baseline Solution in Huawei Cloud Stack
⚫ The MRS data lake solution implements the "three lakes + mart" service scenario to meet customers' requirements in different
phases of data lake construction.

Real-time Data Data Specialized Mining & Fixed AI Self-service List/Details Large-screen BI
management cleansing analytics modeling reports analytics analysis query display
applications
Real-time Kafka (message SparkStreaming/Flink IoTDB
stream processing queue) (stream processing engine) (time series database)
Real-time Real-time ClickHouse
synchronization Batch loading
IoT Real-time (real-time OLAP)
loading
loading
Real-time Logical Real-time retrieval
data lake Flink SQL data lake HBase (simple
Messages Offline data lake
(batch-stream On-demand retrieval)
Real-time Batch Interactive query HetuEngine
... synchronization convergence) loading Elasticsearch
HetuEngine CDL(real-time (cross-lake query) (complex retrieval)
Spark Hive (query in the integration Data Data
lake) engine) lake A lake B GES (graph database)
Files Scheduled
loading Parquet ORC Hudi Redis (in-memory
database)
Service DBs Data
...
Data storage Source Detail Model OBS
HDFS DR Specialized data marts
sources data data data

Hybrid cloud

31 Huawei Confidential
Offline Data Lake
Data lake: A big data platform holding a vast amount of data in its native format for an enterprise. Access to data and compute
power is granted to users through strict data permission and resource control. In a data lake, one replica of data supports
multidimensional analysis.
Offline: Typically, data is not stored in a data lake until a delay of over 15 minutes after being generated, during which period the
data is offline.

Data Data Specialized Mining & Fixed AI Self-service


management cleansing analytics modeling reports analytics analysis

Real-time Kafka SparkStreaming/Flink


stream processing (message queue) (stream processing engine) Time-series data
Real-time mart
IoT processing
Batch loading
Real-time OLAP mart
Offline data lake
Messages Simple retrieval mart
Interactive query On-
... Scheduled Batch Complex retrieval
demand
loading loading mart
Spark Hive HetuEngine
Relationship/Graph
data mart
Files
Parquet ORC Hudi In-memory database
mart
Service DBs Data storage ...
HDFS Source Detail Model OBS
Data sources data data data Specialized data marts

Hybrid cloud

32 Huawei Confidential
Real-Time Data Lake
Data lake: A big data platform holding a vast amount of data in its native format for an enterprise. Access to data and compute power
is granted to users through strict data permission and resource control. In a data lake, one replica of data supports multidimensional
analysis.
Real-time: Real-time refers to cases where data can be stored in the data lake within one minute after being generated, while quasi-
real-time is where data is stored in the data lake within 1 to 15 minutes.

Data Data Specialized Mining & Fixed AI Self-service


management cleansing analytics modeling reports analytics analysis
Real-time
Kafka SparkStreaming/Flink
stream Time-series data
Real-time (message queue) (stream processing engine)
processing mart
IoT processing
Real-time loading Real-time OLAP mart
Real-time data lake
Messages Simple retrieval mart
Real-time Batch Interactive query On-
... demand Complex retrieval
synchronization mart
CDL Spark Hive HetuEngine Flink SQL loading
Relationship/Graph
data mart
Files Parquet ORC Hudi
In-memory database
mart
Service DBs Data storage ...
Data HDFS Source Detail Model OBS
data data data Specialized data marts
sources

Hybrid cloud

33 Huawei Confidential
Logical Data Lake
Data lake: a big data platform holding a vast amount of data in various formats in an enterprise. It opens data and compute power to
users with strict data permission and resource control.
Logical data lake: a virtual data lake composed of multiple physically dispersed data platforms.

Data Data Specialized Mining & Fixed AI Self-service List Large-screen BI


management cleansing analytics modeling reports analytics analysis query display

Real-time stream processing


Real-time Time-series data mart
IoT processing
Batch loading
Real-time OLAP mart

Messages
Logical data lake Simple retrieval mart

... HetuEngine On-


demand
Complex retrieval mart
Scheduled (cross-lake query)
loading Relationship/Graph
loading
data mart
Files
Offline data Real-time In-memory database
lake A data lake B mart
Service DBs ...
Data sources Data mart C
Specialized data marts

Hybrid cloud

34 Huawei Confidential
X Bank: Rolling Upgrades, Decoupled Storage-Compute, and
HetuEngine-based Real-Time BI
Pain points
Operation report Feature label Data science
⚫ Clusters on X's big data platform process 100,000+ jobs and store 30+ PB data per
... day. During upgrades, traditional solutions require power-off and restart, which
affects important services such as anti-fraud and precision marketing on live networks.
⚫ Traditional big data storage usage exceeds 70%, but the CPU usage is less than 50%.
In the all-in-one solution, compute and storage resources need to be expanded
Data mart (performance, credit, and more) together, which wastes resources.
⚫ Data in the lakes and warehouses of the traditional big data platform is isolated.
Associated analysis requires complex ETL tasks to process data and then load it to the
OLAP mart. This results in long data links and low analysis efficiency.
Financial data Analysis and
warehouse DWS mining platform Solution
HetuEngine
(480+ nodes) DWS (240+ nodes) ⚫ The MRS rolling upgrade supports sequential upgrades in different batches until all
nodes in the cluster are upgraded to the latest version. It also supports automatic
isolation of faulty nodes during the upgrade. When an upgrade is complete, the
Unified full data processing platform MRS (1,500+ nodes) faulty nodes are handled.
⚫ The decoupled storage-compute architecture enables on-demand expansion of
HDFS Stream insufficient resources only. The traditional three replicas are replaced by the
Batch processing Storage of massive raw data enterprise-level EC 1.2 replicas.
processing
⚫ The HetuEngine engine supports cross-source collaboration and lakehouse
collaborative analysis, preventing unnecessary ETL processes and reducing data
migrations.
Benefits
⚫ Rolling upgrades ensure service continuity, which enables continuous evolution
Semi-structured and
Structured data based on the same architecture.
unstructured data ⚫ The decoupled storage-compute architecture improves compute resource usage by
OA ERP ... ... 30%+, storage resource usage by 100%+, and reduces TCO by 60%.
⚫ Collaborative analysis across data lakes and warehouses reduces ETL by 80% and
improves analysis efficiency by 10x+, reducing required time from minutes to
seconds.
35 Huawei Confidential
XX Healthcare Security Administration Built a Unified Offline
Data Lake for Decision-Making
Pain points
Macro Service MBF risk Operation Real-time Audit & ⚫ The medical insurance, medicine, and medical treatment systems are siloed. A
Real-time
decision- application warning unified data platform covering all subjects, objects, services, processes, and
monitoring dashboard supervision monitoring data is needed.
making
⚫ Scattered management and fixed standards cause low efficiency in business
handling and operation.
Offline data lake (100+ nodes) ⚫ It is difficult to detect and prevent violations in medical insurance
Medical reimbursement and insurance fraud.
insurance Solution
Original Common Application Offline ⚫ Use MRS to build an offline data lake to store data from different sources, such
data layer data model data store computing as medical insurance, medication, and medical treatment. Build the original
ODS CDM ADS data store (ODS), common data model (CDM), and application data store
Medicine Real-time (ADS) in the offline and real-time computing data areas to implement full data
access Unified management and governance and build a unified provincial health insurance
Real-time storage data platform.
computing ⚫ Use unified data standards to implement centralized data governance and

FusionInsight MRS cloud-native data lake association analysis of data from different sources.
Healthcare ⚫ Build an offline data lake to centrally store video, image, text, and IoT data,
providing real-time computing data areas and real-time data processing
capabilities.
Benefits
Provincial
Basic Business Public Management ⚫ A one-stop platform that holds all-domain data is provided for people to handle
Informationv service service healthcare business. The intensive construction reduces data silos and TCO by
data data
data data 30%.
Provincial National ⚫ Medical insurance reimbursement becomes 3 times more efficient, and manual
data data review workloads and error rates decrease by 80%. People only need to visit the
exchange exchange office once to handle business.
Public Health Civil Bank Insurance platform platform ⚫ The real-time data computing capability effectively controls vulnerabilities that
safety commissions affairs may breed medical insurance reimbursement violations and insurance fraud,
recovering XX00 million in economic losses every year and ensuring the sound
development of the medical benefits fund (MBF).
36 Huawei Confidential
Quiz

1. What is MRS?

2. What are the advantages of MRS compared with self-built Hadoop?

37 Huawei Confidential
Summary

⚫ This chapter first described Huawei's big data platform, MRS, along with its
advantages and application scenarios. It then went through some MRS
components, including Hudi, HetuEngine, Ranger, and LDAP+Kerberos
security authentication. Finally, this chapter introduced the MRS cloud-
native data lake baseline solution.

38 Huawei Confidential
Acronyms and Abbreviations
⚫ BI: Business Intelligence
⚫ ETL: Extract, Transform, and Load, a process that involves extracting data,
transforming the data, and loading the data to final targets.
⚫ AI: Artificial Intelligence
⚫ DWS: Data Warehouse Service
⚫ ES: Elasticsearch, distributed full-text search service
⚫ OBS: Object Storage Service
⚫ ORC: OptimizedRC File. ORC is a top-level Apache project and is a self-describing
column-based storage.

39 Huawei Confidential
Acronyms and Abbreviations
⚫ COW: Copy On Write
⚫ MOR: Merge On Read
⚫ UDF: User-Defined Functions
⚫ TCO: Total Cost of Ownership
⚫ ODS: Operational Data Store
⚫ CDM: Cloud Data Migration
⚫ ADS: Anti-DDoS Service

40 Huawei Confidential
Recommendations

⚫ Huawei Cloud
 https://www.huaweicloud.com/intl/en-us/
⚫ Huawei Talent
 https://e.huawei.com/en/talent/portal/#/
⚫ Huawei Enterprise Product & Service Support
 https://support.huawei.com/enterprise/en/index.html

41 Huawei Confidential
Thank you. 把数字世界带入每个人、每个家庭、
每个组织,构建万物互联的智能世界。
Bring digital to every person, home, and
organization for a fully connected,
intelligent world.

Copyright© 2022 Huawei Technologies Co., Ltd.


All Rights Reserved.

The information in this document may contain predictive


statements including, without limitation, statements regarding
the future financial and operating results, future product
portfolio, new technology, etc. There are a number of factors that
could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements.
Therefore, such information is provided for reference purpose
only and constitutes neither an offer nor an acceptance. Huawei
may change the information at any time without notice.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy