0% found this document useful (0 votes)

13 views6 pages

Unit V

Apache Hive is an open source data warehouse system built on top of Hadoop that allows users to query and analyze large datasets stored in Hadoop files using SQL-like queries. It processes structured and semi-structured data in Hadoop. Hive components include a metastore to store metadata, a driver to control query execution, a compiler to convert queries to execution plans, an optimizer to optimize plans, and an executor to run tasks. HBase is a column-oriented distributed database that provides low-latency operations for random reads and writes. It stores large amounts of data in tables across a cluster and provides automatic sharding and failover.

Uploaded by

S.GOPINATH5035

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views6 pages

Unit V

Uploaded by

S.GOPINATH5035

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

1. What is Apache Hive?

 Apache Hive is an open source data warehouse system built on top of Hadoop cluster.
Housed for querying and analyzing large datasets stored in Hadoop files. It process
structured and semi-structured data in Hadoop.
 Initially, you have to write complex Map-Reduce jobs, but now with the help of the
Hive, you just need to submit merely SQL queries.
 Hive is mainly targeted towards users who are comfortable with SQL.
 Hive use language called HiveQL (HQL), which is similar to SQL.
 HiveQL automatically translates SQL-like queries into Map Reduce jobs.
Hive Architecture

The Apache Hive components are-

Metastore – It stores metadata for each of the tables like their schema and location.Hive
Meta data helps the driver to track the progress of various data sets distributed over the
cluster. It stores the data in a traditional RDBMS format.

Driver – It acts like a controller which receives the HiveQL statements. The driver starts
the execution of the statement by creating sessions. It monitors the life cycle and progress of
the execution. Driver stores the necessary metadata generated during the execution of a
HiveQL statement. It also acts as a collection point of data or query result obtained after the
Reduce operation.
Compiler –

It performs the compilation of the HiveQL query. This converts the query to an execution
plan. The plan contains the tasks. It also contains steps needed to be performed by
the MapReduce to get the output .

The compiler in Hive converts the query to an Abstract Syntax Tree (AST). First, check
for compatibility and compile-time errors, then converts the AST to a Directed Acyclic
Graph (DAG).

Optimizer – It performs various transformations on the execution plan to provide

optimized DAG. It aggregates the transformations together, such as converting a pipeline of
joins to a single join, for better performance.

Executor – Once compilation and optimization complete, the executor executes the tasks.
Executor takes care of pipelining the tasks.

CLI, UI, and Thrift Server – CLI (command-line interface) provides a user interface for
an external user to interact with Hive. Thrift server in Hive allows external clients to interact
with Hive over a network, similar to the JDBC or ODBC protocols.

Hive Shell:

Hive Shell is almost similar to MySQL Shell. It is the command line interface for Hive. In Hive
Shell users can run HQL queries. HiveQL is also case-insensitive (except for string
comparisons) same as SQL.
We can run the Hive Shell in two modes which are: Non-Interactive mode and
Interactive mode.

Hive Data-Model:

Data in Apache Hive can be categorized into:

 Table
 Partition
 Bucket
2. What is Sharding?

Sharding is a very important concept which helps the system to keep data into different
resources according to the sharding process.
The word “Shard” means “a small part of a whole“. Hence Sharding means dividing a larger
part into smaller parts.
In DBMS, Sharding is a type of DataBase partitioning in which a large DataBase is divided or
partitioned into smaller data, also known as shards. These shards are not only smaller, but also
faster and hence easily manageable.
Need for Sharding:
Consider a very large database whose sharding has not been done. For example, let’s take a
DataBase of a college in which all the student’s record (present and past) in the whole college
are maintained in a single database. So, it would contain very very large number of data, say 100,
000 records.
Now when we need to find a student from this Database, each time around 100, 000 transactions
has to be done to find the student, which is very very costly.
Now consider the same college students records, divided into smaller data shards based on years.
Now each data shard will have around 1000-5000 students records only. So not only the database
became much more manageable, but also the transaction cost of each time also reduces by a huge
factor, which is achieved by Sharding.
Features of Sharding:
 Sharding makes the Database smaller
 Sharding makes the Database faster
 Sharding makes the Database much more easily manageable
 Sharding can be a complex operation sometimes
 Sharding reduces the transaction cost of the Database

3. What is Hbase
HBase is an open-source, column-oriented distributed database system in a Hadoop environment.
Initially, it was Google Big Table, afterward, it was re-named as HBase and is primarily written
in Java. Apache HBase is needed for real-time Big Data applications.

HBase Unique Features

 HBase is built for low latency operations

 HBase is used extensively for random read and write operations
 HBase stores a large amount of data in terms of tables
 Provides linear and modular scalability over cluster environment
 Strictly consistent to read and write operations
 Automatic and configurable sharding of tables
 Automatic failover supports between Region Servers
 Convenient base classes for backing Hadoop MapReduce jobs in HBase tables
 Easy to use Java API for client access
 Block cache and Bloom Filters for real-time queries
 Query predicate pushes down via server-side filters.

Storage Mechanism in HBase

 HBase is a column-oriented database and data is stored in tables. The tables are sorted by
RowId. As shown below, HBase has RowId, which is the collection of several column
families that are present in the table.
 The column families that are present in the schema are key-value pairs. If we observe in
detail each column family having multiple numbers of columns. The column values
stored into disk memory. Each cell of the table has its own Metadata like timestamp and
other information.

Coming to HBase the following are the key terms representing table schema

 Table: Collection of rows present.

 Row: Collection of column families.
 Column Family: Collection of columns.
 Column: Collection of key-value pairs.
 Namespace: Logical grouping of tables.
 Cell: A {row, column, version} tuple exactly specifies a cell definition in HBase.

Column-oriented vs Row-oriented storages

Column-oriented Database Row oriented Database
 When the situation comes to process and  Online Transactional
analytics we use this approach. Such process such as banking and
as Online Analytical Processing and it's finance domains use this
applications. approach.

 The amount of data that can able to store in  It is designed for a small
this model is very huge like in terms of number of rows and columns.
petabytes

HBase Data Model

HBase Data Model consists of following elements,

 Set of tables
 Each table with column families and rows
 Each table must have an element defined as Primary Key.
 Row key acts as a Primary key in HBase.
 Any access to HBase tables uses this Primary Key
 Each column present in HBase denotes attribute corresponding to object

HMaster:

HMaster is the implementation of a Master server in HBase architecture. It acts as a monitoring

agent to monitor all Region Server instances present in the cluster and acts as an interface for all
the metadata changes. In a distributed cluster environment, Master runs on NameNode. Master
runs several background threads.

HBase Regions Servers:

When Region Server receives writes and read requests from the client, it assigns the request to a
specific region, where the actual column family resides.

However, the client can directly contact with HRegion servers, there is no need of HMaster
mandatory permission to the client regarding communication with HRegion servers.

HBase Regions:

HRegions are the basic building elements of HBase cluster that consists of the distribution of
tables and are comprised of Column families. It contains multiple stores, one for each column
family. It consists of mainly two components, which are Memstore and Hfile.

ZooKeeper:

In HBase, Zookeeper is a centralized monitoring server which maintains configuration

information and provides distributed synchronization. Distributed synchronization is to access
the distributed applications running across the cluster with the responsibility of providing
coordination services between nodes. If the client wants to communicate with regions, the
server's client has to approach ZooKeeper first.

HDFS:-

HDFS is a Hadoop distributed file system, as the name implies it provides a distributed
environment for the storage and it is a file system designed in a way to run on commodity
hardware. It stores each file in multiple blocks and to maintain fault tolerance, the blocks are
replicated across a Hadoop cluster.

Unit 5 Lecture No-3(Hbase)
No ratings yet
Unit 5 Lecture No-3(Hbase)
35 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
10 NoSQL Databases - HBase Hive Cassandra
No ratings yet
10 NoSQL Databases - HBase Hive Cassandra
74 pages
Bda Unit 5
No ratings yet
Bda Unit 5
16 pages
Hadoop Pig
No ratings yet
Hadoop Pig
27 pages
Data Analytics Units 5
No ratings yet
Data Analytics Units 5
12 pages
Introduction To Hive
No ratings yet
Introduction To Hive
28 pages
BDA
No ratings yet
BDA
20 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
unit 3 hbase,mongodb and couch db
No ratings yet
unit 3 hbase,mongodb and couch db
12 pages
Main Catalog 500 04.2020 en Dok 0150d 0
No ratings yet
Main Catalog 500 04.2020 en Dok 0150d 0
158 pages
BDA Unit-4 Part-2 HBase,Hive,Pig
No ratings yet
BDA Unit-4 Part-2 HBase,Hive,Pig
74 pages
Programming With Sci Lab
No ratings yet
Programming With Sci Lab
25 pages
DSS - U4 - HBASE Rev 1.0
No ratings yet
DSS - U4 - HBASE Rev 1.0
20 pages
UNIT 3
No ratings yet
UNIT 3
15 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
h Base Tutorial
No ratings yet
h Base Tutorial
38 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unit 5 Hbase - Hive - Pig
No ratings yet
Unit 5 Hbase - Hive - Pig
93 pages
National University of Modern Languages - NUML: (Department of Computer Science)
No ratings yet
National University of Modern Languages - NUML: (Department of Computer Science)
56 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
unit-5 notes
No ratings yet
unit-5 notes
61 pages
BDM Unit 5
No ratings yet
BDM Unit 5
60 pages
BDA Module 2-2023
No ratings yet
BDA Module 2-2023
30 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
UNIT-4
No ratings yet
UNIT-4
15 pages
Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
Hbase Understanding Mapreduce: Unit-2 P-2
No ratings yet
Hbase Understanding Mapreduce: Unit-2 P-2
32 pages
Unit 5 Lecture No-3(Hbase)
No ratings yet
Unit 5 Lecture No-3(Hbase)
35 pages
BDA.Unit-5
No ratings yet
BDA.Unit-5
31 pages
Symbols of My City, Province and Region
No ratings yet
Symbols of My City, Province and Region
28 pages
Unit v Hadoop Related Tools_b5f716067e8295de72a527efb7a3698b
No ratings yet
Unit v Hadoop Related Tools_b5f716067e8295de72a527efb7a3698b
54 pages
pbds unit-5
No ratings yet
pbds unit-5
60 pages
Unit - IV_Notes
No ratings yet
Unit - IV_Notes
23 pages
UNIT5
No ratings yet
UNIT5
42 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
18 pages
Hadoop Week 6
No ratings yet
Hadoop Week 6
38 pages
HBase
No ratings yet
HBase
27 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
BDT UNIT - V
No ratings yet
BDT UNIT - V
15 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
Paramax 0ption ENG G2052E-1 991142
No ratings yet
Paramax 0ption ENG G2052E-1 991142
52 pages
9 HBase
No ratings yet
9 HBase
77 pages
HBASE (1)
No ratings yet
HBASE (1)
18 pages
Cse 17CS82 M2 S4 PPT
No ratings yet
Cse 17CS82 M2 S4 PPT
19 pages
Film City Case Study PDF
No ratings yet
Film City Case Study PDF
19 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
HBase
No ratings yet
HBase
6 pages
Apache HBase PPT
No ratings yet
Apache HBase PPT
12 pages
Period Amplitude
No ratings yet
Period Amplitude
38 pages
HBase Presentation
No ratings yet
HBase Presentation
23 pages
Big Data Introduction & Ecosystems
No ratings yet
Big Data Introduction & Ecosystems
4 pages
10_HBase
No ratings yet
10_HBase
13 pages
Big data UNIT 5 own
No ratings yet
Big data UNIT 5 own
18 pages
HBASE
No ratings yet
HBASE
11 pages
Thesis Defence Public Presentation Final
No ratings yet
Thesis Defence Public Presentation Final
16 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Assignment 10
No ratings yet
Assignment 10
9 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
SDL Module-No SQL Module Assignment No. 2: Q1 What Is Hadoop and Need For It? Discuss It's Architecture
No ratings yet
SDL Module-No SQL Module Assignment No. 2: Q1 What Is Hadoop and Need For It? Discuss It's Architecture
6 pages
Tree Planting
No ratings yet
Tree Planting
12 pages
Gnu Cpio: by Robert Carleton and Sergey Poznyakoff
No ratings yet
Gnu Cpio: by Robert Carleton and Sergey Poznyakoff
18 pages
Role of Disinfectants On Alginate Impression Materials: Divya Dharshini A, Jayalakshmi Somasundaram, Muralidhara.N.P
No ratings yet
Role of Disinfectants On Alginate Impression Materials: Divya Dharshini A, Jayalakshmi Somasundaram, Muralidhara.N.P
13 pages
NZ Motor BN 1942 12
No ratings yet
NZ Motor BN 1942 12
5 pages
ShareX Log 2024 10
No ratings yet
ShareX Log 2024 10
12 pages
2 - Lithium Battery Information Sheet LTN-060-26-o
No ratings yet
2 - Lithium Battery Information Sheet LTN-060-26-o
5 pages
LL
No ratings yet
LL
9 pages
CRISIL SME Connect June 09
No ratings yet
CRISIL SME Connect June 09
28 pages
Flash Adc PHD Thesis
100% (3)
Flash Adc PHD Thesis
8 pages
Reaction Paper On Media
100% (2)
Reaction Paper On Media
2 pages
HBase
No ratings yet
HBase
31 pages
Large-Scale Data Management: Hbase
No ratings yet
Large-Scale Data Management: Hbase
36 pages
Thermal and Mechanical Properties of Silane-Grafted Water Crosslinked Polyethylene
No ratings yet
Thermal and Mechanical Properties of Silane-Grafted Water Crosslinked Polyethylene
9 pages
List of Previously Mapped Courses 300824 (1)
No ratings yet
List of Previously Mapped Courses 300824 (1)
23 pages
Lincoln Injectors 7 20
No ratings yet
Lincoln Injectors 7 20
14 pages
Contoh RPH SDP
No ratings yet
Contoh RPH SDP
5 pages
Noun
No ratings yet
Noun
14 pages
Renovatioin of Tehsil Complex and Assistant Commissioner Residence Kot Chutta
No ratings yet
Renovatioin of Tehsil Complex and Assistant Commissioner Residence Kot Chutta
7 pages
BioChemLESSON 1
No ratings yet
BioChemLESSON 1
6 pages
Color 1 Color 2 Color 3 Color 4 Color 5
No ratings yet
Color 1 Color 2 Color 3 Color 4 Color 5
4 pages
3. Ex Clauses - St's
No ratings yet
3. Ex Clauses - St's
4 pages
Lewin Model of Change Management
No ratings yet
Lewin Model of Change Management
3 pages
CC4 2019, SEM2
No ratings yet
CC4 2019, SEM2
2 pages
Boolean Algebra: Name Reads As Logic Gate OCR Notation Alternative Notation Examples Truth Table Notes
No ratings yet
Boolean Algebra: Name Reads As Logic Gate OCR Notation Alternative Notation Examples Truth Table Notes
2 pages
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit V

Uploaded by

Unit V

Uploaded by

1. What is Apache Hive?

The Apache Hive components are-

Optimizer – It performs various transformations on the execution plan to provide

Data in Apache Hive can be categorized into:

HBase Unique Features

 HBase is built for low latency operations

Storage Mechanism in HBase

 Table: Collection of rows present.

Column-oriented vs Row-oriented storages

HBase Data Model

HMaster is the implementation of a Master server in HBase architecture. It acts as a monitoring

HBase Regions Servers:

In HBase, Zookeeper is a centralized monitoring server which maintains configuration

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.