0% found this document useful (0 votes)

11 views10 pages

Unit-5 Sgs

bda

Uploaded by

shweta.shete

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views10 pages

Unit-5 Sgs

bda

Uploaded by

shweta.shete

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Hive Shell & services:

Hive Introduction
Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It resides on
top of Hadoop to summarize Big Data, and makes querying and analyzing easy.
Initially Hive was developed by Facebook, later the Apache Software Foundation took it up and
developed it further as an open source under the name Apache Hive. It is used by different
companies. For example, Amazon uses it in Amazon Elastic MapReduce.
Hive is not
 A relational database
 A design for OnLine Transaction Processing (OLTP)
 A language for real-time queries and row-level updates
Features of Hive
 It stores schema in a database and processed data into HDFS.
 It is designed for OLAP.
 It provides SQL type language for querying called HiveQL or HQL.
 It is familiar, fast, scalable, and extensible.

Architecture of Hive
The following component diagram depicts the architecture of Hive:

This component diagram contains different units. The following table describes each unit:

Unit Name Operation

User Interface Hive is a data warehouse infrastructure software that can

create interaction between user and HDFS. The user
interfaces that Hive supports are Hive Web UI, Hive
command line, and Hive HD Insight (In Windows
server).
Meta Store Hive chooses respective database servers to store the
schema or Metadata of tables, databases, columns in a table,
their data types, and HDFS mapping.

HiveQL Process Engine HiveQL is similar to SQL for querying on schema info on
the Metastore. It is one of the replacements of traditional
approach for MapReduce program. Instead of writing
MapReduce program in Java, we can write a query for
MapReduce job and process it.

Execution Engine The conjunction part of HiveQL process Engine and

MapReduce is Hive Execution Engine. Execution
engine processes the query and generates results as same
as MapReduce results. It uses the flavor of MapReduce.

HDFS or HBASE Hadoop distributed file system or HBASE are the data
storage techniques to store data into file system.

Hive Services
The following are the services provided by Hive:-

o Hive CLI - The Hive CLI (Command Line Interface) is a shell where we can
execute Hive queries and commands.
o Hive Web User Interface - The Hive Web UI is just an alternative of Hive CLI.
It provides a web-based GUI for executing Hive queries and commands.
o Hive MetaStore - It is a central repository that stores all the structure
information of various tables and partitions in the warehouse. It also
includes metadata of column and its type information, the serializers and
deserializers which is used to read and write data and the corresponding
HDFS files where the data is stored.
o Hive Server - It is referred to as Apache Thrift Server. It accepts the request
from different clients and provides it to Hive Driver.
o Hive Driver - It receives queries from different sources like web UI, CLI,
Thrift, and JDBC/ODBC driver. It transfers the queries to the compiler.
o Hive Compiler - The purpose of the compiler is to parse the query and
perform semantic analysis on the different query blocks and expressions. It
converts HiveQL statements into MapReduce jobs.
o Hive Execution Engine - Optimizer generates the logical plan in the form of
DAG of map-reduce tasks and HDFS tasks. In the end, the execution engine
executes the incoming tasks in the order of their dependencies.

Difference between RDBMS and Hive:

RDBMS Hive

It is used to maintain database. It is used to maintain data warehouse.

It uses SQL (Structured Query

It uses HQL (Hive Query Language).
Language).

Schema is fixed in RDBMS. Schema varies in it.

Normalized and de-normalized both type of

Normalized data is stored.
data is stored.

Tables in rdms are sparse. Table in hive are dense.

It doesn’t support partitioning. It supports automation partition.

No partition method is used. Sharding method is used for partition.

Pig Vs Hive

Hadoop Hive Hadoop Pig

Hadoop Hive component is mainly used by the Pig Hadoop component is generally used by
Data Analysts the Researchers and Programmers
Hive is used against completely structured data Pig is used against semi-structured data
Hive has a declarative SQL like language termed Pig has a procedural data flow like language
as HiveQL termed as Pig Latin
Hive is basically used for generation/creation of
Pig is basically used for programming
reports
Hive operates on the server side of an HDFS Pig operates on the client side of an HDFS
cluster cluster
Pig is a wonderful ETL tool for Big Data (for
Hive is very helpful in the areas of ETL its powerful transformation and processing
capabilities)
Hive has an ability to start an optional thrift
Pig does not provide any such provision for
based server which is used to send queries from
this feature
any part to the Hive Server directly to execute
There is no provision of maintaining a
Hive leverages upon the SQL DLL language
dedicated metadata database and hence the
with definitions to tables upfront and storing the
schemas/data types are defined in the actual
schema details on a local database
scripts itself in Pig
There is no provision from Hive to support Avro Pig provides support to Avro
There is no provision of installation for Hive as Pig on the other had can be installed very
it is completely shell based for interaction easily
Pig do not provision anything like partitions
Hive provisions partitions on the data to process
directly but the feature can be achieved using
subsets based on dates or in chronological orders
Filters
Pig renders sample data for each of its
There is no provision in Hive for illustrations
scenarios through Illustrate function
Raw data access is not possible with Pig Latin
There is a provision in Hive to access raw data
scripts as fast as HiveQL
In Hive, a user can join data, order data and even
There is a provision from Pig to perform
can sort data dynamically (in an aggregated
OUTER JOINS using the COGROUP feature.
manner though)
Introduction to Hive metastore

Hive metastore (HMS) is a service that stores metadata related to Apache Hive and other
services, in a backend RDBMS, such as MySQL or PostgreSQL. Impala, Spark, Hive, and
other services share the metastore. The connections to and from HMS include HiveServer,
Ranger, and the NameNode that represents HDFS.

Beeline, Hue, JDBC, and Impala shell clients make requests through thrift or JDBC to
HiveServer. The HiveServer instance reads/writes data to HMS. By default, redundant HMS
operate in active/active mode. The physical data resides in a backend RDBMS, one for HMS.
You must configure all HMS instances to use the same backend database. A separate
RDBMS supports the security service, Ranger for example. All connections are routed to a
single RDBMS service at any given time. HMS talks to the NameNode over thrift and functions
as a client to HDFS.

HMS connects directly to Ranger and the NameNode (HDFS), and so does HiveServer, but this is not
shown in the diagram for simplicity. One or more HMS instances on the backend can talk to other
services, such as Ranger.
Querying Data

HiveQL - JOIN

The HiveQL Join clause is used to combine the data of two or more tables based on a related
column between them. The various type of HiveQL joins are: -

o Inner Join
o Left Outer Join
o Right Outer Join
o Full Outer Join

Here, we are going to execute the join clauses on the records of the following table:

Inner Join in HiveQL

The HiveQL inner join is used to return the rows of multiple tables where the join condition
satisfies. In other words, the join criteria find the match records in every table being joined.

Example of Inner Join in Hive

In this example, we take two table employee and employee_department. The primary key
(empid) of employee table represents the foreign key (depid) of employee_department table.
Let's perform the inner join operation by using the following steps: -

hive> use hiveql;

Now, create a table by using the following command:

hive> create table employee(empid int, empname string , state string) ;

Now, create another table by using the following command:

1. hive> create table employee_department(depid int, department_name string) ;

Now, perform the inner join operation by using the following command: -

1. hive>select e1.empname, e2.department_name from employee e1 join employee_department e2

on e1.empid= e2.depid;
Left Outer Join in HiveQL

The HiveQL left outer join returns all the records from the left (first) table and only that records
from the right (second) table where join criteria find the match.

Example of Left Outer Join in Hive

In this example, we perform the left outer join operation.

o Let's us execute the left outer join operation by using the following command: -

1. hive> select e1.empname, e2.department_name from employee e1 left outer join employee_depa
rtment e2 on e1.empid= e2.depid;

Right Outer Join in HiveQL

The HiveQL right outer join returns all the records from the right (second) table and only that
records from the left (first) table where join criteria find the match.
Example of Left Outer Join in Hive

In this example, we perform the left outer join operation.

o Let's us execute the left outer join operation by using the following command: -

1. hive> select e1.empname, e2.department_name from employee e1 right outer join

employee_de partment e2 on e1.empid= e2.depid;

Full Outer Join

The HiveQL full outer join returns all the records from both the tables. It assigns Null for missing
records in either table
UDFs (User Defined Functions)
In Hive, the users can define own functions to meet certain client
requirements. These are known as UDFs in Hive. User Defined
Functions written in Java for specific modules.

Some of UDFs are specifically designed for the reusability of code in

application frameworks. The developer will develop these functions in
Java and integrate those UDFs with the Hive.

During the Query execution, the developer can directly use the code,
and UDFs will return outputs according to the user-defined tasks. It
will provide high performance in terms of coding and execution.

For example, for string stemming we don’t have any predefined

function in Hive. For this, we can write stem UDF in Java. Wherever
we require Stem functionality, we can directly call this Stem UDF in
Hive.

Here stem functionality means deriving words from its root words. It is
like stemming algorithm reduces the words “wishing”, “wished”, and
“wishes” to the root word “wish.” For performing this type of
functionality, we can write UDF in Java and integrate it with Hive.

Depending on the use cases, the UDFs can be written. It will accept
and produce different numbers of input and output values.

The general type of UDF will accept a single input value and produce
a single output value. If the UDF is used in the query, then UDF will be
called once for each row in the result data set.

In the other way, it can accept a group of values as input and return a
single output value as well.

???????????????????? accessibilityPunctuationGroup
No ratings yet
???????????????????? accessibilityPunctuationGroup
101 pages
Chapter+9+ HIVE
No ratings yet
Chapter+9+ HIVE
50 pages
Manual Mother MSI 7309 v1.3
No ratings yet
Manual Mother MSI 7309 v1.3
18 pages
Information Security Assessment Process
No ratings yet
Information Security Assessment Process
23 pages
Module 3-1
No ratings yet
Module 3-1
32 pages
Hiveppt
No ratings yet
Hiveppt
29 pages
Hive
No ratings yet
Hive
30 pages
Unit 5-Hive
No ratings yet
Unit 5-Hive
18 pages
Unit-5 - Hive
No ratings yet
Unit-5 - Hive
31 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
HIVE
No ratings yet
HIVE
80 pages
HIVE
No ratings yet
HIVE
18 pages
IET Udaipur BDA Unit-5
No ratings yet
IET Udaipur BDA Unit-5
9 pages
Unit 3
No ratings yet
Unit 3
8 pages
7 Hive
No ratings yet
7 Hive
30 pages
Hive Unit VI
No ratings yet
Hive Unit VI
39 pages
Unit 5 Lecture No-1 (Hive)
No ratings yet
Unit 5 Lecture No-1 (Hive)
30 pages
Bda Unit 4 - Mam
No ratings yet
Bda Unit 4 - Mam
57 pages
Unit 5 (BDC)
No ratings yet
Unit 5 (BDC)
59 pages
Using Hive For Data Warehousing: Introduction To Hive
No ratings yet
Using Hive For Data Warehousing: Introduction To Hive
4 pages
Unit-4 Pig Hive
No ratings yet
Unit-4 Pig Hive
40 pages
Unit 5 Handouts
No ratings yet
Unit 5 Handouts
16 pages
Hive
No ratings yet
Hive
12 pages
Hive Data Types and Data Models
No ratings yet
Hive Data Types and Data Models
24 pages
Hive
No ratings yet
Hive
23 pages
Session 3.2
No ratings yet
Session 3.2
27 pages
Unit-Vi Hive Hadoop & Big Data
100% (1)
Unit-Vi Hive Hadoop & Big Data
24 pages
Bda Unit 5 Hive Notes
No ratings yet
Bda Unit 5 Hive Notes
23 pages
Unit-4 Hive
No ratings yet
Unit-4 Hive
10 pages
Apache Hive Cookbook - Sample Chapter
100% (1)
Apache Hive Cookbook - Sample Chapter
27 pages
HIVE
No ratings yet
HIVE
28 pages
Big Data
No ratings yet
Big Data
120 pages
5 - Hive
No ratings yet
5 - Hive
51 pages
Hive Full Lecture
No ratings yet
Hive Full Lecture
17 pages
Big Data Analytics: Welcome
No ratings yet
Big Data Analytics: Welcome
69 pages
Introduction To Hive
No ratings yet
Introduction To Hive
9 pages
(R17a0528) Big Data Analytics-57-100
No ratings yet
(R17a0528) Big Data Analytics-57-100
44 pages
Hive
No ratings yet
Hive
5 pages
Unit5 Notes
No ratings yet
Unit5 Notes
29 pages
Hive Final
No ratings yet
Hive Final
75 pages
Apache HIVE
100% (1)
Apache HIVE
105 pages
Hive Pig PDF
No ratings yet
Hive Pig PDF
20 pages
Hive
No ratings yet
Hive
4 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
4 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
33 pages
Ibiz Hive
No ratings yet
Ibiz Hive
27 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Hive
No ratings yet
Hive
29 pages
Hive
No ratings yet
Hive
49 pages
Chapter 7
No ratings yet
Chapter 7
84 pages
Hive
No ratings yet
Hive
65 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
24 pages
BDA Unit-5
No ratings yet
BDA Unit-5
25 pages
Hive Main
No ratings yet
Hive Main
33 pages
Hive
No ratings yet
Hive
9 pages
Bda Unit 5 Notes
No ratings yet
Bda Unit 5 Notes
23 pages
Apache Hive: Prashant Gupta
100% (1)
Apache Hive: Prashant Gupta
61 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Learn Hbase in 24 Hours
From Everand
Learn Hbase in 24 Hours
Alex Nordeen
No ratings yet
Unit-4 SGS
No ratings yet
Unit-4 SGS
13 pages
Unit-3 SGS
No ratings yet
Unit-3 SGS
16 pages
Unit 2-HDFS SGS
No ratings yet
Unit 2-HDFS SGS
29 pages
Unit-1 Final Sgs
No ratings yet
Unit-1 Final Sgs
24 pages
Assignment Guideline 1
No ratings yet
Assignment Guideline 1
3 pages
Website Development Proposal: Project: Client
No ratings yet
Website Development Proposal: Project: Client
10 pages
Hcna-Hntd Entry Lab Guide v2.1
No ratings yet
Hcna-Hntd Entry Lab Guide v2.1
136 pages
UYR-V4104-P5: Features Specification
No ratings yet
UYR-V4104-P5: Features Specification
1 page
IIIT BH Faculty Profile 2015 Batch
No ratings yet
IIIT BH Faculty Profile 2015 Batch
12 pages
STL 2.0: A Proposal For A Universal Multi-Material Additive Manufacturing File Format
No ratings yet
STL 2.0: A Proposal For A Universal Multi-Material Additive Manufacturing File Format
13 pages
QLE8142 Datasheet
No ratings yet
QLE8142 Datasheet
2 pages
OK - ICT S3 SB Cover
No ratings yet
OK - ICT S3 SB Cover
93 pages
UNIt 1 - ARRAY - Class - Notes - PPT
100% (1)
UNIt 1 - ARRAY - Class - Notes - PPT
34 pages
Sample Ccna Questions
No ratings yet
Sample Ccna Questions
35 pages
Adobe Supplement Iso32000 1 PDF
No ratings yet
Adobe Supplement Iso32000 1 PDF
8 pages
TLV320AIC3262EVM User Guide (Rev. A)
No ratings yet
TLV320AIC3262EVM User Guide (Rev. A)
37 pages
SIP5 Hardware V08.40 Manual C002-K en
No ratings yet
SIP5 Hardware V08.40 Manual C002-K en
292 pages
HP Elitebook 6930p Wistron Karia Rev - 1 SCH
No ratings yet
HP Elitebook 6930p Wistron Karia Rev - 1 SCH
58 pages
Microsoft Virtual Training Day Security Compliance and Identity Fundamentals
No ratings yet
Microsoft Virtual Training Day Security Compliance and Identity Fundamentals
130 pages
Distributed Arithmetic: Implementations and Applications: A Tutorial
No ratings yet
Distributed Arithmetic: Implementations and Applications: A Tutorial
30 pages
Agile Testing
No ratings yet
Agile Testing
22 pages
Temp Anr 2264613547763101732
No ratings yet
Temp Anr 2264613547763101732
45 pages
About - CA Exam
No ratings yet
About - CA Exam
2 pages
MI Sample CyberSecurity Incident Response Plan
No ratings yet
MI Sample CyberSecurity Incident Response Plan
26 pages
Sap PP Guide Material Staging
No ratings yet
Sap PP Guide Material Staging
17 pages
FX2 USB To ATA Operating Instructions
No ratings yet
FX2 USB To ATA Operating Instructions
3 pages
Work Experience SAP Consultant: Capgemini Airbus SE Known As The European Aeronautic Defence and Space Company (EADS), Is A European Aerospace
No ratings yet
Work Experience SAP Consultant: Capgemini Airbus SE Known As The European Aeronautic Defence and Space Company (EADS), Is A European Aerospace
3 pages
Auto Purchase Order For HSD
No ratings yet
Auto Purchase Order For HSD
10 pages
RosellaC - Ramos-Topic 3-PHP Control Structures-Part 2
No ratings yet
RosellaC - Ramos-Topic 3-PHP Control Structures-Part 2
16 pages
Terraform (PDFDrive)
100% (1)
Terraform (PDFDrive)
47 pages
JDSU Poster SDH Nextgen - Networks PDF
100% (2)
JDSU Poster SDH Nextgen - Networks PDF
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit-5 Sgs

Uploaded by

Unit-5 Sgs

Uploaded by

Hive Shell & services:

Unit Name Operation

User Interface Hive is a data warehouse infrastructure software that can

Execution Engine The conjunction part of HiveQL process Engine and

Difference between RDBMS and Hive:

It is used to maintain database. It is used to maintain data warehouse.

It uses SQL (Structured Query

Schema is fixed in RDBMS. Schema varies in it.

Normalized and de-normalized both type of

Tables in rdms are sparse. Table in hive are dense.

It doesn’t support partitioning. It supports automation partition.

No partition method is used. Sharding method is used for partition.

Hadoop Hive Hadoop Pig

Inner Join in HiveQL

Example of Inner Join in Hive

hive> use hiveql;

Now, create a table by using the following command:

hive> create table employee(empid int, empname string , state string) ;

Now, create another table by using the following command:

1. hive> create table employee_department(depid int, department_name string) ;

1. hive>select e1.empname, e2.department_name from employee e1 join employee_department e2

Example of Left Outer Join in Hive

In this example, we perform the left outer join operation.

Right Outer Join in HiveQL

In this example, we perform the left outer join operation.

1. hive> select e1.empname, e2.department_name from employee e1 right outer join

Full Outer Join

Some of UDFs are specifically designed for the reusability of code in

For example, for string stemming we don’t have any predefined

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.