0% found this document useful (0 votes)
39 views13 pages

Hive

Hive is a data warehouse infrastructure tool used to process large structured data stored in Hadoop. It allows users to query and analyze data using SQL-like queries. Hive provides schema on read functionality, where it stores schema information separately from data. It uses a metastore to store metadata and utilizes MapReduce to process queries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views13 pages

Hive

Hive is a data warehouse infrastructure tool used to process large structured data stored in Hadoop. It allows users to query and analyze data using SQL-like queries. Hive provides schema on read functionality, where it stores schema information separately from data. It uses a metastore to store metadata and utilizes MapReduce to process queries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

What is Hive

Hive is a data warehouse infrastructure tool to process structured data in Hadoop. It


resides on top of Hadoop to summarize Big Data, and makes querying and analyzing
easy.
Initially Hive was developed by Facebook, later the Apache Software Foundation
took it up and developed it further as an open source under the name Apache Hive.
It is used by different companies. For example, Amazon uses it in Amazon Elastic
MapReduce.

Hive is not
 A relational database
 A design for OnLine Transaction Processing (OLTP)
 A language for real-time queries and row-level updates
Features of Hive
 It stores schema in a database and processed data into HDFS.
 It is designed for OLAP.
 It provides SQL type language for querying called HiveQL or HQL.
 It is familiar, fast, scalable, and extensible.
Architecture of Hive
The following component diagram depicts the architecture of Hive:

This component diagram contains different units. The following table describes each
unit:

Unit Name Operation

User Interface Hive is a data warehouse infrastructure software that can


create interaction between user and HDFS. The user
interfaces that Hive supports are Hive Web UI, Hive
command line, and Hive HD Insight (In Windows server).

Meta Store Hive chooses respective database servers to store the


schema or Metadata of tables, databases, columns in a
table, their data types, and HDFS mapping.
HiveQL Process Engine HiveQL is similar to SQL for querying on schema info on
the Metastore. It is one of the replacements of traditional
approach for MapReduce program. Instead of writing
MapReduce program in Java, we can write a query for
MapReduce job and process it.

Execution Engine The conjunction part of HiveQL process Engine and


MapReduce is Hive Execution Engine. Execution engine
processes the query and generates results as same as
MapReduce results. It uses the flavor of MapReduce.

HDFS or HBASE Hadoop distributed file system or HBASE are the data
storage techniques to store data into file system.

Working of Hive
The following diagram depicts the workflow between Hive and Hadoop.

The following table defines how Hive interacts with Hadoop framework:

Step No. Operation

1 Execute Query
The Hive interface such as Command Line or Web UI sends query to
Driver (any database driver such as JDBC, ODBC, etc.) to execute.

2 Get Plan
The driver takes the help of query compiler that parses the query to
check the syntax and query plan or the requirement of query.

3 Get Metadata
The compiler sends metadata request to Metastore (any database).

4 Send Metadata
Metastore sends metadata as a response to the compiler.
5 Send Plan
The compiler checks the requirement and resends the plan to the driver.
Up to here, the parsing and compiling of a query is complete.

6 Execute Plan
The driver sends the execute plan to the execution engine.

7 Execute Job
Internally, the process of execution job is a MapReduce job. The
execution engine sends the job to JobTracker, which is in Name node
and it assigns this job to TaskTracker, which is in Data node. Here, the
query executes MapReduce job.

7.1 Metadata Ops


Meanwhile in execution, the execution engine can execute metadata
operations with Metastore.

8 Fetch Result
The execution engine receives the results from Data nodes.

9 Send Results
The execution engine sends those resultant values to the driver.

10 Send Results
The driver sends the results to Hive Interfaces.

Create Database Statement


Create Database is a statement used to create a database in Hive. A database in Hive
is a namespace or a collection of tables. The syntax for this statement is as follows:

CREATE DATABASE|SCHEMA [IF NOT EXISTS] <database name>

Here, IF NOT EXISTS is an optional clause, which notifies the user that a database
with the same name already exists. We can use SCHEMA in place of DATABASE in
this command. The following query is executed to create a database named userdb:
hive> CREATE DATABASE [IF NOT EXISTS] userdb;

or

hive> CREATE SCHEMA userdb;

The following query is used to verify a databases list:

hive> SHOW DATABASES;


default
userdb

Drop Database Statement


Drop Database is a statement that drops all the tables and deletes the database. Its
syntax is as follows:

DROP DATABASE StatementDROP (DATABASE|SCHEMA) [IF EXISTS] database_name


[RESTRICT|CASCADE];

The following queries are used to drop a database. Let us assume that the database
name is userdb.

hive> DROP DATABASE IF EXISTS userdb;

The following query drops the database using CASCADE. It means dropping
respective tables before dropping the database.

hive> DROP DATABASE IF EXISTS userdb CASCADE;

The following query drops the database using SCHEMA.

hive> DROP SCHEMA userdb;

Create Table Statement


Create Table is a statement used to create a table in Hive. The syntax and example
are as follows:

Syntax
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name

[(col_name data_type [COMMENT col_comment], ...)]


[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]
Example
Let us assume you need to create a table named employee using CREATE
TABLE statement. The following table lists the fields and their data types in
employee table:

Sr.No Field Name Data Type

1 Eid int

2 Name String

3 Salary Float

4 Designation string

The following data is a Comment, Row formatted fields such as Field terminator,
Lines terminator, and Stored File type.
COMMENT ‘Employee details’
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED IN TEXT FILE
The following query creates a table named employee using the above data.
hive> CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
If you add the option IF NOT EXISTS, Hive ignores the statement in case the table
already exists.
On successful creation of table, you get to see the following response:
OK
Time taken: 5.905 seconds
hive>

Alter Table Statement


It is used to alter a table in Hive.
Syntax
The statement takes any of the following syntaxes based on what attributes we wish
to modify in a table.
ALTER TABLE name RENAME TO new_name
ALTER TABLE name ADD COLUMNS (col_spec[, col_spec ...])
ALTER TABLE name DROP [COLUMN] column_name
ALTER TABLE name CHANGE column_name new_name new_type
ALTER TABLE name REPLACE COLUMNS (col_spec[, col_spec ...])
Rename To… Statement
The following query renames the table from employee to emp.
hive> ALTER TABLE employee RENAME TO emp;

Relational Operators
These operators are used to compare two operands. The following table describes
the relational operators available in Hive:

Operator Operand Description

A=B all primitive types TRUE if expression A is equivalent to


expression B otherwise FALSE.

A != B all primitive types TRUE if expression A is not equivalent to


expression B otherwise FALSE.

A<B all primitive types TRUE if expression A is less than expression


B otherwise FALSE.

A <= B all primitive types TRUE if expression A is less than or equal to


expression B otherwise FALSE.

A>B all primitive types TRUE if expression A is greater than


expression B otherwise FALSE.
A >= B all primitive types TRUE if expression A is greater than or equal
to expression B otherwise FALSE.

A IS NULL all types TRUE if expression A evaluates to NULL


otherwise FALSE.

A IS NOT NULL all types FALSE if expression A evaluates to NULL


otherwise TRUE.

A LIKE B Strings TRUE if string pattern A matches to B


otherwise FALSE.

A RLIKE B Strings NULL if A or B is NULL, TRUE if any substring


of A matches the Java regular expression B ,
otherwise FALSE.

A REGEXP B Strings Same as RLIKE.

Example
Let us assume the employee table is composed of fields named Id, Name, Salary,
Designation, and Dept as shown below. Generate a query to retrieve the employee
details whose Id is 1205.
+-----+--------------+--------+---------------------------+------+
| Id | Name | Salary | Designation | Dept |
+-----+--------------+------------------------------------+------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
|1205 | Kranthi | 30000 | Op Admin | Admin|
+-----+--------------+--------+---------------------------+------+
The following query is executed to retrieve the employee details using the above
table:

hive> SELECT * FROM employee WHERE Id=1205;

On successful execution of query, you get to see the following response:


+-----+-----------+-----------+----------------------------------+
| ID | Name | Salary | Designation | Dept |
+-----+---------------+-------+----------------------------------+
|1205 | Kranthi | 30000 | Op Admin | Admin |
+-----+-----------+-----------+----------------------------------+
The following query is executed to retrieve the employee details whose salary is
more than or equal to Rs 40000.

hive> SELECT * FROM employee WHERE Salary>=40000;

On successful execution of query, you get to see the following response:


+-----+------------+--------+----------------------------+------+
| ID | Name | Salary | Designation | Dept |
+-----+------------+--------+----------------------------+------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali| 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
+-----+------------+--------+----------------------------+------+
Arithmetic Operators
These operators support various common arithmetic operations on the operands. All
of them return number types. The following table describes the arithmetic operators
available in Hive:

Operators Operand Description

A+B all number types Gives the result of adding A and B.

A-B all number types Gives the result of subtracting B from A.

A*B all number types Gives the result of multiplying A and B.

A/B all number types Gives the result of dividing B from A.

A%B all number types Gives the reminder resulting from dividing A by B.

A&B all number types Gives the result of bitwise AND of A and B.

A|B all number types Gives the result of bitwise OR of A and B.

A^B all number types Gives the result of bitwise XOR of A and B.
~A all number types Gives the result of bitwise NOT of A.

Example
The following query adds two numbers, 20 and 30.

hive> SELECT 20+30 ADD FROM temp;

On successful execution of the query, you get to see the following response:
+--------+
| ADD |
+--------+
| 50 |
+--------+
Logical Operators
The operators are logical expressions. All of them return either TRUE or FALSE.

Operators Operands Description

A AND B boolean TRUE if both A and B are TRUE, otherwise FALSE.

A && B boolean Same as A AND B.

A OR B boolean TRUE if either A or B or both are TRUE, otherwise FALSE.

A || B boolean Same as A OR B.

NOT A boolean TRUE if A is FALSE, otherwise FALSE.

!A boolean Same as NOT A.

Example
The following query is used to retrieve employee details whose Department is TP
and Salary is more than Rs 40000.

hive> SELECT * FROM employee WHERE Salary>40000 && Dept=TP;

On successful execution of the query, you get to see the following response:
+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
+------+--------------+-------------+-------------------+--------+
Complex Operators
These operators provide an expression to access the elements of Complex Types.

Operator Operand Description

A[n] A is an Array and n is an It returns the nth element in the array A. The
int first element has index 0.

M[key] M is a Map<K, V> and It returns the value corresponding to the key
key has type K in the map.

S.x S is a struct It returns the x field of S.

Creating a View
You can create a view at the time of executing a SELECT statement. The syntax is as
follows:

CREATE VIEW [IF NOT EXISTS] view_name [(column_name [COMMENT column_comment],


...) ]
[COMMENT table_comment]
AS SELECT ...

Example
Let us take an example for view. Assume employee table as given below, with the
fields Id, Name, Salary, Designation, and Dept. Generate a query to retrieve the
employee details who earn a salary of more than Rs 30000. We store the result in a
view named emp_30000.
+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
|1205 | Kranthi | 30000 | Op Admin | Admin |
+------+--------------+-------------+-------------------+--------+
The following query retrieves the employee details using the above scenario:
hive> CREATE VIEW emp_30000 AS
SELECT * FROM employee
WHERE salary>30000;
Dropping a View
Use the following syntax to drop a view:
DROP VIEW view_name
The following query drops a view named as emp_30000:
hive> DROP VIEW emp_30000;
Creating an Index
An Index is nothing but a pointer on a particular column of a table. Creating an index
means creating a pointer on a particular column of a table. Its syntax is as follows:
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
Example
Let us take an example for index. Use the same employee table that we have used
earlier with the fields Id, Name, Salary, Designation, and Dept. Create an index named
index_salary on the salary column of the employee table.
The following query creates an index:
hive> CREATE INDEX inedx_salary ON TABLE employee(salary)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';
It is a pointer to the salary column. If the column is modified, the changes are stored
using an index value.

Dropping an Index
The following syntax is used to drop an index:
DROP INDEX <index_name> ON <table_name>
The following query drops an index named index_salary:
hive> DROP INDEX index_salary ON employee;

HiveQL - Select-Where
The Hive Query Language (HiveQL) is a query language for Hive to process and
analyze structured data in a Metastore. This chapter explains how to use the SELECT
statement with WHERE clause.
SELECT statement is used to retrieve the data from a table. WHERE clause works
similar to a condition. It filters the data using the condition and gives you a finite
result. The built-in operators and functions generate an expression, which fulfils the
condition.

Syntax
Given below is the syntax of the SELECT query:

SELECT [ALL | DISTINCT] select_expr, select_expr, ...


FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list]]
[LIMIT number];

Example
Let us take an example for SELECT…WHERE clause. Assume we have the employee
table as given below, with fields named Id, Name, Salary, Designation, and Dept.
Generate a query to retrieve the employee details who earn a salary of more than Rs
30000.
+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
|1205 | Kranthi | 30000 | Op Admin | Admin |
+------+--------------+-------------+-------------------+--------+
The following query retrieves the employee details using the above scenario:

hive> SELECT * FROM employee WHERE salary>30000;

On successful execution of the query, you get to see the following response:
+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
+------+--

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy