Hive
Hive
Hive is not
A relational database
A design for OnLine Transaction Processing (OLTP)
A language for real-time queries and row-level updates
Features of Hive
It stores schema in a database and processed data into HDFS.
It is designed for OLAP.
It provides SQL type language for querying called HiveQL or HQL.
It is familiar, fast, scalable, and extensible.
Architecture of Hive
The following component diagram depicts the architecture of Hive:
This component diagram contains different units. The following table describes each
unit:
HDFS or HBASE Hadoop distributed file system or HBASE are the data
storage techniques to store data into file system.
Working of Hive
The following diagram depicts the workflow between Hive and Hadoop.
The following table defines how Hive interacts with Hadoop framework:
1 Execute Query
The Hive interface such as Command Line or Web UI sends query to
Driver (any database driver such as JDBC, ODBC, etc.) to execute.
2 Get Plan
The driver takes the help of query compiler that parses the query to
check the syntax and query plan or the requirement of query.
3 Get Metadata
The compiler sends metadata request to Metastore (any database).
4 Send Metadata
Metastore sends metadata as a response to the compiler.
5 Send Plan
The compiler checks the requirement and resends the plan to the driver.
Up to here, the parsing and compiling of a query is complete.
6 Execute Plan
The driver sends the execute plan to the execution engine.
7 Execute Job
Internally, the process of execution job is a MapReduce job. The
execution engine sends the job to JobTracker, which is in Name node
and it assigns this job to TaskTracker, which is in Data node. Here, the
query executes MapReduce job.
8 Fetch Result
The execution engine receives the results from Data nodes.
9 Send Results
The execution engine sends those resultant values to the driver.
10 Send Results
The driver sends the results to Hive Interfaces.
Here, IF NOT EXISTS is an optional clause, which notifies the user that a database
with the same name already exists. We can use SCHEMA in place of DATABASE in
this command. The following query is executed to create a database named userdb:
hive> CREATE DATABASE [IF NOT EXISTS] userdb;
or
The following queries are used to drop a database. Let us assume that the database
name is userdb.
The following query drops the database using CASCADE. It means dropping
respective tables before dropping the database.
Syntax
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name
1 Eid int
2 Name String
3 Salary Float
4 Designation string
The following data is a Comment, Row formatted fields such as Field terminator,
Lines terminator, and Stored File type.
COMMENT ‘Employee details’
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED IN TEXT FILE
The following query creates a table named employee using the above data.
hive> CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;
If you add the option IF NOT EXISTS, Hive ignores the statement in case the table
already exists.
On successful creation of table, you get to see the following response:
OK
Time taken: 5.905 seconds
hive>
Relational Operators
These operators are used to compare two operands. The following table describes
the relational operators available in Hive:
Example
Let us assume the employee table is composed of fields named Id, Name, Salary,
Designation, and Dept as shown below. Generate a query to retrieve the employee
details whose Id is 1205.
+-----+--------------+--------+---------------------------+------+
| Id | Name | Salary | Designation | Dept |
+-----+--------------+------------------------------------+------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
|1205 | Kranthi | 30000 | Op Admin | Admin|
+-----+--------------+--------+---------------------------+------+
The following query is executed to retrieve the employee details using the above
table:
A%B all number types Gives the reminder resulting from dividing A by B.
A&B all number types Gives the result of bitwise AND of A and B.
A^B all number types Gives the result of bitwise XOR of A and B.
~A all number types Gives the result of bitwise NOT of A.
Example
The following query adds two numbers, 20 and 30.
On successful execution of the query, you get to see the following response:
+--------+
| ADD |
+--------+
| 50 |
+--------+
Logical Operators
The operators are logical expressions. All of them return either TRUE or FALSE.
A || B boolean Same as A OR B.
Example
The following query is used to retrieve employee details whose Department is TP
and Salary is more than Rs 40000.
On successful execution of the query, you get to see the following response:
+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
+------+--------------+-------------+-------------------+--------+
Complex Operators
These operators provide an expression to access the elements of Complex Types.
A[n] A is an Array and n is an It returns the nth element in the array A. The
int first element has index 0.
M[key] M is a Map<K, V> and It returns the value corresponding to the key
key has type K in the map.
Creating a View
You can create a view at the time of executing a SELECT statement. The syntax is as
follows:
Example
Let us take an example for view. Assume employee table as given below, with the
fields Id, Name, Salary, Designation, and Dept. Generate a query to retrieve the
employee details who earn a salary of more than Rs 30000. We store the result in a
view named emp_30000.
+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
|1205 | Kranthi | 30000 | Op Admin | Admin |
+------+--------------+-------------+-------------------+--------+
The following query retrieves the employee details using the above scenario:
hive> CREATE VIEW emp_30000 AS
SELECT * FROM employee
WHERE salary>30000;
Dropping a View
Use the following syntax to drop a view:
DROP VIEW view_name
The following query drops a view named as emp_30000:
hive> DROP VIEW emp_30000;
Creating an Index
An Index is nothing but a pointer on a particular column of a table. Creating an index
means creating a pointer on a particular column of a table. Its syntax is as follows:
CREATE INDEX index_name
ON TABLE base_table_name (col_name, ...)
AS 'index.handler.class.name'
[WITH DEFERRED REBUILD]
[IDXPROPERTIES (property_name=property_value, ...)]
[IN TABLE index_table_name]
[PARTITIONED BY (col_name, ...)]
[
[ ROW FORMAT ...] STORED AS ...
| STORED BY ...
]
[LOCATION hdfs_path]
[TBLPROPERTIES (...)]
Example
Let us take an example for index. Use the same employee table that we have used
earlier with the fields Id, Name, Salary, Designation, and Dept. Create an index named
index_salary on the salary column of the employee table.
The following query creates an index:
hive> CREATE INDEX inedx_salary ON TABLE employee(salary)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';
It is a pointer to the salary column. If the column is modified, the changes are stored
using an index value.
Dropping an Index
The following syntax is used to drop an index:
DROP INDEX <index_name> ON <table_name>
The following query drops an index named index_salary:
hive> DROP INDEX index_salary ON employee;
HiveQL - Select-Where
The Hive Query Language (HiveQL) is a query language for Hive to process and
analyze structured data in a Metastore. This chapter explains how to use the SELECT
statement with WHERE clause.
SELECT statement is used to retrieve the data from a table. WHERE clause works
similar to a condition. It filters the data using the condition and gives you a finite
result. The built-in operators and functions generate an expression, which fulfils the
condition.
Syntax
Given below is the syntax of the SELECT query:
Example
Let us take an example for SELECT…WHERE clause. Assume we have the employee
table as given below, with fields named Id, Name, Salary, Designation, and Dept.
Generate a query to retrieve the employee details who earn a salary of more than Rs
30000.
+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
|1205 | Kranthi | 30000 | Op Admin | Admin |
+------+--------------+-------------+-------------------+--------+
The following query retrieves the employee details using the above scenario:
On successful execution of the query, you get to see the following response:
+------+--------------+-------------+-------------------+--------+
| ID | Name | Salary | Designation | Dept |
+------+--------------+-------------+-------------------+--------+
|1201 | Gopal | 45000 | Technical manager | TP |
|1202 | Manisha | 45000 | Proofreader | PR |
|1203 | Masthanvali | 40000 | Technical writer | TP |
|1204 | Krian | 40000 | Hr Admin | HR |
+------+--