0% found this document useful (0 votes)
8 views18 pages

Hive Updated

Uploaded by

Vipul Khandke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views18 pages

Hive Updated

Uploaded by

Vipul Khandke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Syllabus

HIVE
HIVE
• Hive is a data warehouse system - used to analyse
structured data.
• Built on the top of Hadoop.
• Developed by Facebook.
• Functionality of reading, writing, and managing
large datasets residing in distributed storage.
• Runs SQL like queries called HQL (Hive query
language) which gets internally converted to
MapReduce jobs.
• Using Hive, - skip writing complex MapReduce
programs.
• Hive supports Data Definition Language (DDL),
Features of HIVE
• Hive is fast and scalable.
• Capable of analyzing large datasets stored in
HDFS.
• Allows different storage types - plain text, RCFile,
and HBase.
• It uses indexing to accelerate queries.
• It can operate on compressed data stored in the
HDFS.
• It supports user-defined functions (UDFs) where
user can provide its functionality.

4
Limitations of HIVE
• Hive is not capable of handling real-time data.
• It is not designed for online transaction
processing.
• Hive queries contain high latency.
PIG vs HIVE
Hive Pig

Hive is used by Data Analysts. Pig is used by programmers.

It follows SQL-like queries. It follows the data-flow


language.
It can handle structured data. It can handle semi-structured
data.
It works on server-side of It works on client-side of HDFS
HDFS cluster. cluster.
Hive is slower than Pig. Pig is comparatively faster than
Hive.
HIVE Architecture
1. Hive Client
• Hive allows writing applications in various
languages, including Java, Python, and C++.
• It supports different types of clients such as:
• Thrift Server - It is a cross-language service
provider platform that serves the request from all
those programming languages that supports
Thrift.
• JDBC Driver - It is used to establish a connection
between hive and Java applications.
• ODBC Driver - It allows the applications that
support the ODBC protocol to connect to Hive.
HIVE Architecture
• Hive CLI - The Hive CLI (Command Line Interface)
is a shell where we can execute Hive queries and
commands.
• Hive Web User Interface - The Hive Web UI is
just an alternative of Hive CLI. It provides a web-
based GUI for executing Hive queries and
commands.
• Hive Server - It is referred to as Apache Thrift
Server. It accepts the request from different clients
and provides it to Hive Driver.
HIVE Architecture
• Hive MetaStore - It is a central repository that
stores all the structure information of various
tables and partitions in the warehouse. It also
includes metadata of column and its type
informationused to read and write data and the
corresponding HDFS files where the data is
stored.
Apache Hive Installation
• Java Installation - $ java -version
• Hadoop Installation - $hadoop version
• Download the Apache Hive tar file.
• http://mirrors.estointernet.in/apache/hive/hive-1.2.2/
• Unzip the downloaded tar file.
• tar -xvf apache-hive-1.2.2-bin.tar.gz
• Open the bashrc file.  $ sudo nano ~/.bashrc
• Provide the following HIVE_HOME path.
• export HIVE_HOME=/home/user/local/apache-hive-1.2.2-
bin
• export PATH=$PATH:/home/user/local/apache-hive-1.2.2-
bin/bin
• Update the environment variable.  $ source ~/.bashrc
• Let's start the hive  $ hive
HIVE DATA TYPES
Integer Types
Type Size Range

TINYINT 1-byte signed -128 to 127


integer
SMALLINT 2-byte signed 32,768 to 32,767
integer
INT 4-byte signed 2,147,483,648 to 2,147,483,647
integer

Decimal
BIGINT Types 8-byte signed -9,223,372,036,854,775,808 to
integer 9,223,372,036,854,775,807
Type Size Range

FLOAT 4-byte Single precision floating point number

DOUBLE 8-byte Double precision floating point number


• Date/Time Types
• TIMESTAMP
• supports UNIX timestamp with optional
nanosecond precision.
• "YYYY-MM-DD HH:MM:SS.fffffffff" (9 decimal place
precision)
• DATES
• The Date value is used to specify a particular
year, month and day, in the form YYYY--MM--DD.
• However, it didn't provide the time of the day. The range
of Date type lies between 0000--01--01 to 9999--12--31
• String Types
• Varchar
• The varchar is a variable length type whose
range lies between 1 and 65535, which
specifies that the maximum number of
characters allowed in the character string.
• CHAR
• The char is a fixed-length type whose
maximum length is fixed at 255.
Complex Type
Type Size Range

Struct It is similar to C struct or an struct('James','Roy')


object where fields are accessed
using the "dot" notation.
Map It contains the key-value tuples map('first','James','last'
where the fields are accessed ,'Roy')
using array notation.
Array It is a collection of similar type of array('James','Roy')
values that indexable using zero-
based integers.
Hive - Create Database
hive
> show databases;

hive> create database


demo;

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy