0% found this document useful (0 votes)

81 views11 pages

Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)

This document provides an overview of Hive and demonstrates how to use various Hive features through examples. The key points covered are: - Hive allows SQL-like queries over large datasets stored in Hadoop. It converts queries to MapReduce jobs. - Examples demonstrate how to create and load data into managed and external tables from local files or HDFS, query data, add/drop columns, partitions and buckets. - Working with a movies dataset, queries filter and aggregate data to find maximum ratings, counts, and views by year. - Partitioning and bucketing are explained - partitioning organizes data by column values into directories, bucketing restricts data to a set number of buckets.

Uploaded by

Ňąŕëşh ķümãŕ S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views11 pages

Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)

Uploaded by

Ňąŕëşh ķümãŕ S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

EXPERIMENT 3: HIVE

Aim: To understand Data Processing Tool – Hive and HQL (Hive query
language)

Objectives:
1. Create Managed and External tables in HIVE
2. Load data in HIVE table from Local File System
3. Load data in HIVE table from HDFS
4. Query data sets using Hive QL
5. Create partitions and buckets

Key concept:

 Hive is a Data warehousing tool in Hadoop ecosystem.

 HIVE Facilitates reading, writing, and managing large
datasets residing in distributed storage and queried using
SQL syntax.
 It is used for analyzing structured and semi-structured
data. Hive abstracts the complexity of Hadoop MapReduce.
Basically, it provides a mechanism to project structure
onto the data and perform queries written in HQL (Hive
Query Language) that are similar to SQL statements.
 Internally, these queries or HQL gets converted to map
reduce jobs by the Hive compiler. Therefore, you don’t need
to worry about writing complex MapReduce programs
to process your data using Hadoop.
 Tools to enable easy access to data via SQL, thus enabling
data warehousing tasks such as extract/transform/load
(ETL), reporting, and data analysis.
 A mechanism to impose structure on a variety of data
formats.
 There is not a single "Hive format" in which data must be
stored. Hive comes with built in connectors for comma and
tab-separated values (CSV/TSV) text file etc.
 Hive is not designed for online transaction processing
(OLTP) workloads. It is best used for traditional data
warehousing tasks.

1
Q1: How to enter the HIVE Shell?

Go to the Terminal and type hive, you will see the hive on the
prompt.

[cloudera@quickstart Desktop]$ hive

Q2: Create a database

create database emp_details;

use emp_details;

Q3: How to create Managed Table in HIVE?

create table emp(empno int, ename string, job string, sal int,
deptno int)
row format delimited fields terminated by ',';

Q4: How to load the data from LOCAL to HIVE TABLE

Suppose you created a comma separated file in local system named

empdetails.txt

1,A,clerk,4000,10
2,A,clerk,4000,30
3,B,mgr,8000,20
4,C,peon,2000,40
5,D,clerk,4000,10
6,E,mgr,8000,50

hive> LOAD DATA LOCAL INPATH

'/home/cloudera/Desktop/empdetails.txt' OVERWRITE INTO TABLE
emp;

# Note: If 'LOCAL' is omitted then it looks for the file in

HDFS.

2
The keyword 'OVERWRITE' signifies that existing data in the
table is deleted. If the 'OVERWRITE' keyword is omitted, data
files are appended to existing data sets.

Q5: How to check where the managed table is created in hive

[cloudera@quickstart Desktop]$ hadoop fs -ls

/user/hive/warehouse/emp_details.db

Found 2 items
drwxrwxrwx - cloudera supergroup 0 2018-07-24 02:40
/user/hive/warehouse/emp_details.db/emp
drwxrwxrwx - cloudera supergroup 0 2018-07-24 02:28
/user/hive/warehouse/emp_details.db/emp1

Also check the contents inside emp:

[cloudera@quickstart Desktop]$ hadoop fs -ls

/user/hive/warehouse/emp_details.db/emp

Found 1 items
-rwxrwxrwx 1 cloudera supergroup 104 2018-07-24 02:40
/user/hive/warehouse/emp_details.db/emp/empdetails.txt

Now see the contents inside empdetails.txt

[cloudera@quickstart Desktop]$ hadoop fs -cat

/user/hive/warehouse/emp_details.db/emp/empdetails.txt

,A,clerk,4000,10
2,A,clerk,4000,30
3,B,mgr,8000,20
4,C,peon,2000,40
5,D,clerk,4000,10
6,E,mgr,8000,50

Q6:Check the schema of the created table emp?

describe emp;

For a detailed schema use:

describe extended emp;

3
Q7: How to see all the tables present in database
show tables;

Q8: Select all the enames from emp table

select ename from emp;

Q9: Get the records where name is 'A'

select * from emp where ename='A';

Q10: Count the total number of records in the created table

Count aggregate function is used count the total number of the

records in a table.
select count(1) from emp;
OR
Select count(*) from emp;

Q11: Group the sum of salaries as per the deptno

select deptno, sum(sal) from emp group by deptno;

Q12: Get the salary of people between 1000 and 2000

select * from emp where sal between 1000 and 2000;

Q13: Select the name of employees where job has exactly 5

characters
hive> select ename from emp where job LIKE '_____';

Q14: List the employee names where job has l as the second
character

hive> select ename from emp where job LIKE '_l%';

Q15: Retrieve the total salary for each department

select deptno, sum(sal) from emp group by deptno;

Q16: Add a column to the table

alter table emp add COLUMNS(lastname string);

Q17: How to Rename a table

alter table emp rename to emp1;

4
Q18: How to drop table
drop table emp;

Q19: How to create External Table:

Syntax:

CREATE EXTERNAL TABLE <table_name> (column1 data_type, column2

data_type)
row format delimited fields terminated by ','
LOCATION ‘<table_hive_location>’;

Eg. I have created a comma separated file in local machine

called extdata.txt

1,2,3
4,5,6

Then I have copied this file in HDFS

[cloudera@quickstart Desktop]$ hadoop fs -put extdata.txt

/user/cloudera/

Then I copied this file in a directory named hivedata

[cloudera@quickstart Desktop]$ hadoop fs -cp extdata.txt

hivedata

Then I created a table ext1 and loaded it with the data

hive> create external table ext1(a int, b int, c int)

> row format delimited fields terminated by ','
> LOCATION '/user/cloudera/hivedata'
> ;

Now check if table is populated with data

hive> select * from ext1;

5
NOTE: You will not see this external table in the location
/user/hive/warehouse/emp_details.db as you saw in case of
managed table, this is because external table is created by
referring the data to the location where txt file is there and
not by loading it in the hive table.

Moreover if you drop the managed table all the data will be lost
in location /user/hive/warehouse/emp_details.db where as in case
of external data your data will still remain in hdfs.

[cloudera@quickstart Desktop]$ hadoop fs -ls

/user/hive/warehouse/emp_details.db
Found 2 items
drwxrwxrwx - cloudera supergroup 0 2018-07-24 02:40
/user/hive/warehouse/emp_details.db/emp
drwxrwxrwx - cloudera supergroup 0 2018-07-24 02:28
/user/hive/warehouse/emp_details.db/emp1

In above output we saw that two managed tables only being seen
and not ext1 which is an external table.

If you drop the external table we will not lose the data in hdfs
as shown below.

[cloudera@quickstart Desktop]$ hadoop fs -ls hivedata

Found 1 items
-rw-r--r-- 1 cloudera cloudera 12 2018-09-25 20:51
hivedata/extdata.txt
[cloudera@quickstart Desktop]$ hadoop fs -cat
hivedata/extdata.txt
1,2,3
4,5,6

If you drop the managed table , you will see that you will not
find your data in location: /user/hive/warehouse/emp_details.db

6
..................WORKING WITH MOVIES DATA SET ……………………….

Q1: Create a database called movies

create database movies;

Q2: Work with database movies

use movies;

Q3: create a table movies_details inside movies database

hive> create table movie_details(no int,
> name string,
> year int,
> rating decimal,
> views int)
> row format delimited fields terminated by ',';

Q4: Load the data set of movies from local to hive table
hive> LOAD DATA LOCAL INPATH
'/home/cloudera/Desktop/hive_demo/movies_new' INTO table
movie_details;

Q5: Check the table created inside database.

hive> show tables;

Q6: Retrieve all the records in movies_details?

hive> select * from movie_details;

Q7: Print all movies between year 1920 and 1990

hive> select * from movie_details where year between 1920 and

1990;

Q8: Select all records where movie name starts from letter c or
C

hive> select * from movie_details where name LIKE 'C%' or name

LIKE 'c%';

Q9: select all records where movie name starts with The

hive> select * from movie_details where name LIKE 'The%';

7
Q10: What is the maximum rating of the movie
hive> select max(rating) from movie_details;

Q11: count the number of records

hive> select count(*) from movie_details;

Q12: select rating of the movie School Ties

hive> SELECT name,rating FROM movie_details WHERE name = 'School

Ties';

Q13: List all the years with total number of views in each year
( hint group by year), restrict the records to 5

hive> select year, sum(views) from movie_details group by year

LIMIT 5;
OK

PARTITIONING AND BUCKETING

Q1: Explain the concept of Partitioning and bucketing?

Assume that you are storing information of people in entire

world spread across 196+ countries spanning around 500 crores of
entries. If you want to query people from a particular country
(Vatican city), in absence of partitioning, you have to scan all
500 crores of entries even to fetch thousand entries of a
country. If you partition the table based on country, you can
fine tune querying process by just checking the data for only
one country partition. Hive partition creates a separate
directory for a column(s) value.

Bucketing decomposes data into more manageable or equal parts.

With partitioning, there is a possibility that you can create

multiple small partitions based on column values. If you go for
bucketing, you are restricting number of buckets to store the
data. This number is defined during table creation scripts

Q2: create a database shopping

8
hive>create database shopping;
use shopping;

Q3: create table (shopping1) inside the database shopping

create table shopping1(code INT, item_name STRING, category
string place string)
> row format delimited fields terminated by ',';

Q4: Load the data in HIVE table from local

Suppose you have a file shop.txt (see the contents below) in

your local machine, you can create a CSV file using gedit
command

1,purse,bag,shimla
2,lipstick,cosmetic,delhi
3,bowl,utensils,jammu
4,mobile,electronic gadget,hyderabad
5,skirt,apparel,chennai
6,bed cover,furnishing,chandigarh
7,car,toys,karnal
8,hand purse,bag,solan
9,cream,cosmetic,jhodpur
10,plate,utensils,mohali
11,head phones,electronic gadget,calicut
12,top,apparel,mumbai
13,table cover,furnishing,agra
17,truck,toys,jaipur
18,wallet,bag,solan
19,foundation,cosmetic,jhodpur
20,spoon,utensils,mohali
21,speaker,electronic gadget,calicut
22,suit,apparel,mumbai
33,table sheet,furnishing,agra
24,auto,toys,jaipur

hive> LOAD DATA LOCAL INPATH '/home/cloudera/Desktop/shop.txt'

OVERWRITE INTO TABLE shopping1;

Q5: create a partition (shopping3) for table shopping1 and also

create 3 buckets inside each partition

9
hive> create table shopping3(code INT, item_name STRING, place
string)
> partitioned by (category string)
> clustered by (place) into 3 buckets
> row format delimited fields terminated by ','
> stored as texfile;

Q4: Populate the partition with data

hive> from shopping1 txn INSERT OVERWRITE TABLE shopping3
PARTITION(category)
select txn.code, txn.item_name, txn.place,
txn.category DISTRIBUTE by category;

Q5: Check your partition

[cloudera@quickstart Desktop]$ hadoop fs -ls
/user/hive/warehouse/shopping.db/

[cloudera@quickstart Desktop]$ hadoop fs -ls

/user/hive/warehouse/shopping.db/shopping3

Found 8 items
drwxrwxrwx - cloudera supergroup 0 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=__HIVE_DEFAU
LT_PARTITION__
drwxrwxrwx - cloudera supergroup 0 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=apparel
drwxrwxrwx - cloudera supergroup 0 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=bag
drwxrwxrwx - cloudera supergroup 0 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=cosmetic
drwxrwxrwx - cloudera supergroup 0 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=electronic
gadget
drwxrwxrwx - cloudera supergroup 0 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=furnishing
drwxrwxrwx - cloudera supergroup 0 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=toys
drwxrwxrwx - cloudera supergroup 0 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=utensils

10
Q6: Check out the buckets for the partition “utensils”?

[cloudera@quickstart Desktop]$ hadoop fs -ls

/user/hive/warehouse/shopping.db/shopping3/category=utensils

Found 3 items
-rwxrwxrwx 1 cloudera supergroup 0 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=utensils/000
000_0
-rwxrwxrwx 1 cloudera supergroup 13 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=utensils/000
001_0
-rwxrwxrwx 1 cloudera supergroup 32 2018-07-24 10:48
/user/hive/warehouse/shopping.db/shopping3/category=utensils/000
002_0

Q7: Check out the contents of a particular bucket suppose

000001_0

[cloudera@quickstart Desktop]$ hadoop fs -cat

/user/hive/warehouse/shopping.db/shopping3/category=utensils/000
001_0
3,bowl,jammu

Unit-V Pig Programming
No ratings yet
Unit-V Pig Programming
123 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
37 pages
Hadoop - Hive
No ratings yet
Hadoop - Hive
190 pages
MidTerm Exam - Attempt Review
100% (1)
MidTerm Exam - Attempt Review
16 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
IT Infrastructure Evolution
100% (1)
IT Infrastructure Evolution
1 page
Chapter 5 Hive
No ratings yet
Chapter 5 Hive
69 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
Hive Lab
No ratings yet
Hive Lab
33 pages
Hadoop Hive - One
No ratings yet
Hadoop Hive - One
10 pages
Apache Hive DDL DML, Queries
100% (2)
Apache Hive DDL DML, Queries
4 pages
Datatypes in Hive
No ratings yet
Datatypes in Hive
31 pages
E20-007 Data Science and Big Data Analytics (EMCDSA)
100% (3)
E20-007 Data Science and Big Data Analytics (EMCDSA)
3 pages
Twitter Sentimental Analysis
No ratings yet
Twitter Sentimental Analysis
42 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Hadoop Tutorial
No ratings yet
Hadoop Tutorial
13 pages
BITS-Pilani, K.K.Birla Goa Campus
No ratings yet
BITS-Pilani, K.K.Birla Goa Campus
2 pages
Big Data Analytics PDF
No ratings yet
Big Data Analytics PDF
22 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
CSCI312 Big Data Management Singapore 2022-2 Assignment 2: Published On 24 April 2022
No ratings yet
CSCI312 Big Data Management Singapore 2022-2 Assignment 2: Published On 24 April 2022
10 pages
MCA - BigData Notes
No ratings yet
MCA - BigData Notes
136 pages
Lecture 2 - Introduction To Big Data Analytics - 1691894427998
No ratings yet
Lecture 2 - Introduction To Big Data Analytics - 1691894427998
55 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
86 pages
43 PPT On Apache Pig
No ratings yet
43 PPT On Apache Pig
16 pages
Lecture 20-22 (Memory II)
No ratings yet
Lecture 20-22 (Memory II)
56 pages
Big Data Analytics: By: Syed Nawaz Pasha at SR Univeristy Professional Elective-5 B.Tech Iv-Ii Sem
100% (1)
Big Data Analytics: By: Syed Nawaz Pasha at SR Univeristy Professional Elective-5 B.Tech Iv-Ii Sem
31 pages
Writing Custom Parsing Rules in McAfee ESM
No ratings yet
Writing Custom Parsing Rules in McAfee ESM
21 pages
DWDM UNIT-1 Lecture Notes
No ratings yet
DWDM UNIT-1 Lecture Notes
15 pages
18CS72 Module1 Qbank
No ratings yet
18CS72 Module1 Qbank
2 pages
Hive Lecture Notes
100% (1)
Hive Lecture Notes
17 pages
CRM Project-Slide
100% (1)
CRM Project-Slide
18 pages
M Tech 1sem BDA Question Paper With Answers
No ratings yet
M Tech 1sem BDA Question Paper With Answers
98 pages
Hadoop Pig Presentation
No ratings yet
Hadoop Pig Presentation
33 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
Sept '18 - Gerrys Death - QuadrigaCX Chatlogs PDF
No ratings yet
Sept '18 - Gerrys Death - QuadrigaCX Chatlogs PDF
211 pages
Big Data Platforms
No ratings yet
Big Data Platforms
8 pages
Apache HIVE
No ratings yet
Apache HIVE
9 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
Types of Graphs
No ratings yet
Types of Graphs
26 pages
D830 - JM7BMB - A1a - 0628 PDF
0% (1)
D830 - JM7BMB - A1a - 0628 PDF
57 pages
Unit 4 HIVE - PIG
No ratings yet
Unit 4 HIVE - PIG
71 pages
GDPR - Context, Principles, Implementation, Operation, Data Governance, Data Ethics and Impact On Outsourcing
No ratings yet
GDPR - Context, Principles, Implementation, Operation, Data Governance, Data Ethics and Impact On Outsourcing
49 pages
Sample Paper Q0503
No ratings yet
Sample Paper Q0503
20 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
File Hanling - New - C++
No ratings yet
File Hanling - New - C++
26 pages
Inteligen NT BB - NTC BB Datasheet PDF
No ratings yet
Inteligen NT BB - NTC BB Datasheet PDF
4 pages
Open vs. Bracketing Methods
No ratings yet
Open vs. Bracketing Methods
10 pages
Entitlement PDF
No ratings yet
Entitlement PDF
3 pages
Conditionally Formatting Data: Examples and Guidelines: Worksheet Rule Type
No ratings yet
Conditionally Formatting Data: Examples and Guidelines: Worksheet Rule Type
33 pages
Chapter - 1 Introduction
No ratings yet
Chapter - 1 Introduction
22 pages
Pid Feedforward Controller
No ratings yet
Pid Feedforward Controller
6 pages
DSBDa MCQ
No ratings yet
DSBDa MCQ
17 pages
Big Data and Spark Developers
No ratings yet
Big Data and Spark Developers
5 pages
Chapter 10
No ratings yet
Chapter 10
50 pages
Online AI Shopping With Wallet System: 1. Admin Module
No ratings yet
Online AI Shopping With Wallet System: 1. Admin Module
3 pages
MCQ Uml
No ratings yet
MCQ Uml
56 pages
Big Data
No ratings yet
Big Data
3 pages
Use Case Diagram
No ratings yet
Use Case Diagram
25 pages
Big Data Syllabus For Theory and Lab
No ratings yet
Big Data Syllabus For Theory and Lab
4 pages
Question Bank For Object Oriented Analysis Design Regulation 2013
No ratings yet
Question Bank For Object Oriented Analysis Design Regulation 2013
6 pages
Mining Data Streams
No ratings yet
Mining Data Streams
67 pages
Da Notes (Big Data) PDF
No ratings yet
Da Notes (Big Data) PDF
32 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
DME - Payment Config Document
No ratings yet
DME - Payment Config Document
25 pages
Edureka Interview Questions - HDFS
No ratings yet
Edureka Interview Questions - HDFS
4 pages
DFG gTUDENT - MANAGEMENT - SYSTEM - A - PROJEC PDF
No ratings yet
DFG gTUDENT - MANAGEMENT - SYSTEM - A - PROJEC PDF
64 pages
Iimsr Student Management System: Shanwaz Syed Shafi Ur Rahman Shuja Shabbir Shubham Kumar Singh Shivam Chabbra
No ratings yet
Iimsr Student Management System: Shanwaz Syed Shafi Ur Rahman Shuja Shabbir Shubham Kumar Singh Shivam Chabbra
64 pages
Operations Research A Report Submitted For External Assessment On
No ratings yet
Operations Research A Report Submitted For External Assessment On
29 pages
80286
No ratings yet
80286
74 pages
Harry H. Porter Iii Theory of Computation - Chapter 1a Page 1 of 79
No ratings yet
Harry H. Porter Iii Theory of Computation - Chapter 1a Page 1 of 79
79 pages
An Introduction To Coding Theory: Adrish Banerjee
No ratings yet
An Introduction To Coding Theory: Adrish Banerjee
28 pages
Ansible Cheat Sheet
No ratings yet
Ansible Cheat Sheet
8 pages
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
No ratings yet
Big Data: NADC Says: Every Day, We Create 2.5 Quintillion Bytes of Data - So Much That 90% of The Data in The
3 pages
Using iMPACT With FPGA Modules
No ratings yet
Using iMPACT With FPGA Modules
24 pages
Class Management System: Features of The System
No ratings yet
Class Management System: Features of The System
2 pages
4 The Linear Quadratic Regulator: 4.1 Time Varying and Finite Horizon Case
No ratings yet
4 The Linear Quadratic Regulator: 4.1 Time Varying and Finite Horizon Case
12 pages
Visual Studio C# 2019 (Service-Based Database)
No ratings yet
Visual Studio C# 2019 (Service-Based Database)
12 pages
Javascript Programming Bootcamp
No ratings yet
Javascript Programming Bootcamp
11 pages
On The Implementation of Implicit Runge-Kutta Methods: 2s3n3/3 + O (N )
No ratings yet
On The Implementation of Implicit Runge-Kutta Methods: 2s3n3/3 + O (N )
4 pages
Compe 431 Sample Questions
100% (1)
Compe 431 Sample Questions
19 pages
Degrees, Minutes & Seconds PDF
No ratings yet
Degrees, Minutes & Seconds PDF
3 pages
SQL Dba
No ratings yet
SQL Dba
3 pages
OOAd 2 Marks
No ratings yet
OOAd 2 Marks
16 pages
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
No ratings yet
Hidden Patterns, Unknown Correlations, Market Trends, Customer Preferences and Other Useful Information That Can Help Organizations Make More-Informed Business Decisions
4 pages
FACT SHEET: Code of Practice For Telecommunication Service Resiliency 2008 ("Service Resiliency Code")
No ratings yet
FACT SHEET: Code of Practice For Telecommunication Service Resiliency 2008 ("Service Resiliency Code")
3 pages
Standard Calibration Procedure Weighing Scale Doc. No. Call/SCP/019 Rev. 00 May 01, 2015
No ratings yet
Standard Calibration Procedure Weighing Scale Doc. No. Call/SCP/019 Rev. 00 May 01, 2015
4 pages
Cardsmark Card Personalization System
No ratings yet
Cardsmark Card Personalization System
8 pages
Fragebogen Vor Dem Beginn Des Praktikums
No ratings yet
Fragebogen Vor Dem Beginn Des Praktikums
1 page
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Touchpad Prime Ver. 1.2 Class 6
From Everand
Touchpad Prime Ver. 1.2 Class 6
Nisha Batra
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)

Uploaded by

Experiment 3: Hive: Aim: To Understand Data Processing Tool - Hive and HQL (Hive Query Language)

Uploaded by

EXPERIMENT 3: HIVE

 Hive is a Data warehousing tool in Hadoop ecosystem.

[cloudera@quickstart Desktop]$ hive

Q2: Create a database

create database emp_details;

Q3: How to create Managed Table in HIVE?

Q4: How to load the data from LOCAL to HIVE TABLE

Suppose you created a comma separated file in local system named

hive> LOAD DATA LOCAL INPATH

# Note: If 'LOCAL' is omitted then it looks for the file in

Q5: How to check where the managed table is created in hive

[cloudera@quickstart Desktop]$ hadoop fs -ls

Also check the contents inside emp:

[cloudera@quickstart Desktop]$ hadoop fs -ls

Now see the contents inside empdetails.txt

[cloudera@quickstart Desktop]$ hadoop fs -cat

Q6:Check the schema of the created table emp?

For a detailed schema use:

Q8: Select all the enames from emp table

Q9: Get the records where name is 'A'

Q10: Count the total number of records in the created table

Count aggregate function is used count the total number of the

Q11: Group the sum of salaries as per the deptno

Q12: Get the salary of people between 1000 and 2000

Q13: Select the name of employees where job has exactly 5

hive> select ename from emp where job LIKE '_l%';

Q15: Retrieve the total salary for each department

Q16: Add a column to the table

Q17: How to Rename a table

Q19: How to create External Table:

CREATE EXTERNAL TABLE <table_name> (column1 data_type, column2

Eg. I have created a comma separated file in local machine

Then I have copied this file in HDFS

[cloudera@quickstart Desktop]$ hadoop fs -put extdata.txt

Then I copied this file in a directory named hivedata

[cloudera@quickstart Desktop]$ hadoop fs -cp extdata.txt

Then I created a table ext1 and loaded it with the data

hive> create external table ext1(a int, b int, c int)

Now check if table is populated with data

hive> select * from ext1;

[cloudera@quickstart Desktop]$ hadoop fs -ls

[cloudera@quickstart Desktop]$ hadoop fs -ls hivedata

Q1: Create a database called movies

Q2: Work with database movies

Q3: create a table movies_details inside movies database

Q5: Check the table created inside database.

Q6: Retrieve all the records in movies_details?

Q7: Print all movies between year 1920 and 1990

hive> select * from movie_details where year between 1920 and

hive> select * from movie_details where name LIKE 'C%' or name

hive> select * from movie_details where name LIKE 'The%';

Q11: count the number of records

Q12: select rating of the movie School Ties

hive> SELECT name,rating FROM movie_details WHERE name = 'School

hive> select year, sum(views) from movie_details group by year

PARTITIONING AND BUCKETING

Q1: Explain the concept of Partitioning and bucketing?

Assume that you are storing information of people in entire

Bucketing decomposes data into more manageable or equal parts.

With partitioning, there is a possibility that you can create

Q2: create a database shopping

Q3: create table (shopping1) inside the database shopping

Q4: Load the data in HIVE table from local

Suppose you have a file shop.txt (see the contents below) in

hive> LOAD DATA LOCAL INPATH '/home/cloudera/Desktop/shop.txt'

Q5: create a partition (shopping3) for table shopping1 and also

Q4: Populate the partition with data

Q5: Check your partition

[cloudera@quickstart Desktop]$ hadoop fs -ls

[cloudera@quickstart Desktop]$ hadoop fs -ls

Q7: Check out the contents of a particular bucket suppose

[cloudera@quickstart Desktop]$ hadoop fs -cat

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.