0% found this document useful (0 votes)
15 views17 pages

Final Bda 1-8 Lab Aayush

parul university 7 sem labmanual

Uploaded by

aayushvaghela12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views17 pages

Final Bda 1-8 Lab Aayush

parul university 7 sem labmanual

Uploaded by

aayushvaghela12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

PRACTICAL 5

Aim: To study Basic commands available for the Hadoop Distributed File System

HDFS Commands
HDFS is the primary or major component of the Hadoop ecosystem which is responsible for
storing large data sets of structured or unstructured data across various nodes and thereby
maintaining the metadata in the form of log files. To use the HDFS commands, first you need
to start the Hadoop services using the following command:

start-all.sh
stop-all.sh
hadoop version
The Hadoop fs shell command version prints the Hadoop version.

Jps
To check the Hadoop services are up and running use the following command:

ls: This command is used to list all the files.

Name:-Aayush Vaghela 15
En. No:-2203051057146
hadoop fs -ls
It will print all the directories present in HDFS. bin directory contains executables so,

mkdir:
To create a directory. In Hadoop dfs there is no home directory by default. So let’s first create
it.
hadoop dfs -mkdir bdalab
vi lab.txt
cat lab.txt
creating local file and viewing the content.
put
To copy files/folders from local file system to hdfs store. This is the most important command.
Local filesystem means the files present on the OS.
syntax
haoop fs -put <localsrc> <dest>

http://localhost:50070/
Name:-Aayush Vaghela 16
En. No:-2203051057146
to check the file copied to Hadoop file system or not in the graphical user interface.

copyToLocal (or) get: To copy files/folders from hdfs store to local file system.
Name:-Aayush Vaghela 17
En. No:-2203051057146
Syntax:

Hadoop fs -get <<srcfile(on hdfs)> <local file dest>


Example:

moveFromLocal: This command will move file from local to hdfs.


Syntax:

Hadoop fs -moveFromLocal <local src> <dest(on hdfs)>


Example:
hadoop fs -moveFromLocal /home/user/Desktop/test/t.txt /karthi

cp: This command is used to copy files within hdfs. Lets copy folder geeks to geeks_copied.
Syntax:

Hadoop -fs -cp <src(on hdfs)> <dest(on hdfs)>


Example:

Name:-Aayush Vaghela 18
En. No:-2203051057146
mv: This command is used to move files within hdfs.
Syntax:

Hadoop fs -mv <src(on hdfs)> <src(on hdfs)>


Example:

rm: This command deletes a file from HDFS.

Syntax:

Hadoop fs -rm <filename/directoryName>


Example:

Hadoop fs -rmr /directory -> It will delete all the content inside the directory then the
directory itself.

du: It will give the size of each file in directory.


Syntax:
Hadoop fs -du <dirName>
Example:

dus:: This command will give the total size of directory/file.


Syntax:
Name:-Aayush Vaghela 19
En. No:-2203051057146
Hadoop fs -dus <dirName>
Example:

stat: It will give the last modified time of directory or path. In short it will give stats of the
directory or file.
Syntax:

Hadoop fs -stat <hdfs file>


Example:

setrep: This command is used to change the replication factor of a file/directory in HDFS. By
default, it is 3 for anything which is stored in HDFS (as set in hdfs core-site.xml).
Example 1: To change the replication factor to 6 for geeks.txt stored in HDFS.
Hadoop fs -setrep -R -w 6 test

Note: -R means recursively, we use it for directories as they may also contain many files and
folders inside them.

test
The test command is used for file test operations.
Options Description
Check whether the path given by the user is a directory or not, return 0 if it is a
-d
directory.
-e Check whether the path given by the user exists or not, return 0 if the path exists.
-f Check whether the path given by the user is a file or not, return 0 if it is a file.
-s Check if the path is not empty, return 0 if a path is not empty.
-r return 0 if the path exists and read permission is granted
-w return 0 if the path exists and write permission is granted

Name:-Aayush Vaghela 20
En. No:-2203051057146
-z Checks whether the file size is 0 byte or not, return 0 if the file is of 0 bytes.

Example

getmerge
getmerge command merges a list of files in a directory on the HDFS filesystem into a single
local file on the local filesystem.
Example

stat prints the statistics about the file or directory in the specified format.

Formats:

%b – file size in bytes


%g – group name of owner
%n – file name
%o – block size
%r – replication
%u – user name of owner
%y – modification date

Example

Name:-Aayush Vaghela 21
En. No:-2203051057146
Name:-Aayush Vaghela 22
En. No:-2203051057146
PRACTICAL 6
Aim: To study basic commands available for HIVE Query Language.

Description:
Apache Hive is an open-source data warehousing tool for performing distributed processing
and data analysis. It was developed by Facebook to reduce the work of writing the Java
MapReduce program. Apache Hive uses a Hive Query language, which is a declarative
language similar to SQL. Hive translates the hive queries into MapReduce programs. It
supports developers to perform processing and analyses on structured and semi-structured data
by replacing complex java MapReduce programs with hive queries. One who is familiar with
SQL commands can easily write the hive queries.

Hive supports applications written in any language like Python, Java, C++, Ruby, etc. using
JDBC, ODBC, and Thrift drivers, for performing queries on the Hive. Hence, one can easily
write a hive client application in any language of its own choice.
Hive clients are categorized into three types:
1. Thrift Clients
The Hive server is based on Apache Thrift so that it can serve the request from a thrift client.
2. JDBC client
Hive allows for the Java applications to connect to it using the JDBC driver. JDBC driver uses
Thrift to communicate with the Hive Server.
3. ODBC client
Hive ODBC driver allows applications based on the ODBC protocol to connect to Hive.
Similar to the JDBC driver, the ODBC driver uses Thrift to communicate with the Hive Server.

Name:-Aayush Vaghela 23
En. No:-2203051057146
Hive - Create Database
In Hive, the database is considered as a catalog or namespace of tables. So, we can maintain
multiple tables within a database where a unique name is assigned to each table. Hive also
provides a default database with a name default.

Initially, we check the default database provided by Hive. So, to check the list of existing
databases, follow the below command: -
hive> show databases;

hive> create database demo;


hive> show databases;
hive> describe database extended demo;

Hive - Create Table


In Hive, we can create a table by using the conventions similar to the SQL. It supports a wide
range of flexibility where the data files for tables are stored. It provides two types of table: -

Internal table
The internal tables are also called managed tables as the lifecycle of their data is controlled by
the Hive. By default, these tables are stored in a subdirectory under the directory defined by
hive.metastore.warehouse.dir (i.e. /user/hive/warehouse). The internal tables are not flexible
enough to share with other tools like Pig. If we try to drop the internal table, Hive deletes both
table schema and data
hive> create table demo.employee (Id int, Name string , Salary float)
row format delimited
fields terminated by ',' ;

External Table
The external table allows us to create and access a table and a data externally. The external
keyword is used to specify the external table, whereas the location keyword is used to
determine the location of loaded data. As the table is external, the data is not present in the
Hive directory. Therefore, if we try to drop the table, the metadata of the table will be deleted,
but the data still exists.

Let's create a directory on HDFS by using the following command: -


hadoop dfs -mkdir /HiveDirectory
Now, store the file on the created directory.
Hadoop dfs -put hive/emp_details /HiveDirectory

hive> create external table emplist (Id int, Name string , Salary float)
row format delimited
fields terminated by ','
location '/HiveDirectory';

select * from emplist;

Name:-Aayush Vaghela 24
En. No:-2203051057146
Hive - Load Data
Once the internal table has been created, the next step is to load the data into it. So, in Hive, we
can easily load data from any file to the database.

load data local inpath '/home/codegyani/hive/emp_details' into table demo.employee;

select * from demo.employee;

Hive - Drop Table


Hive facilitates us to drop a table by using the SQL drop table command. Let's follow the
below steps to drop the table from the database.
show databases;
use demo;
show tables;
drop table new_employee;
Alter table emp rename to employee_data;

Name:-Aayush Vaghela 25
En. No:-2203051057146
PRACTICAL 7

Aim: Basic commands of HBASE Shell

Description:
HBase is a distributed column-oriented database built on top of the Hadoop file system. It is an
open-source project and is horizontally scalable. HBase is a data model that is similar to
Google’s big table designed to provide quick random access to huge amounts of structured
data. It leverages the fault tolerance provided by the Hadoop File System (HDFS).It is a part of
the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop
File System. One can store the data in HDFS either directly or through HBase. Data consumer
reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File
System and provides read and write access.
Data Definition Language :

1. create

create 'emp', 'personal data', 'professional data'

2.list

list

3.disable

disable 'emp'

4.is_disabled

is_disabled 'emp'

5.enable

enable 'emp'

6.is_enabled

is_enabled 'emp'

7.describe

describe 'emp'

8.drop

Name:-Aayush Vaghela 26
En. No:-2203051057146
drop 'emp'

Data Manipulation Language :

9.put :

put 'emp','1','personal data:name','raju'


put 'emp','1','personal data:city','hyderabad'
put 'emp','1','professional data:designation','manager'
put 'emp','1','professional data:salary','50000'
put 'emp','1','professional data:vechiv','50000'
put 'emp','2','personal data:name','sathish'
put 'emp','2','personal data:city','bangalore'
put 'emp','2','professional data:designation','professor'
put 'emp','2','professional data:salary','60000'
put 'emp','3','personal data:name','muthu'
put 'emp','3','personal data:city','chennai'
put 'emp','3','professional data:designation','analyst'
put 'emp','3','professional data:salary','20000'

10.get

get 'emp', '1'

11.delete

delete 'emp', '1', 'personal data:city',1417521848375

12.deleteall

deleteall 'emp','1'

13.scan

scan 'emp'

14.count

count 'emp'

15.truncate

truncate 'emp'

Name:-Aayush Vaghela 27
En. No:-2203051057146
PRACTICAL 8
Aim: Creating the HDFS tables and loading them in Hive and learn join, partition of
tables in Hive.

Description:
Partitions
Each table can be broken into partitions, Partitions determine distribution of data within
subdirectories. In the current century, we know that the huge amount of data which is in the
range of petabytes is getting stored in HDFS. So due to this, it becomes very difficult for
Hadoop users to query this huge amount of data.
The Hive was introduced to lower down this burden of data querying. Apache Hive converts
the SQL queries into MapReduce jobs and then submits it to the Hadoop cluster. When we
submit a SQL query, Hive read the entire data-set. So, it becomes inefficient to run MapReduce
jobs over a large table. Thus this is resolved by creating partitions in tables. Apache Hive
makes this job of implementing partitions very easy by creating partitions by its automatic
partition scheme at the time of table creation.
In Partitioning method, all the table data is divided into multiple partitions. Each partition
corresponds to a specific value(s) of partition column(s). It is kept as a sub-record inside the
table’s record present in the HDFS. Therefore on querying a particular table, appropriate
partition of the table is queried which contains the query value. Thus this decreases the I/O time
required by the query. Hence increases the performance speed.

Static partitions
Insert input data files individually into a partition table is Static Partition. Usually when loading
files (big files) into Hive tables static partitions are preferred. Static Partition saves your time in

Name:-Aayush Vaghela 28
En. No:-2203051057146
loading data compared to dynamic partition. You “statically” add a partition in the table and
move the file into the partition of the table. We can alter the partition in the static partition. You
can get the partition column value from the filename, day of date etc without reading the whole
big file. If you want to use the Static partition in the hive you should set property set
hive.mapred.mode = strict This property set by default in hive-site.xml.Static partition is in
Strict Mode. You should use where clause to use limit in the static partition. You can perform
Static partition on Hive Manage table or external table.
Dynamic partitions
Single insert to partition table is known as a dynamic partition. Usually, dynamic partition
loads the data from the non-partitioned table. Dynamic Partition takes more time in loading
data compared to static partition. When you have large data stored in a table then the Dynamic
partition is suitable. If you want to partition a number of columns but you don’t know how
many columns then also dynamic partition is suitable. Dynamic partition there is no required
where clause to use limit. We can’t perform alter on the Dynamic partition. You can perform
dynamic partition on hive external table and managed table. If you want to use the Dynamic
partition in the hive then the mode is in non-strict mode.Here are Hive dynamic partition
properties you should allow

1 create database test;

use test;
drop database test
show tables;
drop table student;
show databases;

2 create table student(name string,rollno int,percentage float)partitioned by(state string,city


string)row format delimited fields terminated by ',';

3 load data local inpath '/home/training/Desktop/maharastra'


into table student partition(state='maharastra',city='mumbai');

4 load data local inpath '/home/training/Desktop/karnataka'


into table student partition(state='karnataka',city='bangalore');

5select * from student;

6 select * from student where state='maharastra';

Dynamic partitioning
Note: By default dynamic partioning will be disabled. We need to enable it using the followng
command:
7. set hive.exec.dynamic.partition=true;
8. set hive.exec.dynamic.partition.mode=nonstrict;
9. create table stu(name string, rollno int, percentage float, state string, city string) row format
Name:-Aayush Vaghela 29
En. No:-2203051057146
delimited fields terminated by ',';

10. load data local inpath '/home/training/Desktop/Result1' into table stu;

11. create table stud_part (name string, rollno int, percentage float)
partitioned by (state string, city string)
row format delimited
fields terminated by ',';

12. insert overwrite table stud_part


partition (state, city)
select name,rollno, percentage
,state,
city
from stu;

13. select * from stud_part where city='bangalore';

Karnataka.txt
Rajesh,100,78
Abhishek,95,76
Manish,102,89
siva,203,66
sania,204,77
Maharastra.txt
ravi,100,56
mohan,95,89
mahesh,102,67
janvi,103,66

Hive Join
Let's see two tables Employee and Employee Department that are going to be joined.

Employee department table hive DML operation


Inner joins

Select * from employee join employeedepartment ON


(employee.empid=employeedepartment.empId)

Name:-Aayush Vaghela 30
En. No:-2203051057146
Next →← Prev
Hive Join
Let's see two tables Employee and EmployeeDepartment that are going to be joined.

Employee department table hive DML operation


Inner joins

Select * from employee join employeedepartment ON


(employee.empid=employeedepartment.empId)

Left outer joins


Select e.empId, empName, department from employee e Left outer join employeedepartment e
d on(e.empId=ed.empId);
Right outer joins
Select e.empId, empName, department from employee e Right outer join employeedepartment
ed on(e.empId=ed.empId);

Name:-Aayush Vaghela 31
En. No:-2203051057146

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy