0% found this document useful (0 votes)

11 views11 pages

B22 BDA Experiment 03

Uploaded by

stupsundar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views11 pages

B22 BDA Experiment 03

Uploaded by

stupsundar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

VIKAS RAMPRAKASH CHAURASIYA TU3F2021091

ROLL NO:B22 BDA_EXP_03

LAB MANUAL
PART A
(PART A : TO BE REFFERED BY STUDENTS)

Experiment No-03

A.1 Aim:

To install Sqoop and execute basic commands of Hadoop ecosystem component Sqoop

A-2 Prerequisite
Knowledge Java, Phython and VMware software pack.

A.3 OutCome
Students will able to To acquire fundamental enabling technique and scalable
algorithms like Hadoop, Map Reduce and NO SQL in big data analytics.

A.4 Theory:
Introduction
Generally, applications interact with the relational database using RDBMS, and thus this
makes relational databases one of the most important sources that generate Big Data. Such
data is stored in RDB Servers in the relational structure. Here, Apache Sqoop plays an
important role in Hadoop ecosystem, providing feasible interaction between the relational
database server and HDFS.

So, Apache Sqoop is a tool in Hadoop ecosystem which is designed to transfer data
between HDFS (Hadoop storage) and relational database servers like MySQL, Oracle RDB,
SQLite, Teradata, Netezza, Postgres etc. Apache Sqoop imports data from relational
databases to HDFS, and exports data from HDFS to relational databases. It efficiently
transfers bulk data between Hadoop and external data stores such as enterprise data
warehouses, relational databases, etc.

This is how Sqoop got its name – “SQL to Hadoop & Hadoop to SQL”.

Additionally, Sqoop is used to import data from external datastores into Hadoop ecosystem’s
tools like Hive & HBase.

Key Features of Sqoop

Sqoop provides many salient features like:

1. Full Load: Apache Sqoop can load the whole table by a single command. You can
also load all the tables from a database using a single command.
2. Incremental Load: Apache Sqoop also provides the facility of incremental load
where you can load parts of table whenever it is updated.
3. Parallel import/export: Sqoop uses YARN framework to import and export the data,
which provides fault tolerance on top of parallelism.
VIKAS RAMPRAKASH CHAURASIYA TU3F2021091
ROLL NO:B22 BDA_EXP_03

4. Import results of SQL query: You can also import the result returned from an SQL
query in HDFS.
5. Compression: You can compress your data by using deflate(gzip) algorithm with –
compress argument, or by specifying –compression-codec argument. You can also
load compressed table in Apache Hive.
6. Connectors for all major RDBMS Databases: Apache Sqoop provides connectors for
multiple RDBMS databases, covering almost the entire circumference.
7. Kerberos Security Integration: Kerberos is a computer network authentication
protocol which works on the basis of ‘tickets’ to allow nodes communicating over a
non-secure network to prove their identity to one another in a secure manner. Sqoop
supports Kerberos authentication.
8. Load data directly into HIVE/HBase: You can load data directly into Apache
Hive for analysis and also dump your data in HBase, which is a NoSQL database.
9. Support for Accumulo: You can also instruct Sqoop to import the table in Accumulo
rather than a directory in HDFS.

The architecture is one which is empowering Apache Sqoop with these benefits. Now, as we
know the features of Apache Sqoop, let’s move ahead and understand Apache Sqoop’s
architecture & working.

Apache Sqoop Tutorial: Sqoop Architecture & Working

Let us understand how Apache Sqoop works using the below diagram:

T
he import tool imports individual tables from RDBMS to HDFS. Each row in a table is
treated as a record in HDFS.

When we submit Sqoop command, our main task gets divided into subtasks which is handled
by individual Map Task internally. Map Task is the subtask, which imports part of data to the
Hadoop Ecosystem. Collectively, all Map tasks import the whole data.
VIKAS RAMPRAKASH CHAURASIYA TU3F2021091
ROLL NO:B22 BDA_EXP_03

Export also
works in a similar manner.

The export tool exports a set of files from HDFS back to an RDBMS. The files given as input
to Sqoop contain records, which are called as rows in the table.

When we submit our Job, it is mapped into Map Tasks which brings the chunk of data from
HDFS. These chunks are exported to a structured data destination. Combining all these
exported chunks of data, we receive the whole data at the destination, which in most of the
cases is an RDBMS (MYSQL/Oracle/SQL Server).

Reduce phase is required in case of aggregations. But, Apache Sqoop just imports and
exports the data; it does not perform any aggregations. Map job launch multiple mappers
depending on the number defined by the user. For Sqoop import, each mapper task will be
assigned with a part of data to be imported. Sqoop distributes the input data among the
mappers equally to get high performance. Then each mapper creates a connection with the
database using JDBC and fetches the part of data assigned by Sqoop and writes it into HDFS
or Hive or HBase based on the arguments provided in the CLI.

Now that we understand the architecture and working of Apache Sqoop, let’s understand the
difference between Apache Flume and Apache Sqoop.

Sqoop Commands
 Sqoop – IMPORT Command

Import command is used to importing a table from relational databases to HDFS. In our case,
we are going to import tables from MySQL databases to HDFS.

As you can see in the below image, we have employees table in the employees database
which we will be importing into HDFS.
VIKAS RAMPRAKASH CHAURASIYA TU3F2021091
ROLL NO:B22 BDA_EXP_03

The command for importing table is:

1 sqoop import --connect jdbc:mysql://localhost/employees --username edureka --table employe

As you can see in the below image, after executing this command Map tasks will be
executed at the back end.
VIKAS RAMPRAKASH CHAURASIYA TU3F2021091
ROLL NO:B22 BDA_EXP_03

After the code is executed, you can check the Web UI of HDFS i.e. localhost:50070 where
the data is imported.
VIKAS RAMPRAKASH CHAURASIYA TU3F2021091
ROLL NO:B22 BDA_EXP_03

 Sqoop – IMPORT Command with target directory

You can also import the table in a specific directory in HDFS using the below command:

1 sqoop import --connect jdbc:mysql://localhost/employees --username edureka --tabl

Sqoop imports data in parallel from most database sources. -m property is used to specify the
number of mappers to be executed.

Sqoop imports data in parallel from most database sources. You can specify the number of
map tasks (parallel processes) to use to perform the import by using the -m or –num-
mappers argument. Each of these arguments takes an integer value which corresponds to the
degree of parallelism to employ.

You can control the number of mappers independently from the number of files present in the
directory. Export performance depends on the degree of parallelism. By default, Sqoop will
use four tasks in parallel for the export process. This may not be optimal, you will need to
experiment with your own particular setup. Additional tasks may offer better concurrency,
but if the database is already bottlenecked on updating indices, invoking triggers, and so on,
then additional load may decrease performance.
VIKAS RAMPRAKASH CHAURASIYA TU3F2021091
ROLL NO:B22 BDA_EXP_03

You can see in the below image, that the number of mapper task is 1.

The number of files that are created while importing MySQL tables is equal to the number of
mapper created.

 Sqoop – IMPORT Command with Where Clause

You can import a subset of a table using the ‘where’ clause in Sqoop import tool. It executes
the corresponding SQL query in the respective database server and stores the result in a target
directory in HDFS. You can use the following command to import data with ‘where‘ clause:

1 sqoop import --connect jdbc:mysql://localhost/employees --username edureka --table employees --m

VIKAS RAMPRAKASH CHAURASIYA TU3F2021091
ROLL NO:B22 BDA_EXP_03

PART B
(PART B: TO BE COMPLETED BY STUDENTS)

(Students must submit the soft copy as per following segments within two hours of the
practical. The soft copy must be uploaded on the Blackboard or emailed to the concerned
lab in charge faculties at the end of the practical in case the there is no Black board access
available)

Roll. No:B22 Name:Vikas Ramprakash Chaurasiya

Class:BE B Comps Batch:B2
Date of Experiment: Date of Submission:
Grade:

B.1. Explain how to install Sqoop??:

 Copy the Sqoop artifact to the machine where you want to run Sqoop server. The Sqoop
server acts as a Hadoop client, therefore Hadoop libraries (Yarn, Mapreduce, and HDFS
jar files) and configuration files ( core-site.xml , mapreduce-site.xml , ...) must be available on this
node. You do not need to run any Hadoop related services - running the server on a
“gateway” node is perfectly fine.

 You should be able to list a HDFS for example:

hadoop dfs -ls

 Sqoop currently supports Hadoop version 2.6.0 or later. To install the Sqoop server,
decompress the tarball (in a location of your choosing) and set the newly created forder
as your working directory.

# Decompress Sqoop distribution tarball

tar -xvf sqoop-<version>-bin-hadoop<hadoop-
version>.tar.gz

# Move decompressed content to any location

mv sqoop-<version>-bin-hadoop<hadoop
version>.tar.gz /usr/lib/sqoop

# Change working directory

cd /usr/lib/sqoop
VIKAS RAMPRAKASH CHAURASIYA TU3F2021091
ROLL NO:B22 BDA_EXP_03

B.2 Input and Output:

VIKAS RAMPRAKASH CHAURASIYA TU3F2021091
ROLL NO:B22 BDA_EXP_03

B.3 Observations and learning:

 Sqoop is used to import data from external datastores into Hadoop
ecosystem’s tools like Hive & HBase.
 Where we can access it using SQL like queries.

B.4 Conclusion:
 We have executed basic commands of hadoop ecosystem sqoop.
 Able to To acquire fundamental enabling technique and scalable algorithms
like Hadoop, Map Reduce and NO SQL in big data analytics.

B.5 Question of Curiosity

(To be answered by student based on the practical performed and learning/observations)

Q1: What is the default file format to import data using Apache Sqoop?
 Delimited Text File Format

Q2. How will you list all the columns of a table using Apache Sqoop?
 Select * from Table_name;
VIKAS RAMPRAKASH CHAURASIYA TU3F2021091
ROLL NO:B22 BDA_EXP_03

Q3. Name a few import control commands. How can Sqoop handle large
objects?
 Import, Import-all-tables are the import control commands.
 In Sqoop, large objects are managed by importing them into a file known
as "LobFile" which is short for a Large Object File. These LobFiles have
the capability to store large sized data records.

Q4. How can we import data from particular row or column? What is the

destination types allowed in Sqoop import command?

 Importing data from particular row or column:
 Syntax:
–columns <col1,col2……> –where –query
 Example:
sqoop import –connect jdbc:mysql://db.one.com/corp –table
INTELLIPAAT_EMP –where “start_date> ’2016-07-20’ ” sqoopeval –
connect jdbc:mysql://db.test.com/corp –query “SELECT * FROM
intellipaat_emp LIMIT 20” sqoop import –connect
jdbc:mysql://localhost/database –username root –password aaaaa –
columns “name,emp_id,jobtitle”
 HDFS is the destination for importing data.

Q5. What is the process to perform an incremental data load in Sqoop?

 The process to perform incremental data load in Sqoop is to synchronize
the modified or updated data (often referred as delta data) from RDBMS
to Hadoop.

Iec 61508 2 2010
No ratings yet
Iec 61508 2 2010
192 pages
32 BDA Exp2
No ratings yet
32 BDA Exp2
24 pages
SIC Big Data Chapter 3 Workbook
No ratings yet
SIC Big Data Chapter 3 Workbook
86 pages
The Elements of A Database
No ratings yet
The Elements of A Database
11 pages
QSAN Compatibility Matrix XN 2108 en
No ratings yet
QSAN Compatibility Matrix XN 2108 en
90 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
28 pages
Bda U3
No ratings yet
Bda U3
59 pages
Linux Term Questions
No ratings yet
Linux Term Questions
31 pages
BCA Project Phase 2
No ratings yet
BCA Project Phase 2
5 pages
Unit 3 Topic 8 Flume and Scoop
No ratings yet
Unit 3 Topic 8 Flume and Scoop
35 pages
BigData - Sem 4 - Elective 1 - Module 2 - PPT
No ratings yet
BigData - Sem 4 - Elective 1 - Module 2 - PPT
29 pages
Es Scribd Com Soporte Vital Basico Libro Del Proveedor AHA 2020
No ratings yet
Es Scribd Com Soporte Vital Basico Libro Del Proveedor AHA 2020
160 pages
Documentum Server 7.1 Installation Guide
No ratings yet
Documentum Server 7.1 Installation Guide
145 pages
Setup: Installation
No ratings yet
Setup: Installation
17 pages
Unit 4 3 Lumify, Data Rapper and Sqooop
No ratings yet
Unit 4 3 Lumify, Data Rapper and Sqooop
27 pages
04 Sqoop
No ratings yet
04 Sqoop
30 pages
U Iv Sqoop 1
No ratings yet
U Iv Sqoop 1
20 pages
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
No ratings yet
Lesson 3 - Data - Ingestion - Into - Big - Data - Systems - and - ETL
104 pages
Module 5 - Sqoop
No ratings yet
Module 5 - Sqoop
25 pages
Lab Experiments 1,2&4
No ratings yet
Lab Experiments 1,2&4
8 pages
Data Warehousing AND Data Mining: S. Sudarshan Krithi Ramamritham
No ratings yet
Data Warehousing AND Data Mining: S. Sudarshan Krithi Ramamritham
169 pages
Non - Authoritative Applications - 1
No ratings yet
Non - Authoritative Applications - 1
33 pages
PPT3-W3-Big Data Foundation
No ratings yet
PPT3-W3-Big Data Foundation
63 pages
Unit 6
No ratings yet
Unit 6
26 pages
Unit 3 Apache Sqoop and Drill
No ratings yet
Unit 3 Apache Sqoop and Drill
10 pages
Scoop Intro
No ratings yet
Scoop Intro
9 pages
6.moving Data Into Hadoop
No ratings yet
6.moving Data Into Hadoop
18 pages
160 P16cse5a-P16ite3a 2020052411232116
No ratings yet
160 P16cse5a-P16ite3a 2020052411232116
13 pages
Sqoop
No ratings yet
Sqoop
28 pages
(PRIORITY) A View From The Mountains
No ratings yet
(PRIORITY) A View From The Mountains
31 pages
Sqoop
No ratings yet
Sqoop
9 pages
Zep Sqoop Big Data Interview Questions
No ratings yet
Zep Sqoop Big Data Interview Questions
25 pages
Sqoop 2
No ratings yet
Sqoop 2
10 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
90 pages
Bda 11
No ratings yet
Bda 11
10 pages
Sample Project Report
No ratings yet
Sample Project Report
19 pages
SQOOP
No ratings yet
SQOOP
8 pages
Apache Sqoop Data Transfer Between Hadoop and RDBMS
No ratings yet
Apache Sqoop Data Transfer Between Hadoop and RDBMS
9 pages
Fundamentals of Apache Sqoop Notes
No ratings yet
Fundamentals of Apache Sqoop Notes
66 pages
Big Data: Sqoop
No ratings yet
Big Data: Sqoop
43 pages
Bda Exp8 Chinmay
No ratings yet
Bda Exp8 Chinmay
6 pages
SQOOP
No ratings yet
SQOOP
6 pages
DSCI 5350 - Lecture 3 PDF
No ratings yet
DSCI 5350 - Lecture 3 PDF
39 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
Experiment-5 (Case Study On Sqoop)
No ratings yet
Experiment-5 (Case Study On Sqoop)
5 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
6 pages
M - M - Num-Mappers
No ratings yet
M - M - Num-Mappers
4 pages
Scoop PPT
No ratings yet
Scoop PPT
3 pages
Module 2
No ratings yet
Module 2
27 pages
BDA Lab2
No ratings yet
BDA Lab2
8 pages
SqoopTutorial Ver 2.0
No ratings yet
SqoopTutorial Ver 2.0
51 pages
BigData Module 2
No ratings yet
BigData Module 2
18 pages
Apache Sqoop: Vasanth B 2019202060
No ratings yet
Apache Sqoop: Vasanth B 2019202060
10 pages
DMBD MBAA21041 Sqoop
No ratings yet
DMBD MBAA21041 Sqoop
11 pages
Sqoopintro
No ratings yet
Sqoopintro
2 pages
Install Apache OpenMeetings On CentOS 6
No ratings yet
Install Apache OpenMeetings On CentOS 6
8 pages
Essential Hadoop Tools: Module - 2 Session - 2
No ratings yet
Essential Hadoop Tools: Module - 2 Session - 2
6 pages
5 - Big - Data Vivek
No ratings yet
5 - Big - Data Vivek
4 pages
Sqoop Students Datadotz
No ratings yet
Sqoop Students Datadotz
19 pages
Database Programming With SQL Section 13 Quiz
No ratings yet
Database Programming With SQL Section 13 Quiz
37 pages
Flashback Technologies
No ratings yet
Flashback Technologies
3 pages
INT367
No ratings yet
INT367
3 pages
Database
No ratings yet
Database
4 pages
Senior Data Engineer - Soft2bet
No ratings yet
Senior Data Engineer - Soft2bet
2 pages
CH 04 I C
No ratings yet
CH 04 I C
93 pages
Harish Mudedla
No ratings yet
Harish Mudedla
5 pages
Cloudera Academic Partnership 8 PDF
No ratings yet
Cloudera Academic Partnership 8 PDF
69 pages
Practice Assignment
No ratings yet
Practice Assignment
4 pages
Sqoop
No ratings yet
Sqoop
4 pages
15CS82 Module 2
No ratings yet
15CS82 Module 2
12 pages
Different Types of Computer Storage Devices
25% (4)
Different Types of Computer Storage Devices
4 pages
EXSORT - FWRITE - FAILED With - No Space Left On Device
No ratings yet
EXSORT - FWRITE - FAILED With - No Space Left On Device
4 pages
Chapter n3 Sqoop
No ratings yet
Chapter n3 Sqoop
24 pages
Apache Sqoop: Hanoi - Autumn 2019
No ratings yet
Apache Sqoop: Hanoi - Autumn 2019
18 pages
Practice Assignment
No ratings yet
Practice Assignment
3 pages
Sqoop - A Haddop Technology: Srikalahasti
No ratings yet
Sqoop - A Haddop Technology: Srikalahasti
13 pages
Sqoop Performance Tuning Guidelines
No ratings yet
Sqoop Performance Tuning Guidelines
8 pages
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
No ratings yet
Knowledge About Apache Sqoop and Its All Basic Commands To Import and Export The Data
7 pages
Decode and Case
No ratings yet
Decode and Case
13 pages
Sqoop Interview Questions
No ratings yet
Sqoop Interview Questions
6 pages
Sqoop v1.1
No ratings yet
Sqoop v1.1
18 pages
Business Objects Step by Step Tutorial
No ratings yet
Business Objects Step by Step Tutorial
44 pages
Module 1 Create New Database
No ratings yet
Module 1 Create New Database
8 pages
Apache Sqoop
No ratings yet
Apache Sqoop
21 pages
BD Sqltohadoop3 PDF
No ratings yet
BD Sqltohadoop3 PDF
13 pages
A B C D E F: Producers and Consumers Problem
No ratings yet
A B C D E F: Producers and Consumers Problem
4 pages
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
No ratings yet
How Sqoop Works?: Relationaldatabase Servers in The Relational Database Structure
7 pages
Ics 2404 Advanced Database Management Systems
No ratings yet
Ics 2404 Advanced Database Management Systems
2 pages
Backup Optimization in RMAN Backup
No ratings yet
Backup Optimization in RMAN Backup
2 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

B22 BDA Experiment 03

Uploaded by

B22 BDA Experiment 03

Uploaded by

VIKAS RAMPRAKASH CHAURASIYA TU3F2021091

ROLL NO:B22 BDA_EXP_03

Key Features of Sqoop

Apache Sqoop Tutorial: Sqoop Architecture & Working

The command for importing table is:

1 sqoop import --connect jdbc:mysql://localhost/employees --username edureka --table employe

 Sqoop – IMPORT Command with target directory

1 sqoop import --connect jdbc:mysql://localhost/employees --username edureka --tabl

 Sqoop – IMPORT Command with Where Clause

1 sqoop import --connect jdbc:mysql://localhost/employees --username edureka --table employees --m

Roll. No:B22 Name:Vikas Ramprakash Chaurasiya

B.1. Explain how to install Sqoop??:

 You should be able to list a HDFS for example:

hadoop dfs -ls

# Decompress Sqoop distribution tarball

# Move decompressed content to any location

# Change working directory

B.2 Input and Output:

B.3 Observations and learning:

B.5 Question of Curiosity

destination types allowed in Sqoop import command?

Q5. What is the process to perform an incremental data load in Sqoop?

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.