0% found this document useful (0 votes)

80 views18 pages

Hadoop File Complte

The document provides details about installing and configuring Hadoop and its components. It describes: 1) Installing Java, downloading Hadoop, extracting the files, and configuring environment variables. 2) Configuring core Hadoop files like core-site.xml and hdfs-site.xml to set properties like the namenode address and replication factor. 3) Formatting the namenode, starting Hadoop processes, and verifying the installation using the HDFS health check URL. The document also outlines common HDFS commands like ls, mkdir, copyFromLocal, and mv and provides examples of their usage.

Uploaded by

rashant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

80 views18 pages

Hadoop File Complte

Uploaded by

rashant

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as ODT, PDF, TXT or read online on Scribd

You are on page 1/ 18

NATIONAL INSTITUTE OF TECHNOLOGY DELHI

COMPUTER SCIENCE AND ENGINEERING

Lab Report
of
BIG DATA ANALYTICS

Submitted by:
MD. khalid
192211009
M.Tech-CSE(Analytics), Ist-Semester
INDEX

1. Introduction to Hadoop and it's Installation

2. Hadoop Installation(Edit Configuration Files)

3. Introduction to HDFS and it's Commands

4. MongoDB Installation and Commands

5. Hive Installation and Commands

6. Pig Installation and Commands

7. MapReduce
Experiment 1

Lab-1

Hadoop Installation
Introduction
Hadoop is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving
massive amounts of data and computation.
Components of hadoop-
1. Hadoop Common: The common utilities that support the other Hadoop modules.
2. Hadoop Distributed File System (HDFS): A distributed file system that provides high throughput access to application data.
3. Hadoop YARN: A framework for job scheduling and cluster resource management.
4. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Installation:
Step 1: Download the Java 8 Package. Save this file in your home directory.

Step 2: Extract the Java Tar File.

Command: tar -xvf jdk-8u101-linux-i586.tar.gz

Step 3: Download the Hadoop 2.7.3 Package.

Command: wget https://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz

Step 4: Extract the Hadoop tar File.

Command: tar -xvf hadoop-2.7.3.tar.gz

Step 5: Add the Hadoop and Java paths in the bash file (.bashrc).
Open .bashrc file. Now, add Hadoop and Java Path.
Command: gedit .bashrc

Put the below line in bashrc file

export HADOOP_HOME="$HOME/hadoop-2.7.7"
export HADOOP_CONF_DIR="$HOME/hadoop-2.7.7/etc/hadoop"
export HADOOP_MAPRED_HOME="$HOME/hadoop-2.7.7"
export HADOOP_COMMON_HOME="$HOME/hadoop-2.7.7"
export HADOOP_HDFS_HOME="$HOME/hadoop-2.7.7"
export YARN_HOME="$HOME/hadoop-2.7.7"
export PATH="$PATH:$HOME/hadoop-2.7.7/bin"

export JAVA_HOME="/usr/lib/jvm/java-8-oracle"
export PATH="$PATH:JAVA_HOME/bin"

Now ,save the bash file and close it.

For applying all these changes to the current Terminal, execute the source command.
Command: source .bashrc
To make sure that Java and Hadoop have been properly installed on your system and can be accessed through the Terminal, execute
the java -version and hadoop version commands.
Command: java -version
hadoop version
Lab2
Hadoop Installaton (Cont....)

Step 6: Edit the Hadoop Configuration files.

Command: cd hadoop-2.7.3/etc/hadoop/
Step 7: Open core-site.xml and edit the property mentioned below inside configuration tag:
HIVE Installation

core-site.xml informs Hadoop daemon where NameNode runs in the cluster. It contains configuration settings of Hadoop core such as
I/O settings that are common to HDFS & MapReduce.
Command: gedit core-site.xml

Configure file with the configuration given below:

1 <?xml version="1.0" encoding="UTF-8"?>
2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3 <configuration>
4 <property>
5 <name>fs.default.name</name>
6 <value>hdfs://localhost:9000</value>
7 </property>
8 </configuration>

Step 8: Edit hdfs-site.xml and edit the property mentioned below inside configuration tag:

hdfs-site.xml contains configuration settings of HDFS daemons (i.e. NameNode, DataNode, Secondary NameNode). It also includes the
replication factor and block size of HDFS.
Command: gedit hdfs-site.xml
1
2 <?xml version="1.0" encoding="UTF-8"?>
3
4 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
5 <configuration>
6 <property>
7 <name>dfs.replication</name>
8 <value>1</value>
9 </property>
1 <property>
0 <name>dfs.permission</name>
1 <value>false</value>
1 </property>
1 </configuration>
2

Step 9: Edit the mapred-site.xml file and edit the property mentioned below inside configuration tag:

mapred-site.xml contains configuration settings of MapReduce application like number of JVM that can run in parallel, the size of the
mapper and the reducer process, CPU cores available for a process, etc.

Step 10: Edit yarn-site.xml and edit the property mentioned below inside configuration tag:

yarn-site.xml contains configuration settings of ResourceManager and NodeManager like application memory management size, the
operation needed on program & algorithm, etc.
Command: gedit yarn-site.xml
1
2 <?xml version="1.0">
3 <configuration>
4 <property>
5 <name>yarn.nodemanager.aux-services</name>
6 <value>mapreduce_shuffle</value>
7 </property>
8 <property>
9 <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
1 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
0 </property>
1 </configuration>
1

Step 11: Edit hadoop-env.sh and add the Java Path as mentioned below:

hadoop-env.sh contains the environment variables that are used in the script to run Hadoop like Java home path, etc.

Command: gedit hadoop–env.sh

Step 12: Go to Hadoop home directory and format the NameNode.

Command: cd hadoop-2.7.3

bin/hadoop namenode -format

Step 13: Once the NameNode is formatted, go to hadoop-2.7.3/sbin directory and start all the daemons.

Command: cd hadoop-2.7.3/sbin

Either you can start all daemons with a single command or do it individually.
For starting dfs daemon:

Command: ./start-dfs.sh

Step 14: Now open the Mozilla browser and go to localhost:50070/dfshealth.html to check the NameNode interface.
Experiment 2
Lab 3
HDFS Commands
Introduction:
HDFS is the primary or major component of the Hadoop ecosystem which is responsible for storing large data sets of structured or
unstructured data across various nodes and thereby maintaining the metadata in the form of log files.
HDFS commands are as follow:

(Execution is in folder hadoop-2.7.7)

1). ls: This command is used to list all the files. Use lsr for recursive approach. It is useful when we want a hierarchy of a folder.
Syntax:
bin/hdfs dfs -ls <path>

Execution:
bin/hdfs dfs -ls /user

2).mkdir: To create a directory. In Hadoop dfs there is no home directory by default. So let’s first create it.
Syntax:
bin/hdfs dfs -mkdir <folder name>

Execution:
bin/hdfs dfs -mkdir /user/khalid

3).touchz: It creates an empty file.

Syntax:
bin/hdfs dfs -touchz <file_path>

Execution:
bin/hdfs dfs -touchz /user/khalid/tst.txt

4).copyFromLocal : To copy files/folders from local file system to hdfs store. This is the most important command. Local filesystem
means the files present on the OS.
Syntax:
bin/hdfs dfs -copyFromLocal <local file path> <dest(present on hdfs)>

Execution:
bin/hdfs dfs -copyFromLocal /home/nitdelhipc22/Downloads/hadoopfile.doc /user/khalid/

5).copyToLocal: To copy files/folders from hdfs store to local file system.

Syntax:
bin/hdfs dfs -copyToLocal <<srcfile(on hdfs)> <local file dest>

Execution:
bin/hdfs dfs -copyToLocal /user/khalid/tst.txt /home/nitdpc22

6).cp: It copies file from one directory to other.Also multipe files can be copied by this command.
Syntex:
Bin/hdfs dfs -cp <srcfile(on hdfs)> <dest(hdfs)>

Execution:
Bin/hdfs dfs -cp /user/khalid/tst.txt /user/demo

7).moveFromLocal: This command will move file from local to hdfs.

Syntax:
bin/hdfs dfs -moveFromLocal <local src> <dest(on hdfs)>

Execution:
bin/hdfs dfs -moveFromLocal /home/nitdpc22/csa.txt /user/khalid

8).moveToLocal: This command will move file from hdfs to local.

Syntex:
bin/hdfs dfs -moveToLocal <src(on hdfs)> <local dest>

Execution:
bin/hdfs dfs -moveToLocal /user/khalid/tst.txt /home/nitdpc22

9) .mv: This command will move file from one directory to other.It also allow to move multiple files.
Syntex:
Bin/hdfs dfs -mv <src(on hdfs)> <dst(on hdsf)>

Execution:
bin/hdfs dfs -mv /user/khalid/tst.txt /user/demo

10).du: It will give the size of each file in directory.

Syntax:
bin/hdfs dfs -du <dirName>

Execution:
bin/hdfs dfs -du /user/khalid

11) .text: This command that takes a source file and outputs the file in text format.
Syntax:
bin/hdfs dfs –text <src(on hdfs)>

Execution:
bin/hdfs dfs -text /user/khalid/tst.txt

12) .count:HDFS Command to count the number of directories, files, and bytes under the paths that match the specified file
pattern.
Syntax:
bin/hdfs dfs -count <src(on hdfs)>

Execution:
bin/hdfs dfs -count /user
Experiment 3
Lab-4
Mongo DB installation and commands

Introduction:
MongoDB is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like
documents with schema.

Mongo DB Instalation:
STEP 1:Importing Public Key
Commands:
sudo apt-key adv --keyserver
hkp://keyserver.ubuntu.com:80 --recv 0C49F3730359A14518585931BC711F9BA15703C6

STEP 2:Create a List File for the Mongodb

Create a mongodb-org-3.4.list file inside the /etc/apt/sources.list.d location. Use the following command.

Commands:
echo "deb http://repo.mongodb.org/apt/ubuntu xenial/mongodb-org/3.4 multiverse"
sudo tee /etc/apt/sources.list.d/mongodb-org-3.4.list

STEP 3:Update the Local Package

Command:
apt-get update

STEP 4: Install Mongo DB

Command:
apt-get install mongodb-10gen = 2.2.3

STEP 5: Start Mongo DB

Command:
sudo service mongodb start

STEP 6: Run Mongo DB

Command:
mongo

Commands for Mongo DB:

1).insert():To insert data into MongoDB collection.
Command:
db.COLLECTION_NAME.insert(document)
Execution:
user.mycollection.insert(
_id: ObjectId(7df78ad8902c),
title: 'MongoDB Overview',
description: 'MongoDB is no sql database',
by: 'tutorials point',
url: 'http://www.tutorialspoint.com',
tags: ['mongodb', 'database', 'NoSQL'],
likes: 100
})

2). find(): This will display all documents in unstructured way.

Command:
db.COLLECTION_NAME.find()

Execution:
user.mycollection.find()

3). pretty(): This display the results in a formatted way.

Command:
db.mycol.find().pretty()

Execution:
user.mycollection.find().pretty()

Condition related commands:

1). Equality: To check the equality condition.

Syntax:
{<key>:<value>}

Execution:
user.mycollection.find({“id”:190}).pretty()

2). Less Than: To check less than condition.

Syntax:
{<key>:{$lt:<value>}}

Execution:
user.mycollection.find({“id”:{$lt:150}).pretty()

3).Less Than Equals: To check less than and equals condition.

Syntax:
{<key>:{$lte:<value>}}

Execution:
user.mycollection.find({“id”:{$lte:150}).pretty()

4).Greater Than: To check greater than condition.

Syntax:
{<key>:{$gt:<value>}}

Execution:
user.mycollection.find({“id”:{$gt:150}}).pretty()

5). Greater Than Equals: To check the greater than and equals condition.
Syntax:
{<key>:{$gte:<value>}}

Execution:
user.mycollection.find({“id”:{$gte:150}}).pretty()

6).Not Equals: To check the not equals condition.

Syntax:
{<key>:{$ne:<value>}}

Execution:
user.mycollection.find({“id”:{$ne:150}}).pretty()

Experiment 4
Lab 5

HIVE Installation
Introduction:
Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives a
SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop.

Installation:
STEP 1: Download Hive tar.
Command: wget http://archive.apache.org/dist/hive/hive-2.1.0/apache-hive-2.1.0-bin.tar.gz

STEP 2: Extracting and verifying Hive Archive

Command:
tar zxvf apache-hive-0.14.0-bin.tar.gz
ls

STEP 3: Open bashrc file

Command :
gedit ~/.bashrc

STEP 4: Provide the following HIVE_HOME path.

export HIVE_HOME=/home/user/apache-hive-1.2.2-bin
export PATH=$PATH:/home/user/apache-hive-1.2.2-bin/bin

STEP 5:Update the environment variable.

source ~/.bashrc

STEP 6: Start the hive by providing the following command.

hive

Hive Commands:
1).Create database:
Syntax:
CREATE DATABASE [IF NOT EXISTS] <database name

Execution:
CREATE DATABASE [IF NOT EXISTS] khalid

2).Drop:
Syntax:
DROP DATABASE [IF EXISTS] database_name

Execution:
DROP DATABASE [IF EXISTS] khalid

3).Create Table:
Syntax:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.] table_name

[(col_name data_type [COMMENT col_comment], ...)]

[COMMENT table_comment]
[ROW FORMAT row_format]
[STORED AS file_format]

Execution:
CREATE TABLE IF NOT EXISTS employee ( eid int, name String,
salary String, destination String)
COMMENT ‘Employee details’
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’
STORED AS TEXTFILE;

4).Load Data:
Syntax:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename

Execution:
LOAD DATA LOCAL INPATH ‘home/user/khalid.txt’ OVERWRITE INTO TABLE employee

5).Select Data:
Syntax:
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference

Execution:
SELECT * FROM employee

6).Select with where condition:

Syntax:
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]

Execution:
SELECT * FROM employee where salary>5000

7).Select with order by:

Syntax:
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[ORDER BY col_list]]

Execution:
Select * from employee order by salary
Experiment 5
Lab 6
PIG INSTALLATION

Introduction:
Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin.

Installation:
STEP 1: Download the Apache Pig tar file.
http://mirrors.estointernet.in/apache/pig/pig-0.16.0/

STEP 2: Unzip the downloaded tar file.

Command: tar -xvf pig-0.16.0.tar.gz

STEP 3: Open the bashrc file.

Command: gedit ~/.bashrc

STEP 4: Provide the following PIG_HOME path.

export PIG_HOME=/home/user/pig-0.16.0
export PATH=$PATH:$PIG_HOME/bin
Export PIG_CLASSPATH=”$HADOOP_CONF_DIR”

STEP 5: Update the environment variable

Command: source ~/.bashrc

STEP 6: Test the installation on the command prompt type

Command: pig -h

STEP 7: Start the pig in MapReduce mode.

Command: pig

Pig Commands:
1).Load Data:
Syntax:
LOAD 'info' [USING FUNCTION] [AS SCHEMA];

Execution:
A = LOAD 'user/khalid/pload.txt' USING PigStorage(',') AS (a1:int,a2:int,a3:int,a4:int)

2).Distinct Operator:
Execution:
A = LOAD 'user/khalid/pload.txt' USING PigStorage(',') AS (a1:int,a2:int,a3:int,a4:int)
Result = Distinct A

3).ForEach operator:
Execution:
A = LOAD 'user/khalid/pload.txt' USING PigStorage(',') AS (a1:int,a2:int,a3:int,a4:int)
fe = FOREACH A generate a1,a2;

4).Group operator:
Execution:
A = LOAD 'user/khalid/pload.txt' USING PigStorage(',') AS (a1:int,a2:int,a3:int,a4:int);
groupgroupbylname = group A by l_name ;

5).Order by operator:
Execution:
A = LOAD 'user/khalid/pload.txt' USING PigStorage(',') AS (a1:int,a2:int,a3:int,a4:int);
Result = ORDER A BY a1 DESC;

6).Show data:
Command:
Dump Result;

Experiment 6
Lab 7,8
MapReduce
Introduction:
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel,
distributed algorithm on a cluster. A MapReduce program is composed of a map procedure, which performs filtering and sorting, and a
reduce method, which performs a summary operation.

Algorithm:
Generally MapReduce paradigm is based on sending the computer to where the data resides.

MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage:

Map stage − The map or mapper’s job is to process the input data. Generally the input data is in the form of file or directory and is
stored in the Hadoop file system (HDFS). The input file is passed to the mapper function line by line. The mapper processes the data
and creates several small chunks of data.

Reduce stage − This stage is the combination of the Shuffle stage and the Reduce stage. The Reducer’s job is to process the data that
comes from the mapper. After processing, it produces a new set of output, which will be stored in the HDFS.

During a MapReduce job, Hadoop sends the Map and Reduce tasks to the appropriate servers in the cluster.

The framework manages all the details of data-passing such as issuing tasks, verifying task completion, and copying data around the
cluster between the nodes.

Most of the computing takes place on nodes with data on local disks that reduces the network traffic.
After completion of the given tasks, the cluster collects and reduces the data to form an appropriate result, and sends it back to the
Hadoop server.

Map-Reduce Implementation for counting the number of words in a program.

Consider the following text in a text file input.txt for MapReduce.

What do you mean by Object

What do you know about Java
What is Java Virtual Machine
How Java enabled High Performance

Record Reader
This is the first phase of MapReduce where the Record Reader reads every line from the input text file as text and yields output as key-
value pairs.

Input − Line by line text from the input file.

Output − Forms the key-value pairs. The following is the set of expected key-value pairs.

<1, What do you mean by Object>

<2, What do you know about Java>
<3, What is Java Virtual Machine>
<4, How Java enabled High Performance>

Map Phase
The Map phase takes input from the Record Reader, processes it, and produces the output as another set of key-value pairs.

Input − The following key-value pair is the input taken from the Record Reader.

<1, What do you mean by Object>

<2, What do you know about Java>
<3, What is Java Virtual Machine>
<4, How Java enabled High Performance>

The Map phase reads each key-value pair, divides each word from the value using StringTokenizer, treats each word as key and the
count of that word as value. The following code snippet shows the Mapper class and the map function.

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>

{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(Object key, Text value, Context context) throws IOException, InterruptedException
{
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens())
{
word.set(itr.nextToken());
context.write(word, one);
}
}
}

Output − The expected output is as follows −

<What,1> <do,1> <you,1> <mean,1> <by,1> <Object,1>

<What,1> <do,1> <you,1> <know,1> <about,1> <Java,1>
<What,1> <is,1> <Java,1> <Virtual,1> <Machine,1>
<How,1> <Java,1> <enabled,1> <High,1> <Performance,1>

Combiner Phase
The Combiner phase takes each key-value pair from the Map phase, processes it, and produces the output as key-value collection
pairs.

Input − The following key-value pair is the input taken from the Map phase.

<What,1> <do,1> <you,1> <mean,1> <by,1> <Object,1>

<What,1> <do,1> <you,1> <know,1> <about,1> <Java,1>
<What,1> <is,1> <Java,1> <Virtual,1> <Machine,1>
<How,1> <Java,1> <enabled,1> <High,1> <Performance,1>

The Combiner phase reads each key-value pair, combines the common words as key and values as collection. Usually, the code and
operation for a Combiner is similar to that of a Reducer. Following is the code snippet for Mapper, Combiner and Reducer class
declaration.

job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);

Output − The expected output is as follows −

<What,1,1,1> <do,1,1> <you,1,1> <mean,1> <by,1> <Object,1>

<know,1> <about,1> <Java,1,1,1>
<is,1> <Virtual,1> <Machine,1>
<How,1> <enabled,1> <High,1> <Performance,1>

Reducer Phase
The Reducer phase takes each key-value collection pair from the Combiner phase, processes it, and passes the output as key-value
pairs. Note that the Combiner functionality is same as the Reducer.

Input − The following key-value pair is the input taken from the Combiner phase.

<What,1,1,1> <do,1,1> <you,1,1> <mean,1> <by,1> <Object,1>

<know,1> <about,1> <Java,1,1,1>
<is,1> <Virtual,1> <Machine,1>
<How,1> <enabled,1> <High,1> <Performance,1>

The Reducer phase reads each key-value pair. Following is the code snippet for the Combiner.

public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable>

{
private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}

Output − The expected output from the Reducer phase is as follows −

<What,3> <do,2> <you,2> <mean,1> <by,1> <Object,1>

<know,1> <about,1> <Java,3>
<is,1> <Virtual,1> <Machine,1>
<How,1> <enabled,1> <High,1> <Performance,1>

Record Writer
This is the last phase of MapReduce where the Record Writer writes every key-value pair from the Reducer phase and sends the output
as text.

Input − Each key-value pair from the Reducer phase along with the Output format.
Output − It gives you the key-value pairs in text format. Following is the expected output.

What 3
do 2
you 2
mean 1
by 1
Object 1
know 1
about 1
Java 3
is 1
Virtual 1
Machine 1
How 1
enabled 1
High 1
Performance 1

Java Programe for implementation of word count using Map-Reduce

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable>

{
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException
{
int sum = 0;
for (IntWritable val : values)
{
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception
{
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");

job.setJarByClass(WordCount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

Save the above program as WordCount.java. The compilation and execution of the program is given below.

Compilation and Execution

Let us assume we are in the home directory of Hadoop user (for example, /home/hadoop).

Follow the steps given below to compile and execute the above program.

Step 1 − Use the following command to create a directory to store the compiled java classes.

$ mkdir units

Step 2 − Download Hadoop-core-1.2.1.jar, which is used to compile and execute the MapReduce program. You can download the jar
from mvnrepository.com.

Let us assume the downloaded folder is /home/hadoop/.

Step 3 − Use the following commands to compile the WordCount.java program and to create a jar for the program.

$ javac -classpath hadoop-core-1.2.1.jar -d units WordCount.java

$ jar -cvf units.jar -C units/ .

Step 4 − Use the following command to create an input directory in HDFS.

$HADOOP_HOME/bin/hadoop fs -mkdir input_dir

Step 5 − Use the following command to copy the input file named input.txt in the input directory of HDFS.

$HADOOP_HOME/bin/hadoop fs -put /home/hadoop/input.txt input_dir

Step 6 − Use the following command to verify the files in the input directory.

$HADOOP_HOME/bin/hadoop fs -ls input_dir/

Step 7 − Use the following command to run the Word count application by taking input files from the input directory.

$HADOOP_HOME/bin/hadoop jar units.jar hadoop.ProcessUnits input_dir output_dir

Wait for a while till the file gets executed. After execution, the output contains a number of input splits, Map tasks, and Reducer tasks.

Step 8 − Use the following command to verify the resultant files in the output folder.

$HADOOP_HOME/bin/hadoop fs -ls output_dir/

Step 9 − Use the following command to see the output in Part-00000 file. This file is generated by HDFS.

$HADOOP_HOME/bin/hadoop fs -cat output_dir/part-00000

Following is the output generated by the MapReduce program.

What 3
do 2
you 2
mean 1
by 1
Object 1
know 1
about 1
Java 3
is 1
Virtual 1
Machine 1
How 1
enabled 1
High 1
Performance 1

Investments ISE 13th Edition PDF
100% (1)
Investments ISE 13th Edition PDF
50 pages
Roger Penrose, Emanuele Severino, Ines Testoni, Giuseppe Vitiell - Artificial Intelligence Versus Natural Intelligence (2022, Springer) - Libgen - Li
No ratings yet
Roger Penrose, Emanuele Severino, Ines Testoni, Giuseppe Vitiell - Artificial Intelligence Versus Natural Intelligence (2022, Springer) - Libgen - Li
196 pages
Bda Manual
No ratings yet
Bda Manual
80 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Lab 1 - Hadoop HDFS and MapReduce
No ratings yet
Lab 1 - Hadoop HDFS and MapReduce
4 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
REV - 3 MS Cable Testing - RG
No ratings yet
REV - 3 MS Cable Testing - RG
13 pages
Hadoop1
No ratings yet
Hadoop1
15 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
Big Data Manual - Fall 2023
No ratings yet
Big Data Manual - Fall 2023
76 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
BDA Record (1)
No ratings yet
BDA Record (1)
34 pages
Dsa Practical File
No ratings yet
Dsa Practical File
16 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Ccs334 Bda Lab Manual PRINT
No ratings yet
Ccs334 Bda Lab Manual PRINT
53 pages
Big Data Manual Ai
No ratings yet
Big Data Manual Ai
33 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
EXP 1-2
No ratings yet
EXP 1-2
9 pages
Computer Science & Engineering: Department of
No ratings yet
Computer Science & Engineering: Department of
6 pages
Big_data_Lab_Manual[1] (4)
No ratings yet
Big_data_Lab_Manual[1] (4)
32 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
ccs 334 bigdata manual
No ratings yet
ccs 334 bigdata manual
45 pages
Hadoop Lab
100% (2)
Hadoop Lab
6 pages
hadoop
No ratings yet
hadoop
6 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
No ratings yet
Data Storage Data Processing: Hadoop Distributed File System (HDFS) Mapreduce
35 pages
Lab2_BigData-HDFSp
No ratings yet
Lab2_BigData-HDFSp
4 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
BIG data file
No ratings yet
BIG data file
28 pages
big data
No ratings yet
big data
28 pages
Practical 1 - 1 - Hadoop Commands
No ratings yet
Practical 1 - 1 - Hadoop Commands
3 pages
Vndna
No ratings yet
Vndna
12 pages
HDFS Commands
No ratings yet
HDFS Commands
15 pages
Unit 1 Bdhall
No ratings yet
Unit 1 Bdhall
66 pages
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
No ratings yet
C21053 Jay Vijay Karwatkar-Big Data Analytics & Visualization
210 pages
Data Science
No ratings yet
Data Science
82 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Experiment No 1
No ratings yet
Experiment No 1
13 pages
BIGDATA LAB MANUAL
No ratings yet
BIGDATA LAB MANUAL
27 pages
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
ds
No ratings yet
ds
67 pages
BDH Record - Merged
No ratings yet
BDH Record - Merged
47 pages
Bigdata Lab File
No ratings yet
Bigdata Lab File
20 pages
BDA Lab Manual 2023-2024
No ratings yet
BDA Lab Manual 2023-2024
54 pages
Hadoop实际解决方案手册: Chinese Edition
From Everand
Hadoop实际解决方案手册: Chinese Edition
Posts & Telecom Press
No ratings yet
Bda Record
No ratings yet
Bda Record
83 pages
Extreme Computing Lab Exercises Session One: 1 Getting Started
No ratings yet
Extreme Computing Lab Exercises Session One: 1 Getting Started
6 pages
BDA-ALLEXP (2)_merged
No ratings yet
BDA-ALLEXP (2)_merged
149 pages
Hadoop Installation
No ratings yet
Hadoop Installation
11 pages
BDA
No ratings yet
BDA
88 pages
Big Data
No ratings yet
Big Data
67 pages
Hands On-Exercies
No ratings yet
Hands On-Exercies
17 pages
Big Data Manual
No ratings yet
Big Data Manual
19 pages
Big Data
No ratings yet
Big Data
23 pages
BDA LAB MANUAL
No ratings yet
BDA LAB MANUAL
45 pages
Tutorial 7 Questions - Model - Answers-21
No ratings yet
Tutorial 7 Questions - Model - Answers-21
5 pages
0416 Cja 4031
No ratings yet
0416 Cja 4031
5 pages
Aadhaar Authentication Basics
No ratings yet
Aadhaar Authentication Basics
26 pages
English Syntax
No ratings yet
English Syntax
106 pages
The Visual Basic Editor: School of Construction
No ratings yet
The Visual Basic Editor: School of Construction
43 pages
Strain: X A B C S
No ratings yet
Strain: X A B C S
9 pages
Cat - Articulado 740.sis - Hidrau
100% (8)
Cat - Articulado 740.sis - Hidrau
2 pages
Calculus For Economists
100% (1)
Calculus For Economists
139 pages
SeriesIII Firmware v2 1 16232 ReleaseNotes
No ratings yet
SeriesIII Firmware v2 1 16232 ReleaseNotes
3 pages
Mixture and Alligation Questions For CAT
No ratings yet
Mixture and Alligation Questions For CAT
8 pages
Beta-Formula-Excel-Template
No ratings yet
Beta-Formula-Excel-Template
5 pages
Accrolube Spec
No ratings yet
Accrolube Spec
3 pages
Sample Mathematics Questions For Fairy Bee 12 13 Years Old
No ratings yet
Sample Mathematics Questions For Fairy Bee 12 13 Years Old
4 pages
Duhamel Integral
100% (1)
Duhamel Integral
11 pages
Smart Street Light
100% (4)
Smart Street Light
27 pages
Minutes Course Meeting ECON046
No ratings yet
Minutes Course Meeting ECON046
4 pages
Kelly KLS 8080NNPSUserManualV1.13 120N 15
No ratings yet
Kelly KLS 8080NNPSUserManualV1.13 120N 15
43 pages
Revision (Unit 1) : A. Choose The Best Option
No ratings yet
Revision (Unit 1) : A. Choose The Best Option
6 pages
Tropical Design: Espeso - Midterm Examination Reviewer
No ratings yet
Tropical Design: Espeso - Midterm Examination Reviewer
7 pages
Copeland Afe14c5e-Iaa-901 Article 3688596415520414 en Ss
No ratings yet
Copeland Afe14c5e-Iaa-901 Article 3688596415520414 en Ss
1 page
Defect and Limitation
No ratings yet
Defect and Limitation
4 pages
Labour Economics Reading Group
No ratings yet
Labour Economics Reading Group
7 pages
Physical Science: Photo N
No ratings yet
Physical Science: Photo N
37 pages
Analysis & Optimization of Pressure Plate
No ratings yet
Analysis & Optimization of Pressure Plate
37 pages
CNC lap 1_merged
No ratings yet
CNC lap 1_merged
35 pages
Air Induction and Exhaust Systems
No ratings yet
Air Induction and Exhaust Systems
232 pages
MP3 Music Player in Python
No ratings yet
MP3 Music Player in Python
15 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Hadoop File Complte

Uploaded by

Hadoop File Complte

Uploaded by

NATIONAL INSTITUTE OF TECHNOLOGY DELHI

COMPUTER SCIENCE AND ENGINEERING

1. Introduction to Hadoop and it's Installation

2. Hadoop Installation(Edit Configuration Files)

3. Introduction to HDFS and it's Commands

4. MongoDB Installation and Commands

5. Hive Installation and Commands

6. Pig Installation and Commands

Step 2: Extract the Java Tar File.

Step 3: Download the Hadoop 2.7.3 Package.

Step 4: Extract the Hadoop tar File.

Put the below line in bashrc file

Now ,save the bash file and close it.

Step 6: Edit the Hadoop Configuration files.

Configure file with the configuration given below:

Command: gedit mapred-site.xml.

Command: gedit hadoop–env.sh

Step 12: Go to Hadoop home directory and format the NameNode.

bin/hadoop namenode -format

(Execution is in folder hadoop-2.7.7)

3).touchz: It creates an empty file.

5).copyToLocal: To copy files/folders from hdfs store to local file system.

7).moveFromLocal: This command will move file from local to hdfs.

8).moveToLocal: This command will move file from hdfs to local.

10).du: It will give the size of each file in directory.

STEP 2:Create a List File for the Mongodb

STEP 3:Update the Local Package

STEP 4: Install Mongo DB

STEP 5: Start Mongo DB

STEP 6: Run Mongo DB

Commands for Mongo DB:

2). find(): This will display all documents in unstructured way.

3). pretty(): This display the results in a formatted way.

Condition related commands:

1). Equality: To check the equality condition.

2). Less Than: To check less than condition.

3).Less Than Equals: To check less than and equals condition.

4).Greater Than: To check greater than condition.

6).Not Equals: To check the not equals condition.

STEP 2: Extracting and verifying Hive Archive

STEP 3: Open bashrc file

STEP 4: Provide the following HIVE_HOME path.

STEP 5:Update the environment variable.

STEP 6: Start the hive by providing the following command.

[(col_name data_type [COMMENT col_comment], ...)]

6).Select with where condition:

7).Select with order by:

STEP 2: Unzip the downloaded tar file.

STEP 3: Open the bashrc file.

STEP 4: Provide the following PIG_HOME path.

STEP 5: Update the environment variable

STEP 6: Test the installation on the command prompt type

STEP 7: Start the pig in MapReduce mode.

Map-Reduce Implementation for counting the number of words in a program.

Consider the following text in a text file input.txt for MapReduce.

What do you mean by Object

Input − Line by line text from the input file.

<1, What do you mean by Object>

<1, What do you mean by Object>

public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>

Output − The expected output is as follows −

<What,1> <do,1> <you,1> <mean,1> <by,1> <Object,1>

<What,1> <do,1> <you,1> <mean,1> <by,1> <Object,1>

Output − The expected output is as follows −

<What,1,1,1> <do,1,1> <you,1,1> <mean,1> <by,1> <Object,1>

<What,1,1,1> <do,1,1> <you,1,1> <mean,1> <by,1> <Object,1>

public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable>

Output − The expected output from the Reducer phase is as follows −

<What,3> <do,2> <you,2> <mean,1> <by,1> <Object,1>

Java Programe for implementation of word count using Map-Reduce

public class WordCount {

public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable>

FileInputFormat.addInputPath(job, new Path(args[0]));

Compilation and Execution

Let us assume the downloaded folder is /home/hadoop/.

$ javac -classpath hadoop-core-1.2.1.jar -d units WordCount.java