0% found this document useful (0 votes)

121 views49 pages

Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution

The document describes setting up a single node Hadoop cluster and running a word count program on 250MB of data. It involves installing Java and Hadoop, setting environment variables and Hadoop configurations, formatting the namenode, and running the word count MapReduce job pre-installed with Hadoop on a 250MB text file, outputting the word count results.

Uploaded by

All in one

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views49 pages

Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution

Uploaded by

All in one

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Big data analysis practicals kaushal chavda(180280107021)

Practical-1
Aim :- Make a single node cluster in Hadoop.

Solution:-
To install Hadoop, you should have Java version 1.8 in your system.
Check your java version through this command on command prompt

java –version

After downloading java version 1.8, download hadoop version 3.2.1.

Extract it to a folder.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 1

Big data analysis practicals kaushal chavda(180280107021)

Setup System Environment Variables

Create a new user variable. Put the Variable_name as HADOOP_HOME and

Variable_value as the path of the bin folder where you extracted hadoop.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 2

Big data analysis practicals kaushal chavda(180280107021)

Likewise, create a new user variable with variable name as JAVA_HOME and variable
value as the path of the bin folder in the Java directory.

Now we need to set Hadoop bin directory and Java bin directory path in system variable
path.

Edit Path in system variable

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 3

Big data analysis practicals kaushal chavda(180280107021)

Click on New and add the bin directory path of Hadoop and Java in it.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 4

Big data analysis practicals kaushal chavda(180280107021)

Configurations

Now we need to edit some files located in the hadoop directory of the etc folder where we
installed hadoop. The files that need to be edited have been highlighted.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 5

Big data analysis practicals kaushal chavda(180280107021)

Edit the file core-site.xml in the hadoop directory. Copy this xml property in the configuration
in the file.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 6

Big data analysis practicals kaushal chavda(180280107021)

Edit mapred-site.xml and copy this property in the configuration.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 7

Big data analysis practicals kaushal chavda(180280107021)

Edit the file hdfs-site.xml and add below property in the configuration.

Edit the file yarn-site.xml and add below property in the configuration.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 8

Big data analysis practicals kaushal chavda(180280107021)

Create a folder ‘data’ in the hadoop directory.

Create a folder with the name dfs inside data and inside that create ‘datanode’ and
‘namenode’.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 9

Big data analysis practicals kaushal chavda(180280107021)

Hadoop needs windows OS specific files which does not come with default download of
hadoop.you can download it from the github.

Check whether hadoop is successfully installed by running this command on cmd-

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 10

Big data analysis practicals kaushal chavda(180280107021)

hadoop version

Since it doesn’t throw error and successfully shows the hadoop version, that means hadoop
is successfully installed in the system.

Format the NameNode

Formatting the NameNode is done once when hadoop is installed and not for running
hadoop filesystem, else it will delete all the data inside HDFS. Run this command-

hdfs namenode –format

It would appear something like this –

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 11

Big data analysis practicals kaushal chavda(180280107021)

Access Hadoop Services in Browser

Hadoop Name Node started on the port 9870.
L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 12
Big data analysis practicals kaushal chavda(180280107021)

Cluster

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 13

Big data analysis practicals kaushal chavda(180280107021)

Namenode information:

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 14

Big data analysis practicals kaushal chavda(180280107021)

Data node information:

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 15

Big data analysis practicals kaushal chavda(180280107021)

Conclusion:

Thus, in this Practical we learnt about installing and setting up the Hadoop framework and
making a single node cluster in Hadoop.

PRACTICAL-2
AIM:- Run Word count program in Hadoop with 250 MB size of Data Set.

SOLUTION:

we'll run wordcount MapReduce job available

in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-
2.2.0.jar

1. Create a text file with some content. We'll pass this file as input to
the wordcount MapReduce job for counting words.

C:\file1.txt:- This is the file with 250 mb size of the data which we will use for
word count problem.

For example assume that the content of the file is,

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 16

Big data analysis practicals kaushal chavda(180280107021)

Install Hadoop

Run Hadoop Wordcount Mapreduce Example

1. Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt')
to be used for counting words.
C:\Users\abhijitg>cd c:\hadoop
C:\hadoop>bin\hdfs dfs -mkdir input

2. Copy the text file(say 'file1.txt') from local disk to the newly created 'input'
directory in HDFS.
C:\hadoop>bin\hdfs dfs -copyFromLocal c:/file1.txt input

3. Check content of the copied file.

C:\hadoop>hdfs dfs -ls input

Found 1 items
-rw-r--r-- 1 ABHIJITG supergroup 55 2014-02-03 13:19
input/file1.txt

C:\hadoop>bin\hdfs dfs -cat input/file1.txt

Install Hadoop
Run Hadoop Wordcount Mapreduce Example

4. Run the wordcount MapReduce job provided in %HADOOP_HOME

%\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.2.0.jar
C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-
examples-2.2.0.jar wordcount input output
14/02/03 13:22:02 INFO client.RMProxy: Connecting to ResourceManager
at /0.0.0.0:8032
14/02/03 13:22:03 INFO input.FileInputFormat: Total input paths to
process : 1
14/02/03 13:22:03 INFO mapreduce.JobSubmitter: number of splits:1
:
:
14/02/03 13:22:04 INFO mapreduce.JobSubmitter: Submitting tokens for
job: job_1391412385921_0002

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 17

Big data analysis practicals kaushal chavda(180280107021)

14/02/03 13:22:04 INFO impl.YarnClientImpl: Submitted application

application_1391412385921_0002 to ResourceManager at /0.0.0.0:8032
14/02/03 13:22:04 INFO mapreduce.Job: The url to track the job:
http://ABHIJITG:8088/proxy/application_1391412385921_0002/
14/02/03 13:22:04 INFO mapreduce.Job: Running job:
job_1391412385921_0002
14/02/03 13:22:14 INFO mapreduce.Job: Job job_1391412385921_0002
running in uber mode : false
14/02/03 13:22:14 INFO mapreduce.Job: map 0% reduce 0%
14/02/03 13:22:22 INFO mapreduce.Job: map 100% reduce 0%
14/02/03 13:22:30 INFO mapreduce.Job: map 100% reduce 100%
14/02/03 13:22:30 INFO mapreduce.Job: Job job_1391412385921_0002
completed successfully
14/02/03 13:22:31 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=89
FILE: Number of bytes written=160142
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=171
HDFS: Number of bytes written=59
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots
(ms)=5657
Total time spent by all reduces in occupied slots
(ms)=6128
Map-Reduce Framework
Map input records=2
Map output records=7
Map output bytes=82
Map output materialized bytes=89
Input split bytes=116
Combine input records=7
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=89
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 18
Big data analysis practicals kaushal chavda(180280107021)

Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=145
CPU time spent (ms)=1418
Physical memory (bytes) snapshot=368246784
Virtual memory (bytes) snapshot=513716224
Total committed heap usage (bytes)=307757056
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=55
File Output Format Counters
Bytes Written=59

5. Check output.
C:\hadoop>bin\hdfs dfs -cat output/*
Example 1
Hadoop 2
Install 1
Mapreduce 1
Run 1
Wordcount 1

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 19

Big data analysis practicals kaushal chavda(180280107021)

Practical-3
Aim: Understand the logs generated by MapReduce program.

Solution :
Job Client

Job Client is used by the user to facilitate execution of the MapReduce job.
When a user writes a MapReduce job they will typically invoke job client in
their main class to configure and launch the job. In this example, we will be
using SleepJob to cause mapper tasks to sleep for an extended period of time

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 20

Big data analysis practicals kaushal chavda(180280107021)

so we can see how the NodeManager is executing the jobs.

Job job = createJob(numMapper, numReducer,
mapSleepTime, mapSleepCount, reduceSleepTime,
reduceSleepCount); return
job.waitForCompletion(true) ? 0 : 1;

The function call job.WaitForCompletion will first create

/user/gpadmin/.staging directory if it does not exist and create job.xml
job.<timestamp>.conf.xml containing all the Hadoop params used to execute
the job.

It also uploads "hadoop-mapreduce-client-jobclient.jar" jar file used in the

Hadoop jar command into this directory renaming it to job.jar. "job.jar" will
then be used by all the containers to execute the MapReduce job.
Note: .staging directory will be created under the path /user/${username}. In
this article, gpadmin is the user.
[gpadmin@hdw1 ~]$ hdfs dfs -ls
/user/gpadmin/.staging/job_1389385968629_0025 Found 7 items

-rw-r--r-- 3 gpadmin hadoop 7 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/appTokens

-rw-r--r-- 10 gpadmin hadoop 1383034 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/job.jar

-rw-r--r-- 10 gpadmin hadoop 151 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/job.split

-rw-r--r-- 3 gpadmin hadoop 19 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/job.splitmetainfo

-rw-r--r-- 3 gpadmin hadoop 64874 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/job.xml

-rw-r--r-- 3 gpadmin hadoop 0 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/job_1389385968629_0025_1.jhist

-rw-r--r-- 3 gpadmin hadoop 75335 2014-02-01 15:28

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 21

Big data analysis practicals kaushal chavda(180280107021)

/
user/gpadmin/.staging/job_1389385968629_0025/job_1389385968629_0025_1_conf.xm
l

After .staging is created, job client will submit the job to the resource manager
service (application manager port 8032). Then job client will continue to
monitor the execution of the job and report back to the console with the
progress of the map and reduce containers. That is why you see the "map 5%
reduce 0%" while the job is running. Once the job completes, job client will
return some statistics about the job that it collected during execution.
Remember that job client gets map and reduce container statuses from the
Application Master directly. We will talk a bit more about that later but for
now here is an example of running the sleep job, so it hangs for a really long
time while we observe the map containers execute.
[gpadmin@hdm1 ~]$ hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-
mapreduce-client- jobclient.jar sleep -m 3 -r 1 -mt 6000000

14/02/01 15:27:59 INFO service.AbstractService:

Service:org.apache.hadoop.yarn.client.YarnClientImpl is
inited. 14/02/01 15:27:59 INFO service.AbstractService:
Service:org.apache.hadoop.yarn.client.YarnClientImpl is
started. 14/02/01 15:28:00 INFO mapreduce.JobSubmitter:
number of splits:3

14/02/01 15:28:00 INFO mapreduce.JobSubmitter: Submitting

tokens for job: job_1389385968629_0025

14/02/01 15:28:01 INFO client.YarnClientImpl: Submitted

application application_1389385968629_0025 to
ResourceManager at hdm1.hadoop.local/192.168.2.101:8032

14/02/01 15:28:01 INFO mapreduce.Job: The url to track the job:

http://hdm1.hadoop.local:8088/proxy/application_1389385968629_0
025/ 14/02/01 15:28:01 INFO mapreduce.Job: Running job:
job_1389385968629_0025

14/02/01 15:28:12 INFO mapreduce.Job: Job job_1389385968629_0025

running in uber mode : false

14/02/01 15:28:12 INFO mapreduce.Job: map 0% reduce 0%

14/02/01 15:29:06 INFO mapreduce.Job: map 1% reduce 0%

14/02/01 15:30:37 INFO mapreduce.Job: map 2% reduce 0%

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 22

Big data analysis practicals kaushal chavda(180280107021)

14/02/01 15:32:08 INFO mapreduce.Job: map 3% reduce 0%

14/02/01 15:33:38 INFO mapreduce.Job: map 4% reduce 0%

Note: You can kill the MapReduce job using the following command:
[root@hdw3 yarn]# yarn application -kill
application_1389385968629_0025 Output:

14/02/01 16:53:30 INFO client.YarnClientImpl: Killing

application application_1389385968629_0025

14/02/01 16:53:30 INFO service.AbstractService:

Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

Application Master

Once the application manager service has decided to the start running the job,
it then chooses one of the NodeManagers to launch the MapReduce
application master class, which is called
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
The application master service will be launched on one of the NodeManager
servers running in the environment. The NodeManager selected by the
resource manager is largely dependent on the available resources within the
cluster. The Node manager service will generate shell scripts in the local
application cache, which are used to execute the application master
container.

Here we see application master server container directory located in the

NodeManagers ${yarn.nodemanager.local-dirs} defined in yarn-site.xml
nm-local-
dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/jo
bSubmitDir nm-
local-

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 23

Big data analysis practicals kaushal chavda(180280107021)

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/jobSubmitDir
/job.split nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/jobSubmitDir
/appTokens nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/jobSubmitDir/job.spl
itmetainfo nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_0000
01/job.xml
nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/.default_container_execu
tor.sh.crc nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/launch_co
ntainer.sh nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_
000001/tm
p nm-
local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/.container_
tokens.crc nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 24

Big data analysis practicals kaushal chavda(180280107021)

_0025_01_000001/contai
ner_tokens nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_0000
01/job.jar
nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/default_container_e
xecutor.sh nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/.launch_container.sh.crc

The container executer class running in the NodeManager service will then use
launch_container.sh to execute the Application Master class. As per below you
can see all logs for stdout and stderr are getting redirected to $
{yarn.nodemanager.log-dirs} defined in yarn-site.xml
[gpadmin@hdw3 yarn]# tail -1 nm-local-
dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/launch_container.sh

exec /bin/bash -c "$JAVA_HOME/bin/java -Dlog4j.configuration=container-

log4j.properties -
Dyarn.app.mapreduce.container.log.dir=/data/dn/yarn/userlogs/application_1389
38596862 9_0025/container_1389385968629_0025_01_000001 -
Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
-Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
1>/data/dn/yarn/userlogs/application_1389385968629_0025/container_13893859686
29_0025_ 01_000001/stdout
2>/data/dn/yarn/userlogs/application_1389385968629_0025/container_13893859686

29_0025_ 01_000001/stderr "

Once launched the Application Master will issue resource allocation requests
for the map and reduce containers in the queue to the ResourceManager
service. When the resource manager determines that there are enough
resources on the cluster to grant the allocation request, it will inform the
L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 25
Big data analysis practicals kaushal chavda(180280107021)

application master, which NodeManager service is available to execute the

container. The application master will send a request to the NodeManager to
launch the container.
Map or Reduce Container

The container executer class in the NodeManager will do the same for a map or
reduce container as it did with the Application Master class. All files and shell
scripts will be added into the containers application class within the nm-local-
dir
nm-local-
dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003

nm-local-
dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_0000
03/job.xml
nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/.default_container_execu
tor.sh.crc nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/launch_co
ntainer.sh nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_
000003/tm
p nm-
local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 26

Big data analysis practicals kaushal chavda(180280107021)

_0025_01_000003/.j
ob.xml.crc nm-
local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/.container_
tokens.crc nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/contai
ner_tokens nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_0000
03/job.jar
nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/default_container_e
xecutor.sh nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/.launch_container.sh.crc

Note: job.jar is only a soft link that points to the actual job.jar in the
applications filecache directory. This is how yarn handles distributed cache for
containers:
[root@hdw1 yarn]# ls -l nm-local-
dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/

total 96

-rw-r--r-- 1 yarn yarn 108 Feb 1 15:25 container_tokens

-rwx------ 1 yarn yarn 450 Feb 1 15:25 default_container_executor.sh

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 27

Big data analysis practicals kaushal chavda(180280107021)

lrwxrwxrwx 1 yarn yarn 122 Feb 1 15:25 job.jar -> /data/dn/yarn/nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/filecache/439
5983903529 068766/job.jar

-rw-r----- 1 yarn yarn 76430 Feb 1 15:25 job.xml

-rwx------ 1 yarn yarn 2898 Feb 1 15:25 launch_container.sh

drwx--x--- 2 yarn yarn 4096 Feb 1 15:25 tmp

Note: By setting this param, the above container launches scripts and user
cache will remain on the system for a specified period of time; otherwise
these files get deleted after application completes.
<property>

<name>yarn.nodemanager.delete.debug-delay-sec</name>

</property>

Resulting log file locations

During run time you will see all the container logs in the $
{yarn.nodemanager.log-dirs}
[root@hdw3 yarn]# find
userlogs/ -print userlogs/
userlogs/application_13893859
68629_0025

userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
1
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
1/stdout
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
1/stderr
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
1/syslog
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
2
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
2/stdout
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 28

Big data analysis practicals kaushal chavda(180280107021)

2/stderr
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
2/syslog

Once the job has completed the NodeManager will keep the log for each
container for
${yarn.nodemanager.log.retain-seconds} which is 10800 seconds by default ( 3
hours ) and delete them once they have expired. But if ${yarn.log-aggregation-
enable} is enabled then the NodeManager will immediately concatenate all of
the containers logs into one file and upload them into HDFS in $
{yarn.nodemanager.remote-app-log- dir}/${user.name}/logs/<application ID>
and delete them from the local userlogs directory. Log aggregation is enabled
by default in PHD and it makes log collection convenient.

Example when log aggregation is enabled. We know there were 4 containers

executed in this MapReduce job because "-m" specified 3 mappers and the
fourth container is the application master. Each NodeManager got at least one
container so all of them uploaded a log file.
[gpadmin@hdm1 ~]$ hdfs dfs -ls

/
yarn/apps/gpadmin/logs/application_13893859686
29_0025/ Found 3 items

-rw-r----- 3 gpadmin hadoop 4496 2014-02-01 16:54

/
yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw1.hadoop.local_308
25
-rw-r----- 3 gpadmin hadoop 5378 2014-02-01 16:54

/
yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw2.hadoop.local_36429

-rw-r----- 3 gpadmin hadoop 1877950 2014-02-01 16:54

/yarn/apps/gpadmin/logs/applica

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 29

Big data analysis practicals kaushal chavda(180280107021)

PRACTICAL-4
AIM: Run two different Datasets/Different size of Datasets on Hadoop and
Compare the Logs

Solution:
Step 1:

Adding Input
dataset files to
HDFS

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 30

Big data analysis practicals kaushal chavda(180280107021)

Step 2:

Running Program on Dataset 1.

Step 3:

Running Job
on Dataset 2

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 31

Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 32

Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 33

Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 34

Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 35

Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 36

Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 37

Big data analysis practicals kaushal chavda(180280107021)

Practical-5
Aim: Develop Map Reduce Application to sort a given file or do aggregation on
some parameter.
Solution:
Steps to run a Map Reduce Application
1. Create following files
A. SalesDriver.java
package com.kamlesh;
import
org.apache.hadoop.fs.Path
; import
org.apache.hadoop.io.*;
import
org.apache.hadoop.mapre
d.*; public class SalesDriver
{
public static void main(String[] args)
{
JobClient my_client = new JobClient();
JobConf job_conf = new JobConf(SalesDriver.class);
job_conf.setJobName("SalePerCountry");
job_conf.setOutputKeyClass(Text.class);
job_conf.setOutputValueClass(IntWritable.class);
job_conf.setMapperClass(SalesMapper.class);
job_conf.setReducerClass(SalesReducer.class);
job_conf.setInputFormat(TextInputFormat.class);
job_conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(job_conf, new Path(args[0]));
FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));
my_client.setConf(job_conf);
try
L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 38
Big data analysis practicals kaushal chavda(180280107021)

{
JobClient.runJob(job_conf);
}
catch (Exception e)
{
e.printStackTrace();
} } }

A. SalesMapper.java

package
com.kamlesh;
import
java.io.IOExcepti
on;
import
org.apache.hadoop.io.IntWritabl
e; import
org.apache.hadoop.io.LongWrita
ble; import
L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 39
Big data analysis practicals kaushal chavda(180280107021)

org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

public class SalesMapper extends MapReduceBase implements

Mapper<LongWritable, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException
{
String valueString = value.toString();
String[] SingleCountryData = valueString.split(",");
output.collect(new Text(SingleCountryData[7]), one);
}
}

A. SalesReducer.java

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 40

Big data analysis practicals kaushal chavda(180280107021)

package
com.kamlesh;
import
java.io.IOExcepti
on; import
java.util.*;
import
org.apache.hadoop.io.IntWrita
ble; import
org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapred.*;

public class SalesReducer extends MapReduceBase implements

Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text t_key, Iterator<IntWritable> values,
OutputCollector<Text,IntWritable> output, Reporter reporter) throws
IOException
{
Text key = t_key;
int
frequencyForCoun
try = 0; while
(values.hasNext())
{
IntWritable value = (IntWritable) values.next();
frequencyForCountry += value.get(); }
output.collect(key, new IntWritable(frequencyForCountry));
}
}

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 41

Big data analysis practicals kaushal chavda(180280107021)

1. Import the required jar files from the hadoop

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 42

Big data analysis practicals kaushal chavda(180280107021)

3.Export the jar file

First create the required jar file Hadoop_Aggregation.jar and then export
the jar file to the workspace.

4.Start Hadoop DFS daemons and Hadoop MapReduce and Yarn daemons

Start the Hadoop daemons using the commands $ start –dfs.sh and $ start –
yarn.sh

5.Copying files to Namenode FileSystem

After successfully starting the required daemons now we check the input files
using the command $ hdfs dfs –ls/user/kamlesh/input

Now the required file is not present so we copy the required file that is
SalesJan2009.csv using the command $ hdfs dfs –put
/home/kamlesh/Desktop/SalesJan2009.csv

/user/kamlesh/input

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 43

Big data analysis practicals kaushal chavda(180280107021)

6.Running MapReduce Application

Now run the aggregation mapreduce application. Here we use the command
$ hadoop jar /home/kamlesh/Desktop/Hadoop_Aggregation.jar
com.kamlesh.SalesDriver

/user/kamlesh/input/SalesJan2009.csv
/user/kamlesh/output/CountryAndProducts.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 44

Big data analysis practicals kaushal chavda(180280107021)

7.Getting Results

First check the names of resultant file using the command $ hdfs dfs –ls
/user/kamlesh/output.

Then copy the resultant file to the local file system using the command $ hdfs dfs
–get
/user/kamlesh/output/CountryAndProducts/part-0000

/home/kamlesh/Desktop/Results

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 45

Big data analysis practicals kaushal chavda(180280107021)

8.Output

Get the output using the command

$ hdfs dfs –cat /user/kamlesh/output/CountryAndProducts/part-00000

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 46

Big data analysis practicals kaushal chavda(180280107021)

9.Stop Daemons

Finally, after the output stop the Hadoop daemons like Hadoop dfs daemons
and Hadoop yarn daemons by using command $ stop –dfs.sh and $ stop –
yarn.sh

Conclusion:
In this practical we performed sorting and aggregation on the Data Set using
Map Reduce Application.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 47

Big data analysis practicals kaushal chavda(180280107021)

Practical-6

Aim: Download any two datasets from authenticate websites

Solution:
Website: Data Portal Of India

URL: https://data.gov.in/

Dataset 1: Agriculture Data

This dataset provides the information on agriculture produces; machineries,
research etc. Detailed information on the government policies, schemes,
agriculture loans, market prices, animal husbandry, fisheries, horticulture,
loans & credit, sericulture etc. is also available.
URL: https://kerala.data.gov.in/catalog/agriculture-departrment-pmkisan

Dataset 2: Weather
This dataset describes rainfall occurred during Hot-Weather Season by Districts
in Tamil Nadu 2016-17
URL: https://tn.data.gov.in/resources/rainfall-occurred-during-hot-
weather-season- districts-tamil-nadu-2016-17#web_catalog_tabs_block_10

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 48

Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 49

MS Azure DP-100
No ratings yet
MS Azure DP-100
56 pages
How To Use CapCut - The Complete Guide For Beginners
100% (1)
How To Use CapCut - The Complete Guide For Beginners
33 pages
Big Data & Analytics Lab Manual
No ratings yet
Big Data & Analytics Lab Manual
51 pages
Bda 1
No ratings yet
Bda 1
6 pages
Big Data Analytics Lab Manual (BE AI&DS)
No ratings yet
Big Data Analytics Lab Manual (BE AI&DS)
29 pages
Big Data Lab Manual Printout
No ratings yet
Big Data Lab Manual Printout
51 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
Bigdata Lab
No ratings yet
Bigdata Lab
55 pages
Big Data Analytics IT
No ratings yet
Big Data Analytics IT
55 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
Exp2 Hadoop
No ratings yet
Exp2 Hadoop
6 pages
BDA Practical
No ratings yet
BDA Practical
18 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
Bda Megh
No ratings yet
Bda Megh
50 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
58 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
32 pages
Rush
No ratings yet
Rush
90 pages
Data Science
No ratings yet
Data Science
82 pages
Bda Lab Manual - Cse 8 Sem - Compl
No ratings yet
Bda Lab Manual - Cse 8 Sem - Compl
57 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
Big Data Analysis 3170722 Lab Manual
No ratings yet
Big Data Analysis 3170722 Lab Manual
68 pages
Notes
No ratings yet
Notes
53 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
Dsbda Group B 1
No ratings yet
Dsbda Group B 1
5 pages
20dce017 Bda Pracfil
No ratings yet
20dce017 Bda Pracfil
41 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
DSBDA GRP B 1
No ratings yet
DSBDA GRP B 1
8 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
DSBDA GRP B 1
No ratings yet
DSBDA GRP B 1
8 pages
CP5261 Data Analytics Laboratory LTPC0042 Objectives
No ratings yet
CP5261 Data Analytics Laboratory LTPC0042 Objectives
80 pages
BDA Record
No ratings yet
BDA Record
58 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
BDA Lab Manual - Organized
No ratings yet
BDA Lab Manual - Organized
69 pages
BIG Data File
No ratings yet
BIG Data File
28 pages
Practical Final Ai
No ratings yet
Practical Final Ai
90 pages
CSE488 Lab01
No ratings yet
CSE488 Lab01
6 pages
BDA Practicalfile
No ratings yet
BDA Practicalfile
19 pages
BDA Manual
No ratings yet
BDA Manual
41 pages
Bda Exp1 Chinmay
No ratings yet
Bda Exp1 Chinmay
13 pages
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
No ratings yet
Lab Manual Big Data Analytics Lab (LC-CSE-410G) : Department of Computer Science and Engineering
28 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
Bda Lab
No ratings yet
Bda Lab
94 pages
Da Lab Record - Merged
No ratings yet
Da Lab Record - Merged
48 pages
Write A Mapreduce Program To Find Dept Wise Salary. Empno Empname Dept Salary
100% (1)
Write A Mapreduce Program To Find Dept Wise Salary. Empno Empname Dept Salary
5 pages
CCS334 BDA Lab Manual
No ratings yet
CCS334 BDA Lab Manual
35 pages
BDA Journal
No ratings yet
BDA Journal
52 pages
BDA Practical File
No ratings yet
BDA Practical File
61 pages
BDA Lab ManuaL
No ratings yet
BDA Lab ManuaL
83 pages
Bda Lab Manual 2024
No ratings yet
Bda Lab Manual 2024
45 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
MapReduce Merged
No ratings yet
MapReduce Merged
18 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
Bda Record 18071a0597-1
No ratings yet
Bda Record 18071a0597-1
28 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
42 pages
Big Data
No ratings yet
Big Data
28 pages
Big Data Journal
No ratings yet
Big Data Journal
50 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Fusion Amiga
100% (1)
Fusion Amiga
52 pages
Word Processor
No ratings yet
Word Processor
4 pages
Control Software v7.1.1 Release Notes: Foxboro Evo Process Automation System
No ratings yet
Control Software v7.1.1 Release Notes: Foxboro Evo Process Automation System
47 pages
Micro870 DNP3 Lab Featuring Connected Components Workbench v21 Software
100% (1)
Micro870 DNP3 Lab Featuring Connected Components Workbench v21 Software
46 pages
Best Font For Canadian Resume
100% (1)
Best Font For Canadian Resume
4 pages
Acer Veriton M Specsheet
No ratings yet
Acer Veriton M Specsheet
3 pages
QR Code Attendance Monitoring System With Sms Notification: Pamantsan NG Lungsod NG Marikina
No ratings yet
QR Code Attendance Monitoring System With Sms Notification: Pamantsan NG Lungsod NG Marikina
6 pages
DBxConnect User Guide
No ratings yet
DBxConnect User Guide
20 pages
Hyperdeck Studio HD Plus: Product Technical Specifications
No ratings yet
Hyperdeck Studio HD Plus: Product Technical Specifications
6 pages
Worksheet 2.12 TDD Test-Driven Development
No ratings yet
Worksheet 2.12 TDD Test-Driven Development
9 pages
National College Report Hassain Event
No ratings yet
National College Report Hassain Event
28 pages
Excel Notes
100% (1)
Excel Notes
45 pages
How To Make Resume One Page
100% (1)
How To Make Resume One Page
4 pages
(KMS) - eKYC Solution - v1.2
No ratings yet
(KMS) - eKYC Solution - v1.2
34 pages
Lenovo de Series Storage Best Practices With Veeam Backup Replication
No ratings yet
Lenovo de Series Storage Best Practices With Veeam Backup Replication
45 pages
Enviando Logiciel E Link Complet
No ratings yet
Enviando Logiciel E Link Complet
265 pages
SH 080368 Ed
No ratings yet
SH 080368 Ed
64 pages
IoT Based Handy Fuel Flow Measurement
No ratings yet
IoT Based Handy Fuel Flow Measurement
4 pages
CRT Learning Module: Course Code Course Title No. of Hours Module Title
No ratings yet
CRT Learning Module: Course Code Course Title No. of Hours Module Title
15 pages
ProNest 2021 Quick Start Guide
No ratings yet
ProNest 2021 Quick Start Guide
24 pages
BCA II Year Java Practical Solution
No ratings yet
BCA II Year Java Practical Solution
20 pages
Presentation Computer Engineer
No ratings yet
Presentation Computer Engineer
9 pages
Chat GPT
100% (1)
Chat GPT
105 pages
EEE4120
No ratings yet
EEE4120
14 pages
XTelc Sample B1 002
No ratings yet
XTelc Sample B1 002
12 pages
Chapter-8 (Memory Management)
No ratings yet
Chapter-8 (Memory Management)
42 pages
Key+Points+Paper+2 +CS+Revision
No ratings yet
Key+Points+Paper+2 +CS+Revision
77 pages
Os Case Study
No ratings yet
Os Case Study
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.