0% found this document useful (0 votes)
121 views49 pages

Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution

The document describes setting up a single node Hadoop cluster and running a word count program on 250MB of data. It involves installing Java and Hadoop, setting environment variables and Hadoop configurations, formatting the namenode, and running the word count MapReduce job pre-installed with Hadoop on a 250MB text file, outputting the word count results.

Uploaded by

All in one
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
121 views49 pages

Practical-1: Aim:-Make A Single Node Cluster in Hadoop. Solution

The document describes setting up a single node Hadoop cluster and running a word count program on 250MB of data. It involves installing Java and Hadoop, setting environment variables and Hadoop configurations, formatting the namenode, and running the word count MapReduce job pre-installed with Hadoop on a 250MB text file, outputting the word count results.

Uploaded by

All in one
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Big data analysis practicals kaushal chavda(180280107021)

Practical-1
Aim :- Make a single node cluster in Hadoop.

Solution:-
To install Hadoop, you should have Java version 1.8 in your system.
Check your java version through this command on command prompt

java –version

After downloading java version 1.8, download hadoop version 3.2.1.

Extract it to a folder.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 1


Big data analysis practicals kaushal chavda(180280107021)

Setup System Environment Variables

Create a new user variable. Put the Variable_name as HADOOP_HOME and


Variable_value as the path of the bin folder where you extracted hadoop.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 2


Big data analysis practicals kaushal chavda(180280107021)

Likewise, create a new user variable with variable name as JAVA_HOME and variable
value as the path of the bin folder in the Java directory.

Now we need to set Hadoop bin directory and Java bin directory path in system variable
path.

Edit Path in system variable

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 3


Big data analysis practicals kaushal chavda(180280107021)

Click on New and add the bin directory path of Hadoop and Java in it.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 4


Big data analysis practicals kaushal chavda(180280107021)

Configurations

Now we need to edit some files located in the hadoop directory of the etc folder where we
installed hadoop. The files that need to be edited have been highlighted.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 5


Big data analysis practicals kaushal chavda(180280107021)

Edit the file core-site.xml in the hadoop directory. Copy this xml property in the configuration
in the file.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 6


Big data analysis practicals kaushal chavda(180280107021)

Edit mapred-site.xml and copy this property in the configuration.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 7


Big data analysis practicals kaushal chavda(180280107021)

Edit the file hdfs-site.xml and add below property in the configuration.

Edit the file yarn-site.xml and add below property in the configuration.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 8


Big data analysis practicals kaushal chavda(180280107021)

Create a folder ‘data’ in the hadoop directory.

Create a folder with the name dfs inside data and inside that create ‘datanode’ and
‘namenode’.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 9


Big data analysis practicals kaushal chavda(180280107021)

Hadoop needs windows OS specific files which does not come with default download of
hadoop.you can download it from the github.

Check whether hadoop is successfully installed by running this command on cmd-

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 10


Big data analysis practicals kaushal chavda(180280107021)

hadoop version

Since it doesn’t throw error and successfully shows the hadoop version, that means hadoop
is successfully installed in the system.

Format the NameNode


 
Formatting the NameNode is done once when hadoop is installed and not for running
hadoop filesystem, else it will delete all the data inside HDFS. Run this command-

hdfs namenode –format

It would appear something like this –

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 11


Big data analysis practicals kaushal chavda(180280107021)

Access Hadoop Services in Browser


Hadoop Name Node started on the port 9870.
L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 12
Big data analysis practicals kaushal chavda(180280107021)

Cluster

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 13


Big data analysis practicals kaushal chavda(180280107021)

Namenode information:

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 14


Big data analysis practicals kaushal chavda(180280107021)

Data node information:

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 15


Big data analysis practicals kaushal chavda(180280107021)

Conclusion:

Thus, in this Practical we learnt about installing and setting up the Hadoop framework and
making a single node cluster in Hadoop.

PRACTICAL-2
AIM:- Run Word count program in Hadoop with 250 MB size of Data Set.

SOLUTION:

we'll run wordcount MapReduce job available


in %HADOOP_HOME%\share\hadoop\mapreduce\hadoop-mapreduce-examples-
2.2.0.jar

1. Create a text file with some content. We'll pass this file as input to
the wordcount MapReduce job for counting words.

C:\file1.txt:- This is the file with 250 mb size of the data which we will use for
word count problem.

For example assume that the content of the file is,

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 16


Big data analysis practicals kaushal chavda(180280107021)

Install Hadoop

Run Hadoop Wordcount Mapreduce Example

1. Create a directory (say 'input') in HDFS to keep all the text files (say 'file1.txt')
to be used for counting words.
C:\Users\abhijitg>cd c:\hadoop
C:\hadoop>bin\hdfs dfs -mkdir input

2. Copy the text file(say 'file1.txt') from local disk to the newly created 'input'
directory in HDFS.
C:\hadoop>bin\hdfs dfs -copyFromLocal c:/file1.txt input

3. Check content of the copied file.

C:\hadoop>hdfs dfs -ls input


Found 1 items
-rw-r--r-- 1 ABHIJITG supergroup 55 2014-02-03 13:19
input/file1.txt

C:\hadoop>bin\hdfs dfs -cat input/file1.txt


Install Hadoop
Run Hadoop Wordcount Mapreduce Example

4. Run the wordcount MapReduce job provided in %HADOOP_HOME


%\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.2.0.jar
C:\hadoop>bin\yarn jar share/hadoop/mapreduce/hadoop-mapreduce-
examples-2.2.0.jar wordcount input output
14/02/03 13:22:02 INFO client.RMProxy: Connecting to ResourceManager
at /0.0.0.0:8032
14/02/03 13:22:03 INFO input.FileInputFormat: Total input paths to
process : 1
14/02/03 13:22:03 INFO mapreduce.JobSubmitter: number of splits:1
:
:
14/02/03 13:22:04 INFO mapreduce.JobSubmitter: Submitting tokens for
job: job_1391412385921_0002

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 17


Big data analysis practicals kaushal chavda(180280107021)

14/02/03 13:22:04 INFO impl.YarnClientImpl: Submitted application


application_1391412385921_0002 to ResourceManager at /0.0.0.0:8032
14/02/03 13:22:04 INFO mapreduce.Job: The url to track the job:
http://ABHIJITG:8088/proxy/application_1391412385921_0002/
14/02/03 13:22:04 INFO mapreduce.Job: Running job:
job_1391412385921_0002
14/02/03 13:22:14 INFO mapreduce.Job: Job job_1391412385921_0002
running in uber mode : false
14/02/03 13:22:14 INFO mapreduce.Job: map 0% reduce 0%
14/02/03 13:22:22 INFO mapreduce.Job: map 100% reduce 0%
14/02/03 13:22:30 INFO mapreduce.Job: map 100% reduce 100%
14/02/03 13:22:30 INFO mapreduce.Job: Job job_1391412385921_0002
completed successfully
14/02/03 13:22:31 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=89
FILE: Number of bytes written=160142
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=171
HDFS: Number of bytes written=59
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots
(ms)=5657
Total time spent by all reduces in occupied slots
(ms)=6128
Map-Reduce Framework
Map input records=2
Map output records=7
Map output bytes=82
Map output materialized bytes=89
Input split bytes=116
Combine input records=7
Combine output records=6
Reduce input groups=6
Reduce shuffle bytes=89
Reduce input records=6
Reduce output records=6
Spilled Records=12
Shuffled Maps =1
L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 18
Big data analysis practicals kaushal chavda(180280107021)

Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=145
CPU time spent (ms)=1418
Physical memory (bytes) snapshot=368246784
Virtual memory (bytes) snapshot=513716224
Total committed heap usage (bytes)=307757056
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=55
File Output Format Counters
Bytes Written=59

5. Check output.
C:\hadoop>bin\hdfs dfs -cat output/*
Example 1
Hadoop 2
Install 1
Mapreduce 1
Run 1
Wordcount 1

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 19


Big data analysis practicals kaushal chavda(180280107021)

Practical-3
Aim: Understand the logs generated by MapReduce program.

Solution :
Job Client

Job Client is used by the user to facilitate execution of the MapReduce job.
When a user writes a MapReduce job they will typically invoke job client in
their main class to configure and launch the job. In this example, we will be
using SleepJob to cause mapper tasks to sleep for an extended period of time

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 20


Big data analysis practicals kaushal chavda(180280107021)

so we can see how the NodeManager is executing the jobs.


Job job = createJob(numMapper, numReducer,
mapSleepTime, mapSleepCount, reduceSleepTime,
reduceSleepCount); return
job.waitForCompletion(true) ? 0 : 1;

The function call job.WaitForCompletion will first create


/user/gpadmin/.staging directory if it does not exist and create job.xml
job.<timestamp>.conf.xml containing all the Hadoop params used to execute
the job.

It also uploads "hadoop-mapreduce-client-jobclient.jar" jar file used in the


Hadoop jar command into this directory renaming it to job.jar. "job.jar" will
then be used by all the containers to execute the MapReduce job.
Note: .staging directory will be created under the path /user/${username}. In
this article, gpadmin is the user.
[gpadmin@hdw1 ~]$ hdfs dfs -ls
/user/gpadmin/.staging/job_1389385968629_0025 Found 7 items

-rw-r--r-- 3 gpadmin hadoop 7 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/appTokens

-rw-r--r-- 10 gpadmin hadoop 1383034 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/job.jar

-rw-r--r-- 10 gpadmin hadoop 151 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/job.split

-rw-r--r-- 3 gpadmin hadoop 19 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/job.splitmetainfo

-rw-r--r-- 3 gpadmin hadoop 64874 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/job.xml

-rw-r--r-- 3 gpadmin hadoop 0 2014-02-01 15:28

/user/gpadmin/.staging/job_1389385968629_0025/job_1389385968629_0025_1.jhist

-rw-r--r-- 3 gpadmin hadoop 75335 2014-02-01 15:28

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 21


Big data analysis practicals kaushal chavda(180280107021)

/
user/gpadmin/.staging/job_1389385968629_0025/job_1389385968629_0025_1_conf.xm
l

After .staging is created, job client will submit the job to the resource manager
service (application manager port 8032). Then job client will continue to
monitor the execution of the job and report back to the console with the
progress of the map and reduce containers. That is why you see the "map 5%
reduce 0%" while the job is running. Once the job completes, job client will
return some statistics about the job that it collected during execution.
Remember that job client gets map and reduce container statuses from the
Application Master directly. We will talk a bit more about that later but for
now here is an example of running the sleep job, so it hangs for a really long
time while we observe the map containers execute.
[gpadmin@hdm1 ~]$ hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-
mapreduce-client- jobclient.jar sleep -m 3 -r 1 -mt 6000000

14/02/01 15:27:59 INFO service.AbstractService:


Service:org.apache.hadoop.yarn.client.YarnClientImpl is
inited. 14/02/01 15:27:59 INFO service.AbstractService:
Service:org.apache.hadoop.yarn.client.YarnClientImpl is
started. 14/02/01 15:28:00 INFO mapreduce.JobSubmitter:
number of splits:3

14/02/01 15:28:00 INFO mapreduce.JobSubmitter: Submitting


tokens for job: job_1389385968629_0025

14/02/01 15:28:01 INFO client.YarnClientImpl: Submitted


application application_1389385968629_0025 to
ResourceManager at hdm1.hadoop.local/192.168.2.101:8032

14/02/01 15:28:01 INFO mapreduce.Job: The url to track the job:


http://hdm1.hadoop.local:8088/proxy/application_1389385968629_0
025/ 14/02/01 15:28:01 INFO mapreduce.Job: Running job:
job_1389385968629_0025

14/02/01 15:28:12 INFO mapreduce.Job: Job job_1389385968629_0025


running in uber mode : false

14/02/01 15:28:12 INFO mapreduce.Job: map 0% reduce 0%

14/02/01 15:29:06 INFO mapreduce.Job: map 1% reduce 0%

14/02/01 15:30:37 INFO mapreduce.Job: map 2% reduce 0%

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 22


Big data analysis practicals kaushal chavda(180280107021)

14/02/01 15:32:08 INFO mapreduce.Job: map 3% reduce 0%

14/02/01 15:33:38 INFO mapreduce.Job: map 4% reduce 0%

Note: You can kill the MapReduce job using the following command:
[root@hdw3 yarn]# yarn application -kill
application_1389385968629_0025 Output:

14/02/01 16:53:30 INFO client.YarnClientImpl: Killing


application application_1389385968629_0025

14/02/01 16:53:30 INFO service.AbstractService:


Service:org.apache.hadoop.yarn.client.YarnClientImpl is stopped.

Application Master

Once the application manager service has decided to the start running the job,
it then chooses one of the NodeManagers to launch the MapReduce
application master class, which is called
org.apache.hadoop.mapreduce.v2.app.MRAppMaster.
The application master service will be launched on one of the NodeManager
servers running in the environment. The NodeManager selected by the
resource manager is largely dependent on the available resources within the
cluster. The Node manager service will generate shell scripts in the local
application cache, which are used to execute the application master
container.

Here we see application master server container directory located in the


NodeManagers ${yarn.nodemanager.local-dirs} defined in yarn-site.xml
nm-local-
dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/jo
bSubmitDir nm-
local-

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 23


Big data analysis practicals kaushal chavda(180280107021)

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/jobSubmitDir
/job.split nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/jobSubmitDir
/appTokens nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/jobSubmitDir/job.spl
itmetainfo nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_0000
01/job.xml
nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/.default_container_execu
tor.sh.crc nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/launch_co
ntainer.sh nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_
000001/tm
p nm-
local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/.container_
tokens.crc nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 24


Big data analysis practicals kaushal chavda(180280107021)

_0025_01_000001/contai
ner_tokens nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_0000
01/job.jar
nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/default_container_e
xecutor.sh nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/.launch_container.sh.crc

The container executer class running in the NodeManager service will then use
launch_container.sh to execute the Application Master class. As per below you
can see all logs for stdout and stderr are getting redirected to $
{yarn.nodemanager.log-dirs} defined in yarn-site.xml
[gpadmin@hdw3 yarn]# tail -1 nm-local-
dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000001/launch_container.sh

exec /bin/bash -c "$JAVA_HOME/bin/java -Dlog4j.configuration=container-


log4j.properties -
Dyarn.app.mapreduce.container.log.dir=/data/dn/yarn/userlogs/application_1389
38596862 9_0025/container_1389385968629_0025_01_000001 -
Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
-Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
1>/data/dn/yarn/userlogs/application_1389385968629_0025/container_13893859686
29_0025_ 01_000001/stdout
2>/data/dn/yarn/userlogs/application_1389385968629_0025/container_13893859686

29_0025_ 01_000001/stderr "

Once launched the Application Master will issue resource allocation requests
for the map and reduce containers in the queue to the ResourceManager
service. When the resource manager determines that there are enough
resources on the cluster to grant the allocation request, it will inform the
L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 25
Big data analysis practicals kaushal chavda(180280107021)

application master, which NodeManager service is available to execute the


container. The application master will send a request to the NodeManager to
launch the container.
Map or Reduce Container

The container executer class in the NodeManager will do the same for a map or
reduce container as it did with the Application Master class. All files and shell
scripts will be added into the containers application class within the nm-local-
dir
nm-local-
dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003

nm-local-
dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_0000
03/job.xml
nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/.default_container_execu
tor.sh.crc nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/launch_co
ntainer.sh nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_
000003/tm
p nm-
local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 26


Big data analysis practicals kaushal chavda(180280107021)

_0025_01_000003/.j
ob.xml.crc nm-
local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/.container_
tokens.crc nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/contai
ner_tokens nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_0000
03/job.jar
nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/default_container_e
xecutor.sh nm-local-

dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/.launch_container.sh.crc

Note: job.jar is only a soft link that points to the actual job.jar in the
applications filecache directory. This is how yarn handles distributed cache for
containers:
[root@hdw1 yarn]# ls -l nm-local-
dir/usercache/gpadmin/appcache/application_1389385968629_0025/container_13893
85968629

_0025_01_000003/

total 96

-rw-r--r-- 1 yarn yarn 108 Feb 1 15:25 container_tokens

-rwx------ 1 yarn yarn 450 Feb 1 15:25 default_container_executor.sh

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 27


Big data analysis practicals kaushal chavda(180280107021)

lrwxrwxrwx 1 yarn yarn 122 Feb 1 15:25 job.jar -> /data/dn/yarn/nm-local-


dir/usercache/gpadmin/appcache/application_1389385968629_0025/filecache/439
5983903529 068766/job.jar

-rw-r----- 1 yarn yarn 76430 Feb 1 15:25 job.xml

-rwx------ 1 yarn yarn 2898 Feb 1 15:25 launch_container.sh

drwx--x--- 2 yarn yarn 4096 Feb 1 15:25 tmp

Note: By setting this param, the above container launches scripts and user
cache will remain on the system for a specified period of time; otherwise
these files get deleted after application completes.
<property>

<name>yarn.nodemanager.delete.debug-delay-sec</name>

<value>10000000</value>

</property>

Resulting log file locations

During run time you will see all the container logs in the $
{yarn.nodemanager.log-dirs}
[root@hdw3 yarn]# find
userlogs/ -print userlogs/
userlogs/application_13893859
68629_0025

userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
1
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
1/stdout
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
1/stderr
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
1/syslog
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
2
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
2/stdout
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 28


Big data analysis practicals kaushal chavda(180280107021)

2/stderr
userlogs/application_1389385968629_0025/container_1389385968629_0025_01_00000
2/syslog

Once the job has completed the NodeManager will keep the log for each
container for
${yarn.nodemanager.log.retain-seconds} which is 10800 seconds by default ( 3
hours ) and delete them once they have expired. But if ${yarn.log-aggregation-
enable} is enabled then the NodeManager will immediately concatenate all of
the containers logs into one file and upload them into HDFS in $
{yarn.nodemanager.remote-app-log- dir}/${user.name}/logs/<application ID>
and delete them from the local userlogs directory. Log aggregation is enabled
by default in PHD and it makes log collection convenient.

Example when log aggregation is enabled. We know there were 4 containers


executed in this MapReduce job because "-m" specified 3 mappers and the
fourth container is the application master. Each NodeManager got at least one
container so all of them uploaded a log file.
[gpadmin@hdm1 ~]$ hdfs dfs -ls

/
yarn/apps/gpadmin/logs/application_13893859686
29_0025/ Found 3 items

-rw-r----- 3 gpadmin hadoop 4496 2014-02-01 16:54

/
yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw1.hadoop.local_308
25
-rw-r----- 3 gpadmin hadoop 5378 2014-02-01 16:54

/
yarn/apps/gpadmin/logs/application_1389385968629_0025/hdw2.hadoop.local_36429

-rw-r----- 3 gpadmin hadoop 1877950 2014-02-01 16:54

/yarn/apps/gpadmin/logs/applica

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 29


Big data analysis practicals kaushal chavda(180280107021)

PRACTICAL-4
AIM: Run two different Datasets/Different size of Datasets on Hadoop and
Compare the Logs

Solution:
Step 1:

Adding Input
dataset files to
HDFS

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 30


Big data analysis practicals kaushal chavda(180280107021)

Step 2:

Running Program on Dataset 1.

Step 3:

Running Job
on Dataset 2

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 31


Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 32


Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 33


Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 34


Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 35


Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 36


Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 37


Big data analysis practicals kaushal chavda(180280107021)

Practical-5
Aim: Develop Map Reduce Application to sort a given file or do aggregation on
some parameter.
Solution:
Steps to run a Map Reduce Application
1. Create following files
A. SalesDriver.java
package com.kamlesh;
import
org.apache.hadoop.fs.Path
; import
org.apache.hadoop.io.*;
import
org.apache.hadoop.mapre
d.*; public class SalesDriver
{
public static void main(String[] args)
{
JobClient my_client = new JobClient();
JobConf job_conf = new JobConf(SalesDriver.class);
job_conf.setJobName("SalePerCountry");
job_conf.setOutputKeyClass(Text.class);
job_conf.setOutputValueClass(IntWritable.class);
job_conf.setMapperClass(SalesMapper.class);
job_conf.setReducerClass(SalesReducer.class);
job_conf.setInputFormat(TextInputFormat.class);
job_conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(job_conf, new Path(args[0]));
FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));
my_client.setConf(job_conf);
try
L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 38
Big data analysis practicals kaushal chavda(180280107021)

{
JobClient.runJob(job_conf);
}
catch (Exception e)
{
e.printStackTrace();
} } }

A. SalesMapper.java

package
com.kamlesh;
import
java.io.IOExcepti
on;
import
org.apache.hadoop.io.IntWritabl
e; import
org.apache.hadoop.io.LongWrita
ble; import
L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 39
Big data analysis practicals kaushal chavda(180280107021)

org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;

public class SalesMapper extends MapReduceBase implements


Mapper<LongWritable, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException
{
String valueString = value.toString();
String[] SingleCountryData = valueString.split(",");
output.collect(new Text(SingleCountryData[7]), one);
}
}

A. SalesReducer.java

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 40


Big data analysis practicals kaushal chavda(180280107021)

package
com.kamlesh;
import
java.io.IOExcepti
on; import
java.util.*;
import
org.apache.hadoop.io.IntWrita
ble; import
org.apache.hadoop.io.Text;
import
org.apache.hadoop.mapred.*;

public class SalesReducer extends MapReduceBase implements


Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text t_key, Iterator<IntWritable> values,
OutputCollector<Text,IntWritable> output, Reporter reporter) throws
IOException
{
Text key = t_key;
int
frequencyForCoun
try = 0; while
(values.hasNext())
{
IntWritable value = (IntWritable) values.next();
frequencyForCountry += value.get(); }
output.collect(key, new IntWritable(frequencyForCountry));
}
}

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 41


Big data analysis practicals kaushal chavda(180280107021)

1. Import the required jar files from the hadoop

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 42


Big data analysis practicals kaushal chavda(180280107021)

3.Export the jar file

First create the required jar file Hadoop_Aggregation.jar and then export
the jar file to the workspace.

4.Start Hadoop DFS daemons and Hadoop MapReduce and Yarn daemons

Start the Hadoop daemons using the commands $ start –dfs.sh and $ start –
yarn.sh

5.Copying files to Namenode FileSystem

After successfully starting the required daemons now we check the input files
using the command $ hdfs dfs –ls/user/kamlesh/input

Now the required file is not present so we copy the required file that is
SalesJan2009.csv using the command $ hdfs dfs –put
/home/kamlesh/Desktop/SalesJan2009.csv

/user/kamlesh/input

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 43


Big data analysis practicals kaushal chavda(180280107021)

6.Running MapReduce Application

Now run the aggregation mapreduce application. Here we use the command
$ hadoop jar /home/kamlesh/Desktop/Hadoop_Aggregation.jar
com.kamlesh.SalesDriver

/user/kamlesh/input/SalesJan2009.csv
/user/kamlesh/output/CountryAndProducts.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 44


Big data analysis practicals kaushal chavda(180280107021)

7.Getting Results

First check the names of resultant file using the command $ hdfs dfs –ls
/user/kamlesh/output.

Then copy the resultant file to the local file system using the command $ hdfs dfs
–get
/user/kamlesh/output/CountryAndProducts/part-0000

/home/kamlesh/Desktop/Results

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 45


Big data analysis practicals kaushal chavda(180280107021)

8.Output

Get the output using the command


$ hdfs dfs –cat /user/kamlesh/output/CountryAndProducts/part-00000

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 46


Big data analysis practicals kaushal chavda(180280107021)

9.Stop Daemons

Finally, after the output stop the Hadoop daemons like Hadoop dfs daemons
and Hadoop yarn daemons by using command $ stop –dfs.sh and $ stop –
yarn.sh

Conclusion:
In this practical we performed sorting and aggregation on the Data Set using
Map Reduce Application.

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 47


Big data analysis practicals kaushal chavda(180280107021)

Practical-6

Aim: Download any two datasets from authenticate websites

Solution:
Website: Data Portal Of India

URL: https://data.gov.in/

Dataset 1: Agriculture Data


This dataset provides the information on agriculture produces; machineries,
research etc. Detailed information on the government policies, schemes,
agriculture loans, market prices, animal husbandry, fisheries, horticulture,
loans & credit, sericulture etc. is also available.
URL: https://kerala.data.gov.in/catalog/agriculture-departrment-pmkisan

Dataset 2: Weather
This dataset describes rainfall occurred during Hot-Weather Season by Districts
in Tamil Nadu 2016-17
URL: https://tn.data.gov.in/resources/rainfall-occurred-during-hot-
weather-season- districts-tamil-nadu-2016-17#web_catalog_tabs_block_10

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 48


Big data analysis practicals kaushal chavda(180280107021)

L.D COLLEGE OF ENGINEERING(COMPUTER DEPARTMENT) 49

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy