DSBDSAssingment 11
DSBDSAssingment 11
Title:
Problem Statement:
Design a distributed application using MapReduce which processes a log file of a
system.
Objective:
By completing this task, students will learn the following
1. Hadoop Distributed File System.
2. MapReduce Framework.
Software/Hardware Requirements:
64-bit Open source OS-Linux, Java, Java Development Kit (JDK), Hadoop.
Theory:
You will be asked to authenticate hosts by adding RSA keys to known hosts. Type yes
and hit Enter to authenticate the localhost.
Rename the extracted folder to remove version information. This is an optional step,
but if you don’t want to rename, then adjust the remaining configuration paths.
mv hadoop-3.3.6 hadoop
Next, you will need to configure Hadoop and Java Environment Variables on your
system. Open the ~/.bashrc file in your favorite text editor.Here I am using nano
editior , to pasting the code we use ctrl+shift+v for saving the file ctrl+x and
ctrl+y ,then hit enter:
nano ~/.bashrc
You also need to configure JAVA_HOME in hadoop-env.sh file. Edit the Hadoop
environment variable file in the text editor:
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh
mkdir -p ~/hadoopdata/hdfs/{namenode,datanode}
Next, edit the core-site.xml file and update with your system hostname:
nano $HADOOP_HOME/etc/hadoop/core-site.xml
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME/home/hadoop/hadoop/bin/h
adoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME/home/hadoop/hadoop/bin/h
adoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME/home/hadoop/hadoop/bin/h
adoop</value>
</property>
</configuration>
Once the namenode directory is successfully formatted with hdfs file system, you
will see the message “Storage directory
/home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted”.
Then start the Hadoop cluster with the following command.
start-all.sh
You can now check the status of all Hadoop services using the jps command:
jps
To check that all the Hadoop services are up and running, run the below command.
Step 1)jps
Step 2) cd
Step 7) cd mapreduce_bhargavi/
Step 8) ls
Step 12) ls
Step 13) cd SalesCountry/
Step 18) ls
Step 19) cd
Step 27) Now open the Mozilla browser and go to localhost:50070/dfshealth.html to check the
NameNode interface.
Mapper Class:
package SalesCountry;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
IntWritable> {
public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter
Reducer Class:
package SalesCountry;
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
public class SalesCountryReducer extends MapReduceBase implements Reducer<Text, IntWritable,
Text, IntWritable> {
int frequencyForCountry = 0;
while (values.hasNext()) {
frequencyForCountry += value.get();
Driver Class:
package SalesCountry;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
job_conf.setJobName("SalePerCountry");
job_conf.setOutputKeyClass(Text.class);
job_conf.setOutputValueClass(IntWritable.class);
job_conf.setMapperClass(SalesCountry.SalesMapper.class);
job_conf.setReducerClass(SalesCountry.SalesCountryReducer.class);
job_conf.setInputFormat(TextInputFormat.class);
job_conf.setOutputFormat(TextOutputFormat.class);
//arg[0] = name of input directory on HDFS, and arg[1] = name of output directory to be created to
my_client.setConf(job_conf);
try {
JobClient.runJob(job_conf);
} catch (Exception e) {
e.printStackTrace();
}
Note : The paths and directory names will change according to your folder. Change the names and
paths accordingly.
Conclusion :
In this assignment we learnt how to process a log file using Hadoop frame work on distributed
system.