0% found this document useful (0 votes)

143 views20 pages

DSBDSAssingment 11

The document discusses how to design and implement a distributed application using MapReduce to process a log file. It describes the key components of MapReduce including the Mapper, Reducer, and Driver classes and their roles in processing log file data. It also provides steps to install Hadoop on Linux and configure the environment.

Uploaded by

403 Chaudhari Sanika Sagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

143 views20 pages

DSBDSAssingment 11

Uploaded by

403 Chaudhari Sanika Sagar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Assignment 11

Title:

To design a distributed application using MapReduce which processes a log file of a

system.

Problem Statement:
Design a distributed application using MapReduce which processes a log file of a
system.

Objective:
By completing this task, students will learn the following
1. Hadoop Distributed File System.
2. MapReduce Framework.

Software/Hardware Requirements:
64-bit Open source OS-Linux, Java, Java Development Kit (JDK), Hadoop.

Theory:

Hadoop : Hadoop is an open-source distributed computing framework designed to

handle and process large volumes of data across clusters of commodity hardware. It
was inspired by Google's MapReduce and Google File System (GFS) papers and is
written in Java. Apache Hadoop provides a scalable, reliable, and distributed
computing environment for processing and analyzing big data.
Map Reduce: MapReduce is a programming model and framework for processing
and generating large datasets in a distributed and parallel manner. It consists of two
main phases: the Map phase, where input data is divided into smaller chunks and
processed independently, and the Reduce phase, where the results from the Map
phase are aggregated and combined to produce the final output.
Mapper Class: The Mapper class is a crucial component in a MapReduce job,
responsible for processing each input record and generating intermediate key-value
pairs.
In the context of processing a log file, the Mapper class parses each log entry and
extracts relevant information.
The typical steps involved in implementing a Mapper class for log file processing
include:
1. Input Parsing: Read each line of the log file.
2. Data Extraction: Extract relevant information from each log entry, such as
timestamps, error codes, or other data points of interest.
3. Data Transformation: Convert the extracted information into key-value pairs.
For example, if the goal is to analyze error frequencies, the Mapper might emit
<error_code, 1> pairs for each occurrence of an error code in the log entry.
4. Output Emission: Emit the key-value pairs to the MapReduce framework for
further processing.
The Mapper class typically extends the Mapper class provided by the MapReduce
framework and overrides the map() method to define custom logic for processing
input records.
Reducer Class: The Reducer class is another crucial component in a MapReduce job,
responsible for aggregating and processing the intermediate key-value pairs
generated by the Mapper class.
In the context of processing a log file, the Reducer class receives key-value pairs
where the key represents a unique identifier (e.g., an error code) and the value
represents the count of occurrences.
The typical steps involved in implementing a Reducer class for log file processing
include:
1. Input Aggregation: Receive key-value pairs grouped by key from the
MapReduce framework.
2. Data Aggregation: Aggregate the counts of occurrences for each unique key.
3. Output Generation: Produce the final output, which may include aggregated
statistics, summaries, or any other desired analysis results.
The Reducer class typically extends the Reducer class provided by the MapReduce
framework and overrides the reduce() method to define custom logic for aggregating
intermediate results.
Driver Class : The Driver class in a MapReduce application is responsible for
configuring the job, setting up input and output paths, specifying mapper and
reducer classes, and submitting the job for execution. Here's a breakdown of the key
components typically found in a Driver class:
1. Configuration Setup: In the Driver class, you initialize a Hadoop configuration
object (Configuration) which holds various settings and parameters for the
MapReduce job. This includes properties such as input/output paths, mapper
and reducer classes, and any other job-specific configurations.
2. Job Initialization: Using the configuration object, you create a Job object (Job),
which represents the entire MapReduce job to be executed. This involves
specifying the name of the job, setting input/output formats, and configuring
the mapper and reducer classes.
3. Input and Output Paths: Specify the input and output paths for the job. This
tells Hadoop where to find the input data (e.g., log file) and where to write the
output of the job.
4. Mapper and Reducer Classes: Set the mapper and reducer classes to be used
in the MapReduce job. This involves specifying the Java classes that
implement the Mapper and Reducer interfaces and defining the logic for data
processing and aggregation.
5. InputFormat and OutputFormat: Configure the input and output formats for
the job, which define how input data is read and how output data is written.
Hadoop provides default input and output formats, but you can also use
custom formats if needed.
6. Output Key-Value Types: Specify the types of keys and values that the mapper
and reducer classes will emit. These types should match the output types of
the mapper and reducer classes.
7. Job Submission: Finally, submit the MapReduce job to the Hadoop or
MapReduce framework for execution. This involves calling the
job.waitForCompletion() method, which initiates the job execution and waits
for it to complete.
Log file : A log file is a file that records events, actions, or messages that occur within
a software application, operating system, or system component. These files are
commonly used for troubleshooting, debugging, monitoring, auditing, and analysis
purposes. Here's some key information about log files:
Log files serve various purposes, including:
Recording system events: Log files often record events such as system startups,
shutdowns, errors, warnings, and informational messages.
Debugging: Developers use log files to debug software by analyzing logs to identify
and fix issues.
Monitoring and performance analysis: System administrators use log files to monitor
system health, performance, and resource usage.
Auditing and compliance: Log files are sometimes used to track user activities for
auditing and compliance purposes.
Log files can be stored in various formats, including plain text, XML, JSON, and
structured formats. The choice of format depends on the logging framework or
application generating the logs and the requirements of downstream analysis tools.

Steps to install hadoop (Linux) :

Step 1 : Install Java Development Kit

The default Ubuntu repositories contain Java 8 and Java 11 both. I am using Java 8
because hive only works on this version.Use the following command to install it.
sudo apt update && sudo apt install openjdk-8-jdk

Step 2 : Verify the Java version :

Once you have successfully installed it, check the current Java version:
java -version

Step 3 : Install SSH :

SSH (Secure Shell) installation is vital for Hadoop as it enables secure communication
between nodes in the Hadoop cluster. This ensures data integrity, confidentiality,
and allows for efficient distributed processing of data across the cluster.
sudo apt install ssh

Step 4 : Create the hadoop user :

All the Hadoop components will run as the user that you create for Apache Hadoop,
and the user will also be used for logging in to Hadoop’s web interface.
Run the command to create user and set password :
sudo adduser hadoop

Step 5 : Switch user :

Switch to the newly created hadoop user:
su - hadoop

Step 6 : Configure SSH :

Now configure password-less SSH access for the newly created hadoop user, so I
didn’t enter key to save file and passpharse. Generate an SSH keypair first:
ssh-keygen -t rsa
Step 7 : Set permissions :
Copy the generated public key to the authorized key file and set the proper
permissions:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 640 ~/.ssh/authorized_keys

Step 8 : SSH to the localhost

ssh localhost

You will be asked to authenticate hosts by adding RSA keys to known hosts. Type yes
and hit Enter to authenticate the localhost.

Step 9 : Switch user

Again switch to hadoop
su - hadoop

Step 10 : Install hadoop

Download hadoop 3.3.6
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

Once you’ve downloaded the file, you can unzip it to a folder.

tar -xvzf hadoop-3.3.6.tar.gz

Rename the extracted folder to remove version information. This is an optional step,
but if you don’t want to rename, then adjust the remaining configuration paths.
mv hadoop-3.3.6 hadoop

Next, you will need to configure Hadoop and Java Environment Variables on your
system. Open the ~/.bashrc file in your favorite text editor.Here I am using nano
editior , to pasting the code we use ctrl+shift+v for saving the file ctrl+x and
ctrl+y ,then hit enter:
nano ~/.bashrc

Append the below lines to the file.

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
Load the above configuration in the current environment.
source ~/.bashrc

You also need to configure JAVA_HOME in hadoop-env.sh file. Edit the Hadoop
environment variable file in the text editor:
nano $HADOOP_HOME/etc/hadoop/hadoop-env.sh

Search for the “export JAVA_HOME” and configure it .

JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
Step 11 : Configuring Hadoop :
First, you will need to create the namenode and datanode directories inside the
Hadoop user home directory. Run the following command to create both directories:
cd hadoop/

mkdir -p ~/hadoopdata/hdfs/{namenode,datanode}

Next, edit the core-site.xml file and update with your system hostname:
nano $HADOOP_HOME/etc/hadoop/core-site.xml

Change the following name as per your system hostname:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Save and close the file.
Then, edit the hdfs-site.xml file:
nano $HADOOP_HOME/etc/hadoop/hdfs-site.xml

Change the NameNode and DataNode directory paths as shown below:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
Then, edit the mapred-site.xml file:
nano $HADOOP_HOME/etc/hadoop/mapred-site.xml

Make the following changes:

<configuration>
<property>
<name>yarn.app.mapreduce.am.env</name>

<value>HADOOP_MAPRED_HOME=$HADOOP_HOME/home/hadoop/hadoop/bin/h
adoop</value>
</property>
<property>
<name>mapreduce.map.env</name>

<value>HADOOP_MAPRED_HOME=$HADOOP_HOME/home/hadoop/hadoop/bin/h
adoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>

<value>HADOOP_MAPRED_HOME=$HADOOP_HOME/home/hadoop/hadoop/bin/h
adoop</value>
</property>
</configuration>

Then, edit the yarn-site.xml file:

nano $HADOOP_HOME/etc/hadoop/yarn-site.xml

Make the following changes:

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Save the file and close it .

Step 12 : Start Hadoop cluster:

Before starting the Hadoop cluster. You will need to format the Namenode as a
hadoop user.
Run the following command to format the Hadoop Namenode:
hdfs namenode -format

Once the namenode directory is successfully formatted with hdfs file system, you
will see the message “Storage directory
/home/hadoop/hadoopdata/hdfs/namenode has been successfully formatted”.
Then start the Hadoop cluster with the following command.
start-all.sh

You can now check the status of all Hadoop services using the jps command:
jps
To check that all the Hadoop services are up and running, run the below command.

Step 1)jps

Step 2) cd

Step 3) sudo mkdir mapreduce_bhargavi

Step 4) sudo chmod 777 -R mapreduce_bhargavi/

Step 5) sudo chown -R bhargavi mapreduce_bhargavi/

Step 6) sudo cp /home/bhargavi/Desktop/logfiles1/* ~/bhargavi/

Step 7) cd mapreduce_bhargavi/

Step 8) ls

Step 9) sudo chmod +r .

Step 10) export CLASSPATH="/home/bhargavi/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-

mapreduce-client-core-3.3.6.jar:/home/bhargavi/hadoop-
3.3.6/share/hadoop/mapreduce/hadoop-mapreduce-client-common-
3.3.6.jar:/home/bhargavi/hadoop-3.3.6/share/hadoop/common/hadoop-common-
3.3.6.jar:~/mapreduce_bhargavi/SalesCountry/*:$HADOOP_HOME/lib/*"

Step 11) javac -d . SalesMapper.java SalesCountryReducer.java SalesCountryDriver.java

Step 12) ls
Step 13) cd SalesCountry/

Step 14) ls (check is class files are created)

Step 15) cd ..

Step 16) gedit Manifest.txt

(add following lines to it:
Main-Class: SalesCountry.SalesCountryDriver)

Step 17) jar -cfm mapreduce_vijay.jar Manifest.txt SalesCountry/*.class

Step 18) ls

Step 19) cd

Step 20) cd mapreduce_bhargavi/

Step 21) sudo mkdir /input200

Step 22) sudo cp access_log_short.csv /input200

Step 23) $HADOOP_HOME/bin/hdfs dfs -put /input200 /

Step 24) $HADOOP_HOME/bin/hadoop jar mapreduce_bhargavi.jar /input200 /output200

Step 25) hadoop fs -ls /output200

Step 26) hadoop fs -cat /out321/part-00000

Step 27) Now open the Mozilla browser and go to localhost:50070/dfshealth.html to check the
NameNode interface.

Java Code to process logfile

Mapper Class:
package SalesCountry;
import java.io.IOException;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.*;

public class SalesMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,

IntWritable> {

private final static IntWritable one = new IntWritable(1);

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter

reporter) throws IOException {

String valueString = value.toString();

String[] SingleCountryData = valueString.split("-");

output.collect(new Text(SingleCountryData[0]), one);

Reducer Class:

package SalesCountry;

import java.io.IOException;

import java.util.*;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapred.*;
public class SalesCountryReducer extends MapReduceBase implements Reducer<Text, IntWritable,

Text, IntWritable> {

public void reduce(Text t_key, Iterator<IntWritable> values, OutputCollector<Text,IntWritable>

output, Reporter reporter) throws IOException {

Text key = t_key;

int frequencyForCountry = 0;

while (values.hasNext()) {

// replace type of value with the actual type of our value

IntWritable value = (IntWritable) values.next();

frequencyForCountry += value.get();

output.collect(key, new IntWritable(frequencyForCountry));

Driver Class:

package SalesCountry;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

public class SalesCountryDriver {

public static void main(String[] args) {

JobClient my_client = new JobClient();

// Create a configuration object for the job

JobConf job_conf = new JobConf(SalesCountryDriver.class);

// Set a name of the Job

job_conf.setJobName("SalePerCountry");

// Specify data type of output key and value

job_conf.setOutputKeyClass(Text.class);

job_conf.setOutputValueClass(IntWritable.class);

// Specify names of Mapper and Reducer Class

job_conf.setMapperClass(SalesCountry.SalesMapper.class);

job_conf.setReducerClass(SalesCountry.SalesCountryReducer.class);

// Specify formats of the data type of Input and output

job_conf.setInputFormat(TextInputFormat.class);

job_conf.setOutputFormat(TextOutputFormat.class);

// Set input and output directories using command line arguments,

//arg[0] = name of input directory on HDFS, and arg[1] = name of output directory to be created to

store the output file.

FileInputFormat.setInputPaths(job_conf, new Path(args[0]));

FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));

my_client.setConf(job_conf);

try {

// Run the job

JobClient.runJob(job_conf);

} catch (Exception e) {

e.printStackTrace();
}

Note : The paths and directory names will change according to your folder. Change the names and

paths accordingly.

Conclusion :
In this assignment we learnt how to process a log file using Hadoop frame work on distributed
system.

Ieee Paper Resume Builder
No ratings yet
Ieee Paper Resume Builder
3 pages
Hive Installation On Windows 10
No ratings yet
Hive Installation On Windows 10
13 pages
Mean Stack Technologies-Module II - Angular JS, Mongodb
No ratings yet
Mean Stack Technologies-Module II - Angular JS, Mongodb
6 pages
Robotic Process Automation - Ccs361 QB
No ratings yet
Robotic Process Automation - Ccs361 QB
5 pages
Fundamentals-of-Computer-and-IT-BCA Notes (Unit1, Unit2, Unit3 and Unit4)
No ratings yet
Fundamentals-of-Computer-and-IT-BCA Notes (Unit1, Unit2, Unit3 and Unit4)
187 pages
Automated Policy Based Management
100% (1)
Automated Policy Based Management
5 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
Big Data Analytics - Unit 4
No ratings yet
Big Data Analytics - Unit 4
32 pages
IBM - PBL Program 2025
No ratings yet
IBM - PBL Program 2025
2 pages
CCS334 BDA Lab Manual Final
No ratings yet
CCS334 BDA Lab Manual Final
40 pages
Age Detection Using Machine
No ratings yet
Age Detection Using Machine
11 pages
Manual Hadoop HIve Installation
No ratings yet
Manual Hadoop HIve Installation
4 pages
Rev2 Service Manual CRANEX3D Eng
100% (5)
Rev2 Service Manual CRANEX3D Eng
176 pages
Class Interface: Diffrence New Api Old Api
No ratings yet
Class Interface: Diffrence New Api Old Api
5 pages
Docker Lab Manual Aditya Nair
No ratings yet
Docker Lab Manual Aditya Nair
20 pages
CCS336 Cloud Services Management Apr May 2024 Question Paper Download
No ratings yet
CCS336 Cloud Services Management Apr May 2024 Question Paper Download
3 pages
UNIT-1 Introduction To Android
No ratings yet
UNIT-1 Introduction To Android
38 pages
DSBDA Practical Final
No ratings yet
DSBDA Practical Final
49 pages
CSDF
No ratings yet
CSDF
39 pages
ccs341 Data Warehouse Lab Experiments
No ratings yet
ccs341 Data Warehouse Lab Experiments
26 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
SPM
No ratings yet
SPM
83 pages
Cassandra Hadoop Integration
No ratings yet
Cassandra Hadoop Integration
2 pages
Blockchain Unit 2
No ratings yet
Blockchain Unit 2
16 pages
KodNestCSR2025 JAMRoundPrepMaterial
No ratings yet
KodNestCSR2025 JAMRoundPrepMaterial
20 pages
Summer Internship Report On: Aws Data Engineering (Topic)
No ratings yet
Summer Internship Report On: Aws Data Engineering (Topic)
21 pages
Srs On Railway Reservation
63% (8)
Srs On Railway Reservation
21 pages
AI Lab Manual
No ratings yet
AI Lab Manual
37 pages
Case Study DS-BDA
No ratings yet
Case Study DS-BDA
29 pages
Full Stack Internship Report
No ratings yet
Full Stack Internship Report
26 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
Unit 4
No ratings yet
Unit 4
40 pages
Final Year Project
0% (1)
Final Year Project
15 pages
DAN Lab ManuaL
No ratings yet
DAN Lab ManuaL
53 pages
RPA Unit 4 DC
No ratings yet
RPA Unit 4 DC
34 pages
Jhansika Burri Java Resume
No ratings yet
Jhansika Burri Java Resume
8 pages
B.tech Viii Bda Chapter 3
No ratings yet
B.tech Viii Bda Chapter 3
21 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
18 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Hadoop ppt@87
No ratings yet
Hadoop ppt@87
16 pages
Unit 4 BDA
No ratings yet
Unit 4 BDA
31 pages
B.SC (CS) Real Syllabus
No ratings yet
B.SC (CS) Real Syllabus
75 pages
BD - Unit - IV - Hive and Pig
No ratings yet
BD - Unit - IV - Hive and Pig
41 pages
Java - Lab - Manual-21csl35 - Skit
No ratings yet
Java - Lab - Manual-21csl35 - Skit
30 pages
Sample Technical Seminar Vtu
No ratings yet
Sample Technical Seminar Vtu
14 pages
Anuj Saklani's Resume
No ratings yet
Anuj Saklani's Resume
1 page
Krithickgowtham P
No ratings yet
Krithickgowtham P
2 pages
Big Data Analytics Unit 4
No ratings yet
Big Data Analytics Unit 4
83 pages
Hadoop and Related Tools
No ratings yet
Hadoop and Related Tools
57 pages
Apache Hadoop Yarn Architecture PDF
No ratings yet
Apache Hadoop Yarn Architecture PDF
3 pages
Rev 1 Module2 PLC
100% (2)
Rev 1 Module2 PLC
293 pages
AJ - Lab Manual
No ratings yet
AJ - Lab Manual
97 pages
JNTUH Mobile Application Development Syllabi
No ratings yet
JNTUH Mobile Application Development Syllabi
2 pages
5th Sem Summer Training Report
No ratings yet
5th Sem Summer Training Report
34 pages
Lecture 1 - Introduction To Parallel Computing
0% (1)
Lecture 1 - Introduction To Parallel Computing
32 pages
MCQ Type Questions
No ratings yet
MCQ Type Questions
24 pages
OCI Architect Associate Flash Cards
No ratings yet
OCI Architect Associate Flash Cards
19 pages
Industrial Extreme Programming: Submitted By: Group 3 Submitted To
No ratings yet
Industrial Extreme Programming: Submitted By: Group 3 Submitted To
7 pages
AFT Impulse 8 Data Sheet
No ratings yet
AFT Impulse 8 Data Sheet
2 pages
Digital Signal Processing: M.Sivakumar
100% (1)
Digital Signal Processing: M.Sivakumar
44 pages
Job Recommender Java Spring Boot
No ratings yet
Job Recommender Java Spring Boot
21 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
Business Continuity Specialist Exam
No ratings yet
Business Continuity Specialist Exam
45 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Computer Interface Design: Dr. Ghassan Abu Samhadana
No ratings yet
Computer Interface Design: Dr. Ghassan Abu Samhadana
37 pages
Natural General Intelligence How Understanding The Brain Can Help Us Build Ai 1nbsped 0192843885 9780192843883 Compress
No ratings yet
Natural General Intelligence How Understanding The Brain Can Help Us Build Ai 1nbsped 0192843885 9780192843883 Compress
341 pages
Thanh Machine Learning Daylight Analysis
No ratings yet
Thanh Machine Learning Daylight Analysis
19 pages
SMMO 2017-2023 Problems
No ratings yet
SMMO 2017-2023 Problems
32 pages
Quadratic Py Qs
No ratings yet
Quadratic Py Qs
8 pages
GreenDelta Bottle Tutorial 1.10
No ratings yet
GreenDelta Bottle Tutorial 1.10
50 pages
Risa3dtutorial32024 1737985583983
No ratings yet
Risa3dtutorial32024 1737985583983
11 pages
Citra Log
No ratings yet
Citra Log
7 pages
Perception-Desktop-CX-GAM&D-TM-003 - Software Release Note - MD - March 2024
No ratings yet
Perception-Desktop-CX-GAM&D-TM-003 - Software Release Note - MD - March 2024
35 pages
2024-03-06
No ratings yet
2024-03-06
17 pages
qw5618 EN
No ratings yet
qw5618 EN
21 pages
HashMap HashSet LeetCode Questions
No ratings yet
HashMap HashSet LeetCode Questions
2 pages
SAP Note 26317 - Set Up For LOGON Group For Autom. Load Balancing
No ratings yet
SAP Note 26317 - Set Up For LOGON Group For Autom. Load Balancing
4 pages
Anomaly-Based IDS To Detect Attack Using Various...
No ratings yet
Anomaly-Based IDS To Detect Attack Using Various...
5 pages
Etherchannel: By. Eng. Ayman Boghdady
No ratings yet
Etherchannel: By. Eng. Ayman Boghdady
19 pages
EHEIM ControlCenter User Manual GB 07-2011
No ratings yet
EHEIM ControlCenter User Manual GB 07-2011
26 pages
ROLL NO. 00135 Maha Arbab: Job Preparation - Repeated Mcqs Test
No ratings yet
ROLL NO. 00135 Maha Arbab: Job Preparation - Repeated Mcqs Test
3 pages
Detalhes Do Pedido - Dell Brasil
No ratings yet
Detalhes Do Pedido - Dell Brasil
2 pages
n670x Series Datasheet
No ratings yet
n670x Series Datasheet
3 pages
5G in Military Usage
No ratings yet
5G in Military Usage
1 page
Abstract 618 Letter
No ratings yet
Abstract 618 Letter
2 pages
1.3 Python As A Calculator
100% (1)
1.3 Python As A Calculator
2 pages
DT2485 - DT-BUS Data Logger
No ratings yet
DT2485 - DT-BUS Data Logger
2 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DSBDSAssingment 11

Uploaded by

DSBDSAssingment 11

Uploaded by

Assignment 11

To design a distributed application using MapReduce which processes a log file of a

Hadoop : Hadoop is an open-source distributed computing framework designed to

Steps to install hadoop (Linux) :

Step 1 : Install Java Development Kit

Step 2 : Verify the Java version :

Step 3 : Install SSH :

Step 4 : Create the hadoop user :

Step 5 : Switch user :

Step 6 : Configure SSH :

Step 8 : SSH to the localhost

Step 9 : Switch user

Step 10 : Install hadoop

Once you’ve downloaded the file, you can unzip it to a folder.

Append the below lines to the file.

Search for the “export JAVA_HOME” and configure it .

Change the following name as per your system hostname:

Change the NameNode and DataNode directory paths as shown below:

Make the following changes:

Then, edit the yarn-site.xml file:

Make the following changes:

Step 12 : Start Hadoop cluster:

Step 3) sudo mkdir mapreduce_bhargavi

Step 4) sudo chmod 777 -R mapreduce_bhargavi/

Step 5) sudo chown -R bhargavi mapreduce_bhargavi/

Step 6) sudo cp /home/bhargavi/Desktop/logfiles1/* ~/bhargavi/

Step 9) sudo chmod +r *.*

Step 10) export CLASSPATH="/home/bhargavi/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-

Step 11) javac -d . SalesMapper.java SalesCountryReducer.java SalesCountryDriver.java

Step 14) ls (check is class files are created)

Step 16) gedit Manifest.txt

Step 17) jar -cfm mapreduce_vijay.jar Manifest.txt SalesCountry/*.class

Step 20) cd mapreduce_bhargavi/

Step 21) sudo mkdir /input200

Step 23) $HADOOP_HOME/bin/hdfs dfs -put /input200 /

Step 24) $HADOOP_HOME/bin/hadoop jar mapreduce_bhargavi.jar /input200 /output200

Step 25) hadoop fs -ls /output200

Step 26) hadoop fs -cat /out321/part-00000

Java Code to process logfile

public class SalesMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text,

private final static IntWritable one = new IntWritable(1);

reporter) throws IOException {

String valueString = value.toString();

String[] SingleCountryData = valueString.split("-");

output.collect(new Text(SingleCountryData[0]), one);

public void reduce(Text t_key, Iterator<IntWritable> values, OutputCollector<Text,IntWritable>

output, Reporter reporter) throws IOException {

Text key = t_key;

// replace type of value with the actual type of our value

IntWritable value = (IntWritable) values.next();

output.collect(key, new IntWritable(frequencyForCountry));

public class SalesCountryDriver {

public static void main(String[] args) {

JobClient my_client = new JobClient();

// Create a configuration object for the job

JobConf job_conf = new JobConf(SalesCountryDriver.class);

// Specify data type of output key and value

// Specify names of Mapper and Reducer Class

// Specify formats of the data type of Input and output

// Set input and output directories using command line arguments,

store the output file.

FileInputFormat.setInputPaths(job_conf, new Path(args[0]));

FileOutputFormat.setOutputPath(job_conf, new Path(args[1]));

// Run the job

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Step 9) sudo chmod +r .