0% found this document useful (0 votes)

48 views6 pages

Group B PR 3 DSBDA

This document provides a guide on how to analyze weather data using Hadoop with Java/Scala, including steps for installing Hadoop without HDFS and setting up the environment. It details the implementation of a MapReduce job with three main components: WeatherMapper, WeatherReducer, and WeatherAnalysisDriver, which processes a weather dataset to calculate average temperature, dew point, and wind speed. Additionally, it includes instructions for compiling the Java files, creating a JAR file, and running the job in both HDFS and local modes.

Uploaded by

ayeshagujrati00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views6 pages

Group B PR 3 DSBDA

Uploaded by

ayeshagujrati00

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Group B- Big Data Analytics – JAVA/SCALA (Any three)

Locate dataset (e.g., sample_weather.txt) for working on weather data which reads the
text input files and finds average for temperature, dew point and wind speed.

How to Install Hadoop without HDFS:

Step 1: Install Java (JDK)

sudo apt update

sudo apt install openjdk-11-jdk -y
java -version

Step 2: Download Hadoop Zip File

wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -xzf hadoop-3.3.6.tar.gz
mv hadoop-3.3.6 hadoop

Step 3:

🔧 Set Environment Variables

Edit ~/.bashrc:

nano ~/.bashrc

Add these at the end:

export HADOOP_HOME=~/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

Apply the changes:

source ~/.bashrc

hadoop version

WeatherAnalysis/
├── WeatherMapper.java
├── WeatherReducer.java
├── WeatherAnalysisDriver.java
├── sample_weather.txt

📁 WeatherMapper.java
WeatherMapper.java
/*

● Input: Each line of the weather CSV file (as a Text)

● Output: Key-value pairs like:

○ ("Temperature", 28.5)

○ ("DewPoint", 20.0)

○ ("WindSpeed", 5.6)

🔁 Logic:
1. Skip header (if (!line.contains("Date")))

2. Split line using comma , to extract:

○ fields[1] → Temperature

○ fields[2] → Dew Point

○ fields[3] → Wind Speed

3. Parse and emit the three metrics as key-value pairs.

import java.io.IOException;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WeatherMapper extends Mapper<LongWritable, Text, Text, FloatWritable> {

private Text metric = new Text();
private FloatWritable valueOut = new FloatWritable();

public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
String line = value.toString();
if (!line.contains("Date")) {
String[] fields = line.split(",");
if (fields.length >= 4) {
try {
float temperature = Float.parseFloat(fields[1]);
float dewPoint = Float.parseFloat(fields[2]);
float windSpeed = Float.parseFloat(fields[3]);

metric.set("Temperature");
valueOut.set(temperature);
context.write(metric, valueOut);

metric.set("DewPoint");
valueOut.set(dewPoint);
context.write(metric, valueOut);

metric.set("WindSpeed");
valueOut.set(windSpeed);
context.write(metric, valueOut);
} catch (NumberFormatException e) {
// Ignore malformed lines
}
}
}
}
}

WeatherReducer.java
/*Calculate average for each metric.

🔧 Code Breakdown:
● Input: For each key like "Temperature", it gets all values (e.g., all temperature readings).

● Output: A single key-value like:

○ ("Temperature", 26.2)

🔁 Logic:
● Sum all float values.

● Count them.

● Emit the average:

*/
import java.io.IOException;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WeatherReducer extends Reducer<Text, FloatWritable, Text, FloatWritable> {

private FloatWritable result = new FloatWritable();

public void reduce(Text key, Iterable<FloatWritable> values, Context context) throws

IOException, InterruptedException {
float sum = 0;
int count = 0;

for (FloatWritable val : values) {

sum += val.get();
count++;
}

result.set(sum / count);
context.write(key, result);
}
}

WeatherAnalysisDriver.java
/*

Component Purpose

WeatherMappe Emits metric names with their values

WeatherReduc Computes average for each metric

Driver Class Runs and configures the MapReduce job

*/
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WeatherAnalysisDriver {

public static void main(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: WeatherAnalysisDriver <input path> <output path>");
System.exit(-1);
}

Configuration conf = new Configuration();

Job job = Job.getInstance(conf, "Weather Data Average");

job.setJarByClass(WeatherAnalysisDriver.class);
job.setMapperClass(WeatherMapper.class);
job.setReducerClass(WeatherReducer.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(FloatWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}

sample_weather.txt
Date,Temperature,DewPoint,WindSpeed
2024-04-01,28.5,20.0,5.6
2024-04-02,30.0,22.5,6.0
2024-04-03,29.0,21.0,5.8
2024-04-04,27.5,19.5,4.9
2024-04-05,31.0,23.0,6.3
2024-04-06,26.0,18.5,4.5

javac -classpath `hadoop classpath` -d . WeatherMapper.java WeatherReducer.java

WeatherAnalysisDriver.java
jar -cvf weather-analysis.jar *.class

Comment Setup Input in HDFS

hdfs dfs -mkdir -p /weather/input

hdfs dfs -put sample_weather.txt /weather/input/

hadoop jar weather-analysis.jar WeatherAnalysisDriver /weather/input /weather/output

See Output
hdfs dfs -cat /weather/output/part-r-00000

✅ Step-by-Step Fix
🟡 1. Make sure you're in the correct folder
You should be inside the folder where your .java files are located.

Run this to confirm:

javac -classpath `hadoop classpath` -d . WeatherMapper.java WeatherReducer.java

WeatherAnalysisDriver.java

3. Then create JAR file

Now you can run:

jar -cvf weather-analysis.jar *.class

✅ Option 1: You're Using Hadoop Local/Standalone

Mode (most likely)
In local mode, you don’t need to use hdfs dfs. You can simply use your local file system
paths.

🔁 Update: Run the Job Using Local File System Paths

Just run your job like this:

hadoop jar weather-analysis.jar WeatherAnalysisDriver sample_weather.txt output

cat output/part-r-00000

SAP Rollout - SD Module Configuration
100% (8)
SAP Rollout - SD Module Configuration
15 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Map Reduce 1
No ratings yet
Map Reduce 1
8 pages
Hadoop Weather
No ratings yet
Hadoop Weather
4 pages
22MCC20017 Suraj Kumar Thakur BIG Data 2.2
No ratings yet
22MCC20017 Suraj Kumar Thakur BIG Data 2.2
5 pages
DA Lab Program-3
No ratings yet
DA Lab Program-3
9 pages
MR Progs For Self Excercise
No ratings yet
MR Progs For Self Excercise
14 pages
Practical 2-2
No ratings yet
Practical 2-2
9 pages
Map Reduce
No ratings yet
Map Reduce
46 pages
AP20110010464
No ratings yet
AP20110010464
7 pages
Mcsl26 See QP Solution 2024
No ratings yet
Mcsl26 See QP Solution 2024
33 pages
BDA4
No ratings yet
BDA4
7 pages
Worksheet 6th
No ratings yet
Worksheet 6th
6 pages
Bda Unit 3
No ratings yet
Bda Unit 3
22 pages
Document 6
No ratings yet
Document 6
15 pages
Exp 3 4
No ratings yet
Exp 3 4
7 pages
MapReduce and Yarn
No ratings yet
MapReduce and Yarn
39 pages
cl3 Exp 09
No ratings yet
cl3 Exp 09
4 pages
Unit-Iii: A Weather Dataset
No ratings yet
Unit-Iii: A Weather Dataset
12 pages
Bda Material Unit 3
No ratings yet
Bda Material Unit 3
14 pages
CSF443 Lab-Report Nimish Shandilya 1000016934
No ratings yet
CSF443 Lab-Report Nimish Shandilya 1000016934
17 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
20 pages
104 Da11-13
No ratings yet
104 Da11-13
14 pages
Short Programs
No ratings yet
Short Programs
41 pages
Weather Analysis
No ratings yet
Weather Analysis
3 pages
Map Reduce
No ratings yet
Map Reduce
15 pages
Unit IV BDA
No ratings yet
Unit IV BDA
32 pages
Sets Bda
No ratings yet
Sets Bda
19 pages
ADBMS-Module 3
No ratings yet
ADBMS-Module 3
115 pages
Cloud LAB 10.1,11.1,12.1
No ratings yet
Cloud LAB 10.1,11.1,12.1
6 pages
Unit 4 Handouts
No ratings yet
Unit 4 Handouts
13 pages
Lab Manual
No ratings yet
Lab Manual
86 pages
Unit III EBDP 2022
No ratings yet
Unit III EBDP 2022
77 pages
Analyzing Data With Hadoop
No ratings yet
Analyzing Data With Hadoop
54 pages
Big Data Fundamentals and Platforms Assginment 3
No ratings yet
Big Data Fundamentals and Platforms Assginment 3
6 pages
Using Map Reduce Concept, Implement A Java Pro...
No ratings yet
Using Map Reduce Concept, Implement A Java Pro...
2 pages
Hadoop Big Data Unit 3
No ratings yet
Hadoop Big Data Unit 3
22 pages
Big Data Record
No ratings yet
Big Data Record
14 pages
Week 1 Hadoop and Hdfs Commands
No ratings yet
Week 1 Hadoop and Hdfs Commands
1 page
MapReduce Hands On
No ratings yet
MapReduce Hands On
28 pages
Hadoop
No ratings yet
Hadoop
19 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
42 pages
Unit Iii LM
No ratings yet
Unit Iii LM
14 pages
Data Science
No ratings yet
Data Science
82 pages
Tutorial Partitioner
No ratings yet
Tutorial Partitioner
8 pages
Analyzing The Data With Hadoop
No ratings yet
Analyzing The Data With Hadoop
13 pages
Big Data File
No ratings yet
Big Data File
16 pages
MapReduce Programs
No ratings yet
MapReduce Programs
10 pages
BDA
No ratings yet
BDA
19 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
MapReduce - Notes
No ratings yet
MapReduce - Notes
17 pages
Bda Lab Manual - Cse 8 Sem - Compl
No ratings yet
Bda Lab Manual - Cse 8 Sem - Compl
57 pages
Big Data Manual
No ratings yet
Big Data Manual
82 pages
Bda Lab
No ratings yet
Bda Lab
39 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
Bda Lab S
No ratings yet
Bda Lab S
92 pages
Palak
No ratings yet
Palak
10 pages
Big Data Record
No ratings yet
Big Data Record
13 pages
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
No ratings yet
KCC Institute of Technology and Management: Big Data and Analytics Lab File BCDS651
30 pages
Wrordcount
No ratings yet
Wrordcount
2 pages
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
Naughty 2
No ratings yet
Naughty 2
43 pages
DSR 2023 Vol 2 - Google Search
No ratings yet
DSR 2023 Vol 2 - Google Search
2 pages
JEDI Slides-Intro1-Chapter04-Programming Fundamentals
No ratings yet
JEDI Slides-Intro1-Chapter04-Programming Fundamentals
96 pages
BPM - Business Process Management: and Enactment
No ratings yet
BPM - Business Process Management: and Enactment
14 pages
MIC Unit 4 - Ultimate Microprocrssor 8086 Notes by Ur Engineering Friend
No ratings yet
MIC Unit 4 - Ultimate Microprocrssor 8086 Notes by Ur Engineering Friend
40 pages
LightRAG The Cross Breed of NavieRag and GraghRAG - by SUMITH - Oct, 2024 - Medium
No ratings yet
LightRAG The Cross Breed of NavieRag and GraghRAG - by SUMITH - Oct, 2024 - Medium
14 pages
Example Adam - Palhazi - Senior - Software - Engineer - CV
No ratings yet
Example Adam - Palhazi - Senior - Software - Engineer - CV
4 pages
Week 2 Technical Assessment Solution
No ratings yet
Week 2 Technical Assessment Solution
11 pages
LS Brochure CHIRplus FX
No ratings yet
LS Brochure CHIRplus FX
6 pages
(M5-MAIN) - Introduction To SQL
No ratings yet
(M5-MAIN) - Introduction To SQL
33 pages
Computer Fundamentals Lab Report PDF
No ratings yet
Computer Fundamentals Lab Report PDF
26 pages
Dev Practical List
No ratings yet
Dev Practical List
34 pages
Assignment Projects 14 DNA
No ratings yet
Assignment Projects 14 DNA
3 pages
Assignment 2 Front Sheet: Qualification TEC Level 5 HND Diploma in Computing
No ratings yet
Assignment 2 Front Sheet: Qualification TEC Level 5 HND Diploma in Computing
47 pages
Interrupts
No ratings yet
Interrupts
14 pages
Business Analyst-Project Coordinator - 7
No ratings yet
Business Analyst-Project Coordinator - 7
5 pages
Alternative Reconciliation GL Entry Process
No ratings yet
Alternative Reconciliation GL Entry Process
17 pages
AMCAT Requirements and Instructions
No ratings yet
AMCAT Requirements and Instructions
1 page
BGMI Software Testing PBL
No ratings yet
BGMI Software Testing PBL
13 pages
User Manual RedpackAPI English 1.0
No ratings yet
User Manual RedpackAPI English 1.0
23 pages
Guia para Uso de Tecplot
No ratings yet
Guia para Uso de Tecplot
20 pages
Ipfire - Localdomain - Intrusion Prevention System-192-382
No ratings yet
Ipfire - Localdomain - Intrusion Prevention System-192-382
191 pages
Faculty of Mechanical and Manufacturing Enginering: Lesson Plan
No ratings yet
Faculty of Mechanical and Manufacturing Enginering: Lesson Plan
18 pages
05 - Decision Table Testing
No ratings yet
05 - Decision Table Testing
14 pages
BCOL306 Design & Analysis of Algorithm: Course Objectives
No ratings yet
BCOL306 Design & Analysis of Algorithm: Course Objectives
44 pages
Microsoft Role-Based Certification Roadmap POSTER (March 2019)
No ratings yet
Microsoft Role-Based Certification Roadmap POSTER (March 2019)
1 page
Merge
No ratings yet
Merge
3 pages
Ideathon 2024
No ratings yet
Ideathon 2024
10 pages
TWP Deepseek Ai Prompts
No ratings yet
TWP Deepseek Ai Prompts
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Group B PR 3 DSBDA

Uploaded by

Group B PR 3 DSBDA

Uploaded by

Group B- Big Data Analytics – JAVA/SCALA (Any three)

How to Install Hadoop without HDFS:

Step 1: Install Java (JDK)

sudo apt update

Step 2: Download Hadoop Zip File

🔧 Set Environment Variables

Add these at the end:

Apply the changes:

●​ Input: Each line of the weather CSV file (as a Text)​

●​ Output: Key-value pairs like:​

2.​ Split line using comma , to extract:​

○​ fields[2] → Dew Point​

○​ fields[3] → Wind Speed​

3.​ Parse and emit the three metrics as key-value pairs.

public class WeatherMapper extends Mapper<LongWritable, Text, Text, FloatWritable> {

●​ Output: A single key-value like:​

●​ Emit the average:

public class WeatherReducer extends Reducer<Text, FloatWritable, Text, FloatWritable> {

​ public void reduce(Text key, Iterable<FloatWritable> values, Context context) throws

​ for (FloatWritable val : values) {

WeatherMappe Emits metric names with their values

WeatherReduc Computes average for each metric

Driver Class Runs and configures the MapReduce job

public class WeatherAnalysisDriver {

​ Configuration conf = new Configuration();

​ FileInputFormat.addInputPath(job, new Path(args[0]));

javac -classpath `hadoop classpath` -d . WeatherMapper.java WeatherReducer.java

Comment Setup Input in HDFS

hdfs dfs -mkdir -p /weather/input

hadoop jar weather-analysis.jar WeatherAnalysisDriver /weather/input /weather/output

Run this to confirm:

javac -classpath `hadoop classpath` -d . WeatherMapper.java WeatherReducer.java

3. Then create JAR file

Now you can run:

jar -cvf weather-analysis.jar *.class

✅ Option 1: You're Using Hadoop Local/Standalone

🔁 Update: Run the Job Using Local File System Paths

hadoop jar weather-analysis.jar WeatherAnalysisDriver sample_weather.txt output

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

● Input: Each line of the weather CSV file (as a Text)

● Output: Key-value pairs like:

2. Split line using comma , to extract:

○ fields[2] → Dew Point

○ fields[3] → Wind Speed

3. Parse and emit the three metrics as key-value pairs.

● Output: A single key-value like:

● Emit the average:

public void reduce(Text key, Iterable<FloatWritable> values, Context context) throws

for (FloatWritable val : values) {

Configuration conf = new Configuration();

FileInputFormat.addInputPath(job, new Path(args[0]));