0% found this document useful (0 votes)
48 views6 pages

Group B PR 3 DSBDA

This document provides a guide on how to analyze weather data using Hadoop with Java/Scala, including steps for installing Hadoop without HDFS and setting up the environment. It details the implementation of a MapReduce job with three main components: WeatherMapper, WeatherReducer, and WeatherAnalysisDriver, which processes a weather dataset to calculate average temperature, dew point, and wind speed. Additionally, it includes instructions for compiling the Java files, creating a JAR file, and running the job in both HDFS and local modes.

Uploaded by

ayeshagujrati00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views6 pages

Group B PR 3 DSBDA

This document provides a guide on how to analyze weather data using Hadoop with Java/Scala, including steps for installing Hadoop without HDFS and setting up the environment. It details the implementation of a MapReduce job with three main components: WeatherMapper, WeatherReducer, and WeatherAnalysisDriver, which processes a weather dataset to calculate average temperature, dew point, and wind speed. Additionally, it includes instructions for compiling the Java files, creating a JAR file, and running the job in both HDFS and local modes.

Uploaded by

ayeshagujrati00
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Group B- Big Data Analytics – JAVA/SCALA (Any three)

Locate dataset (e.g., sample_weather.txt) for working on weather data which reads the
text input files and finds average for temperature, dew point and wind speed.

How to Install Hadoop without HDFS:

Step 1: Install Java (JDK)

sudo apt update


sudo apt install openjdk-11-jdk -y
java -version

Step 2: Download Hadoop Zip File


wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
tar -xzf hadoop-3.3.6.tar.gz
mv hadoop-3.3.6 hadoop

Step 3:

🔧 Set Environment Variables


Edit ~/.bashrc:

nano ~/.bashrc

Add these at the end:

export HADOOP_HOME=~/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar

Apply the changes:


source ~/.bashrc

hadoop version

WeatherAnalysis/
├── WeatherMapper.java
├── WeatherReducer.java
├── WeatherAnalysisDriver.java
├── sample_weather.txt

📁 WeatherMapper.java
WeatherMapper.java
/*

●​ Input: Each line of the weather CSV file (as a Text)​

●​ Output: Key-value pairs like:​

○​ ("Temperature", 28.5)​

○​ ("DewPoint", 20.0)​

○​ ("WindSpeed", 5.6)​

🔁 Logic:
1.​ Skip header (if (!line.contains("Date")))​

2.​ Split line using comma , to extract:​

○​ fields[1] → Temperature​

○​ fields[2] → Dew Point​

○​ fields[3] → Wind Speed​

3.​ Parse and emit the three metrics as key-value pairs.

*/

import java.io.IOException;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WeatherMapper extends Mapper<LongWritable, Text, Text, FloatWritable> {


​ private Text metric = new Text();
​ private FloatWritable valueOut = new FloatWritable();

​ public void map(LongWritable key, Text value, Context context) throws IOException,
InterruptedException {
​ String line = value.toString();
​ if (!line.contains("Date")) {
​ String[] fields = line.split(",");
​ if (fields.length >= 4) {
​ try {
​ float temperature = Float.parseFloat(fields[1]);
​ float dewPoint = Float.parseFloat(fields[2]);
​ float windSpeed = Float.parseFloat(fields[3]);

​ metric.set("Temperature");
​ valueOut.set(temperature);
​ context.write(metric, valueOut);

​ metric.set("DewPoint");
​ valueOut.set(dewPoint);
​ context.write(metric, valueOut);

​ metric.set("WindSpeed");
​ valueOut.set(windSpeed);
​ context.write(metric, valueOut);
​ } catch (NumberFormatException e) {
​ // Ignore malformed lines
​ }
​ }
​ }
​ }
}

WeatherReducer.java
/*Calculate average for each metric.

🔧 Code Breakdown:
●​ Input: For each key like "Temperature", it gets all values (e.g., all temperature readings).​

●​ Output: A single key-value like:​

○​ ("Temperature", 26.2)​

🔁 Logic:
●​ Sum all float values.​

●​ Count them.​

●​ Emit the average:

*/
import java.io.IOException;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WeatherReducer extends Reducer<Text, FloatWritable, Text, FloatWritable> {


​ private FloatWritable result = new FloatWritable();

​ public void reduce(Text key, Iterable<FloatWritable> values, Context context) throws


IOException, InterruptedException {
​ float sum = 0;
​ int count = 0;

​ for (FloatWritable val : values) {


​ sum += val.get();
​ count++;
​ }

​ result.set(sum / count);
​ context.write(key, result);
​ }
}

WeatherAnalysisDriver.java
/*

Component Purpose

WeatherMappe Emits metric names with their values

WeatherReduc Computes average for each metric


er

Driver Class Runs and configures the MapReduce job

*/
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.FloatWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WeatherAnalysisDriver {


​ public static void main(String[] args) throws Exception {
​ if (args.length != 2) {
​ System.err.println("Usage: WeatherAnalysisDriver <input path> <output path>");
​ System.exit(-1);
​ }

​ Configuration conf = new Configuration();


​ Job job = Job.getInstance(conf, "Weather Data Average");

​ job.setJarByClass(WeatherAnalysisDriver.class);
​ job.setMapperClass(WeatherMapper.class);
​ job.setReducerClass(WeatherReducer.class);

​ job.setOutputKeyClass(Text.class);
​ job.setOutputValueClass(FloatWritable.class);

​ FileInputFormat.addInputPath(job, new Path(args[0]));


​ FileOutputFormat.setOutputPath(job, new Path(args[1]));

​ System.exit(job.waitForCompletion(true) ? 0 : 1);
​ }
}

sample_weather.txt
Date,Temperature,DewPoint,WindSpeed
2024-04-01,28.5,20.0,5.6
2024-04-02,30.0,22.5,6.0
2024-04-03,29.0,21.0,5.8
2024-04-04,27.5,19.5,4.9
2024-04-05,31.0,23.0,6.3
2024-04-06,26.0,18.5,4.5

javac -classpath `hadoop classpath` -d . WeatherMapper.java WeatherReducer.java


WeatherAnalysisDriver.java
jar -cvf weather-analysis.jar *.class

Comment Setup Input in HDFS

hdfs dfs -mkdir -p /weather/input


hdfs dfs -put sample_weather.txt /weather/input/

hadoop jar weather-analysis.jar WeatherAnalysisDriver /weather/input /weather/output


See Output
hdfs dfs -cat /weather/output/part-r-00000

✅ Step-by-Step Fix
🟡 1. Make sure you're in the correct folder
You should be inside the folder where your .java files are located.

Run this to confirm:

ls

javac -classpath `hadoop classpath` -d . WeatherMapper.java WeatherReducer.java


WeatherAnalysisDriver.java

3. Then create JAR file

Now you can run:

jar -cvf weather-analysis.jar *.class

✅ Option 1: You're Using Hadoop Local/Standalone


Mode (most likely)
In local mode, you don’t need to use hdfs dfs. You can simply use your local file system
paths.

🔁 Update: Run the Job Using Local File System Paths


Just run your job like this:

hadoop jar weather-analysis.jar WeatherAnalysisDriver sample_weather.txt output

cat output/part-r-00000

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy