0% found this document useful (0 votes)
449 views

Weather Data Analysis

This document analyzes weather data from India over 117 years using machine learning algorithms. It explores trends in monthly and seasonal temperatures and rainfall across 36 meteorological subdivisions. Clustering and regression analysis found that temperatures are increasing overall, indicating global warming, with maximum temperatures rising each year except 1920-1925. Rainfall is highest from June to September. Future work could extend these techniques to other large datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
449 views

Weather Data Analysis

This document analyzes weather data from India over 117 years using machine learning algorithms. It explores trends in monthly and seasonal temperatures and rainfall across 36 meteorological subdivisions. Clustering and regression analysis found that temperatures are increasing overall, indicating global warming, with maximum temperatures rising each year except 1920-1925. Rainfall is highest from June to September. Future work could extend these techniques to other large datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

THIAGARAJAR COLLEGE OF ENGINEERING,MADURAI - 625015

Department of Computer Science and Engineering


Weather Data Analysis
20C112 – Ajay Kumar | 20C134 – Sri Krishna M

ABSTRACT
India has a type of weather condition of various seasons and geographical
conditions. Country has extreme high temperatures at various state each
state change climate and heavy rainfall at Chennai. It requires higher
scientific techniques / method like machine learning algorithms
application for effective study and predications of weather condition using
history of weather data set
KEYWORD
Geographical conditions, Temperatures, weather, clustering, classification
BACKGROUND
In this section, weather forecasting is discussed, and the central topic of
big data is clarified. Lastly, the factors that have an impact on weather
forecast performance are described.

INTRODUCTION

This project, we have analyzed data of various regions, reference weather


data to analysis and implemented in our application and predict the rainfall
to achieve in particular region to report this is our goal of predicting the
weather. Most of the databases contain information that is accumulated
for years. These databases can become valuable information for analysts
who use the data to perform various operations on data. Analysis was
done on the weather data sets using machine learning algorithms

METHODOLOGY

The process of extracting patterns from large data sets by combining


methods from statistics and artificial intelligence with database
management. Data mining is seen as an increasingly important tool by
modern business to transform data into business intelligence giving an
informational advantage. It is currently used in a wide range of profiling
practices, such as marketing, surveillance, fraud detection, and scientific
discovery.The related terms data dredging, data fishing and data
snooping refer to the use of data mining techniques to sample portions of

1
the larger population data set that are (or may be) too small for reliable
statistical inferences to be made about the validity of any patterns
discovered. These techniques can, however, be used in the creation of new
hypotheses to test against the larger data populations.

EXPLORATORY DATA ANALYSIS

The economic growth of each year depends on the amount of duration of


monsoon rain, bad monsoon can lead to destruction of some crops, which
may result in scarcity of some agricultural products which in turn can cause
food inflation, insecurity and public unrest. In our analysis we are trying to
understand the behavior of rainfall in India over the years, by months and
different subdivisions.

Data-set is downloaded from “data.gov.in” website. It has data for 117


years (1901–2017) consisting of monthly and seasonal data for all 36
meteorological subdivisions of India. So in total we have 117*12*36 =
50,544 observations. Our data-set had 0.7% of missing values. For the
subdivision Arunachal Pradesh we had missing values for the first 15 years
i.e. 1901 to 1915, so for all subdivisions we have considered data from 1916
to 2017 when we are analyzing as whole India. For the rest of the missing
values we have used sequential imputation technique. Below table shows
mean rainfall observed for each month over years. We can see that average
rainfall is high in July and August followed by June and September.

Cluster0: The annual Max. temperature went up to 300c. There is a


temperature variation across seasons i.e. low during winter (250c) and
raised to peak during Mar-May season (320c), downfall starts from Jun-
Sep season (310c) and further downfall starts in rainy season (280c). The
mean of max. temperature was raised from 1900 year to 2012. Same
is the case happened across the seasons Jan-Feb, Mar-May, Jun-Sep,
Oct-Dec and also along annual. There are low temperature values in
Annual, Jan-Feb, Mar-May, Jun-Sep and Oct-Dec duration in cluster4 i.e.
in the year 1905 and high in cluster0. The maximum temperature is
increasing year by year and there is no downfall except in 1920 -25 years
during Jun-Sep. That means warming of earth is taking place year by year
due to many factors indicated by Annual- seasonal Max. Temperature data.
The data sets with mean temperature was clustered and kept in a table 3.3
for further analysis.

DATASET LINK :

Weather data Indian cities (1990 to 2022) | Kaggle

2
ANALYSIS:
The mean temperature is raising year by year but slight downfall in the
duration 1955 – 1965 but again rose after that duration. That means warming
of earth is taking place year by year due to many factors Annual and seasonal
minimum (night) temperatures is averaged over the country as a whole for
the period 1901- 2012. It is based on the surface air temperature (i.e. 1.2 m
above sea level) data from more than 350 stations spread over the country.
In this in year 1995 it is showing 20.3 as highest min. temp and in 1975
lowest min. temp is 18.61. The regression trend line was drawn with
equation is a polynomial equation.

Annual and seasonal minimum (night) temperatures is averaged over the


country as a whole for the period 1901- 2012. It is based on the surface air
temperature (i.e. 1.2 m above sea level) data from more than 350 stations
spread over the country. In this in year 1995 it is showing 20.3 as highest
min. temp and in 1975 lowest min. temp is 18.61. The regression trend
line was drawn with equation is a polynomial equation.
y = -3E-11x6 + 7E-09x5 - 4E-07x4- 1E-05x3 + 0.001x2- 0.025x + 19.36

DATASET DESCRIPTION:

Any Data is as good as its Description, so here's a brief explanation:


The following data set contains Temperature data (Minimum, Average,
Maximum) in degrees Centigrade and Precipitation data in mm.This data set
contains daily Temperature and Precipitation data from 01/01/1990 to
20/07/2022.
Data for the following cities is present :
• Delhi
• Bangalore
• Chennai
• Lucknow
• Rajasthan
• Mumbai
• Bhubaneswar
• Rourkela
The station Geolocation file will give you the approximate location from where
these measurements are taken.
What Can you do with this Data Set ?

3
• Can you Find the hottest/coldest years for each city?
• Can you Find precipitation averages and tell when rainfall was abnormally
less or abnormally more?
• Can you Prove that temperature is increasing and if so at what rate (degree
increase/ year)?
• Can you create Effective Visualization to convey the same?

INFERENCES REPORT:

(PDF) Analysis of Indian Weather Data Sets Using Data Mining Techniques
(researchgate.net)

CONCLUSION:

It is found that over 112 years of temperature data that temperature is


increasing gradually i.e. there is an indication of global warming taking
place. Temperature in terms of min or max or mean irrespective of
it is increasing gradually and is found through k-means cluster
analysis. The predictions can be done using the linear regression line
equations that are found in an effective manner. The future scope of
this is it can be extended to any huge data sets with various attributes
/parameters for effective analysis and accurate prediction.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy