0% found this document useful (0 votes)
14 views12 pages

Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

The document appears to be a Python code submission for a student named Jiya Ali with registration number 2110815. The code imports various Python libraries for data analysis and machine learning. It then loads and cleans temperature data for various cities. The code performs exploratory data analysis on the Chicago temperature data, including plotting, time series decomposition and stationarity tests. It then builds and evaluates ARIMA models to forecast future temperature values.

Uploaded by

Jia Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views12 pages

Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

The document appears to be a Python code submission for a student named Jiya Ali with registration number 2110815. The code imports various Python libraries for data analysis and machine learning. It then loads and cleans temperature data for various cities. The code performs exploratory data analysis on the Chicago temperature data, including plotting, time series decomposition and stationarity tests. It then builds and evaluates ARIMA models to forecast future temperature values.

Uploaded by

Jia Ali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

Submitted To: MURTAZA MEHDI


____________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Student Name: JIYA ALI

____________________________________________________________________________________________________________________________________________________________________________________________________________________________________

Reg Number: 2110815


___________________________________________________________________________________________________________________________________________________________________________________________________________________________________
_

_____________________________________________________________________________________________________________________________________________________________________________________________________________________________

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

PYTHON CODE
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import folium

import imageio

from tqdm import tqdm_notebook

from folium.plugins import MarkerCluster

import geoplot as gplt

import geopandas as gpd

import geoplot.crs as gcrs

import imageio

import mapclassify as mc

import statsmodels.api as sm

from statsmodels.tsa.stattools import adfuller

from statsmodels.tsa.arima_model import ARIMA

import scipy

from itertools import product

import seaborn as sns

from statsmodels.graphics.tsaplots import plot_pacf

from statsmodels.graphics.tsaplots import plot_acf

from statsmodels.tsa.arima_process import ArmaProcess

from statsmodels.stats.diagnostic import acorr_ljungbox

from statsmodels.tsa.statespace.sarimax import SARIMAX

from statsmodels.tsa.stattools import adfuller

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

from statsmodels.tsa.stattools import pacf

from statsmodels.tsa.stattools import acf

plt.style.use('ggplot')

plt.rcParams['font.family'] = 'sans-serif'

plt.rcParams['font.serif'] = 'Ubuntu'

plt.rcParams['font.monospace'] = 'Ubuntu Mono'

plt.rcParams['font.size'] = 14

plt.rcParams['axes.labelsize'] = 12

plt.rcParams['axes.labelweight'] = 'bold'

plt.rcParams['axes.titlesize'] = 12

plt.rcParams['xtick.labelsize'] = 12

plt.rcParams['ytick.labelsize'] = 12

plt.rcParams['legend.fontsize'] = 12

plt.rcParams['figure.titlesize'] = 12

plt.rcParams['image.cmap'] = 'jet'

plt.rcParams['image.interpolation'] = 'none'

plt.rcParams['figure.figsize'] = (12, 10)

plt.rcParams['axes.grid']=True

plt.rcParams['lines.linewidth'] = 2

plt.rcParams['lines.markersize'] = 8

colors = ['xkcd:pale orange', 'xkcd:sea blue', 'xkcd:pale red', 'xkcd:sage green', 'xkcd:terra cotta',
'xkcd:dull purple', 'xkcd:teal', 'xkcd: goldenrod', 'xkcd:cadet blue',

'xkcd:scarlet']

data = pd.read_csv('GlobalLandTemperaturesByMajorCity.csv')

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

data = pd.read_csv('GlobalLandTemperaturesByMajorCity.csv')

city_data.head()

LAT = []

LONG = []

for city in city_data.City.tolist():

locator = Nominatim(user_agent="myGeocoder")

location = locator.geocode(city)

LAT.append(location.latitude)

LONG.append(location.longitude)

from geopy.geocoders import Nominatim

world_map= folium.Map()

geolocator = Nominatim(user_agent="Piero")

marker_cluster = MarkerCluster().add_to(world_map)

for i in range(len(city_data)):

lat = city_data.iloc[i]['Latitude']

long = city_data.iloc[i]['Longitude']

radius=5

folium.CircleMarker(location = [lat, long], radius=radius,fill =True,


color='darkred',fill_color='darkred').add_to(marker_cluster)

explodes = (0,0.3)

plt.pie(data[data['City']=='Chicago'].AverageTemperature.isna().value_counts(),explode=explodes,
startangle=0,colors=['firebrick','indianred'],

labels=['Non NaN elements','NaN elements'], textprops={'fontsize': 20})

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

chicago_data = data[data['City']=='Chicago']

chicago_data['AverageTemperature']=chicago_data.AverageTemperature.fillna(method='bfill')

chicago_data['AverageTemperatureUncertainty']=chicago_data.AverageTemperatureUncertainty.
fillna(method='bfill')

chicago_data = chicago_data.reset_index()

chicago_data = chicago_data.drop(columns=['index'])

chicago_data.dt = pd.to_datetime(chicago_data.dt)

YEAR = []

MONTH = []

DAY = []

WEEKDAY = []

for i in range(len(chicago_data)):

WEEKDAY.append(chicago_data.dt[i].weekday())

DAY.append(chicago_data.dt[i].day)

MONTH.append(chicago_data.dt[i].month)

YEAR.append(chicago_data.dt[i].year)

chicago_data['Year'] = YEAR

chicago_data['Month'] = MONTH

chicago_data['Day'] = DAY

chicago_data['Weekday'] = WEEKDAY

change_year_index = []

change_year = []

year_list = chicago_data['Year'].tolist()

for y in range(0,len(year_list)-1):

if year_list[y]!=year_list[y+1]:

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

change_year.append(year_list[y+1])

change_year_index.append(y+1)

chicago_data.loc[change_year_index].head()

x_ticks_year_list=np.linspace(min(year_list),max(year_list),10).astype(int)

change_year_index = np.array(change_year_index)

x_ticks_year_index = []

for i in range(1,len(x_ticks_year_list)):

x_ticks_year_index.append(change_year_index[np.where(np.array(change_year)==x_ticks_year_li
st[i])][0])

sns.scatterplot(x=chicago_data.index,y=chicago_data.AverageTemperature,s=25,color='firebrick')

plt.xticks(x_ticks_year_index,x_ticks_year_list)

plt.title('Temperature vs Year Scatter plot',color='firebrick',fontsize=40)

plt.xlabel('Year')

plt.ylabel('Average Temperature')

plt.figure(figsize=(20,20))

plt.suptitle('Plotting 4 decades',fontsize=40,color='firebrick')

plt.subplot(2,2,1)

plt.title('Starting year: 1800, Ending Year: 1810',fontsize=15)

plot_timeseries(1800,1810)

plt.subplot(2,2,2)

plt.title('Starting year: 1900, Ending Year: 1910',fontsize=15)

plot_timeseries(1900,1910)

plt.subplot(2,2,3)

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

plt.title('Starting year: 1950, Ending Year: 1960',fontsize=15)

plot_timeseries(1900,1910)

plt.subplot(2,2,4)

plt.title('Starting year: 2000, Ending Year: 2010',fontsize=15)

plot_timeseries(1900,1910)

plt.tight_layout()

fig = plt.figure(figsize=(12,8))

ax1 = fig.add_subplot(211)

fig = sm.graphics.tsa.plot_acf(chicago_data.AverageTemperature, ax=ax1,color ='firebrick')

ax2 = fig.add_subplot(212)

fig = sm.graphics.tsa.plot_pacf(chicago_data.AverageTemperature, ax=ax2,color='firebrick')

result = adfuller(chicago_data.AverageTemperature)

print('ADF Statistic on the entire dataset: {}'.format(result[0]))

print('p-value: {}'.format(result[1]))

print('Critical Values:')

for key, value in result[4].items():

print('\t{}: {}'.format(key, value))

result = adfuller(chicago_data.AverageTemperature[0:120])

print('ADF Statistic on the first decade: {}'.format(result[0]))

print('p-value: {}'.format(result[1]))

print('Critical Values:')

for key, value in result[4].items():

print('\t{}: {}'.format(key, value))

plt.title('The dataset used for prediction', fontsize=30,color='firebrick')

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

plot_timeseries(1992,2013)

plt.title('The dataset used for prediction', fontsize=30,color='firebrick')

plot_timeseries(1992,2013)

temp = get_timeseries(1992,2013)

N = len(temp.AverageTemperature)

split = 0.95

training_size = round(split*N)

test_size = round((1-split)*N)

series = temp.AverageTemperature[:training_size]

date = temp.dt[:training_size]

test_series = temp.AverageTemperature[len(date)-1:len(temp)]

test_date = temp.dt[len(date)-1:len(temp)]

#test_date = test_date.reset_index().dt

#test_series = test_series.reset_index().AverageTemperature

plot_from_data(series,date,label='Training Set')

plot_from_data(test_series,test_date,'navy',with_ticks=False,label='Test Set')

plt.legend()

def optimize_ARIMA(order_list, exog):

"""

Return dataframe with parameters and corresponding AIC

order_list - list with (p, d, q) tuples

exog - the exogenous variable

"""

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

results = []

for order in tqdm_notebook(order_list):

#try:

model = SARIMAX(exog, order=order).fit(disp=-1)

#except:

# continue

aic = model.aic

results.append([order, model.aic])

#print(results)

result_df = pd.DataFrame(results)

result_df.columns = ['(p, d, q)', 'AIC']

#Sort in ascending order, lower AIC is better

result_df = result_df.sort_values(by='AIC', ascending=True).reset_index(drop=True)

ps = range(0, 10, 1)

d=0

qs = range(0, 10, 1)

# Create a list with all possible combination of parameters

parameters = product(ps, qs)

parameters_list = list(parameters)

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

order_list = []

for each in parameters_list:

each = list(each)

each.insert(1, d)

each = tuple(each)

order_list.append(each)

result_d_0 = optimize_ARIMA(order_list, exog = series)return result_df

result_d_1.head()

final_result = result_d_0.append(result_d_1)

best_models = final_result.sort_values(by='AIC', ascending=True).reset_index(drop=True).head()

best_model_params_0 = best_models[best_models.columns[0]][0]

best_model_params_1 = best_models[best_models.columns[0]][1]

best_model_0 = SARIMAX(series, order=best_model_params_0).fit()

print(best_model_0.summary())

best_model_1 = SARIMAX(series, order=best_model_params_1).fit()

print(best_model_1.summary())

#plt.plot(forec)

plt.figure(figsize=(12,12))

plt.subplot(2,1,1)

plt.fill_between(x1, lower_test, upper_test,alpha=0.2, label = 'Test set error range',color='navy')

plt.plot(test_set,marker='.',label="Actual",color='navy')

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

plt.plot(forec,marker='d',label="Forecast",color='firebrick')

plt.xlabel('Index Datapoint')

plt.ylabel('Temperature')

#plt.fill_between(x1, s_ci['lower AverageTemperature'], s_ci['upper


AverageTemperature'],alpha=0.3, label = 'Confidence inerval (95%)',color='firebrick')

plt.legend()

plt.subplot(2,1,2)

#plt.fill_between(x1, lower_test, upper_test,alpha=0.2, label = 'Test set error range',color='navy')

plt.plot(test_set,marker='.',label="Actual",color='navy')

plt.plot(s_forec,marker='d',label="Forecast",color='firebrick')

plt.fill_between(x1, ci['lower AverageTemperature'], ci['upper AverageTemperature'],alpha=0.3,


label = 'Confidence inerval (95%)',color='firebrick')

plt.legend()

plt.xlabel('Index Datapoint')

plt.ylabel('Temperature')

plt.fill_between(np.arange(0,len(test_set),1), lower_test, upper_test,alpha=0.2, label = 'Test set


error range',color='navy')

plot_from_data(test_set,test_date,c='navy',label='Actual')

plot_from_data(forec['f'],test_date,c='firebrick',label='Forecast')

plt.legend(loc=2)

AI LAB BSAI 3A SZABIST-ISB


Shaheed Zulfikar Ali Bhutto Institute of Science & Technology

COMPUTER SCIENCE DEPARTMENT

AI LAB BSAI 3A SZABIST-ISB

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy