AI Final Report 129,132,160
AI Final Report 129,132,160
Submitted by
AAYUSH SRIVASTAVA[RA2111003011129]
SWETHA SREE S [RA2111003011132]
AISHNA SHARMA[RA2111003011160]
BACHELOR OF TECHNOLOGY
in
1
COLLEGE OF ENGINEERING & TECHNOLOGY
SRM INSTITUTE OF SCIENCE & TECHNOLOGY
S.R.M. NAGAR, KATTANKULATHUR – 603 203
Chengalpattu District
BONAFIDE CERTIFICATE
Certified that the Mini project report titled Traffic Prediction using Machine
Learning is the bona fide work of AAYUSH SRIVASTAVA[RA2111003011129],
SWETHA SREE S [RA2111003011132], AISHNA SHARMA[RA2111003011160]
who carried out the minor project under my supervision. Certified further, that to the
best of my knowledge, the work reported herein does not form any other project report
or dissertation on the basis of which a degree or award was conferred on an earlier
occasion on this or any other candidate.
SIGNATURE
2
ABSTRACT
The paper deals with traffic prediction that can be done in intelligent transportation
systems which involve the prediction between previous year’s data set and he recent
year data which ultimately provides the accuracy and mean square error. This prediction
will be helpful for the people who are in need to check the immediate traffic state. The
traffic data is predicated on a basis of 1 hour time gap. Live statistics of the traffic is
analyzed from this prediction. So, this will be easier to analyze when the user is on
driving too. The system compares the data of all roads and determines the most
populated roads of the city. I propose the regression model in order to predict the traffic
using machine learning by importing Sklearn, Keras and TensorFlow libraries
Keywords: Traffic, Regression, Intelligent Transport System (ITS), Machine
learning,Prediction.
3
Table of Contents
ABSTRACT iii
TABLE OF CONTENTS iv
LIST OF FIGURES v
ABBREVIATIONS vi
1 INTRODUCTION 7
2 LITERATURE SURVEY 9
3 OVERVIEW 10
4 METHODOLOGY 11
5 SOFTWARE IMPLEMENTATION 14
REFERENCES 23
4
LIST OF FIGURES
5
ABBREVIATIONS
6
INTRODUCTION
Machine Learning (ML) is one of the most important and popular emerging
branches these days as it is a part of Artificial Intelligence (AI). In recent times,
machine learning becomes an essential and upcoming research area for
transportation engineering, especially in traffic prediction. Traffic congestion
affects the country’s economy directly or indirectly by its means. Traffic
congestion also takes people’s valuable time, cost of fuel every single day. As
traffic congestion is a major problem for all classes in society, there has to be a
small-scale traffic prediction for the people’s sake of living their lives without
frustration or tension. For ensuring the country’s economic growth, the road
user’s ease is required in the first place. This is possible only when the traffic
flow is smooth. To deal with this, Traffic prediction is needed so that we can
estimate or predict the future traffic to some extent. In addition to the country’s
economy, pollution can also be reduced. The government is also investing in the
intelligent transportation system (ITS) to solve these issues. The plot of this
research paper is to find different machine learning algorithms and speculating
the models by utilizing python3. The goal of traffic flow prediction is to predict
the traffic to the users as soon as possible. Nowadays the traffic becomes really
hectic and this cannot be determined by the people when they are on roads. So,
this research can be helpful to predict traffic. Machine learning is usually done
using anaconda software but, in this paper, I have used the python program using
command prompt window which is much easier than the usual way of predicting
the data. In summary, the constructs of this paper consist of ten major sections.
These are: Introduction, Purpose of Traffic Prediction, Problem Statement,
Related Work, Overview, Methodology, Software Implementation and
Conclusion with Future work.
7
Purpose of statement: Many reports of the traffic data are of actual time but it is
not favorable and accessible to many users as we need to have prior decision in
which route we need to travel. For example, during working days, we need to
have daily traffic information or at times we need hourly traffic information but
then the traffic congestion occurs; for solving this issue the user needs to have
actual time traffic prediction. Many factors are responsible for the traffic
congestion. This can be predicted by taking two datasets; one with the past year
and one with the recent year’s data set. If traffic is so heavy then the traffic can
be predicted by referring the same time in the past year’s data set and analyzing
how congested the traffic would be. With the increasing cost of the fuel, the traffic
congestion changes drastically. The goal of this prediction is to provide real-time
gridlock and snarl up information. The traffic on the city becomes complex and
are out of control these days, so such kind of systems are not sufficient for
prediction. Therefore, research on traffic flow prediction plays a major role in
ITS.
8
LITERATURE SURVEY
9
OVERVIEW
In traffic congestion forecasting there are data collection and prediction model.
The methodology has to be done correctly so that there won’t be any flaws while
predicting. After data collection, the vital role is the data processing which is to train
and test the datasets that is taken as the input. After processing the data, the validation
of the model is done by using necessary models. Figure 1 highlights the outline of traffic
prediction using machine learning.
10
METHODOLOGY
Many researchers have been used various discussed approaches. This paper contains
the technique of predicting the traffic using regression model using various libraries
like Pandas, NumPy, OS, Matplotlib.pyplot, Keras and Sklearn.
Data set:
Traffic congestion is raising a lot these days. Factors like expanding urban
populations, uncoordinated traffic signal timing and a lack of real-time data. The
effect of the traffic congestion is very huge these days. Data collected in this paper are
from the Kaggle website for the implementations of machine learning algorithms
using python3 to show outputs in the traffic prediction.
Date: The Date Column contains the date on which the data were recorded in the
format DD/MM/YYYY.
Day: The Day Column contains the weekday on which the data was collected. This is
done to make the dataset more usable in terms of predicting the likelihood of traffic
dependent on what day of the week it is.
Coded Day: Each day of the week is assigned a code number by the coded day.
Because we are not forced to write string functions for converting the given days to
codes, predicting traffic depending on the day is considerably easier. The following
are the day codes: -Monday - 1 Tuesday - 2 Wednesday - 3 Thursday - 4 Friday - 5
Saturday - 6 Sunday – 7
Zone: This column contains the zone number for which traffic data is collected. The
weather in this column has been coded. This is based on a variety of typical weather
conditions. The amount of traffic fluctuates depending on the weather in each zone.
This covers factors such as humidity, mist, visibility, and precipitation, among others.
Temperature: This column contains the temperature for the given zone on a given day.
Temperature has a significant impact on traffic forecasting.
11
Traffic: This is the column that serves as the training dataset as well as a predictor.
This column's traffic is coded on a five-level scale. The following are the levels: -1 -
Less than 5 cars. 2 - 5 to 15 cars. 3 - 15 to 30 cars. 4 - 30 to 50 cars. 5 - More than 50
cars.
12
JupyterLab is a browser based communal development. JupyterLab is a limber and
which can construct and exhibit the user interface to support a far flung of metadata in
machine learning. Python3 is the status quo environment where the code is implemented
in Jupyter notebook. This can be accesses/installed using command prompt. This is done
in order to get the access from the local drive. So, the Jupyter notebook is installed
through command prompt and then a local host is created. The file is accessed through
this host and the prediction are done using various libraries and models in the python
environment.
13
SOFTWARE IMPLEMENTATION
Simulation: The command prompt is the local host in this paper to initialize the
jupyter notebook.
The local host contains the nbextenisons which we modify to our convenience.
14
CODING AND TESTING
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [4]:
dataset=pd.read_csv("Dataset.csv")
In [5]:
dataset.head()
Out[5]:
0 Wednesday 01-06-18 3 2 35 17 2
1 Wednesday 01-06-18 3 3 36 16 3
2 Wednesday 01-06-18 3 4 27 25 5
3 Wednesday 01-06-18 3 5 23 23 3
4 Wednesday 01-06-18 3 6 18 42 2
In [6]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
In [8]:
dataset['Date']= le.fit_transform(dataset['Date'])
In [9]:
dataset.tail(10)
Out[9]:
15
Day Date CodedDay Zone Weather Temperature Traffic
In [11]:
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1439 entries, 0 to 1438
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Day 1439 non-null object
1 Date 1439 non-null int32
2 CodedDay 1439 non-null int64
3 Zone 1439 non-null int64
4 Weather 1439 non-null int64
5 Temperature 1439 non-null int64
6 Traffic 1439 non-null int64
dtypes: int32(1), int64(5), object(1)
memory usage: 73.2+ KB
In [12]:
X = dataset.iloc[:, 2:6].values
y = dataset.iloc[:, 6:7].values
In [13]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 0)
In [14]:
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.fit_transform(X_test)
In [15]:
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators = 300, random_state = 0)
regressor.fit(X_train,y_train)
16
C:\Users\sweth\AppData\Local\Temp\ipykernel_15884\1592691624.py:3: DataConversion
Warning: A column-vector y was passed when a 1d array was expected. Please change
the shape of y to (n_samples,), for example using ravel().
regressor.fit(X_train,y_train)
Out[15]:
RandomForestRegressor
RandomForestRegressor(n_estimators=300, random_state=0)
In [16]:
y_pred = regressor.predict(X_test)
In [17]:
if(y_pred.all()<2.5):
y_pred=np.round(y_pred-0.5)
else:
y_pred=np.round(y_pred+0.5)
In [18]:
df1=(y_pred-y_test)/y_test
df1=round(df1.mean()*100,2)
print("Error = ",df1,"%")
a=100-df1
print("Accuracy= ",a,"%")
Error = 13.42 %
Accuracy= 86.58 %
In [19]:
17
2.98060755, 3.72745618, 4.06289067, 3.18461613, 2.79760641,
3.16032156, 2.40107766, 3.05992469, 2.50326614, 3.14948545,
2.0258538 , 2.39031143, 2.5313014 , 2.06376862, 3.25822173,
3.8331732 , 2.91792523, 4.16926464, 2.99944268, 2.72893788,
2.7935679 , 3.48603414, 3.09625424, 3.2274822 , 3.1877698 ,
3.1412397 , 3.34430832, 2.67296499, 2.60448657, 3.05688571,
1.89781436, 2.80615316, 3.03905493, 3.21732211, 2.68923666,
2.68586496, 2.92857837, 3.68705538, 3.5105518 , 2.78635346,
3.16770217, 2.09719915, 3.14931621, 3.16040971, 2.34756506,
2.97929857, 2.58705337, 3.40824713, 3.39293322, 3.9374141 ,
2.37849792, 3.00871069, 2.72592051, 3.18156876, 2.13153741,
3.01812279, 3.11791498, 2.17607694, 3.1761826 , 2.8550817 ,
3.16434755, 3.1920135 , 2.83936959, 2.60467229, 3.84780699,
3.17861604, 2.99417862, 3.28979924, 2.95488478, 2.30669607,
2.95799886, 2.77125492, 3.52502281, 3.42332299, 3.06387125,
3.36793637, 3.18167001, 3.10368359, 3.64836723, 2.24413845,
2.85459664, 3.42208017, 3.15932066, 1.98435782, 4.06736121,
3.51295149, 3.32246528, 3.70481761, 3.87743913, 3.3851674 ,
2.7729928 , 3.05428736, 2.68014413, 3.71865779, 3.56637636,
3.10693693, 2.69362507, 2.90795588, 2.49206221, 2.38230613,
2.80610236, 2.2751092 , 3.35312855, 3.19958008, 2.85885598,
2.81472787, 2.97602789, 2.24323176, 3.53810023, 2.57869998,
2.54667703, 2.7750127 , 2.93120167, 2.02760259, 3.13390563,
3.0352923 , 2.46270591, 2.86611534, 3.01681107, 3.25683056,
3.33614346, 3.29854881, 3.0633859 , 3.03701775, 3.13264911,
3.27237022, 1.99578218, 2.96019533, 3.07517058, 2.75037391,
2.69742165, 3.05560167, 2.7811108 , 3.29256002, 3.09761597,
2.94515573, 3.67508548, 2.79910461, 2.68429918, 2.35135477,
1.78037451, 2.06058057, 2.78031096, 2.96887932, 2.45162779,
2.34716929, 2.60769396, 3.32031932, 2.83096498, 3.96318253,
1.95025452, 2.96047266, 2.60408857, 3.01211478, 2.76417688,
3.12770243, 2.8292839 , 3.48226775, 2.66415284, 3.60667452,
3.22824346, 1.98119445, 3.04303888, 2.61650804, 3.09453429,
3.77630571, 3.30109344, 3.12562853, 3.08593669, 3.42617757,
3.40788194, 3.81374563, 2.0364313 , 2.90612445, 2.9860682 ,
2.4440315 , 2.84854671, 3.02571857, 3.0058111 , 3.36237316,
2.92237107, 2.8212406 , 2.68600375, 3.06650889, 2.71026894,
2.48229778, 2.78905095, 3.05137147, 2.30409362, 3.14095599,
1.9830805 , 3.31410349, 2.8334792 , 3.41044366, 3.71151193,
2.94174763, 2.53802189, 3.32556579, 2.59622144, 2.7976202 ,
3.16463529, 2.89358632, 3.666639 , 2.59841178, 3.10452024,
3.96164453, 2.62040129, 3.54119895, 3.26069901, 2.27431143,
2.36339918, 2.48121324, 3.17323898, 2.35699081, 3.76301981,
3.24130716, 2.5939703 , 2.64012962, 3.26218151, 3.40286614,
3.51389568, 3.20716058, 3.20192922, 2.69489033, 3.37639684,
2.83542016, 3.69721289, 2.76678089, 3.25447156, 3.3156072 ,
2.82490969, 3.33686767, 3.06560594, 2.67029599, 2.39938476,
3.33374851, 3.31654043, 3.1761943 ])
18
In [21]:
if(y_pred.all()<2.5):
y_pred=np.round(y_pred-0.5)
else:
y_pred=np.round(y_pred+0.5)
y_pred
Out[21]:
array([2., 3., 3., 3., 2., 2., 2., 2., 3., 2., 3., 3., 2., 2., 2., 2., 2.,
2., 3., 2., 3., 2., 3., 2., 2., 3., 2., 3., 3., 2., 2., 3., 2., 3.,
3., 2., 2., 4., 3., 2., 2., 3., 2., 3., 3., 2., 2., 2., 2., 3., 2.,
3., 4., 3., 2., 3., 2., 3., 2., 3., 2., 2., 2., 2., 3., 3., 2., 4.,
2., 2., 2., 3., 3., 3., 3., 3., 3., 2., 2., 3., 1., 2., 3., 3., 2.,
2., 2., 3., 3., 2., 3., 2., 3., 3., 2., 2., 2., 3., 3., 3., 2., 3.,
2., 3., 2., 3., 3., 2., 3., 2., 3., 3., 2., 2., 3., 3., 2., 3., 2.,
2., 2., 2., 3., 3., 3., 3., 3., 3., 3., 2., 2., 3., 3., 1., 4., 3.,
3., 3., 3., 3., 2., 3., 2., 3., 3., 3., 2., 2., 2., 2., 2., 2., 3.,
3., 2., 2., 2., 2., 3., 2., 2., 2., 2., 2., 3., 3., 2., 2., 3., 3.,
3., 3., 3., 3., 3., 3., 1., 2., 3., 2., 2., 3., 2., 3., 3., 2., 3.,
2., 2., 2., 1., 2., 2., 2., 2., 2., 2., 3., 2., 3., 1., 2., 2., 3.,
2., 3., 2., 3., 2., 3., 3., 1., 3., 2., 3., 3., 3., 3., 3., 3., 3.,
3., 2., 2., 2., 2., 2., 3., 3., 3., 2., 2., 2., 3., 2., 2., 2., 3.,
2., 3., 1., 3., 2., 3., 3., 2., 2., 3., 2., 2., 3., 2., 3., 2., 3.,
3., 2., 3., 3., 2., 2., 2., 3., 2., 3., 3., 2., 2., 3., 3., 3., 3.,
3., 2., 3., 2., 3., 2., 3., 3., 2., 3., 3., 2., 2., 3., 3., 3.])
In [22]:
df1=(y_pred-y_test)/y_test
df1=round(df1.mean()*100,2)
print("Error = ",df1,"%")
Error = 12.16 %
In [23]:
a=100-df1
print("Accuracy= ",a,"%")
Accuracy= 87.84 %
In [24]:
print("Error = ",df1,"%")
print("Accuracy= ",a,"%")
Error = 12.16 %
Accuracy= 87.84 %
19
SCREENSHOTS AND RESULTS
20
CONCLUSION AND FUTURE ENHANCEMENTS
Conclusion: In the system, it has been concluded that we develop the traffic flow
prediction system by using a machine learning algorithm. By using regression model,
the prediction is done. The public gets the benefits such as the current situation of the
traffic flow, they can also check what will be the flow of traffic on the right after one
hour of the situation and they can also know how the roads are as they can know mean
of the vehicles passing through a particular junction that is 4 here. The weather
conditions have been changing from years to years. The cost of fuel is also playing a
major role in the transportation system. Many people are not able to afford the vehicle
because of the fuel cost. So, there can be many variations in the traffic data. There is
one more scenario where people prefer going on their own vehicle without carpooling,
this also matters in the traffic congestion. So, this prediction can help judging the traffic
flow by comparing them with these 2 years data sets. The forecasting or the prediction
can help people or the users in judging the road traffic easier beforehand and even they
can decide which way to go using their navigator and also this will prediction will be
also helpful.
Future: Work In the future, the system is often further improved using more factors
that affect traffic management using other methods like deep learning, artificial neural
network, and even big data. The users can then use this technique to seek out which
route would be easiest to achieve on destination. The system can help in suggesting the
users with their choice of search and also it can help to find the simplest choice where
traffic isn't in any crowded environment. Many forecasting methods have already been
applied in road traffic jam forecasting. While there's more scope to create the congestion
prediction more precise, there are more methods that give precise and accurate results
from the prediction. Also, during this period, the employment of the increased available
traffic data by applying the newly developed forecasting models can improve the
prediction accuracy. These days, traffic prediction is extremely necessary for pretty
21
much a part of the state and also worldwide. So, this method of prediction would be
helpful in predicting the traffic before and beforehand. For better congestion prediction,
the grade and accuracy are prominent in traffic prediction. within the future, the
expectation is going to be the estimation of established order accuracy prediction with
much easier and user-friendly methods so people would find the prediction model useful
and that they won’t be wasting their time and energy to predict the information. There
will be some more accessibility like weather outlook, GPS that's the road and accident-
prone areas will be highlighted in order that people wouldn't prefer using the paths
which aren't safe and simultaneously they'll predict the traffic. This will be done by deep
learning, big data, and artificial neural networks.
22
REFERENCES
1. Big data-driven machine learning-enabled traffic flow prediction Anhui Kong3, 2018.
2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2845248/
3. https://jupyter.org/
4. Identification Traffic Flow Prediction Parameters Anuchit Ratanaparadorn
Department
5. https://www.kaggle.com/fedesoriano/traffic-prediction-dataset
6. https://www.hindawi.com/journals/jat/2021/8878011/
7. https://machinelearningmastery.com/how-to-connect-model-input-data-
withpredictions-for-machine-learning/
8. https://www.shanelynn.ie/pandas-iloc-loc-select-rows-and-columns-dataframe/
9. https://matplotlib.org/2.0.2/api/pyplot_api.html
10. https://www.catalyzex.com/s/Traffic%20Prediction
11. https://www.geeksforgeeks.org/formatting-dates-in-python/
12. https://www.scitepress.org/Papers/2016/58957/pdf/index.ht
23