A Machine Learning Model For Flight Delay Prediction: Certificate
A Machine Learning Model For Flight Delay Prediction: Certificate
Certificate
Project group: SAURABH KUMAR, SAMEER AKHTER, PAYAL KUMARI, RAMESH KUMAR
Under my guidance and supervision the synopsis of the project
_____________________________________________________________of 4th
year Information and technology is submitted.
2
ACKNOWLEDGEMENT
I owe my deep sense of gratitude to my respected mentor Prof. ANUPAM
BERA, Department of Information and Technology. Netaji Subhash
Engineering College, Kolkata for his meticulous and expert guidance,
constructive criticism, patient hearing and benevolent behaviour throughout
my ordeal of the present research. I shall remain grateful to him for his
cordial, cooperative attitude, wise and knowledgeable counsel that acted as
an impetus in the successful completion of my project titled MACHINE
LEARNING MODEL FOR FLIGHT DELAY PREDICTION.
I would like to particularly thank the Head of the Department for giving me
guidance and inspiration during my study in the department. I never forget
the kind help extended by the HOD. It however, is not possible for me to
forget the kind of help provided by all the faculty members,
At last but not least my friends in the department who deserve some words
of thanks.
3
CONTENT
Abstract 5
Introduction 6
Proposed Work 9
5.1 Classification
System Design 10
The various modules of the project would be divided into the segments as described.
I. Data Collection 12
II. Pre Processing 12
III. Training the Machine 13
IV. Data Scoring 14
Conclusion 15
Future work 15
References 16
4
Abstract
Growth in aircraft industry has resulted in air-traffic congestion causing
flight delays. Flight delays not only have economic impact but also
5
Introduction
6
Project Goals and Scope
will focus exclusively on predicting the flight delay . The project will
prediction. More so, the project will analyse the accuracies of these
prediction.
1. Origin
2. Dest
3. Unique_Carrier
4. Day_of_Week
7
5. Dep_Hour
6. Arr_Delay.
4.2 Tools
look like pseudo code. This is useful when pseudo code given in
8
Proposed Work
5.1 Classification
Classification is an instance of supervised learning where a set is
values or the data are given, classification draws some conclusion from
the observed value. If more than one input is given then classification
will try to predict one or more outcomes for the same. A few classifiers
that are used here for the flight delay prediction includes the random
take the decision aggregate of random subset decision trees and yield a
9
final class or result based on the votes of the random subset of decision
trees.
Parameters
weighted fraction of the sum total of weights of all the input samples
System Design
The first step is the conversion of this raw data into processed data.
This is done using feature extraction, since in the raw data collected
there are multiple attributes but only a few of those attributes are
extraction, where the key attributes are extracted from the whole list of
10
attributes available in the raw dataset. Feature extraction starts from an
ease of management, while still precisely and totally depicting the first
classification process wherein the data that was obtained after feature
belongs. The training data set is used to train the model whereas the
test data is used to predict the accuracy of the model. The splitting is
done in a way that training data maintain a higher proportion than the
decision trees to analyze the data. In layman terms, from the total
look for specific attributes in the data. This is known as data splitting. In
this case, since the end goal of our proposed system is to predict the
11
System Architecture
I. Data Collection
Data collection is a very basic module and the initial step towards the
project. It generally deals with the collection of the right dataset. The
the dataset by adding more data that are external. Our data mainly
12
analyzing the Kaggle dataset and according to the accuracy, we will be
using the model with the data to analyze the predictions accurately.
looking for categorical values, splitting the data-set into training and
test set and finally do a feature scaling to limit the range of variables so
touch up the test data. The training sets are used to tune and fit the
models. The test sets are untouched, as a model should not be judged
the model using the training data. Tuning models are meant to
13
score, for individual sets of hyperparameters. Then, we select the best
some initial values with the dataset and then optimize the parameters
the optimal values. Thus, we take the predictions from the trained
model on the inputs from the test dataset. Hence, it is divided in the
ratio of 80:20 where 80% is for the training set and the rest 20% for a
as scoring the data. The technique used to process the dataset is the
results. The last module thus describes how the result of the model can
14
Conclusion
decision tree and can predict if a flight’s arrival will be delayed or not
fairly accurately.
Future work
15
References:
[1] C. Cetek, E. Cinar, F. Aybek, and A. Cavcar, “Capacity and delay analysis for
Aerospace Technology, vol. 86, no. 1, pp. 43–55, 2013. [Online]. Available:
https://doi.org/10.1108/AEAT-04-2012-0058
16
References
1. https://www.researchgate.net/publication/315382748_A_Review_on_Flight_Delay_Pr
ediction
2.
17