0% found this document useful (0 votes)
87 views17 pages

A Machine Learning Model For Flight Delay Prediction: Certificate

This document provides a synopsis for a project that aims to build a machine learning model to predict flight delays. It will apply classification algorithms like decision trees and random forest classifiers to historical flight data to determine if a given flight's arrival will be delayed or not. The goals are to improve understanding of flight delays and help customers. It will focus only on prediction and not solutions. The data will come from publicly available sources and Python will be used for analysis and modeling. Key steps include data preprocessing, training a model, and evaluating accuracy on test data.

Uploaded by

Ramesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views17 pages

A Machine Learning Model For Flight Delay Prediction: Certificate

This document provides a synopsis for a project that aims to build a machine learning model to predict flight delays. It will apply classification algorithms like decision trees and random forest classifiers to historical flight data to determine if a given flight's arrival will be delayed or not. The goals are to improve understanding of flight delays and help customers. It will focus only on prediction and not solutions. The data will come from publicly available sources and Python will be used for analysis and modeling. Key steps include data preprocessing, training a model, and evaluating accuracy on test data.

Uploaded by

Ramesh Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

PROJECT SYNOPSIS ON

A Machine Learning Model for


Flight Delay Prediction
By

PAYAL KUMARI (10900216037)

Under the guidance of:


(ANUPAM BERA)

Department of Information and Technology.

Netaji Subhash Engineering College

Garia, Kolkata – 700152

Certificate
Project group: SAURABH KUMAR, SAMEER AKHTER, PAYAL KUMARI, RAMESH KUMAR
Under my guidance and supervision the synopsis of the project
_____________________________________________________________of 4th
year Information and technology is submitted.

(signature of Project guide)


--------------------------------------
ANUPAM BERA
Information and Technology
Netaji Subhas Engineering College.
Garia, Kolkata - 700152

2
ACKNOWLEDGEMENT
I owe my deep sense of gratitude to my respected mentor Prof. ANUPAM
BERA, Department of Information and Technology. Netaji Subhash
Engineering College, Kolkata for his meticulous and expert guidance,
constructive criticism, patient hearing and benevolent behaviour throughout
my ordeal of the present research. I shall remain grateful to him for his
cordial, cooperative attitude, wise and knowledgeable counsel that acted as
an impetus in the successful completion of my project titled MACHINE
LEARNING MODEL FOR FLIGHT DELAY PREDICTION.
I would like to particularly thank the Head of the Department for giving me
guidance and inspiration during my study in the department. I never forget
the kind help extended by the HOD. It however, is not possible for me to
forget the kind of help provided by all the faculty members,
At last but not least my friends in the department who deserve some words
of thanks.

3
CONTENT

Abstract 5

Introduction 6

Project Goals and Scope 7

Data and Tools 7


4.1 Data Used
4.1.1 Choosing the Dataset
4.2 Tools 8
Python and associated packages

Proposed Work 9
5.1 Classification

System Design 10
The various modules of the project would be divided into the segments as described.
I. Data Collection 12
II. Pre Processing 12
III. Training the Machine 13
IV. Data Scoring 14

Conclusion 15

Future work 15

References 16

4
Abstract
Growth in aircraft industry has resulted in air-traffic congestion causing

flight delays. Flight delays not only have economic impact but also

harmful environmental effects. Air-traffic management is becoming

increasingly challenging. In this project I apply machine learning

algorithm like decision tree classifiers to predict if a given flight’s arrival

will be delayed or not.

5
Introduction

Delay is one of the most remembered performance indicators of any


transportation system. Notably, commercial aircraft players understand delay
as the period by which a flight is late or postponed. Thus, a delay may be
represented by the difference between scheduled and real times of departure
or arrival of a plane. Country regulator authorities have a multitude of
indicators related to tolerance thresholds for flight delays. Indeed, flight delay
is an essential subject in the context of air transportation systems. In 2013,
36% of flights delayed by more than five minutes in Europe, 31.1% of flights
delayed by more than 15 minutes in the United States, and 16.3% of flights
were cancelled or suffered delays greater than 30 minutes in Brazil. This
indicates how relevant this indicator is and how it affects no matter the scale
of airline meshes.

6
Project Goals and Scope

A chief goal of this project is to add to the academic

understanding of flight delay prediction. The hope is that with a greater

understanding of how the flight delays, customer will be better

equipped to prevent delay.

It is important here to define the scope of the project. This project

will focus exclusively on predicting the flight delay . The project will

make no attempt to decide how much money to allocate to each

prediction. More so, the project will analyse the accuracies of these

prediction.

Data and Tools

4.1 Data Used

4.1.1 Choosing the Dataset

We have selected dataset available on kaggle.com .Features

contained in the dataset are as follows:

1. Origin

2. Dest

3. Unique_Carrier

4. Day_of_Week

7
5. Dep_Hour

6. Arr_Delay.

4.2 Tools

Python and associated packages


Python was the language of choice for this project. This was an easy

decision for the multiple reasons. 16

1. Python as a language has an enormous community behind it. Any

problems that might be encountered can be easily solved with a

trip to Stack Overflow. Python is among the most popular

languages on the site which makes it very likely there will be a

direct answer to any query.

2. Python has an abundance of powerful tools ready for scientific

computing. Packages such as Numpy, Pandas, and SciPy are freely

available, performant, and well documented. Packages such as

these can dramatically reduce, and simplify the code needed to

write a given program. This makes iteration quick.

3. Python as a language is forgiving and allows for programs that

look like pseudo code. This is useful when pseudo code given in

academic papers needs to be implemented and tested. Using

Python, this step is usually reasonably trivial.

8
Proposed Work

I basically use here classification in my project.

5.1 Classification
Classification is an instance of supervised learning where a set is

analyzed and categorized based on a common attribute. From the

values or the data are given, classification draws some conclusion from

the observed value. If more than one input is given then classification

will try to predict one or more outcomes for the same. A few classifiers

that are used here for the flight delay prediction includes the random

forest classifier, SVM classifier.

Random Forest Classification and Logistic Regression

Random Forest Classifier

Random forest classifier is a type of ensemble classifier and also a

supervised algorithm. It basically creates a set of decision trees, that

yields some result. The basic approach of random class classifier is to

take the decision aggregate of random subset decision trees and yield a

9
final class or result based on the votes of the random subset of decision

trees.

Parameters

The parameters included in the random forest classifier are

n_estimators which is total number of decision trees, and other hyper

parameters like oobscore to determine the generalization accuracy of

the random forest, max_features which includes the number of

features for best-split. min_weight_fraction_leaf is the minimum

weighted fraction of the sum total of weights of all the input samples

required to be at a leaf node. Samples have equal weight when sample

weight is not provided.

System Design

The first step is the conversion of this raw data into processed data.

This is done using feature extraction, since in the raw data collected

there are multiple attributes but only a few of those attributes are

useful for the purpose of prediction. So the first step is feature

extraction, where the key attributes are extracted from the whole list of

10
attributes available in the raw dataset. Feature extraction starts from an

initial state of measured data and builds derived values or features.

These features are intended to be informative and non-redundant,

facilitating the subsequent learning and generalization steps. Feature

extraction is a dimensionality reduction process, where the initial set of

raw variables is diminished to progressively reasonable features for

ease of management, while still precisely and totally depicting the first

informational collection. The feature extraction process is followed by a

classification process wherein the data that was obtained after feature

extraction is split into two different and distinct segments. Classification

is the issue of recognizing to which set of categories a new observation

belongs. The training data set is used to train the model whereas the

test data is used to predict the accuracy of the model. The splitting is

done in a way that training data maintain a higher proportion than the

test data. The random forest algorithm utilizes a collection of random

decision trees to analyze the data. In layman terms, from the total

number of decision trees in the forest, a cluster of the decision trees

look for specific attributes in the data. This is known as data splitting. In

this case, since the end goal of our proposed system is to predict the

flight delay from its historical data.

11
System Architecture

The various modules of the project would be divided into


the segments as described.

I. Data Collection
Data collection is a very basic module and the initial step towards the

project. It generally deals with the collection of the right dataset. The

dataset that is to be used in the prediction has to be used to be filtered

based on various aspects. Data collection also complements to enhance

the dataset by adding more data that are external. Our data mainly

consists of the previous year flight time table. Initially, we will be

12
analyzing the Kaggle dataset and according to the accuracy, we will be

using the model with the data to analyze the predictions accurately.

II. Pre Processing


Data pre-processing is a part of data mining, which involves

transforming raw data into a more coherent format. Raw data is

usually, inconsistent or incomplete and usually contains many errors.

The data pre-processing involves checking out for missing values,

looking for categorical values, splitting the data-set into training and

test set and finally do a feature scaling to limit the range of variables so

that they can be compared on common environs.

III. Training the Machine


Training the machine is similar to feeding the data to the algorithm to

touch up the test data. The training sets are used to tune and fit the

models. The test sets are untouched, as a model should not be judged

based on unseen data. The training of the model includes cross-

validation where we get a well-grounded approximate performance of

the model using the training data. Tuning models are meant to

specifically tune the hyperparameters like the number of trees in a

random forest. We perform the entire cross-validation loop on each set

of hyperparameter values. Finally, we will calculate a cross-validated

13
score, for individual sets of hyperparameters. Then, we select the best

hyperparameters. The idea behind the training of the model is that we

some initial values with the dataset and then optimize the parameters

which we want to in the model. This is kept on repetition until we get

the optimal values. Thus, we take the predictions from the trained

model on the inputs from the test dataset. Hence, it is divided in the

ratio of 80:20 where 80% is for the training set and the rest 20% for a

testing set of the data.

IV. Data Scoring


The process of applying a predictive model to a set of data is referred to

as scoring the data. The technique used to process the dataset is the

Random Forest Algorithm. Random forest involves an ensemble

method, which is usually used, for classification and as well as

regression. Based on the learning models, we achieve interesting

results. The last module thus describes how the result of the model can

help to predict the probability of a flight delay based on certain

parameters. It also shows the vulnerabilities of a particular entity. The

user authentication system control is implemented to make sure that

only the authorized entities are accessing the results.

14
Conclusion

In this project, I am able to successfully apply machine learning

algorithms to predict flight arrival-delay and show simple classifiers like

decision tree and can predict if a flight’s arrival will be delayed or not

fairly accurately.

Future work

For further work I like to further improve my model, perhaps with

more training-data or deeper neural network, or both. We can improve

the accuracy further.

15
References:

[1] C. Cetek, E. Cinar, F. Aybek, and A. Cavcar, “Capacity and delay analysis for

airport manoeuvring areas using simulation,” Aircraft Engineering and

Aerospace Technology, vol. 86, no. 1, pp. 43–55, 2013. [Online]. Available:

https://doi.org/10.1108/AEAT-04-2012-0058

[2] K. B. Nogueira, P. H. Aguiar, and L. Weigang, “Using ant algorithm to

arrange taxiway sequencing in airport,” International Journal of Computer

Theory and Engineering, vol. 6, no. 4, p. 357, 2014.

[3] R. R. Clewlow, I. Simaiakis, and H. Balakrishnan, “Impact of arrivals on

departure taxi operations at airports,” 2010

16
References
1. https://www.researchgate.net/publication/315382748_A_Review_on_Flight_Delay_Pr

ediction

2.

17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy