0% found this document useful (0 votes)
44 views5 pages

Predicting Flight Delays

The team analyzed flight delay data to build a predictive model of arrival delays. Their best model used departure delay and flight distance/duration to predict arrival delays with 88% accuracy. They tried adding other variables like departure/destination airports, airline, dates but these did not significantly improve the model. Weather, especially snowfall, appeared to most impact delays but could not be reliably predicted or included in the model. The key drivers of arrival delays were found to be departure delay and flight characteristics.

Uploaded by

salvador lozano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views5 pages

Predicting Flight Delays

The team analyzed flight delay data to build a predictive model of arrival delays. Their best model used departure delay and flight distance/duration to predict arrival delays with 88% accuracy. They tried adding other variables like departure/destination airports, airline, dates but these did not significantly improve the model. Weather, especially snowfall, appeared to most impact delays but could not be reliably predicted or included in the model. The key drivers of arrival delays were found to be departure delay and flight characteristics.

Uploaded by

salvador lozano
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

EMBA WEEKLY TEAM 7

Predicting flight delays:


Based on the data we received, what we first did was to order it a little bit so we can
understand it better, so we first tried to get an average from the arriving delay per day to
see the tendency line, were we saw that as the year went by the arrival delay decreased,
and try to understand if there was any logical reasons behind it, like may be at the beginning
of the year people are more enthusiastic and have more money from holyday bonuses so
they tend to travel more and there is more number of flights per day.
After that what we thought was that maybe de departure delay was very much linked to
the departure hour, since in rush hours there is more probability for the flight to by delayed,
so we did a correlation analysis and we ran different models with the 2 variables and we
discovered that the behavior of the model was exactly the same either using hour and
departure delay or only dep delay, so we concluded that using only departure delay was
taking into account all the variables that affect a delay of a flight in the departure airport.
Then we run a couple of models using the variables we considered to be the most correlated
ones with the arriving delay to see what we could predict based on the equation we
obtained from them.
For our first model we saw that there was a huge correlation between the departure delay
and the arrival delay, which sounds pretty obvious, so we ran a model using these
variables and we obtained an adjusted r^2 of 85% which was pretty good.
So then we started thinking how we could improve our model, by integrating some other
variables, maybe de departure airport since that may be linked to the traffic in each one,
but these variables were not numerical, so we made a binary table from where we could
extract the data easier.
is EWR is JFK
1 0
0 1
0 0

After reviewing the model, we saw that this variable did not affect as we expected since de
adjusted r^2 stayed pretty much the same and adding nonvalue variables just makes the
model harder to understand so we removed it.
We thought that another interesting variable could be the acceleration, since maybe even
though the airport departure was after the original hour, the pilot may increase the airplane
speed in order to compensate time and give a better service to the customers, so we looked
for the difference between the departure delay and the arrival delay so from the ones that
improved for more than 30 min we consider that they accelerate and we did a similar binary
table so we could integrate this variable in the model (distance and air time), and the result
remained the same.
Then, we decided to add directly the variables distance and airtime and we finally got an
increase in our adjusted r square to .88. We thought this was a very good number, but we
wanted to try to improve it a little bit more, so then we kept thinking and looked for a way
to relate that since the departure was late and the distance was considerable (200km) the
airplane could accelerate, so we relate flights that accelerated to the company that they
belong to, and we obtained that our model just got better by .05.
Another thing we did was to build a correlation analysis to know which airplane companies
had more correlation with the arrival delay so we could filter good from bad companies, we
still took as delay >=30 min. but we obtain a positive correlation for .04 and .06 from 2 of
the companies, but we consider this to low. So, we conclude that the air company doesn’t
affect much.
Then one of the only things that we had left and that we consider important was the date,
sounds very logical that if you travel on a holyday or any busy dates, you have more
probability of getting late to your destiny since the number of flights in the airport is higher
than in normal days. So we did a graph related to the number of flights each day from each
month, and we discovered that there was a tendency in which there is much less number
of flights on January and February and this flights had more arrival delay than the other
ones during the year so that made us think that since we are analyzing new York and winter
there hits strong, this may be due to snowfalls, so we integrate this variable in our model
but the r^2 almost stayed the same. But we also see that there were some days in the week
that the delay was above the rest of them, so we did a correlation analysis comparing day
of the week vs arrival delay and “winter” but the model didn’t improve.
We got a little frustrated at this point since we already tried a lot of ways to integrate the
different variables but none of them seemed to work but in the correlation analysis, we saw
that on winter 2014 the correlation of delay was high, so we looked in the news for the
exact days that had a lot of arrival delays and we found out that there has been a huge
snowfall on all of them so we concluded that if you know that there is a possibility of a
snowfall in the date that you are planning to travel the delay will increase. But suddenly we
realized that the variables that we were analyzing all of them had to do with departure delay
so they were already represented by that variable and that was why our model wasn’t
getting any better, so then we were going to focus only on the ones that can affect de arrival
delay independently to the departure, and this was the airport destiny.
We calculated the average arrival delay for each destiny airport, and we created a binary
variable that was 1 for the airports with an average arrival delay above 10 minutes and 0
for the rest. We ran the regression analysis, and the model remained the same, so we
concluded that the destination airport doesn’t have any significant effect and the arrival
delay.

Figure 1 (Average arrival delay related to date)

Figure2 (Number of flights per day)


Figure 3 (Inicial correlation analysis)

Figure 4 (Last regression with adjusted r square of .888)


Figure 5 (Prediction model)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy