620 Case Study3
620 Case Study3
Status
The FlightDelays.csv file contains a data set with the information on all commercial flights departing the
Washington, DC area and arriving at one of the New York area airports. For each flight, there is information
on the departure and arrival airports, the distance of the route, the scheduled time and day of the flight,
and so on. The outcome that you need to predict is the flight arrival status, i.e., whether a flight is delayed
or on time (FL_STATUS). A delay is defined as an arrival that is at least 15 minutes later than scheduled.
The table below descries each of the columns in the data set.
Questions
3. Compare results of logistic regression model vs. classification tree model for the same data set.
a. Present and compare in your report the validation confusion matrix for the logistic
regression model in 2c of this case versus the validation confusion matrix using the
GridSearchCV() algorithm for the classification tree in the previous case study #2. Using
the accuracy value (misclassification rate), which model would you recommend applying
for classification (prediction) of flight arrival status? Briefly explain your answer.