Flight Price Predictions
Flight Price Predictions
Surandai
Tenkasi (Dt.) - 627859.
APRIL - 2023
Team ID : NM2023TMID22068
01
Introduction 04
Problem Definition & Design
02 07
Thinking
03 Result 09
04 Advantages &Disadvantages 10
05 Application 12
06 Conclusion 13
07 Future Scope 14
08 Appendix 16
INTRODUCTION:
machine learning.In this project, we will be exploring how we can leverage machine
learning techniques to predict flight ticket prices and help customers save money
Airline ticket prices are highly dynamic and can fluctuate rapidly based on
using machine learning algorithms, we can analyze historical data and predict
Our goal is to build a predictive model that can accurately forecast future
ticket prices and provide recommendations to users on the optimal time to book
their flights. This project will involve data collection, data cleaning, feature
about their flight bookings and save money by booking at the right time. We look
forward to embarking on this exciting journey with you!learning models will become
even more accurate and effective in predicting flight delays, enabling the industry
enable travelers to make informed decisions about when to book their tickets and
prices, as well as information about various factors that affect ticket prices, such as
time of year, airline, and route. Using this data, the machine learning model will be
The benefits of this project are numerous. For travelers, it can help them
save money by identifying the optimal time to book tickets. For airlines, it can
improve their revenue management by allowing them to better predict demand and
adjust prices accordingly. Overall, this project has the potential to revolutionize the
way people purchase airline tickets, making it more convenient and cost-effective.
Overview:
balance their budget constraints with their preferred travel dates and airline
preferences. Machine learning (ML) can help travelers make more informed
market trends.
identify patterns and trends that influence flight prices. By using these
With the help of ML-powered price prediction tools, travelers can optimize
their flight booking decisions and save money by booking at the right time
and choosing the best airline and travel dates for their needs. Additionally,
airlines and travel agencies can benefit from these tools by improving their
Overall, the use of ML in flight booking can greatly improve the travel
experience for both travelers and businesses, making it easier and more
Purpose:
The purpose of using machine learning for optimizing flight booking decisions
through price predictions is to help travelers make more informed decisions about
their flights, leading to cost savings and a better travel experience. The project aims
to provide accurate predictions of flight prices based on historical data and current
market trends, which can help travelers determine the best time to book their
flights and choose the most cost-effective travel dates and airlines.
The use of machine learning in this project also benefits airlines and travel
Overall, the purpose of this project is to leverage the power of machine learning to
improve the travel experience for both travelers and businesses, making it easier
Empathy map:
Brainstroming Map:
Result:
The use of machine learning for optimizing f;light booking decisoons
through price prediction has the potential is the revolitionize the travel
machine learning algorithms can make accures price predictions and help
Advantages:
Cost savings:
of data and predict the best time to buy tickets, which can help customers
If customers can book flights at lower prices, they are more likely to
Competitive advantage:
resources, including planes and crew, reducing the risk of flying with empty
seats.
Disadvantages:
Limited accuracy:
Machine learning algorithms are only as good as the data they are
trained on, and there is always a risk of inaccurate predictions. This can
flights.
Ethical concerns:
There is a risk that airlines may use the information they gather
Technical challenges:
the areas of online travel agencies, airline websites, and travel search
decisions, companies can provide their customers with more accurate and
This also extends to airlines and travel agencies, as they can use
algorithms can make accurate price predictions and help travelers make more
The benefits of using machine learning for flight booking decisions include
increased accuracy, time and cost savings, customization, and improved pricing
strategies for airlines and travel agencies. However, there are potential drawbacks,
such as the need for accurate and up-to-date data, technical expertise, unforeseen
The future scope of this project is vast and promising, with the
potential to further improve the travel experience for both travelers and
businesses.
import numpy as nm
import pandas as pd
import warnings
import pickle
warnings.filterwarnings('ignore')
data.head()
for i in data:
print(i,data[i].unique())
data.info()
data.Date_of_Journey=data.Date_of_Journey.str.split('/')
data.Date_of_Journey
data['Month']=data.Date_of_Journey.str[1]
data['Year']=data.Date_of_Journey.str[2]
data.head()
data.Total_Stops.unique()
data.Route=data.Route.str.split('->')
data.Route
data['City1']=data.Route.str[0]
data['City2']=data.Route.str[1]
data['City3']=data.Route.str[2]
data.head()
#data.dropna(inplace=True)
#data.isnull().sum()
#In the similar manner, we split the Dep_time column, and create separate
columns for departure hours and minutes
data['Dep_hour'] = pd.to_datetime(data['Dep_Time']).dt.hour
data['Dep_min'] = pd.to_datetime(data['Dep_Time']).dt.minute
data.drop('Dep_Time',axis=1,inplace=True)
data['Arrival_hour'] = pd.to_datetime(data['Arrival_Time']).dt.hour
data['Arrival_min'] = pd.to_datetime(data['Arrival_Time']).dt.minute
data.drop('Arrival_Time',axis=1,inplace=True)
data.head()
data.Duration=data.Duration.str.split('')
data['Travel_Hours']=data.Duration.str[1]
data['Travel_Hours']=data['Travel_Hours'].str.split('h')
data['Travel_Hours']=data['Travel_Hours'].str[0]
data.Travel_Hours=data.Travel_Hours
data['Travel_Mins']=data.Duration.str[0]
data['Travel_Mins']=data.Travel_Mins.str.split('m')
data['Travel_Mins']=data.Travel_Mins.str[1]
data.head()
data.Additional_Info.unique()
data.isnull().sum()
categorical=['Airline','Source','Destination','Additional_Info','City1']
numerical=['Total_Stops','Date','Month','Year','Dep_Time_Hour','Dep_Time_Mins','Arr
ival_Date','Arrivel_Time_Hour','Arrival_Time_Mins','Travel_Hours','Travel_Mins']
data.head()
#Label Encoder
from sklearn.preprocessing import LabelEncoder
le=LabelEncoder()
data.Airline=le.fit_transform(data.Airline)
data.Source=le.fit_transform(data.Source)
data.Destination=le.fit_transform(data.Destination)
data.Total_Stops=le.fit_transform(data.Total_Stops)
data.Additional_Info=le.fit(data.Additional_Info)
data.head(10)
data=data[['Airline','Source','Destination','Date','Month','Year','Dep_hour','Dep_min','
Arrival_hour','Arrival_min','Price']]
data.head()
#Descriptive Stastical
data.describe()
#Visual Analysis
c=1
plt.figure(figsize=(20,45))
categorical=['Airline','Source','Destination','Additional_Info']
#for i in categorical:
# plt.subplot(6,3,c)
#sns.countplot(x=data[i])
#plt.xticks(rotation=90)
#plt.tight_layout(pad=3.0)
#c=c+1
#plt.show()
plt.figure(figsize=(15,8))
sns.displot(data.Price)
y=data['Price']
x=data.drop(columns=['Price'],axis=1)
x_scaled=pd.DataFrame(x_scaled,columns=x.columns)
x_scaled.head()
x_train.head()
rf=RandomForestRegressor()
gb=GradientBoostingRegressor()
ad=AdaBoostRegressor()
for i in[rf,gb,ad]:
i.fit(x_train,y_train)
y_pred=i.predict(x_test)
test_score=r2_score(y_test,y_pred)
train_score=r2_score(y_train,i.predict(x_train))
if abs(train_score-test_score)<=0.2:
print(i)
dt=DecisionTreeRegressor()
for i in[knn,svm,dt]:
i.fit(x_train,y_train)
y_pred=i.predict(x_test)
test_score=r2_score(y_test,y_pred)
train_score=r2_score(y_train,i.predict(x_train))
if abs(train_score-test_score)<=0.1:
print(i)
#for i in range(2,5):
# cv=cross_val_score(rf,x,y,cv=i)
#print(rf,cv.mean())
#Accuracy
rf=RandomForestRegressor(n_estimators=10,max_features='sqrt',max_depth=None)
rf.fit(x_train,y_train)
y_train_pred=rf.predict(x_train)
y_test_pred=rf.predict(x_test)
print("Test Accuracy",r2_score(y_train_pred,y_train))
print("Train Accuracy",r2_score(y_test_pred,y_test))
knn=KNeighborsClassifier(n_neighbors=2,algorithm='auto',metric_params=None,n_j
obs=1)
knn.fit(x_train,y_train)
y_train_pred=rf.predict(x_train)
y_test_pred=rf.predict(x_test)
print("Test Accuracy",r2_score(y_train_pred,y_train))
print("Train Accuracy",r2_score(y_test_pred,y_test))
#price_list=pd.DataFrame({'Price':data})
#price_list
data.head()