"Never Assume You Can't Do Something. Push Yourself To Redefine The Boundaries." Brian Chesky, CEO of Airbnb
"Never Assume You Can't Do Something. Push Yourself To Redefine The Boundaries." Brian Chesky, CEO of Airbnb
”
Brian Chesky, CEO of Airbnb
Agenda
• Problem Statement
• Data Analysis & Visualization.
• Proposed Solution
• Solution Details
• Challenges
• Model Creation
• Insights
• Conclusion
Problem Statement
• Airbnb is an online marketplace which allows users to post lisings on their website and it
earns commissions from every booking
• At present when someone wants to list an Airbnb rental, they have to manually analyze
similar properties near their location and decide the price themselves
• Idea of our project is to form a model to estimate what the correct price of their rental
should be given the features of their property.
• Dataset: 77000+ records and has 97 columns.
• After visualization of
correlation between
different features, 'beds'
column is having strong
correlation with
accommodation.
• 'beds' column is removed.
London-borough vs price
• This plot shows
relationship of
london_borough and price
in data using barplot().
• Performed Exploratory Data Analysis (EDA) in order to remove categorical columns which had
unbalanced data
• Performed Data Cleaning to standardize the columns and remove noise which improves the
quality of the training data for analytics and gives better decision-making.
• Developed a common function which could identify category wise outliers in the feature passed
to it and give flexibility to either remove or replace that value with threshold
• We then checked the correlation between each amenity and discarded the amenities
having very high correlation.
For eg: bathroom essentials, Bath towel and cooking basics, Dishes and silverware
etc.
• Removed amenities which are most common or most uncommon from the data.
• The data also has different property type values, so we are replacing those values
with “Others” where count is below or equal to 100.
Challenges:
• After Exploratory Data Analysis(EDA) on our
data we found that some records in 'price'
column were suspicious that could lead to
incorrect model.
• The predicted
price increases
as
accommodates
increases
• Importance of
Availability in the
model
• There is steady
increase in
predicted price
as the
availability
increases
• Importance of
extra_people in
the model
• There is decline
is in predicted
price as the
listings start
charging more
for extra people
• Importance of
minimum_nights
in the model
• There is decline
is in predicted
price as the host
start increasing
the minimum
nights
• Importance of
room_type in the
model
• Entire home can
fetch the
maximum price
whereas shared
room fetches
minimum price
• Importance of
property_type in
the model
• A hotel can fetch
the maximum
price whereas an
hostel fetches
the minimum
price
• Importance of location in the
model
• This may be their primary residence that they make available to Airbnb guests when
they themselves go on holiday, or it may be a second home in a city .
• A large proportion of hosts on Airbnb share their home by listing a private bedroom in
their primary residence. This allows hosts the flexibility to maximise space in their
home, benefiting from the additional income, and social interaction, without having to
commit to a full-time tenant.
• What makes predicting the rental price most challenging is that Airbnb gives the host
complete flexibility to choose their rental price. So a 5 bedroom house can range
anywhere between 25$ to 2000$+
• We chose this Dataset over any Kaggle dataset because we wanted to analyse a real
world usecase instead of just creating a model on pre-cleaned data.
Honour Code & Team
Contribution
• We hereby declare that solution developed by us is entirely our work and not
plagiarized by any means.
• Vrushank Gude-30%
• Shehzada Alam-20%
• Sagar Bhutada-15%
• Sameer Pophali-20%
• Shreyas Wankhede-15%
Thank You