0% found this document useful (0 votes)
241 views24 pages

"Never Assume You Can't Do Something. Push Yourself To Redefine The Boundaries." Brian Chesky, CEO of Airbnb

The document summarizes a proposed solution to predict the correct rental price for Airbnb listings given various property features. It discusses analyzing a dataset of 77,000 Airbnb listings, cleaning and preprocessing the data, selecting important features, and creating random forest and XGBoost regression models to predict price bins and estimated prices. Challenges addressed include outliers, misleading data, and hosts having flexibility to choose prices. The models accurately predict higher prices in central London locations and prices increasing with more accommodations.

Uploaded by

Lara Ivanovic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
241 views24 pages

"Never Assume You Can't Do Something. Push Yourself To Redefine The Boundaries." Brian Chesky, CEO of Airbnb

The document summarizes a proposed solution to predict the correct rental price for Airbnb listings given various property features. It discusses analyzing a dataset of 77,000 Airbnb listings, cleaning and preprocessing the data, selecting important features, and creating random forest and XGBoost regression models to predict price bins and estimated prices. Challenges addressed include outliers, misleading data, and hosts having flexibility to choose prices. The models accurately predict higher prices in central London locations and prices increasing with more accommodations.

Uploaded by

Lara Ivanovic
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 24

“Never assume you can't do something. Push yourself to redefine the boundaries.


Brian Chesky, CEO of Airbnb
Agenda
• Problem Statement
• Data Analysis & Visualization.
• Proposed Solution
• Solution Details
• Challenges
• Model Creation
• Insights
• Conclusion
Problem Statement
• Airbnb is an online marketplace which allows users to post lisings on their website and it
earns commissions from every booking
• At present when someone wants to list an Airbnb rental, they have to manually analyze
similar properties near their location and decide the price themselves
• Idea of our project is to form a model to estimate what the correct price of their rental
should be given the features of their property.
• Dataset: 77000+ records and has 97 columns.

Dataset source: http://insideairbnb.com/get-the-data.html


Data Analysis & Visualization.
Correlation Matrix:

• After visualization of
correlation between
different features, 'beds'
column is having strong
correlation with
accommodation.
• 'beds' column is removed.
London-borough vs price
• This plot shows
relationship of
london_borough and price
in data using barplot().

• From this plot


Westminister, Kensington
and Chelsea, City of
London has higher prices
where the black lines are
the error bars that shows
uncertainty in values
which is highest for
Enfield.
Property-types vs price.
• This plot shows
relationship of
property_type and price in
data using barplot().

• From this plot, we can


see that there are some
property types which are
present in very few
listings. So such
categories were clubbed
together into ‘Other’.
Bedrooms vs price
• This visualization shows how
spread out the bedrooms and
price in data are.

• We can see from the plot that


the lower bound price of each
bedrooms are overlapping.

• This shows that the users are


free to post any price for their
listings without any
moderation from Airbnb which
makes our task more
challenging
Proposed Solution
Solution Details: Cleaning & Preprocessing Phase
• Removed the unwanted features from our dataset.

• Removed the features which could leak information(Reviews)

• Performed Exploratory Data Analysis (EDA) in order to remove categorical columns which had
unbalanced data

• Performed Data Cleaning to standardize the columns and remove noise which improves the
quality of the training data for analytics and gives better decision-making.

• Developed a common function which could identify category wise outliers in the feature passed
to it and give flexibility to either remove or replace that value with threshold

• Handled the misleading information in our data.


E.g.: Some listings mentioned the room type as Shared room and the price mentioned was for 1
person but the features of the house specifies how many people in total it can accommodate.
Feature Selection & Feature Engineering
• After performing data cleaning and removing the unwanted features from our data we
have selected feature like 'amenities' for feature engineering.

• Sample value from Amenities columns:


{TV,"Cable TV",Internet,Wifi,Breakfast,"Pets live on this
property",Dog(s),Heating,"Family/kid friendly","Fire extinguisher",Shampoo}

• We transformed it into individual features using the MultiLabelBinarizer

• We then checked the correlation between each amenity and discarded the amenities
having very high correlation.
For eg: bathroom essentials, Bath towel and cooking basics, Dishes and silverware
etc.

• Removed amenities which are most common or most uncommon from the data.

• The data also has different property type values, so we are replacing those values
with “Others” where count is below or equal to 100.
Challenges:
• After Exploratory Data Analysis(EDA) on our
data we found that some records in 'price'
column were suspicious that could lead to
incorrect model.

• On careful analysis, we found out that there


was a problem in Airbnb website which
resulted in traditional web scrappers which
relied on tools like Beautifulsoup to fetch
incorrect prices.

• So we used selenium package which is


capable of scraping JS rendered pages to
get the correct price for such listings.

• Data also had many listings with misleading


information.
For eg: Host wants to give private room for
rent for one guest but has mentioned the total
number of bedrooms in her house
Model Selection and Creation
• After feature engineering step we have created 2 bins for 'price' from 0-100 & 101-2001.
• Splitting the data into Train and Test set(70-30).
• Before performing Regression we have first done Classification to predict Price_bins.
• We chose Random Forest and Logistic Regression because we wanted a algorithm which
would allow to assign class weights to handle class imbalance problem

Classification Models Accuracy Precision Recall


Train Test
Random Forest 0.8873 0.8688 0.88 0.87
Classifier
Logistic Regression 0.8683 0.8633 0.87 0.86
Vote Classifier 0.8788 0.8668 0.87 0.87
• After performing Classification on price_bins we have built XGBRegressor model for
each price bin.
• We have trained the model on log transformed Target variable as price is a relative
term.
• We have used L1 regularization to prevent overfitting

Regression Models Median Absolute Error Median Absolute


Train Error
Test
XGBRegressor For Price_bin 1 4.9123 7.8473
XGBRegressor For Price_bin 2 16.73 22.54
Insights
• Importance of
Accommodates
in the model

• The predicted
price increases
as
accommodates
increases
• Importance of
Availability in the
model
• There is steady
increase in
predicted price
as the
availability
increases
• Importance of
extra_people in
the model
• There is decline
is in predicted
price as the
listings start
charging more
for extra people
• Importance of
minimum_nights
in the model
• There is decline
is in predicted
price as the host
start increasing
the minimum
nights
• Importance of
room_type in the
model
• Entire home can
fetch the
maximum price
whereas shared
room fetches
minimum price
• Importance of
property_type in
the model
• A hotel can fetch
the maximum
price whereas an
hostel fetches
the minimum
price
• Importance of location in the
model

• The model is accurately able to


predict that maximum rental price
predicted is in centre London.

• The lat, long combination is near


Buckingham Palace
Conclusio
n
• There are a variety of ways that UK residents can choose to host on Airbnb. Just over
half of the hosts on the platform choose to rent out their entire home.

• This may be their primary residence that they make available to Airbnb guests when
they themselves go on holiday, or it may be a second home in a city .

• A large proportion of hosts on Airbnb share their home by listing a private bedroom in
their primary residence. This allows hosts the flexibility to maximise space in their
home, benefiting from the additional income, and social interaction, without having to
commit to a full-time tenant.

• What makes predicting the rental price most challenging is that Airbnb gives the host
complete flexibility to choose their rental price. So a 5 bedroom house can range
anywhere between 25$ to 2000$+

• We chose this Dataset over any Kaggle dataset because we wanted to analyse a real
world usecase instead of just creating a model on pre-cleaned data.
Honour Code & Team
Contribution
• We hereby declare that solution developed by us is entirely our work and not
plagiarized by any means.

• Vrushank Gude-30%
• Shehzada Alam-20%
• Sagar Bhutada-15%
• Sameer Pophali-20%
• Shreyas Wankhede-15%
Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy