0% found this document useful (0 votes)
73 views26 pages

SLF - ReCell Project - Presentation

The document analyzes the refurbished phones/tablets market, highlighting key factors influencing the price of used devices, such as screen size and camera resolution. A linear regression model was developed to predict prices, revealing that the price of used devices is significantly affected by the price of new devices and brand popularity. Recommendations include focusing on customer demographics for deeper insights and recognizing that certain features have minimal impact on pricing.

Uploaded by

sanjaycj99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views26 pages

SLF - ReCell Project - Presentation

The document analyzes the refurbished phones/tablets market, highlighting key factors influencing the price of used devices, such as screen size and camera resolution. A linear regression model was developed to predict prices, revealing that the price of used devices is significantly affected by the price of new devices and brand popularity. Recommendations include focusing on customer demographics for deeper insights and recognizing that certain features have minimal impact on pricing.

Uploaded by

sanjaycj99
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Analysis of Refurbished

Phones/Tablets Market
ReCell _ Supervised Learning -Foundations
29-Jul-2023

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Contents / Agenda

● Executive Summary

● Business Problem Overview and Solution Approach

● EDA Results

● Data Preprocessing

● Model Performance Summary

● Appendix

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Executive Summary
Based on the analysis following are actionable insights and recommendations.
Actionable Insights:
1) Screen size, main camera mp, selfie camera mp, ram, days used, normalized new price and 4g_yes have
positive coefficients which means that the price of used devices will increase with the increase of these
variables.
2) The price of used devices is highly dependent on the price of new device. A unit increase in the price of
new device will result in increasing the price of the used device by 0.428 unit assuming all other
variables are constant.
3) Years since release has a negative coefficient that means the older the phone the lower the price of the
used device.
4) The brand name Lenovo, Nokia, Xiaomi seem to increase the price of the used device; they may be in
demand.
Recommendations:
1. The features such as number of days used, battery and weight seem to have no impact on the price of the
used device. It may be inferred that the dealers maintain certain quality of the used devices by making
necessary checks in order to keep the refurbished market attractive. Therefore the company should not be
over concerned about the stated factors.
2. In the analysis adding factors like gender, age and income of the customers can give more insight about
the used devices market.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Business Problem Overview and Solution Approach
Problem Overview / Statement
Over the past decade the use of refurbished devices has increased significantly. “As per the forecast by
2023 the market would be worth $52bn with compound annual growth rate of 13.6% w.r.t. 2018.”
Apart from offering cost-effective options to both buyers and sellers, the used phone market bids a
number of other benefits such as,
● the used devices can also be sold with warranties,
● the consumption of used phones helps in reducing environmental impact and its negative effects on
the health of people involved in manufacturing phones/tablets,
● Also, third party vendors provide attractive offers to customers for refurbished devices, etc.

Start-ups such as ReCell are intended to venture this growing market and are interested in knowing the
price of the used devices and the factors influencing their price.

Solution approach / methodology


Using the given data of refurbished devices a linear regression model was built by Python software to
prdict the price of used devices and the factors effecting the price of the used phones/tablets.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Overview
Missing values

Observations:
● There are 15 columns and 3454 entries or row in the data frame
● Out of 15 columns, the data type of 9 columns are floats, 2 are integers and 4 are object.
● 6 columns have missing values as shown in the table of missing values
● Statistical summary shows that for some of the features the mean value is slightly higher than the
median that means their distribution would be slightly skewed to the right while for days_used mean
is less than median so its distribution is expected to be left skewed.
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results_Univariate Analysis

Observations:
● The data of normalized used price, normalized new price and
screen size follows approximately normal distribution with
outliers on both sides
● The battery data is slight skewed to the right with outliers on
the right side only.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results_Univariate Analysis (Cont’d)

Observations:
● The resolution of main camera seems to follow normal distribution
with outliers on the right side only
● The resolution of selfie camera has right skewed distribution with
outliers on the right side only
● The weight column follows right skewed distribution with large
number of outliers on the right and few on the left side
● The number of days used shows left skewed distribution with no
outliers
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results_Univariate Analysis (Cont’d)

Observations:
● Most of the brand names are masked by column others
● Samsung seems to be highly demanded brand among all
● About 93% refurbished devices have Android operating system
● In the refurbished market availability of 4G devices is higher
compared to 5G devices
● In the refurbished devices most of them were released in 2014

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results_Bivariate Analysis

Observations:
● The heat map shows high positive correlation between normalized new
price and normalized used price
● There exists high positive correlation between weight and screen size
● Also, battery and screen size are highly correlated
● The box plot indicates that OnePlus offers more ram to devices

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results_Bivariate Analysis (Cont’d)
Brand vs high selfie camera
Brand vs Large screen size

Observations: Brand vs high main camera


● The box plot shows that Apple devices are the heaviest ones and it’s
because Apple offers devices with batteries having high energy capacity
● Huawei has the highest percentage of devices (13.6%) with screen size >
6”
● Huawei has the highest percentage of devices (13.3%) with selfie camera
greater than 8 mp
● Sony has the highest percentage of devices (39.4%) with main camera
greater than 16 mp
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results_Bivariate Analysis (Cont’d)

Observations:
● The line plot shows that the price of used devices increases with year of release
● The box plots indicate that the price of used devices with 5g is higher than the ones with 4g

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Preprocessing_ Missing value treatment

Table-1 Table-2 Table-3 Table-4

Table-1: Pre treatment missing values in all the columns


Table-2: Results after imputing the missing values in the data by the column medians grouped by “release year”
and “brand name”.
Table-3: Results after imputing the remaining missing values in the data by the column medians grouped by “brand
name”.
Table-4: Results after imputing the remaining missing values in the “main camera mp” by the column median.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Preprocessing_Duplication check & Feature engg.
(Cont’d) Duplicate value check results

Observations:
● Duplication check showed that none of a
complete rows is duplicated
● In feature engineering a new column called
years_since_release was introduced and the
release_year column was dropped; Feature
engineering results table shows the statistics Feature engineering results
of the newly introduced column.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Preprocessing _Outliers check (Cont’d)

Observations:
● Except for year_since_release and days_used outliers are present in all the numerical columns.
● In this analysis the outliers will not be treated, as treating them may result in losing useful
information.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Preprocessing_Data preparation for modeling
(Cont’d)

Observations:
● Following introducing dummies for categorical columns the number of column increased to
49 from 15
● After splitting between train and test data 2417 rows assigned to train data and 1037 to test
data.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Overview of initial ML model (Before applying VIF & P-Value check)

Observations:
● Adj. R-squared value is 0.842 that shows the model
is reasonably good.
● The constant is 1.31, which means that in the initial
model this much cannot be explained by the
predictor variables
● There is a long list of dummy variables introduced
due to categorical data (brands and operating
system) however, most of them have p-value >0.05
that means they do not have significant impact on
the model hence will be dropped from the final
model.
● By dropping high p-value variables the number
of predictor variables dropped from 45 to 11 in
the final model. (See final model in next slide)

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Summary of final ML model factors for prediction

Observations:

1) Screen size, main camera mp, selfie camera mp, ram,


days used, normalized new price and 4g_yes have
positive coefficients which means that the price of used
devices will increase with the increase of these variables.
2) The price of used devices is highly dependent on the
price of new device. A unit increase in the price of new
device will result in increasing the price of the used
device by 0.428 unit assuming all other variables are
constant.
3) Years since release has a negative coefficient that means
the older the phone the lower the price of the used
device.
4) The brand name Lenovo, Nokia, Xiaomi seem to increase
the price of the used device; they may be in demand.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Training & Test data performance metrics
Observations:

● The model can explain ~0.84% of the variance which


means that the model is not under fitted
● Low and comparable training and test RMSE and MAE
indicate that the model is not over fit. Test performance
● Model can predict the price of a used device within a mean
error of 0.185 on the test data.
● The MAPE on the test data suggests that the price of the
used device can be predicted within 4.5% of its value.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
APPENDIX

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Data Background and Contents
The data contains the different attributes of used/refurbished phones and tablets. The data was
collected in the year 2021. The detailed data dictionary is given below.
● brand_name: Name of manufacturing brand
● os: OS on which the device runs
● screen_size: Size of the screen in cm
● 4g: Whether 4G is available or not
● 5g: Whether 5G is available or not
● main_camera_mp: Resolution of the rear camera in megapixels
● selfie_camera_mp: Resolution of the front camera in megapixels
● int_memory: Amount of internal memory (ROM) in GB
● ram: Amount of RAM in GB
● battery: Energy capacity of the device battery in mAh
● weight: Weight of the device in grams
● release_year: Year when the device model was released
● days_used: Number of days the used/refurbished device has been used
● normalized_new_price: Normalized price of a new device of the same model in euros
● normalized_used_price: Normalized price of the used/refurbished device in euros
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
EDA Results_Univariate Analysis

Observations:
● The ram column has normal distribution with outliers on both sides
● The internal memory has right skewed distribution with outliers on the right side only

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Model Assumptions
Followings are the assumptions of the model:

1. No Multicollinearity
2. Linearity of variables
3. Independence of error terms
4. Normality of error terms
5. No Heteroscedasticity

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Linear regression assumptions check
Test for Multicollinearity
● 7 variables [screen_size, weight ,brand_name_Apple, brand_name_Huawei, brand_name_Others, brand_name_Samsung, os_iOS] observed to
have VIF > 5 (see Table-1 in appendix)
● By dropping brand_name_Apple and , brand_name_Others & weight variables VIF for all predictor variables dropped below 5 (see Table-2
in appendix) therefore the assumption of multicollinearity is satisfied.

Test for Linearity & Independence


● Residual vs Fitted values plot shows no pattern hence the
assumptions of linearity and independence are satisfied.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Linear regression assumptions check (Cont’d)

Test for Normality


● A bell shape curve on the histogram shows close to normal distribution

● On the Q-Q, plot except for tail values the residuals lie on a straight line indicating that
the distribution is approximately normal

● As per Shapiro test the residuals are not normally distributed as p-value less than 0.05

● Test results indicate that the distribution is not precisely normal but it can be assumed
to be close to normal

Test for Homoscedasticity


● The homoscedasticity assumption is satisfied as the p-value is greater than 0.05.

Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Results of Multicollinearity treatment

Table-3
Table-1 Table-2
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.
Happy Learning !

26
Proprietary content. © Great Learning. All Rights Reserved. Unauthorized use or distribution prohibited.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy