0% found this document useful (0 votes)

8 views41 pages

Project Re-Cell by Patel Dakshesh Maheshbhai

The document outlines a project titled 'Re-Cell' aimed at developing a dynamic pricing strategy for used and refurbished phones and tablets through data analysis and linear regression modeling. It includes sections on problem understanding, data overview, exploratory data analysis, data preprocessing, model building, and insights. The analysis reveals significant market growth for used devices, with Android dominating the market and various factors influencing pricing, such as camera resolution and battery capacity.

Uploaded by

daksheshpatel1419

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views41 pages

Project Re-Cell by Patel Dakshesh Maheshbhai

Uploaded by

daksheshpatel1419

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

PROJECT

re-cell

BY-PATEL DAKSHESH MAHESHBHAI

Content Sub-Content
1. Problem Statement  Business Context
and  Objective
 Data Description & Dictionary
Understanding
2. Data Overview  Checking the shape of the dataset
 Checking the datatype of columns of the dataset
 Checking duplicate values in the dataset
 Checking mission values

3. Exploratory Univariate Analysis:

Data
o What does the distribution of normalized used
Analysis device prices look like?
o What percentage of the used device market is
dominated by Android devices?

Bivariate Analysis:

 Which attributes are highly correlated with the

normalized price of a used device?
 The amount of RAM is important for the smooth
functioning of a device. How does the amount of
RAM vary with the brand?
 A large battery often increases a device's weight,
making it feel uncomfortable in the hands. How
does the weight vary for phones and tablets offering
large batteries (more than 4500 mAh)?
 Bigger screens are desirable for entertainment
purposes as they offer a better viewing experience.
How many phones and tablets are available across
different brands with a screen size larger than 6
inches?
 A lot of devices nowadays offer great selfie cameras,
allowing us to capture our favourite moments with
loved ones. What is the distribution of devices
offering greater than 8MP selfie cameras across
brands?

4. Data Preprocessing  Missing Value Imputation

 Feature Engineering
 Outlier Check
 Data Preparation for Modeling

2
5. Model Building –  OLS Regression
Linear  Model Performance Check
Regression
6. Linear  Test for Multicollinearity
Regression  Removing Multicollinearity
 Dropping high p-values
Assumption  Test for linearity for independence
 Test for Normality
 Test for Homoscedasticity

7. Final Model Summary

8. Insights and Recommendations

3
Problem Statement &
Understanding
 Business Context:
Buying and selling used phones and tablets used to be something that happened on a
handful of online marketplace sites. But the used and refurbished device market has
grown considerably over the past decade, and a new IDC (International Data
Corporation) forecast predicts that the used phone market would be worth $52.7bn by
2023 with a compound annual growth rate (CAGR) of 13.6% from 2018 to 2023. This
growth can be attributed to an uptick in demand for used phones and tablets that offer
considerable savings compared with new models.

Refurbished and used devices continue to provide cost-effective alternatives to both

consumers and businesses that are looking to save money when purchasing one. There
are plenty of other benefits associated with the used device market. Used and
refurbished devices can be sold with warranties and can also be insured with proof of
purchase. Third-party vendors/platforms, such as Verizon, Amazon, etc., provide
attractive offers to customers for refurbished devices. Maximizing the longevity of
devices through second-hand trade also reduces their environmental impact and helps
in recycling and reducing waste. The impact of the COVID-19 outbreak may further
boost this segment as consumers cut back on discretionary spending and buy phones
and tablets only for immediate needs.

 Objective:
The rising potential of this comparatively under-the-radar market fuels the need for an
ML-based solution to develop a dynamic pricing strategy for used and refurbished
devices. Re-Cell, a startup aiming to tap the potential in this market, has hired you as a
data scientist. They want you to analyze the data provided and build a linear regression
model to predict the price of a used phone/tablet and identify factors that significantly
influence it.

4
 Data Description and Dictionary:
The data contains the different attributes of used/refurbished phones and tablets. The
data was collected in the year 2021. The detailed data dictionary is given below.

brand_name Name of manufacturing brand

os OS on which the device runs
screen_size Size of the screen in cm
4g Whether 4G is available or not
5g Whether 5G is available or not
main_camera_mp Resolution of the rear camera in megapixels
selfie_camera_mp Resolution of the front camera in megapixels
int_memory Amount of internal memory (ROM) in GB
ram Amount of RAM in GB
battery Energy capacity of the device battery in mAh
weight Weight of the device in grams
release_year Year when the device model was released
days_used Number of days the used/refurbished device has been used
normalized_new_price Normalized price of a new device of the same model in
euros
normalized_used_price Normalized price of the used/refurbished device in euros

 There are 34 different phone brands, 4 operating systems, and phones support
either 4G or 5G with "yes" or "no" values.
 Android is the most popular operating system, with 3,246 phones using it.
 2,359 phones have 4G connectivity, while only 152 phones support 5G.
 The average values for features like screen size, camera megapixels, internal
memory, battery, weight, and prices are greater than the median, indicating right-
skewed data.
 The average and median values for RAM are almost the same, showing little to
no skewness.
 The average number of days a used phone has been in use is less than the
median, indicating left-skewed data.

5
Data Overview
Checking the shape of the dataset
brand_ os screen 4 5 main_cam selfie_cam int_me ra batt wei release days_ normalized_ normalized_
name _size g g era_mp era_mp mory m ery ght _year used used_price new_price

0 Honor And 14.50 y n 13.0 5.0 64.0 3. 302 146. 2020 127 4.307572 4.715100
roid e o 0 0.0 0
s

1 Honor And 17.30 y y 13.0 16.0 128.0 8. 430 213. 2020 325 5.162097 5.519018
roid e e 0 0.0 0
s s

2 Honor And 16.69 y y 13.0 8.0 128.0 8. 420 213. 2020 162 5.111084 5.884631
roid e e 0 0.0 0
s s

3 Honor And 25.50 y y 13.0 8.0 64.0 6. 725 480. 2020 345 5.135387 5.630961
roid e e 0 0.0 0
s s

4 Honor And 15.32 y n 13.0 8.0 64.0 3. 500 185. 2020 293 4.389995 4.947837
roid e o 0 0.0 0
s

In this data we can find that there are 3,455 Rows and 15 Columns. A high percentage
of devices seem to be Androids. There are devices available from as late as 2020.

Checking the data types of the columns for the dataset

There are 11 numerical (float and integer) types and 4 object types in the dataset.
The target variable is the normalized price of a used device and is a float type.

6
Checking for duplicate values

Checked for duplicate values and there are no duplicate values in this
dataset.

Checking for missing values

There are missing values for the following columns:

 main_camera_mp
 selfie_camera_mp
 int_memory
 ram
 battery
 weight

7
Exploratory Data Analysis
Univariate Analysis:
Q1. What does the distribution of normalized used device prices
look like?

 normalized_used_price

The distribution of the normalized used price for the devices appears to look normal.
There seems to be outliers on both the lower and higher ends. The average normalized
used price for the devices is $4.36.

 normalized_new_price

The normalized new price distribution for devices resembles a normal curve, though it
includes outliers at both the lower and higher extremes. The average normalized new
price is $5.23.

8
 screen_size

The distribution of the screen size for the devices appears to not have a clear pattern,
although it most closely resembles a normal distribution. There seems to be outliers on
both the lower and higher ends. The average screen size for the devices is 13.71 cm.

 main_camera_mp

The distribution of the resolution of the main camera for the devices appears to be
skewed slightly left. There seems to be outliers on the upper end. The average
resolution of the main camera for the devices is 9.46 MP.

9
 selfie_camera_mp

The distribution of the resolution of the selfie camera for the devices appears to be
skewed slightly right. There seems to be outliers on the upper end. The average
resolution of the selfie camera for the devices is 6.55 MP.

 int_memory

The distribution of the amount of internal memory for the devices appears to be
skewed right. There seems to be outliers on the upper end. The average amount of
internal memory for the devices is 54.57 GB.

1
0
 ram

There does not seem to be discernable distribution for RAM in the devices. The average
amount of RAM for the devices is 4.04 GB.

 weight

The weight distribution of the devices is slightly right-skewed, with numerous outliers
at the higher end and a few at the lower end. The average weight is 182.75 grams.

1
1
 battery

The distribution of the energy capacity for the devices appears to resemble a
multimodal distribution. There seems to be outliers on the upper end. The average
energy capacity for the devices is 182.75 mAh.

 days_used

The distribution of the number of days a refurbished product is used appears to be

skewed slightly to the left. There seems to be no outliers. The average number of days
a refurbished product is used is about 675 days.

1
2
 brand_name

It appears that various other brand names are the most popular in the used device
industry, comprising of 14.5% of the market. This is followed by Samsung and Huawei
with 9.9% and 7.3%, respectively.

 os

Q2: What percentage of the used device market is dominated by

Android devices?

It appears that majority of the used phone market is dominated by Android devices as
they make up 93.1% of it. IOS has the smallest market share in the used phone market
with 1.0%.

1
3
 4g

Majority of the devices (67.6%) are available with 4g.

 5g

Majority of the devices (95.6%) are not available with 5g.

1
4
 release_year

Many of the devices (18.6%) had a release year of 2014. This is followed by 2013 and
2015 with 16.5% and 14.9%, respectively.

1
5
Bivariate Analysis
Q3: Which attributes are highly correlated with the normalized
price of a used device?

 Correlation Check

The normalized used price of a device is highly positively correlated with the
normalized new price, battery capacity, selfie camera resolution, and screen size. It is
negatively correlated with the number of days the device has been used.

 RAM

Q4: The amount of RAM is important for the smooth functioning

of a device. How does the amount of RAM vary with the brand?

There does not seem to be discernable distribution for RAM in the devices. The average
amount of RAM for the devices is 4 GB. The brand that has the most RAM is OnePlus
and the brand that has the least RAM is Celkon.

1
6
 Battery

Q5: People who travel frequently require devices with large

batteries to run through the day. But large battery often increases
weight, making it feel uncomfortable in the hands. Checking
how the weight varies for phones offering large batteries (more
than 4500 mAh).

The weight distribution of devices varies noticeably across brands, with none appearing
normally distributed and some showing outliers. Google has the heaviest devices, while
Micromax has the lightest.

1
7
 Screen

Q6: Bigger screens are desirable for entertainment purposes as

they offer a better viewing experience. How many phones and
tablets are available across different brands with a screen size
larger than 6 inches?

People who buy phones and tablets primarily for entertainment purposes prefer a large
screen as they offer a better viewing experience. The brand that has the greatest
number of devices with screen sizes larger than 6 inches is Huawei, taking up 13.6% of
the market. This is followed by Samsung and other miscellaneous brands with market
shares of 10.8% and 9%, respectively.

1
8
 Camera

Q7: A lot of devices nowadays offer great selfie cameras, allowing

us to capture our favorite moments with loved ones. What is the
distribution of devices offering greater than 8MP selfie cameras
across brands?

Huawei leads in the number of devices with selfie cameras over 8 MP, holding 13.3% of
the market, followed by Vivo at 11.9% and Oppo at 11.5%.

 Rear Camera

Rear cameras typically have higher resolution than front cameras, with a threshold of
16 MP set for analysis. Sony leads in devices with main cameras over 16 MP, capturing
39.4% of the market, followed by Motorola at 11.7% and other brands at 9.6%.

1
9
 Price

Prices of Used Device across years.

There appears to a positive relationship between the release year of the device and the
normalized used price. As the release year of the devices increases, the normalized
used price increases as well.

Prices for used phones and tablets offering 4G and 5G networks.

 It appears that devices with 4g availability have a higher normalized price than
devices that do not.
 It appears that devices with 5g availability have a higher normalized price than
devices that do not.
 Devices that possess 5g have a higher normalized price than devices that possess
4g.

2
0
Data Preprocessing:
 Missing Value Imputation

We will impute the missing values in the data by the column medians grouped by
release_year and brand_name.

There are 6 variables with missing values:

 main_camera_mp has 179 missing values

 selfie_camera_mp has 2 missing values
 int_memory has 4 missing values
 ram has 4 missing values
 battery has 6 missing values
 weight has 7 missing values

We will impute the remaining missing values in the data by the column medians
grouped by brand_name.

2
1
We will fill the remaining missing values in the main_camera_mp column by the column
median.

All missing values have been treated.

 Feature Engineering

 Let's create a new column years_since_release from the release_year column.

 We will consider the year of data collection, 2021, as the baseline.
 We will drop the release_year column.

2
2
 Outlier Check

Let's check for outliers in the data.

 There are quite a few outliers in the data

 However, we will not treat them as they are proper values

2
3
 Data Preparation for modeling

 We want to predict the normalized price of used devices

 Before we proceed to build a model, we'll have to encode categorical features
 We'll split the data into train and test to be able to evaluate the model that we
build on the train data
 We will build a Linear Regression model using the train data and then check it's
performance

Splitting the data in 70:30 ratio for train to test data

 Number of rows in train data = 2417

 Number of rows in test data = 1037

2
4
Model Building - Linear
Regression
Printing x_train and y_train datatype:

 OLS Regression

o Adjusted. R-squared: It reflects the fit of the model.

 Adjusted R-squared values generally range from 0 to 1, where a
higher value generally indicates a better fit, assuming certain
• conditions are met.
 In our case, the value for adj. R-squared is 0.845, which is good.

o const coefficient: It is the Y-intercept.

 It means that if all the predictor variable coefficients are zero, then
the expected output (i.e., Y) would be equal to the const coefficient.
 In our case, the value for const coefficient is 1.6815

o Coefficient of a predictor variable: It represents the change in the output Y

due to a change in the predictor variable (everything else held constant).
 For example, the coefficient of normalized_new_price is 0.4146.

2
5
OLS Regression Result

2
6
2
7
 Model Performance Check

Let us check the performance of the model using different metrics.

 We will be using metric functions defined in sklearn for RMSE, MAE, and R2 .
 We will define a function to calculate MAPE and adjusted R2 .
 We will create a function which will print out all the above metrics in one go.

2
8
Linear Regression
Assumptions
We will be checking the following Linear Regression assumptions:

 No Multicollinearity
 Linearity of variables
 Independence of error terms
 Normality of error terms
 No Heteroscedasticity

 TEST FOR MULTICOLLINEARITY

 We will test for multicollinearity using VIF.

 General Rule of thumb:
 If VIF is 1 then there is no correlation between the k th predictor and the
remaining predictor variables.
 If VIF exceeds 5 or is close to exceeding 5, we say there is moderate
multicollinearity.
 If VIF is 10 or exceeding 10, it shows signs of high multicollinearity.

Let's define a function to check VIF and removing multicollinearity:

Dropping os_iOS would have the maximum impact on the predictive power of the
model (amongst the variables being considered). We'll drop os_iOS and check the VIF
again.

2
9
VIF after dropping os_iOS:

Dropping the brand_name_Others:

Dropping brand_name_Others would have the maximum impact on the predictive

power of the model (amongst the variables being considered). We'll drop
brand_name_Others and check the VIF again.

3
0
VIF after dropping brand_name_Others:

Dropping the years_since_release:

Dropping years_since_release would have the maximum impact on the predictive

power of the model (amongst the variables being considered). We'll drop
years_since_release and check the VIF again.

VIF after dropping years_since_release:

3
1
Dropping the weight:

Dropping weight would have the maximum impact on the predictive power of the
model (amongst the variables being considered). We'll drop weight and check the VIF
again.

VIF after dropping weight:

There are no more predictors that have multicollinearity and the assumption is
satisfied.

3
2
Dropping high p-value variables (if needed):

 We will drop the predictor variables having a p-value greater than 0.05 as they do
not significantly impact the target variable.
 But sometimes p-values change after dropping a variable. So, we'll not drop all
variables at once.
 Instead, we will do the following:
 Build a model, check the p-values of the variables, and drop the column
with the highest p-value.
 Create a new model without the dropped feature, check the p-values of
the variables, and drop the column with the highest p-value.
 Repeat the above two steps till there are no columns with p-value > 0.05.

The above process can also be done manually by picking one variable at a time that
has a high p-value, dropping it, and building a model again. But that might be a little
tedious and using a loop will be more efficient.

Checking the p-values on the right dataset:

OLS Regression for updated dataset (no multicollinearity and no insignificant

predictors)

Observation:

The final model, olsmod2, includes predictor variables from x_train6, with no p-values
exceeding 0.05. It has an adjusted R-squared of 0.841, explaining ~84% of the variance.
Compared to olsmod1 (adjusted R-squared of 0.845), the dropped variables had
minimal impact. Comparable RMSE and MAE values for train and test sets confirm the
model is not overfitting.

3
3
3
4
Training Performance

Test Performance

 TEST FOR LINEARITY AND INDEPENDENCE

 We will test for linearity and independence by making a plot of fitted values vs
residuals and checking for patterns.
 If there is no pattern, then we say the model is linear and residuals are
independent.
 Otherwise, the model is showing signs of non-linearity and residuals are not
independent.

Creating a data frame with actual, fitted, and residual values and plotting the fitted
values vs residuals:

The scatter plot of residuals versus fitted values illustrates the distribution of errors. If a
pattern exists in the plot, it suggests non-linearity in the data, meaning the model does
not account for non-linear effects. Since no pattern is observed, the assumptions of
linearity and independence are met.

3
5
 TEST FOR NORMALITY

 We will test for normality by checking the distribution of residuals, by checking

the Q-Q plot of residuals, and by using the Shapiro-Wilk test.
 If the residuals follow a normal distribution, they will make a straight-line plot,
otherwise not.
 If the p-value of the Shapiro-Wilk test is greater than 0.05, we can say the
residuals are normally distributed.

The histogram of residuals does have a bell shape.

Checking Q-Q plot:

The residuals follow a straight line except

for the tails.

3
6
Shapiro Result (statistic=0.9748184084892273, pvalue=3.1171865175534697e-20)

 Since p-value < 0.05, the residuals are not normal as per the Shapiro-Wilk test.
 Strictly speaking, the residuals are not normal.
 However, as an approximation, we can accept this distribution as close to being
normal.
 So, the assumption is satisfied.

 TEST FOR HOMOSCEDASTICITY:

 We will test for homoscedasticity by using the goldfeldquandt test.

 If we get a p-value greater than 0.05, we can say that the residuals are
homoscedastic. Otherwise, they are heteroscedastic.

Goldfeldquandt Test: [('F statistic', 1.0643431899824787), ('p-value',.141633165194831)]

Since p-value > 0.05, we can say that the residuals are homoscedastic. So, this
assumption is satisfied.

Prediction on Test data set:

3
7
 We can observe here that our model has returned pretty good prediction results,
and the actual and predicted values are comparable.
 We can also visualize comparison result as a bar graph.

3
8
 Final Model Summary

3
9
Training Performance:

Test Performance:

 The model explains approximately 84% of the variation in the data.

 Train and test RMSE and MAE are low and comparable, indicating no overfitting.
 The MAPE on the test set suggests predictions are within 4.6% of the anime
ratings.
 The final model, olsmodel_final, is suitable for both prediction and inference.

4
0
 Actionable Insights and Recommendations

Insights:

o New vs. Used Price Relationship: The price of used devices strongly correlates
with their new counterparts. Higher-priced new devices lead to higher resale
value, making them key targets for refurbishment.
o Key Features Driving Value: Large screen sizes, higher RAM, better front and rear
cameras, and 4G connectivity significantly boost resale prices. Devices with these
specifications from specific brands are especially profitable.
o Impact of Brand and Features: While certain brands like Samsung and some main
camera configurations negatively affect resale prices, others contribute positively.

Recommendations:

 Prioritize refurbishing devices with large screens, high RAM, and excellent camera
specifications, as these are in high demand.
 Focus on newer models with high initial market prices to maximize revenue
potential.
 Expand to selling other used gadgets, such as smartwatches, to diversify offerings
and attract more customers.
 Collect and analyse customer demographics to refine product selection and cater
to different market segments.
 Retailers should avoid overstocking models or brands that do not retain value
well, like some Samsung models with specific camera configurations.

4
1

Maape
No ratings yet
Maape
13 pages
Automotive Scan Tool PID Diagnostics
From Everand
Automotive Scan Tool PID Diagnostics
Mandy Concepcion
4.5/5 (5)
Marketing Analytics
No ratings yet
Marketing Analytics
6 pages
Smoothed Bootstrap - Nelson-Siegel Revisited June 2010
No ratings yet
Smoothed Bootstrap - Nelson-Siegel Revisited June 2010
38 pages
Ajay Kumar Excel
No ratings yet
Ajay Kumar Excel
59 pages
Data Science-1
No ratings yet
Data Science-1
17 pages
A PPVC Report On "Google Playstore Insights" Department of Computer Science and Engineering (Data Science)
No ratings yet
A PPVC Report On "Google Playstore Insights" Department of Computer Science and Engineering (Data Science)
30 pages
EDA Mini - Report
No ratings yet
EDA Mini - Report
24 pages
ANN For Correlation
No ratings yet
ANN For Correlation
15 pages
Smartphone Survey Analysis Report
No ratings yet
Smartphone Survey Analysis Report
3 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
Aff700 1000 230109
No ratings yet
Aff700 1000 230109
9 pages
Swamy2003 Gmelina Arborea
No ratings yet
Swamy2003 Gmelina Arborea
18 pages
Predicting Mobile Data Usage
No ratings yet
Predicting Mobile Data Usage
15 pages
User Behavior - Phase 1
No ratings yet
User Behavior - Phase 1
6 pages
PM Guided Project Sample Business Report
No ratings yet
PM Guided Project Sample Business Report
35 pages
ReCell Project PDF
No ratings yet
ReCell Project PDF
21 pages
Understanding Regression Analysis: by Amy Gallo
No ratings yet
Understanding Regression Analysis: by Amy Gallo
16 pages
Introductory Econometrics Test Bank 5th Edi
100% (1)
Introductory Econometrics Test Bank 5th Edi
140 pages
IMT Workshop 4 070225
No ratings yet
IMT Workshop 4 070225
8 pages
MS Project
No ratings yet
MS Project
11 pages
The Stroop Effect in Greek Healthy Population: Normative Data For The Stroop Neuropsychological Screening Test
No ratings yet
The Stroop Effect in Greek Healthy Population: Normative Data For The Stroop Neuropsychological Screening Test
9 pages
Mobile Phone Price Prediction A Comparative Study
No ratings yet
Mobile Phone Price Prediction A Comparative Study
7 pages
Principles of Model Building
No ratings yet
Principles of Model Building
75 pages
Visvesvaraya Technological University, Belagavi
No ratings yet
Visvesvaraya Technological University, Belagavi
17 pages
Latent Variable Path Modeling With PLS-Verlag Heidelberg
No ratings yet
Latent Variable Path Modeling With PLS-Verlag Heidelberg
284 pages
The Analysis of The Relationship Between The User's Preference and The Smartphone Design
No ratings yet
The Analysis of The Relationship Between The User's Preference and The Smartphone Design
8 pages
SLF - ReCell Project - Presentation
No ratings yet
SLF - ReCell Project - Presentation
26 pages
Determinants of Income Diversification of Farm Households in The Western Region of Ghana
No ratings yet
Determinants of Income Diversification of Farm Households in The Western Region of Ghana
18 pages
Surveying Formulas
No ratings yet
Surveying Formulas
8 pages
Matco Foods - Report PDF
No ratings yet
Matco Foods - Report PDF
313 pages
ML Project (1) Final
No ratings yet
ML Project (1) Final
15 pages
Topic Four: IV. Box-Jenkins Methodology
No ratings yet
Topic Four: IV. Box-Jenkins Methodology
8 pages
Abstract Mobileexchangeprice
No ratings yet
Abstract Mobileexchangeprice
1 page
Marketing Research-Sm
No ratings yet
Marketing Research-Sm
21 pages
Mobile Phone Price Classification and Prediction - Final Project
No ratings yet
Mobile Phone Price Classification and Prediction - Final Project
7 pages
Predicting Mobile Phone Pricing Using Machine Learning
No ratings yet
Predicting Mobile Phone Pricing Using Machine Learning
8 pages
PM GRADED PROJECT Wagisha Jain
No ratings yet
PM GRADED PROJECT Wagisha Jain
21 pages
SPSS-RAK Faktorial
No ratings yet
SPSS-RAK Faktorial
61 pages
Statistics For Business
No ratings yet
Statistics For Business
7 pages
PGPM C3 Assignment1 ME
No ratings yet
PGPM C3 Assignment1 ME
10 pages
74 Ijcse2018 19
No ratings yet
74 Ijcse2018 19
7 pages
Get the Most Out of Your Android Phone in 2015
From Everand
Get the Most Out of Your Android Phone in 2015
Chris Navarre
No ratings yet
Machine Learning at Amazon
No ratings yet
Machine Learning at Amazon
2 pages
Plate Notebook Guided Project 1 1
No ratings yet
Plate Notebook Guided Project 1 1
58 pages
Analisis Dimensi Kualitas Pelayanan Jasa Terhadap Kepuasan Mahasiswa of
No ratings yet
Analisis Dimensi Kualitas Pelayanan Jasa Terhadap Kepuasan Mahasiswa of
14 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
26 pages
Case Study
No ratings yet
Case Study
3 pages
Robust Moving Least-Squares Fitting With Sharp Features
No ratings yet
Robust Moving Least-Squares Fitting With Sharp Features
9 pages
Smartphone Mafia
From Everand
Smartphone Mafia
Vijay Gomathinayagam
No ratings yet
Research Paper Yash 1
No ratings yet
Research Paper Yash 1
13 pages
Vietnam Stock Factor
No ratings yet
Vietnam Stock Factor
10 pages
Lecture 4 - Data Presentation
No ratings yet
Lecture 4 - Data Presentation
27 pages
Matched Pairs Design On Bestseller Book Prices On Amazon and Flipkart
No ratings yet
Matched Pairs Design On Bestseller Book Prices On Amazon and Flipkart
8 pages
BRM MCQ
50% (2)
BRM MCQ
44 pages
Ca1 Format All
No ratings yet
Ca1 Format All
13 pages
Flipkart
No ratings yet
Flipkart
11 pages
MAT2377F13 Midterm - Sol
No ratings yet
MAT2377F13 Midterm - Sol
9 pages
SLF Project SolutionNotebook
100% (1)
SLF Project SolutionNotebook
58 pages
Engineering Analysis & Statistics: Lect. # 11
No ratings yet
Engineering Analysis & Statistics: Lect. # 11
22 pages
New Empirical Model To Evaluate Groundwater Ow Into Circular Tunnel Using Multiple Regression Analysis
No ratings yet
New Empirical Model To Evaluate Groundwater Ow Into Circular Tunnel Using Multiple Regression Analysis
7 pages
Sidm Draft One
No ratings yet
Sidm Draft One
15 pages
BS Report
No ratings yet
BS Report
23 pages
Solution To DMOP Make Up Exam 2016
No ratings yet
Solution To DMOP Make Up Exam 2016
5 pages
Digital Transformation in Modern Business Success
No ratings yet
Digital Transformation in Modern Business Success
17 pages
Reg 2
No ratings yet
Reg 2
96 pages
Project Data Set Choice
No ratings yet
Project Data Set Choice
13 pages
Time Series Topics Using R/Rstudio: Oscar Torres-Reyna
No ratings yet
Time Series Topics Using R/Rstudio: Oscar Torres-Reyna
16 pages
Consumers Perception and Preference Towards Smartphone
No ratings yet
Consumers Perception and Preference Towards Smartphone
4 pages
Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)
No ratings yet
Used Car Price Prediction Using Machine Learning: Veluru Ranjith (Urk18Cs020)
26 pages
Analysis of Old Cars Data
No ratings yet
Analysis of Old Cars Data
32 pages
Moblie Price Classification 1
No ratings yet
Moblie Price Classification 1
11 pages
Acknowledgement:: WWW - Mega.pk
No ratings yet
Acknowledgement:: WWW - Mega.pk
3 pages
Business Analytics
No ratings yet
Business Analytics
5 pages
Calibration of Instrumental Methods
No ratings yet
Calibration of Instrumental Methods
12 pages
CA Assignment Group 1 RBA
No ratings yet
CA Assignment Group 1 RBA
17 pages
Balaji 1
No ratings yet
Balaji 1
30 pages
Ijmet 10 01 035
No ratings yet
Ijmet 10 01 035
8 pages
4.AIML - To Extract Features From Given Data Set and Establish Training Data
No ratings yet
4.AIML - To Extract Features From Given Data Set and Establish Training Data
2 pages
Prediction of Mobile Phone Price Class Using Supervised Machine Learning Techniques
No ratings yet
Prediction of Mobile Phone Price Class Using Supervised Machine Learning Techniques
4 pages
Final - Graph Till Summay Report
No ratings yet
Final - Graph Till Summay Report
12 pages
Survey On Mobile Phones and Their Defects
100% (4)
Survey On Mobile Phones and Their Defects
33 pages
Cashify Whitepaper 2020
No ratings yet
Cashify Whitepaper 2020
28 pages
Consumer Behavior: A Study of Consumer Purchase Behavior of Mobile Phones in India 2019-21
No ratings yet
Consumer Behavior: A Study of Consumer Purchase Behavior of Mobile Phones in India 2019-21
17 pages
Birla Institute of Management Technology: Analytic Hierarchy Process
No ratings yet
Birla Institute of Management Technology: Analytic Hierarchy Process
21 pages
Project Proposal
No ratings yet
Project Proposal
4 pages
Apurva Madiraju CBA Exam Apurva Madiraju 71310044 296
No ratings yet
Apurva Madiraju CBA Exam Apurva Madiraju 71310044 296
8 pages
Market Research On Nokia
No ratings yet
Market Research On Nokia
17 pages
BRM Case
No ratings yet
BRM Case
14 pages
Rajesh Gomra PPT On Mobile Phones Pushpendra
No ratings yet
Rajesh Gomra PPT On Mobile Phones Pushpendra
31 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Project Re-Cell by Patel Dakshesh Maheshbhai

Uploaded by

Project Re-Cell by Patel Dakshesh Maheshbhai

Uploaded by

PROJECT

BY-PATEL DAKSHESH MAHESHBHAI

3. Exploratory Univariate Analysis:

 Which attributes are highly correlated with the

4. Data Preprocessing  Missing Value Imputation

7. Final Model Summary

Refurbished and used devices continue to provide cost-effective alternatives to both

brand_name Name of manufacturing brand

Checking the data types of the columns for the dataset

Checking for missing values

There are missing values for the following columns:

The distribution of the number of days a refurbished product is used appears to be

Q2: What percentage of the used device market is dominated by

Majority of the devices (67.6%) are available with 4g.

Majority of the devices (95.6%) are not available with 5g.

Q4: The amount of RAM is important for the smooth functioning

Q5: People who travel frequently require devices with large

Q6: Bigger screens are desirable for entertainment purposes as

Q7: A lot of devices nowadays offer great selfie cameras, allowing

Prices of Used Device across years.

Prices for used phones and tablets offering 4G and 5G networks.

There are 6 variables with missing values:

 main_camera_mp has 179 missing values

All missing values have been treated.

 Let's create a new column years_since_release from the release_year column.

Let's check for outliers in the data.

 There are quite a few outliers in the data

 We want to predict the normalized price of used devices

Splitting the data in 70:30 ratio for train to test data

 Number of rows in train data = 2417

o Adjusted. R-squared: It reflects the fit of the model.

o const coefficient: It is the Y-intercept.

o Coefficient of a predictor variable: It represents the change in the output Y

Let us check the performance of the model using different metrics.

 TEST FOR MULTICOLLINEARITY

 We will test for multicollinearity using VIF.

Let's define a function to check VIF and removing multicollinearity:

Dropping the brand_name_Others:

Dropping brand_name_Others would have the maximum impact on the predictive

Dropping the years_since_release:

Dropping years_since_release would have the maximum impact on the predictive

VIF after dropping years_since_release:

VIF after dropping weight:

Checking the p-values on the right dataset:

OLS Regression for updated dataset (no multicollinearity and no insignificant

 TEST FOR LINEARITY AND INDEPENDENCE

 We will test for normality by checking the distribution of residuals, by checking

The histogram of residuals does have a bell shape.

Checking Q-Q plot:

The residuals follow a straight line except

 TEST FOR HOMOSCEDASTICITY:

 We will test for homoscedasticity by using the goldfeldquandt test.

Goldfeldquandt Test: [('F statistic', 1.0643431899824787), ('p-value',.141633165194831)]

Prediction on Test data set:

 The model explains approximately 84% of the variation in the data.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.