0% found this document useful (0 votes)
45 views2 pages

Capstone Project-Naan Mudlvan

The document provides instructions for a capstone project to build a predictive model using a CPU performance dataset. The dataset contains 209 rows and 9 columns with details on CPU vendors, models, specifications and performance scores. The tasks are to preprocess the data by encoding categorical variables, dropping rare vendors and the model column, split the data into train and test sets, build a linear regression model on the training set, calculate performance metrics on both sets, check for multicollinearity using VIF, and predict the performance score for a new CPU instance.

Uploaded by

mohanraj28174
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views2 pages

Capstone Project-Naan Mudlvan

The document provides instructions for a capstone project to build a predictive model using a CPU performance dataset. The dataset contains 209 rows and 9 columns with details on CPU vendors, models, specifications and performance scores. The tasks are to preprocess the data by encoding categorical variables, dropping rare vendors and the model column, split the data into train and test sets, build a linear regression model on the training set, calculate performance metrics on both sets, check for multicollinearity using VIF, and predict the performance score for a new CPU instance.

Uploaded by

mohanraj28174
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Capstone Project

General Instructions:
1. Provide appropriate comments in your code.
2. Perform all task programmatically using Python libraries.

‘XYZ’ hardware service center is specialized in servicing the CPUs. The center has maintained the details
about the CPUs they have serviced which is available in “machine_data.csv”. The dataset has 209 rows
and 9 columns. [Source of raw dataset: UCI repository] The details of the columns are as follows:

 vendor: represents the manufacturer of the CPU.


 model: represents the model number of the CPU.
 cycle_time: represents the time taken for internal data transfer in nanoseconds of the CPU.
 min_memory: represents the minimum main memory required by the CPU.
 max_memory: represents the maximum main memory supported by the CPU.
 cache: represents the size of cache memory required by the CPU.
 min_threads: represents the number of threads that run in the CPU when it is just switched on.
 max_threads: represents the maximum number of threads that can be run on the CPU.
 score: represents the performance score of the CPU.

Based on this data, ‘XYZ’ hardware service center would like to build a predictive model that predicts the
performance score for the new CPUs.
As a data science expert, you are expected to build the best model for the given scenario.

Problem statement:

Perform the following activities to build the model:

1. Import the data set “machine_data.csv”.


2. As part of data preprocessing, perform the following activities:
a. Encode the categorical column – ‘vendor’ using label encoder.
b. Identify the vendors who have manufactured less than 5 CPUs and drop those rows from the
given dataset, corresponding to the identified vendors.
c. Drop the column ‘model’.

[Note: The preprocessed dataset should be used further.]

3. Select ‘score’ as the target variable to be predicted and remaining features as predictors.
4. Split the data into training and testing data set in the ratio 80:20.
5. As part of model building, perform the following activities:
a. Based on the training data, build a Linear Regression model.
b. Find the train and the test score for the built model.
c. Calculate the adjusted R-Squared values on both the train and the test data.
6. Calculate the VIF values for all the features considered while building the model using the train data.
7. Based on the model built, predict the performance score of a new test sample/ new CPU instance
which is given below:

(Note: In the above hardware instance, the vendor value '14' is the label encoded value for vendor
'harris'.)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy