0% found this document useful (0 votes)
4 views4 pages

PAMLSET2

The document outlines an examination paper for Predictive Analytics and Machine Learning, consisting of multiple questions with a total of 60 marks. Candidates must answer mandatory questions and choose between two options for question 3. The exam includes tasks such as data manipulation, visualization, logistic regression, clustering, principal component analysis, and model evaluation.

Uploaded by

nishthathakkar21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

PAMLSET2

The document outlines an examination paper for Predictive Analytics and Machine Learning, consisting of multiple questions with a total of 60 marks. Candidates must answer mandatory questions and choose between two options for question 3. The exam includes tasks such as data manipulation, visualization, logistic regression, clustering, principal component analysis, and model evaluation.

Uploaded by

nishthathakkar21
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

PUSASQF602

PREDICTIVE ANALYTICS & MACHINE LEARNING​

E
​ ​ Time:​ 2 Hours
​ ​ Total Marks:​ 60

EG
​ ​ ​
Note:​
-​ The candidate has the option to either question 3A or question 3B. Rest all

LL
questions are mandatory​
-​ Numbers to the right indicate full marks​
-​ The candidates will be provided with the formula sheet and graphs (if

O
required) for the examination​
-​ Use of approved scientific calculator is allowed​

C
​ ​ ​ ​

Q1. Answer the following ​ ​ ​ ​ ​ ​ ​ 15 Marks ​


E
A. ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 5 Marks​
D
Perform the following operations mentioned below on the diamonds dataset.​
R

i. Read the data “Youtuber.csv” using Pandas​ ​ ​ ​ ​ ​ (1)


VA

​ ​
ii. Generate a bar plot of Top 7 Youtube Channels by subscribers
The graph should have titles as mentioned below​ ​ ​ ​ ​ (2)
​ ​ Title: Top 5 YouTube Channels by Subscribers​
​ ​ X Axis Title: Channel Name​
AR

​ ​ Y Axis Title: Subscribers (in millions)​​ ​ ​ ​ ​

iii. Generate a plot for Distribution of Channels by Country


The graph should have titles as mentioned below​ ​ ​ ​ ​ (2)
TK

​ ​ Title: Distribution of Channels by Country​


​ ​ X Axis Title: Country​
​ ​ Y Axis Title: Number of Channels
PA

​ ​ ​ ​
B. ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 5 Marks​
Load the dataset FIFA19.csv​​ ​ ​ ​ ​ ​ ​
i. Filter the data to include only the 'Name', 'Age', 'Nationality', 'Club', 'Value', 'Wage', and
'Overall' columns​ ​ ​ ​ ​ ​ ​ ​ ​ ​ (1)​
ii. Calculate average “Wage” for each Nationality​ ​ ​ ​ ​ ​ (1)
iii. Derive any 2 insights from the data ​ ​ ​ ​ ​ ​ ​ (3)

1 of 4
C. ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 5 Marks.
i. Run a logistic regression in the below given dataframe​ ​ ​ ​

df = pd.DataFrame({​ ​ ​ ​ ​ ​ ​ ​ ​ ​ (1)
​ 'Cust_ID': [1, 2, 3, 4, 5, 6,7,8,9,10,11,12,13,14,15],
​ 'Salary': [1000, 1100, 10000, 1000, 11000, 1110,21000,

E
30000,2100,33000,21000,21000,50000,21000,45000],

EG
​ 'EMI': [0, 0, 0, 1, 1, 1,0, 0, 0, 1, 1, 1,1,1,1]
})
The data frame consists of 6 employees along with their monthly salaries to check their eligibility
for No Cost EMI​ ​ ​ ​ ​ ​ ​ ​ ​

LL
​ ​ Cust_ID: Customer ID for the inquiry​
​ ​ Salary: Customer's monthly take home salary​
​ ​ EMI: Checks eligibility for the EMI​

O
ii. Predict whether the customer is EMI worthy or not​ ​ ​ ​ ​ (2)​

C
iii. Provide the confusion matrix & score​ ​ ​ ​ ​ ​ ​ (2)​
​ ​ ​
E
Q2. Answer the following ​ ​ ​ ​ ​ ​ ​ ​ 15 Marks
D

A. ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 5 Marks​
R

Generate a random dataset using the below code:​ ​ ​ ​


i. X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=0) (1)
VA


ii. Plot the dataset.​ ​ ​ ​ ​ ​ ​ ​ ​ ​ (2)
​ ​
iii. Apply K Means clustering with suitable number of clusters​ ​ ​ ​ (2)​
AR

B. ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 5 Marks ​
Apply Principal Component Analysis on “diamonds.csv” to derive 4 principal components.​
TK

​ ​
C. ​ ​ 5 Marks​
Load the covid_19_india dataset in python and perform the below mentioned steps​ ​
PA


i. Provide the summarised view of “Cured","Deaths","Confirmed" cases per state​ ​ (3)

ii. Show no. of covid cases with respect to YYYYMM(Year-Month) on x-axis​ ​ (2)​

2 of 4
Q3. ​
A. ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 30 Marks
Predict “churn” using the “Bank Customer Churn Prediction.csv”.​ ​ ​ ​ ​
i.​ Load the dataset​ ​ ​ ​ ​ ​ ​ ​ (1)

ii.​ Get the insights & Correlation for each column vs the output column (5)

E
iii.​Do the outlier treatment & Null imputation if required. (2)

EG
iv.​Shortlist the most important features for predicting the “churn” (3)

v.​ Split the data into features and target​ ​ (2)

LL
vi.​Perform train test split with a ratio 20%​ ​ ​ ​ ​ ​ ​ (2)​

O
vii.​Define any 3 classifier models & Train the model on train dataset and predict the model on
test dataset​ ​ ​ ​ ​ ​ (5)

C
viii.​ Calculate the accuracy of the model​ ​ E ​ ​ ​ ​ ​ (2)

ix.​Generate the classification report​ ​ ​ ​ ​ ​ ​ ​ (2)


D
x.​ Generate the confusion matrix​ ​ ​ ​ ​ ​ ​ ​ (2)​
R

xi.​Which model is the most suitable one in predicting the output column ​ ​ ​ (4)
​ ​ ​ ​ ​ ​
VA

​ ​ OR​
AR
TK
PA

3 of 4
Q3. ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ ​ 30 Marks​
B.​
Predict “Car Purchase Amount” using the “Car_Purchasing_Data.csv”.
i.​ Load the dataset​ ​ ​ ​ ​ ​ ​ ​ (1)​

ii.​ Get the insights & Correlation for each column vs the output column (5)

E
iii.​Do the outlier treatment & Null imputation if required. (2)​

EG
iv.​Shortlist the most important features for predicting the “Car Purchase amount” ​ (5)​

v.​ Split the data into features and target​ ​ (2)​

LL
vi.​Perform train test split with a ratio 20%​ ​ ​ ​ ​ ​ ​ (2)​

O
vii.​Define any 3 regression models & Train the model on train dataset and predict the model on
test dataset​ ​ ​ ​ ​ ​ ​ ​ (5)​

C
viii.​ Calculate the accuracy of the model​ ​ E ​ ​ ​ ​ ​ (2)​

ix.​Which model is the most suitable one in predicting the output column ​ ​ (6)
D
R
VA
AR
TK
PA

4 of 4

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy