PAMLSET2
PAMLSET2
E
Time: 2 Hours
Total Marks: 60
EG
Note:
- The candidate has the option to either question 3A or question 3B. Rest all
LL
questions are mandatory
- Numbers to the right indicate full marks
- The candidates will be provided with the formula sheet and graphs (if
O
required) for the examination
- Use of approved scientific calculator is allowed
C
ii. Generate a bar plot of Top 7 Youtube Channels by subscribers
The graph should have titles as mentioned below (2)
Title: Top 5 YouTube Channels by Subscribers
X Axis Title: Channel Name
AR
B. 5 Marks
Load the dataset FIFA19.csv
i. Filter the data to include only the 'Name', 'Age', 'Nationality', 'Club', 'Value', 'Wage', and
'Overall' columns (1)
ii. Calculate average “Wage” for each Nationality (1)
iii. Derive any 2 insights from the data (3)
1 of 4
C. 5 Marks.
i. Run a logistic regression in the below given dataframe
df = pd.DataFrame({ (1)
'Cust_ID': [1, 2, 3, 4, 5, 6,7,8,9,10,11,12,13,14,15],
'Salary': [1000, 1100, 10000, 1000, 11000, 1110,21000,
E
30000,2100,33000,21000,21000,50000,21000,45000],
EG
'EMI': [0, 0, 0, 1, 1, 1,0, 0, 0, 1, 1, 1,1,1,1]
})
The data frame consists of 6 employees along with their monthly salaries to check their eligibility
for No Cost EMI
LL
Cust_ID: Customer ID for the inquiry
Salary: Customer's monthly take home salary
EMI: Checks eligibility for the EMI
O
ii. Predict whether the customer is EMI worthy or not (2)
C
iii. Provide the confusion matrix & score (2)
E
Q2. Answer the following 15 Marks
D
A. 5 Marks
R
ii. Plot the dataset. (2)
iii. Apply K Means clustering with suitable number of clusters (2)
AR
B. 5 Marks
Apply Principal Component Analysis on “diamonds.csv” to derive 4 principal components.
TK
C. 5 Marks
Load the covid_19_india dataset in python and perform the below mentioned steps
PA
i. Provide the summarised view of “Cured","Deaths","Confirmed" cases per state (3)
ii. Show no. of covid cases with respect to YYYYMM(Year-Month) on x-axis (2)
2 of 4
Q3.
A. 30 Marks
Predict “churn” using the “Bank Customer Churn Prediction.csv”.
i. Load the dataset (1)
ii. Get the insights & Correlation for each column vs the output column (5)
E
iii.Do the outlier treatment & Null imputation if required. (2)
EG
iv.Shortlist the most important features for predicting the “churn” (3)
LL
vi.Perform train test split with a ratio 20% (2)
O
vii.Define any 3 classifier models & Train the model on train dataset and predict the model on
test dataset (5)
C
viii. Calculate the accuracy of the model E (2)
xi.Which model is the most suitable one in predicting the output column (4)
VA
OR
AR
TK
PA
3 of 4
Q3. 30 Marks
B.
Predict “Car Purchase Amount” using the “Car_Purchasing_Data.csv”.
i. Load the dataset (1)
ii. Get the insights & Correlation for each column vs the output column (5)
E
iii.Do the outlier treatment & Null imputation if required. (2)
EG
iv.Shortlist the most important features for predicting the “Car Purchase amount” (5)
LL
vi.Perform train test split with a ratio 20% (2)
O
vii.Define any 3 regression models & Train the model on train dataset and predict the model on
test dataset (5)
C
viii. Calculate the accuracy of the model E (2)
ix.Which model is the most suitable one in predicting the output column (6)
D
R
VA
AR
TK
PA
4 of 4