ML Lab Question Set - 21
ML Lab Question Set - 21
Write a Python program using Scikit-learn to split the Iris dataset into 70% train data and 30% test
data. Out of total 150 records, the training set will contain 105 records and the test set contains 45 of
those records. Predict the response for the test dataset (SepalLengthCm, SepalWidthCm,
PetalLengthCm, PetalWidthCm).
YearsExperience Salary
1.1 45000
3.5 60000
6.8 85000
8.5 95000
10.0 120000
9.0 105000
2.0 52000
3.8 65000
12.5 150000
15.6 180000
Write an R program to predict the salary for the years of experience is 5 using Linear Regression.
3. Write a Python program using Scikit-learn to split the Motorcycle dataset into 80% train data and
20% test data. Out of total 303 records, the training set will contain 80% records and the test set
contains 20% of those records. Predict the response for the test dataset using a classifier.
4. Consider the given dataset with the odd number of observations arranged in descending order – 23,
21, 18, 16, 15, 13, 12, 10, 9, 7, 6, 5, and 2. Find mean, median, mode, range, standard deviation,
variance, five number summary, boxplot using Python's numpy, scipy and matplotlib libraries.
5. Write a Python program to apply pre-processing techniques such as handling missing data, scaling,
and encoding categorical variables for a retail sales dataset. Then implement the backpropagation
algorithm using a neural network to predict customer purchase behavior.
6. Evaluate the regression model, and you notice that the Mean Squared Error (MSE) is significantly
higher than the Mean Absolute Error (MAE). What does this indicate about your data, and how would
you address it?
7. Import a dataset of employee job satisfaction, which has attributes such as age, years_at_company,
job_role, education_level, and salary. Implement the Naive Bayes classification algorithm using Python
to predict whether an employee will stay with the company or leave (e.g., predicting if an employee will
leave based on job satisfaction).
8. Write a Python program to apply pre-processing techniques for the vote dataset, which includes
details like population, gender, education, age, superfund, crime, etc. Then implement a Decision Tree
classifier for predicting class outcomes and display the decision tree visually.
9. Evaluate a regression model where the target variable has a skewed distribution. What metrics
would be most appropriate for assessing model performance?
10. Import a dataset for student performance, with attributes containing study_hours, attendance,
previous_grades, participation, study_material_usage, and age. Implement the Bagging ensemble
algorithm using Python's scikit-learn to predict the final exam score of a student based on these
features.
11. Import a dataset of species in a forest ecosystem and apply the K-Means clustering algorithm
with n_clusters=2, n_clusters=3, and n_clusters=4 using Python. The dataset includes attributes like
height, leaf_size, growth_rate, and environmental_factors. Analyze the accuracy of the clustering and
visualize the distribution of the species into different groups based on these features.
10.1 - 39343.00
10.3 - 46205.00
10.5 - 37731.00
20.0 - 43525.00
20.2 - 69891.00
22.9 - 118882.00
34.0 - 110150.00
35.2 - 134445.00
36.2 - 144445.00
38.7 - 157189.00
Predict the salary for the years of experience is 55 using a regression model in Python.
13. Apply pre-processing techniques for a dataset that includes details of employee_age,
years_at_company, department, job_satisfaction, and performance_score. Implement the Support
Vector Machine (SVM) algorithm using Python's scikit-learn to classify whether an employee is likely
to be promoted or not. Visualize the decision boundaries and assess the model's accuracy.
14. Write a Python program to split the customer churn dataset into 70% train data and 30% test data.
Out of a total of 1000 records, the training set will contain 700 records and the test set will contain 300
records. Predict whether a customer will churn using a Random Forest classifier.
15. Consider the following housing dataset:
House Age (Years) - Number of Bedrooms - Price
1 - 2 - 300000
3 - 3 - 250000
5 - 3 - 200000
8 - 4 - 180000
10 - 4 - 150000
Write a Python program to predict the house price based on the number of bedrooms and house age
using a Linear Regression model.
16. Write a Python program to perform feature selection using Recursive Feature Elimination (RFE)
for a dataset and implement classification using Support Vector Machines.
17. Apply pre-processing techniques for a vehicle dataset containing details such as car_model,
year_of_manufacture, mileage, fuel_type, and service_history. Implement the Random Forest algorithm
to predict whether a car will require major repairs in the next 12 months based on these features. Use
Python to preprocess the data and evaluate the model's performance.
18. Import the weather dataset with attributes like "temperature, humidity, wind speed, and
condition" and implement a Naive Bayes classifier to predict the weather condition (Sunny, Cloudy,
Rainy) using Python.
21. Write a Python program using Scikit-learn to split the Minst dataset into 80% train data and 20%
test data. Train a Logistic Regression classifier on the dataset and evaluate the accuracy on the test set.
22. Implement a Python program to apply Principal Component Analysis (PCA) on the Iris dataset
to reduce the dimensionality to 2 principal components and visualize the results in a 2D scatter plot.
23. Write a Python program using Scikit-learn to split a loan approval dataset into 75% train data and
25% test data. The dataset includes features like income, credit_score, loan_amount,
employment_status, and debt_to_income_ratio. Preprocess the data and train a Random Forest classifier
to predict whether a loan application will be approved or denied.
24. Implement a Python program to perform feature scaling using StandardScaler on a dataset with
numerical features and then train a K-Nearest Neighbors (KNN) classifier for classification tasks.
25. Write a Python program to implement the Naive Bayes classifier on a spam email dataset.
Predict whether an email is spam or not based on features like word frequency, length, etc.
26. Apply Linear Discriminant Analysis (LDA) on a fraud detection dataset using Python, and compare
the classification results with a Logistic Regression model. The dataset includes features such as
transaction_amount, transaction_time, user_id, location, and device_used. Evaluate both models to
predict whether a transaction is fraudulent or not.
27. Write a Python program to implement the AdaBoost algorithm using Scikit-learn on a binary
classification dataset (e.g., predicting whether a customer will purchase or not).
28. Implement a Python program to use the Gradient Boosting classifier on a dataset, predict the class
labels, and evaluate its accuracy.
29. Import the Boston Housing dataset, and implement a Random Forest regressor to predict housing prices based on
features such as crime rate, average number of rooms, etc.