0% found this document useful (0 votes)
2 views4 pages

ML Lab Question Set - 21

The document outlines a series of programming tasks involving data science and machine learning using Python and R. It includes tasks such as dataset splitting, regression and classification model implementation, data preprocessing, and evaluation of model performance across various datasets. Specific algorithms mentioned include Linear Regression, Naive Bayes, Random Forest, K-Means, and others, along with techniques like PCA and feature scaling.

Uploaded by

Bala Krish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

ML Lab Question Set - 21

The document outlines a series of programming tasks involving data science and machine learning using Python and R. It includes tasks such as dataset splitting, regression and classification model implementation, data preprocessing, and evaluation of model performance across various datasets. Specific algorithms mentioned include Linear Regression, Naive Bayes, Random Forest, K-Means, and others, along with techniques like PCA and feature scaling.

Uploaded by

Bala Krish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

Write a Python program using Scikit-learn to split the Iris dataset into 70% train data and 30% test
data. Out of total 150 records, the training set will contain 105 records and the test set contains 45 of
those records. Predict the response for the test dataset (SepalLengthCm, SepalWidthCm,
PetalLengthCm, PetalWidthCm).

2. Consider the below salary dataset:

YearsExperience Salary
1.1 45000
3.5 60000
6.8 85000
8.5 95000
10.0 120000
9.0 105000
2.0 52000
3.8 65000
12.5 150000
15.6 180000
Write an R program to predict the salary for the years of experience is 5 using Linear Regression.

3. Write a Python program using Scikit-learn to split the Motorcycle dataset into 80% train data and
20% test data. Out of total 303 records, the training set will contain 80% records and the test set
contains 20% of those records. Predict the response for the test dataset using a classifier.

4. Consider the given dataset with the odd number of observations arranged in descending order – 23,
21, 18, 16, 15, 13, 12, 10, 9, 7, 6, 5, and 2. Find mean, median, mode, range, standard deviation,
variance, five number summary, boxplot using Python's numpy, scipy and matplotlib libraries.

5. Write a Python program to apply pre-processing techniques such as handling missing data, scaling,
and encoding categorical variables for a retail sales dataset. Then implement the backpropagation
algorithm using a neural network to predict customer purchase behavior.

6. Evaluate the regression model, and you notice that the Mean Squared Error (MSE) is significantly
higher than the Mean Absolute Error (MAE). What does this indicate about your data, and how would
you address it?

7. Import a dataset of employee job satisfaction, which has attributes such as age, years_at_company,
job_role, education_level, and salary. Implement the Naive Bayes classification algorithm using Python
to predict whether an employee will stay with the company or leave (e.g., predicting if an employee will
leave based on job satisfaction).

8. Write a Python program to apply pre-processing techniques for the vote dataset, which includes
details like population, gender, education, age, superfund, crime, etc. Then implement a Decision Tree
classifier for predicting class outcomes and display the decision tree visually.
9. Evaluate a regression model where the target variable has a skewed distribution. What metrics
would be most appropriate for assessing model performance?

10. Import a dataset for student performance, with attributes containing study_hours, attendance,
previous_grades, participation, study_material_usage, and age. Implement the Bagging ensemble
algorithm using Python's scikit-learn to predict the final exam score of a student based on these
features.

11. Import a dataset of species in a forest ecosystem and apply the K-Means clustering algorithm
with n_clusters=2, n_clusters=3, and n_clusters=4 using Python. The dataset includes attributes like
height, leaf_size, growth_rate, and environmental_factors. Analyze the accuracy of the clustering and
visualize the distribution of the species into different groups based on these features.

12. Consider the below salary dataset:


Years experienced - Salary

10.1 - 39343.00
10.3 - 46205.00
10.5 - 37731.00
20.0 - 43525.00

20.2 - 69891.00

22.9 - 118882.00
34.0 - 110150.00
35.2 - 134445.00
36.2 - 144445.00

38.7 - 157189.00

Predict the salary for the years of experience is 55 using a regression model in Python.

13. Apply pre-processing techniques for a dataset that includes details of employee_age,
years_at_company, department, job_satisfaction, and performance_score. Implement the Support
Vector Machine (SVM) algorithm using Python's scikit-learn to classify whether an employee is likely
to be promoted or not. Visualize the decision boundaries and assess the model's accuracy.

14. Write a Python program to split the customer churn dataset into 70% train data and 30% test data.
Out of a total of 1000 records, the training set will contain 700 records and the test set will contain 300
records. Predict whether a customer will churn using a Random Forest classifier.
15. Consider the following housing dataset:
House Age (Years) - Number of Bedrooms - Price

1 - 2 - 300000

3 - 3 - 250000

5 - 3 - 200000

8 - 4 - 180000

10 - 4 - 150000

Write a Python program to predict the house price based on the number of bedrooms and house age
using a Linear Regression model.

16. Write a Python program to perform feature selection using Recursive Feature Elimination (RFE)
for a dataset and implement classification using Support Vector Machines.

17. Apply pre-processing techniques for a vehicle dataset containing details such as car_model,
year_of_manufacture, mileage, fuel_type, and service_history. Implement the Random Forest algorithm
to predict whether a car will require major repairs in the next 12 months based on these features. Use
Python to preprocess the data and evaluate the model's performance.

18. Import the weather dataset with attributes like "temperature, humidity, wind speed, and
condition" and implement a Naive Bayes classifier to predict the weather condition (Sunny, Cloudy,
Rainy) using Python.

19. Consider the following dataset on online transactions:


Transaction Amount - Fraudulent (0 = No, 1 = Yes)
50 - 0
100 - 1
75 - 0
200 - 1
150 - 0
Write a Python program to predict whether a transaction is fraudulent based on the transaction amount
using Logistic Regression.
20. Import the social media dataset with features such as "age, number of followers, likes per post,
and posts per day". Implement the K-Nearest Neighbors (KNN) algorithm to predict the category of the
user (Influencer, Regular User) using Python.

21. Write a Python program using Scikit-learn to split the Minst dataset into 80% train data and 20%
test data. Train a Logistic Regression classifier on the dataset and evaluate the accuracy on the test set.

22. Implement a Python program to apply Principal Component Analysis (PCA) on the Iris dataset
to reduce the dimensionality to 2 principal components and visualize the results in a 2D scatter plot.

23. Write a Python program using Scikit-learn to split a loan approval dataset into 75% train data and
25% test data. The dataset includes features like income, credit_score, loan_amount,
employment_status, and debt_to_income_ratio. Preprocess the data and train a Random Forest classifier
to predict whether a loan application will be approved or denied.

24. Implement a Python program to perform feature scaling using StandardScaler on a dataset with
numerical features and then train a K-Nearest Neighbors (KNN) classifier for classification tasks.

25. Write a Python program to implement the Naive Bayes classifier on a spam email dataset.
Predict whether an email is spam or not based on features like word frequency, length, etc.

26. Apply Linear Discriminant Analysis (LDA) on a fraud detection dataset using Python, and compare
the classification results with a Logistic Regression model. The dataset includes features such as
transaction_amount, transaction_time, user_id, location, and device_used. Evaluate both models to
predict whether a transaction is fraudulent or not.

27. Write a Python program to implement the AdaBoost algorithm using Scikit-learn on a binary
classification dataset (e.g., predicting whether a customer will purchase or not).

28. Implement a Python program to use the Gradient Boosting classifier on a dataset, predict the class
labels, and evaluate its accuracy.

29. Import the Boston Housing dataset, and implement a Random Forest regressor to predict housing prices based on
features such as crime rate, average number of rooms, etc.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy