Shwet Mlds
Shwet Mlds
Acropolis Institute of
Technology and
Research, Indore Department of CSE
Submitted To: Dr. Mayur Rathi
(Artificial Intelligence & Machine
Learning)
Submitted By:
Shwet Pardhi
Enrollment No. : 0827AL211039
Class/Year/Sem : AL/4th / 7th
CERTIFICATE
This is to certify that the experimental work entered in this journal as per
In this lab Students gain hands-on experience with data preprocessing, model building, and
evaluation using popular algorithms. The lab emphasizes the use of Python and data science libraries
like scikit-learn and pandas. By the end, participants are equipped with practical skills to analyze and
interpret complex datasets.
➢ DO’S
✓ While entering into the LAB students should wear their ID cards.
✓ Students should sign in the LOGIN REGISTER before entering into the
laboratory.
✓ Students should come with observation and record note book to the laboratory.
✓ After completing the laboratory exercise, make sure to shutdown the system
properly.
➢ DONT’S
PREREQUISITE:-
➢ Course Objectives
1. To make the student learn a machine learning algorithm and implementation.
2. To teach the student to implement machine learning algorithm in Python programming to
solve the problems
➢ Course Outcomes
At the end of the course student will be able to:
• Apply practical solutions using predictive analytics.
• Understand the importance of various algorithms in Data Science.
• Create competitive advantage from both structured and unstructured data.
• Predict outcomes with supervised machine learning techniques.
• Unearth patterns in customer behavior with unsupervised technique
Index
Date of Page Date of Grade &
S.No Exp. Name of the Experiment No. Submission Sign of the
Faculty
1 Prepare a data set for Indian Stock Market data for
financial sector company.
Additional remarks
Tutor
1 Title
Data Preparation for Indian Stock Market Data (Financial Sector Companies)
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Identify Data Sources:
• Use stock exchange APIs (e.g., NSE or BSE) or web scraping to gather the data.
• Ensure the data spans a significant timeframe to provide comprehensive insights.
2. Collect Data:
• Retrieve data for multiple financial sector companies.
• Include historical prices, volume, and key financial metrics.
3. Preprocess Data:
• Clean missing or erroneous data.
• Format data columns appropriately (e.g., date, numerical values).
4. Store Data:
• Organize data in a structured format, such as a DataFrame, and export to a CSV file.
• Ensure the dataset is well-documented with headers and descriptions.
3.2 Program
import yfinance as yf
import pandas as pd
# Save to CSV
all_data.to_csv('indian_stock_market_data.csv')
4 Tabulation Sheet
INPUT OUTPUT
Stock Symbols Historical Data Table
Tutor
1 Title
House Price Prediction Using Linear Regression
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Data Collection:
• Use a publicly available housing dataset, such as the "California Housing" dataset from
scikit-learn or data from Kaggle.
2. Data Preprocessing:
• Handle missing values by using methods like mean/mode substitution.
• Convert categorical data into numerical values using one-hot encoding.
• Split data into training and testing sets.
3. Model Development:
• Use a simple linear regression model from scikit-learn.
• Train the model on the training data and make predictions.
4. Model Evaluation:
• Use metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-
squared to evaluate performance.
3.2 Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
INPUT OUTPUT
Features (X) Predicted House Prices
Bathrooms, Location
5 Results
• A structured and cleaned dataset has been prepared for the financial sector companies in the
Indian stock market.
• The dataset includes essential features like historical stock prices, trading volume, and
company identifiers.
• The data is saved in indian_stock_market_data.csv and is ready for further analysis.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Loan Approval Prediction Using Machine
Data Science (AL704) Learning in Python
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Loan Approval Prediction Using Machine Learning in Python
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1 Data Collection:
• Use a publicly available loan dataset, such as the "Loan Prediction Problem Dataset" from Kaggle.
2 Data Preprocessing:
• Fill missing values with appropriate statistics (mean, median, or mode).
• Encode categorical variables (like gender, education, property area) using label encoding or one-hot
encoding.
• Scale features if necessary.
3 Model Development:
• Train multiple models such as Logistic Regression, Decision Trees, or Random Forest using scikit-
learn.
• Select the best-performing model based on evaluation metrics.
4 Model Evaluation:
• Use metrics such as accuracy, precision, recall, F1-score, and a confusion matrix.
3.2Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
# Make predictions
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)
4 Tabulation Sheet
INPUT OUTPUT
Features (X) Prediction: Approved/Not
5 Results
• Model Performance:
• Accuracy: Value from output
• Confusion Matrix: Shows true positives, false positives, etc.
• Classification Report: Precision, recall, F1-score for each class
• The logistic regression model provides a reasonably accurate prediction of loan approval.
Performance can be improved with more advanced models or data preprocessing techniques.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Car Price Prediction Using Linear Regression
Data Science (AL704) with Scikit-Learn in Python.
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Car Price Prediction Using Linear Regression with Scikit-Learn in Python
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1 Data Collection:
▪ Use a car price dataset (for example, from Kaggle or a custom CSV file).
2 Data Preprocessing:
• Handle any missing or null values appropriately.
• Convert categorical data (like car make and model) to numerical format using encoding techniques.
▪ Normalize or scale numerical features if needed to ensure better model performance.
3 Model Development:
• Use the Linear Regression class from scikit-learn to build and train the model.
• Split the dataset into training and testing sets to validate the model.
4 Model Evaluation:
• Evaluate the model using R-squared and Mean Squared Error (MSE) to understand its predictive
power.
3.2Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
4 Tabulation Sheet
INPUT OUTPUT
Features (X) Prediction: Approved/Not
5 Results
• Model Performance:
• Accuracy: Value from output
• Confusion Matrix: Shows true positives, false positives, etc.
• Classification Report: Precision, recall, F1-score for each class
• The logistic regression model provides a reasonably accurate prediction of loan approval.
Performance can be improved with more advanced models or data preprocessing techniques.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Group / Title: Given credit card transactions for a customer in
a month, identify those transactions that were made by the
Lab: Machine Learning for
customer and those that were not. A program with a model of
Data Science (AL704)
this decision could refund those transactions that were
fraudulent.
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1 Title
Given credit card transactions for a customer in a month, identify those transactions that were made by the
customer and those that were not. A program with a model of this decision could refund those transactions
that were fraudulent.
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Input Data: Collect transaction data (amount, merchant, date, location) and historical
customer behavior.
2. Data Preprocessing:
• Clean data, handle missing values.
• Create features like transaction frequency and typical amounts.
• Normalize continuous features (e.g., amount).
3. Label Transactions: Label historical data as legitimate or fraudulent.
4. Train-Test Split: Divide data into training and test sets.
5. Model Selection: Choose a classifier (e.g., Random Forest, SVM, Logistic Regression).
6. Model Training: Train the model on the training data.
7. Prediction: Use the model to predict fraudulent vs. legitimate transactions.
8. Evaluation: Measure accuracy, precision, recall, and F1-score.
9. Flag Fraudulent Transactions: Identify fraudulent transactions and trigger refund or
review.
10. Deployment: Integrate the model into the system for real-time classification.
3.2 Program
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
# Sample DataFrame (In practice, this would be a larger dataset with more
features)
# Columns: ['amount', 'merchant', 'location', 'time_of_day', 'is_fraudulent']
data = {
'amount': [100, 200, 150, 300, 250, 120, 500, 1500],
'merchant': ['A', 'B', 'A', 'B', 'C', 'A', 'C', 'B'],
'location': ['NY', 'NY', 'LA', 'LA', 'NY', 'LA', 'NY', 'LA'],
'time_of_day': ['morning', 'evening', 'afternoon', 'morning', 'evening',
'afternoon', 'morning', 'evening'],
'is_fraudulent': [0, 0, 0, 1, 0, 1, 0, 1] # 0 = legitimate, 1 = fraudulent
}
# Convert to DataFrame
df = pd.DataFrame(data)
# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Predictions
y_pred = model.predict(X_test_scaled)
4 Tabulation Sheet
INPUT OUTPUT
5 Results
Transaction 2 is identified as fraudulent and will be refunded, while Transactions 1 and 3 are legitimate.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Develop logistic regression model for a loan
Data Science (AL704) dataset
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1. Title
Develop Logistic Regression Model for a loan dataset.
2. Neatly Drawn and labeled experimental setup
3. Theoretical solution of the instant problem
3.1Algorithm
1. Data Preparation:
• Clean data (handle missing values, encode categorical features, scale numerics).
• Split data into training and test sets.
2. Model Development:
• Define and train a logistic regression model on training data.
3. Model Evaluation:
• Predict on test data.
• Evaluate with accuracy, precision, recall, and ROC AUC score.
3.2 Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score,
confusion_matrix, roc_curve
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
# Data Preparation
# Separate features and target
X = data.drop('loan_status', axis=1) # Replace 'loan_status' with the actual target column name in your
dataset
y = data['loan_status']
# Print metrics
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"ROC AUC: {roc_auc:.2f}")
print("Confusion Matrix:\n", conf_matrix)
4. Tabulation Sheet
INPUT OUTPUT
Accuracy: 0.85
Precision: 0.80
Recall: 0.75
ROC AUC: 0.87
Confusion Matrix: [[85 15] [20 80]]
5. Results
Accuracy (85%): Indicates the overall correct prediction rate.
Precision (80%): Shows the reliability of positive predictions; 80% of loans predicted to
be approved were truly eligible.
Recall (75%): Reflects the model’s ability to identify eligible loans, successfully
capturing 75% of those cases.
ROC AUC (0.87): High AUC suggests good separation between approved and declined
loans.
Confusion Matrix: Breaks down predictions:
o True Negatives (85): Loans correctly predicted as not approved.
o False Positives (15): Loans incorrectly predicted as approved.
o False Negatives (20): Eligible loans incorrectly predicted as not approved.
o True Positives (80): Loans correctly predicted as approved.
These results help gauge how well your logistic regression model performs, guiding
improvements such as tuning parameters or adjusting the feature set if necessary.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Implement Naïve Bayes Classification in
Data Science (AL704) Python
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1. Title
Implement Naïve Bayes Classification in Python.
2. Neatly Drawn and labeled experimental setup
3. Theoretical solution of the instant problem
3.1Algorithm
The Naïve Bayes algorithm follows these steps:
1. Calculate Prior Probabilities: Compute the prior probability of each class based on the
training data (i.e., the frequency of each class).
2. Calculate Likelihood: For each feature in the data, compute the likelihood of the feature
given each class. This involves calculating the probability of each feature value given the
class, often assuming feature independence.
3. Apply Bayes' Theorem: For each class, calculate the posterior probability of the class given
the features by multiplying the prior and likelihoods.
4. Make a Prediction: Assign the class label with the highest posterior probability as the
prediction for each data point.
In Python, the algorithm can be implemented using formulas based on Gaussian distributions
for continuous features (Gaussian Naïve Bayes) or frequency counts for categorical features
(Multinomial/Bernoulli Naïve Bayes).
3.2 Program
import numpy as np
class NaiveBayesClassifier:
def fit(self, X, y):
# Separate data by class
self.classes = np.unique(y)
self.mean = {}
self.variance = {}
self.priors = {}
for c in self.classes:
X_c = X[y == c]
# Calculate mean, variance, and prior for each class
self.mean[c] = X_c.mean(axis=0)
self.variance[c] = X_c.var(axis=0)
self.priors[c] = X_c.shape[0] / X.shape[0]
# Example usage
if name == " main ":
# Sample data
X = np.array([[1, 20], [2, 21], [1, 22], [4, 20], [5, 21], [6, 22]]) # Features
y = np.array([0, 0, 0, 1, 1, 1]) # Labels
INPUT OUTPUT
X = np.array([[1, 20], [2, 21], [1, 22], [4, 20], [5, 21], [6, 22]]) Predictions: [0 1]
y = np.array([0, 0, 0, 1, 1, 1])
X_test = np.array([[1, 20], [4, 22]])
5. Results
The classifier predicts the following:
• For the test sample [1, 20], the predicted class is 0.
• For the test sample [4, 22], the predicted class is 1.
Therefore, the output of the classifier for the test data X_test = [[1, 20], [4, 22]] is Predictions:
[0 1]. This means that the first sample is likely to belong to class 0, and the second sample is
likely to belong to class 1.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Build KNN Classification model for a given
Data Science (AL704) house price prediction.
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular
Additional remarks
Tutor
1. Title
Build KNN Classification model for a given house price prediction.
2. Neatly Drawn and labeled experimental setup
3. Theoretical solution of the instant problem
3.1Algorithm
1. Prepare Your Data: Gather data about houses (like square footage and number of bedrooms)
and their prices.
2. Choose the Number of Neighbors (K): Decide on K, the number of neighbors to consider
when making predictions. A typical starting point is K=3 or K=5.
3. Calculate Distances: For a new house, calculate the distance to all other houses in the dataset.
The closer a house is to the new house, the more likely it has a similar price.
4. Select the Nearest Neighbors: Pick the K houses closest to the new house based on the
calculated distances.
5. Predict the Price: Calculate the average price of these K nearest neighbors. This average will
be the predicted price for the new house.
3.2 Program
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error
class KNNRegressor:
def init (self, k=3):
self.k = k
# Example Usage
if name == " main ":
# Sample dataset (square footage, bedrooms) and prices
X = np.array([[1500, 3], [2000, 4], [1700, 3], [2400, 4], [3000, 5]]) # Features: [sqft, bedrooms]
y = np.array([300000, 400000, 320000, 450000, 550000]) # House prices
INPUT OUTPUT
5. Results
The KNN model predicted the following house prices:
• For a house with 1600 square feet and 3 bedrooms, the predicted price was $310,000, while
the actual price was $320,000.
• For a house with 2600 square feet and 4 bedrooms, the predicted price was $475,000, while
the actual price was $450,000.
The Mean Absolute Error (MAE) of the model is $7,500, indicating that the average
difference between the predicted and actual prices is $7,500.