0% found this document useful (0 votes)
28 views35 pages

Shwet Mlds

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views35 pages

Shwet Mlds

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

2024-25

Acropolis Institute of
Technology and
Research, Indore Department of CSE
Submitted To: Dr. Mayur Rathi
(Artificial Intelligence & Machine
Learning)

Machine Learning for Data Science


(AL704)

Submitted By:
Shwet Pardhi
Enrollment No. : 0827AL211039
Class/Year/Sem : AL/4th / 7th

[LAB ASSIGNMENT MLDS (AL-704)]


The Machine Learning for Data Science lab focuses on applying machine learning techniques to solve real-world data analysis
problems. The lab emphasizes the use of Python and data science libraries like scikit-learn and pandas.
ACROPOLIS INSTITUTE OF TECHNOLOGY & RESEARCH,
INDORE

Department of CSE (Artificial Intelligence & Machine Learning)

CERTIFICATE

This is to certify that the experimental work entered in this journal as per

the B. TECH. IV year syllab us prescribed by the RGPV was done by

Mr. Shwet Pardhi B.TECH IV year VII semester in the Machine

Learning for Data Science (AL704) Laboratory of this institute during

the academic year 2024- 2025.

Signature of the Faculty


About the Laboratory

In this lab Students gain hands-on experience with data preprocessing, model building, and
evaluation using popular algorithms. The lab emphasizes the use of Python and data science libraries
like scikit-learn and pandas. By the end, participants are equipped with practical skills to analyze and
interpret complex datasets.

Student will be able to


• To derive practical solutions using predictive analytics.
• They will also understand the importance of various algorithms in Data Science.
❖ GENERAL INSTRUCTIONS FOR LABORATORY CLASSES

➢ DO’S

✓ Without Prior permission do not enter into the Laboratory.

✓ While entering into the LAB students should wear their ID cards.

✓ The Students should come with proper uniform.

✓ Students should sign in the LOGIN REGISTER before entering into the
laboratory.

✓ Students should come with observation and record note book to the laboratory.

✓ Students should maintain silence inside the laboratory.

✓ After completing the laboratory exercise, make sure to shutdown the system
properly.

➢ DONT’S

✓ Students bringing the bags inside the laboratory.

✓ Students using the computers in an improper way.

✓ Students scribbling on the desk and mishandling the chairs.

✓ Students using mobile phones inside the laboratory.

✓ Students making noise inside the laboratory.


SYLLABUS
Course: AL704 (Machine Learning for Data Science)
Branch/Year/Sem: Artificial Intelligence & Machine Learning / IV/ VII

Module1: Algorithms and Machine Learning, Introduction to algorithms, Tools to


analyze algorithms, Algorithmic techniques: Divide and Conquer, examples,
Randomization, Applications.

Module2: Graphs, maps, Map searching, Application of algorithms: stable marriages


example, Dictionaries and hashing, search trees, Dynamic programming.

Module3: Linear Programming, NP completeness, Introduction to personal Genomics,


Massive Raw data in Genomics, Data science on Personal Genomes, Interconnectedness on
Personal Genomes, Case studies .

Module4: Introduction, Classification, Linear Classification, Ensemble Classifiers, Model


Selection, Cross Validation, Holdout.

Module5: Probabilistic modelling, Topic modelling, Probabilistic Inference,


Application: prediction of preterm birth, Data description and preparation,
Relationship between machine learning and statistics.

HARDWARE AND SOFTWARE REQUIREMENTS:

S. Name of Item Specification


No.
1 Computer System Hard Disk min 5 GB
RAM: 4 GB / 8 GB
Processor: Intel i3 or above

S. Name of Item Specification


No.
1 Operating system Window XP or 2000
2 Editor Python ,R Programming
RATIONALE:
The purpose of this subject is to cover the students will be able to derive practical
solutions using predictive analytics. They will also understand the importance of various
algorithms in Data Science.

PREREQUISITE:-

1. Basic Programming Skills: Familiarity with Python, including knowledge of data


structures and control flow.
2. Fundamentals of Statistics: Understanding of key concepts like probability,
distributions, and statistical inference.
3. Linear Algebra and Calculus: Basic understanding of vectors, matrices, and
derivatives, as they are foundational for many machine learning algorithms.
4. Introduction to Data Science: Prior exposure to data analysis, data cleaning, and
working with libraries like pandas and numpy.

COURSE OBJECTIVES AND OUTCOMES

➢ Course Objectives
1. To make the student learn a machine learning algorithm and implementation.
2. To teach the student to implement machine learning algorithm in Python programming to
solve the problems
➢ Course Outcomes
At the end of the course student will be able to:
• Apply practical solutions using predictive analytics.
• Understand the importance of various algorithms in Data Science.
• Create competitive advantage from both structured and unstructured data.
• Predict outcomes with supervised machine learning techniques.
• Unearth patterns in customer behavior with unsupervised technique
Index
Date of Page Date of Grade &
S.No Exp. Name of the Experiment No. Submission Sign of the
Faculty
1 Prepare a data set for Indian Stock Market data for
financial sector company.

2 House Price Prediction using Linear Regression


Machine Learning algorithms.

3 Loan Approval Prediction using Machine Learning


using Python.

4 Prediction car Price by linear regression using scikit-


learn in Python.

5 Given credit card transactions for a customer in a


month, identify those transactions that were made by
the customer and those that were not. A program
with a model of this decision could refund those
transactions that were fraudulent.
6 Develop Logistic Regression Model for a loan
dataset.

7 Implement Naïve Bayes Classification in Python.

8 Build KNN Classification model for a given house


price prediction.
Program Outcome (PO)

The engineering graduate of this institute will demonstrate:


a) Apply knowledge of mathematics, science, computing and engineering fundamentals to computer
science engineering problems.
b) Able to identify, formulate, and demonstrate with excellent programming, and problem solving skills.
c) Design solutions for engineering problems including design of experiment and processes to meet
desired needs within reasonable constraints of manufacturability, sustainability, ecological,
intellectual and health and safety considerations.
d) Propose and develop effective investigational solution of complex problems using research
methodology; including design of experiment, analysis and interpretation of data, and combination of
information to provide suitable conclusion. synthesis
e) Ability to create, select and use the modern techniques and various tools to solve engineering
problems and to evaluate solutions with an understanding of the limitations.
f) Ability to acquire knowledge of contemporary issues to assess societal, health and safety, legal and
cultural issues.
g) Ability to evaluate the impact of engineering solutions on individual as well as organization in a
societal and environmental context, and recognize sustainable development, and will be aware of
emerging technologies and current professional issues.
h) Capability to possess leadership and managerial skills, and understand and commit to professional
ethics and responsibilities.
i) Ability to demonstrate the team work and function effectively as an individual, with an ability to
design, develop, test and debug the project, and will be able to work with a multi-disciplinary team.
j) Ability to communicate effectively on engineering problems with the community, such as being able
to write effective reports and design documentation.
k) Flexibility to feel the recognition of the need for, and have the ability to engage in independent and
life- long learning by professional development and quality enhancement programs in context of
technological change.
l) A practice of engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and entrepreneurship.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Data Preparation for Indian Stock Market Data
Data Science (AL704) (Financial Sector Companies)
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor


1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.


A B C D F

Tutor

1 Title
Data Preparation for Indian Stock Market Data (Financial Sector Companies)
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Identify Data Sources:
• Use stock exchange APIs (e.g., NSE or BSE) or web scraping to gather the data.
• Ensure the data spans a significant timeframe to provide comprehensive insights.
2. Collect Data:
• Retrieve data for multiple financial sector companies.
• Include historical prices, volume, and key financial metrics.
3. Preprocess Data:
• Clean missing or erroneous data.
• Format data columns appropriately (e.g., date, numerical values).
4. Store Data:
• Organize data in a structured format, such as a DataFrame, and export to a CSV file.
• Ensure the dataset is well-documented with headers and descriptions.

3.2 Program
import yfinance as yf
import pandas as pd

# Define a list of financial sector company symbols


companies = ['HDFCBANK.NS', 'ICICIBANK.NS', 'SBIN.NS'] # Example symbols

# Fetch data and store it in a DataFrame


all_data = pd.DataFrame()

for company in companies:


data = yf.download(company, start='2020-01-01', end='2023-12-31')
data['Company'] = company
all_data = pd.concat([all_data, data])

# Save to CSV
all_data.to_csv('indian_stock_market_data.csv')

4 Tabulation Sheet

INPUT OUTPUT
Stock Symbols Historical Data Table

Date Range Date, Open, High, Low, Close, Volume


5 Results
• A structured and cleaned dataset has been prepared for the financial sector companies in the
Indian stock market.
• The dataset includes essential features like historical stock prices, trading volume, and
company identifiers.
• The data is saved in indian_stock_market_data.csv and is ready for further analysis.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: House Price Prediction Using Linear Regression
Data Science (AL704)
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor


1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work
Additional remarks

Grade: Cross the grade.


A B C D F

Tutor

1 Title
House Price Prediction Using Linear Regression
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Data Collection:
• Use a publicly available housing dataset, such as the "California Housing" dataset from
scikit-learn or data from Kaggle.
2. Data Preprocessing:
• Handle missing values by using methods like mean/mode substitution.
• Convert categorical data into numerical values using one-hot encoding.
• Split data into training and testing sets.
3. Model Development:
• Use a simple linear regression model from scikit-learn.
• Train the model on the training data and make predictions.
4. Model Evaluation:
• Use metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-
squared to evaluate performance.

3.2 Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset


data = pd.read_csv('housing_data.csv') # Replace with your dataset
X = data[['square_feet', 'bedrooms', 'bathrooms', 'location_encoded']] # Features
y = data['price'] # Target

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear regression model


model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")


print(f"R-squared: {r2}")
4 Tabulation Sheet

INPUT OUTPUT
Features (X) Predicted House Prices

Square Feet Actual vs Predicted Plot

Bedrooms Error Metrics (MSE, R²)

Bathrooms, Location

5 Results
• A structured and cleaned dataset has been prepared for the financial sector companies in the
Indian stock market.
• The dataset includes essential features like historical stock prices, trading volume, and
company identifiers.
• The data is saved in indian_stock_market_data.csv and is ready for further analysis.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Loan Approval Prediction Using Machine
Data Science (AL704) Learning in Python
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor


1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.


A B C D F

Tutor

1 Title
Loan Approval Prediction Using Machine Learning in Python
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1 Data Collection:
• Use a publicly available loan dataset, such as the "Loan Prediction Problem Dataset" from Kaggle.
2 Data Preprocessing:
• Fill missing values with appropriate statistics (mean, median, or mode).
• Encode categorical variables (like gender, education, property area) using label encoding or one-hot
encoding.
• Scale features if necessary.
3 Model Development:
• Train multiple models such as Logistic Regression, Decision Trees, or Random Forest using scikit-
learn.
• Select the best-performing model based on evaluation metrics.
4 Model Evaluation:
• Use metrics such as accuracy, precision, recall, F1-score, and a confusion matrix.

3.2Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load the dataset


data = pd.read_csv('loan_data.csv') # Replace with your dataset

# Fill missing values


data.fillna(data.mean(), inplace=True)

# Encode categorical features


label_encoder = LabelEncoder()
data['Gender'] = label_encoder.fit_transform(data['Gender'])
data['Married'] = label_encoder.fit_transform(data['Married'])
data['Education'] = label_encoder.fit_transform(data['Education'])
data['Self_Employed'] = label_encoder.fit_transform(data['Self_Employed'])
data['Property_Area'] = label_encoder.fit_transform(data['Property_Area'])

# Define features and target


X = data[['Gender', 'Married', 'Education', 'ApplicantIncome', 'LoanAmount', 'Credit_History', 'Property_Area']]
y = data['Loan_Status'] # Assuming 'Loan_Status' is the target variable

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train a Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model


accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

4 Tabulation Sheet

INPUT OUTPUT
Features (X) Prediction: Approved/Not

Gender, Income, Loan Accuracy, Confusion Matrix

Amount, Credit History Classification Report

5 Results
• Model Performance:
• Accuracy: Value from output
• Confusion Matrix: Shows true positives, false positives, etc.
• Classification Report: Precision, recall, F1-score for each class
• The logistic regression model provides a reasonably accurate prediction of loan approval.
Performance can be improved with more advanced models or data preprocessing techniques.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Car Price Prediction Using Linear Regression
Data Science (AL704) with Scikit-Learn in Python.
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor


1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.


A B C D F

Tutor

1 Title
Car Price Prediction Using Linear Regression with Scikit-Learn in Python
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1 Data Collection:
▪ Use a car price dataset (for example, from Kaggle or a custom CSV file).
2 Data Preprocessing:
• Handle any missing or null values appropriately.
• Convert categorical data (like car make and model) to numerical format using encoding techniques.
▪ Normalize or scale numerical features if needed to ensure better model performance.
3 Model Development:
• Use the Linear Regression class from scikit-learn to build and train the model.
• Split the dataset into training and testing sets to validate the model.
4 Model Evaluation:
• Evaluate the model using R-squared and Mean Squared Error (MSE) to understand its predictive
power.

3.2Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset


data = pd.read_csv('car_data.csv') # Replace with your car price dataset

# Display the first few rows of the dataset


print(data.head())

# Define features (X) and target (y)


X = data[['year', 'mileage', 'engine_size', 'make_encoded']] # Example features
y = data['price'] # Target variable

# Split the data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the linear regression model


model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set


y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")


print(f"R-squared: {r2}")

4 Tabulation Sheet

INPUT OUTPUT
Features (X) Prediction: Approved/Not

Gender, Income, Loan Accuracy, Confusion Matrix

Amount, Credit History Classification Report

5 Results
• Model Performance:
• Accuracy: Value from output
• Confusion Matrix: Shows true positives, false positives, etc.
• Classification Report: Precision, recall, F1-score for each class
• The logistic regression model provides a reasonably accurate prediction of loan approval.
Performance can be improved with more advanced models or data preprocessing techniques.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Group / Title: Given credit card transactions for a customer in
a month, identify those transactions that were made by the
Lab: Machine Learning for
customer and those that were not. A program with a model of
Data Science (AL704)
this decision could refund those transactions that were
fraudulent.
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor


1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.


A B C D F

Tutor

1 Title
Given credit card transactions for a customer in a month, identify those transactions that were made by the
customer and those that were not. A program with a model of this decision could refund those transactions
that were fraudulent.
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Input Data: Collect transaction data (amount, merchant, date, location) and historical
customer behavior.
2. Data Preprocessing:
• Clean data, handle missing values.
• Create features like transaction frequency and typical amounts.
• Normalize continuous features (e.g., amount).
3. Label Transactions: Label historical data as legitimate or fraudulent.
4. Train-Test Split: Divide data into training and test sets.
5. Model Selection: Choose a classifier (e.g., Random Forest, SVM, Logistic Regression).
6. Model Training: Train the model on the training data.
7. Prediction: Use the model to predict fraudulent vs. legitimate transactions.
8. Evaluation: Measure accuracy, precision, recall, and F1-score.
9. Flag Fraudulent Transactions: Identify fraudulent transactions and trigger refund or
review.
10. Deployment: Integrate the model into the system for real-time classification.

3.2 Program
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Sample DataFrame (In practice, this would be a larger dataset with more
features)
# Columns: ['amount', 'merchant', 'location', 'time_of_day', 'is_fraudulent']
data = {
'amount': [100, 200, 150, 300, 250, 120, 500, 1500],
'merchant': ['A', 'B', 'A', 'B', 'C', 'A', 'C', 'B'],
'location': ['NY', 'NY', 'LA', 'LA', 'NY', 'LA', 'NY', 'LA'],
'time_of_day': ['morning', 'evening', 'afternoon', 'morning', 'evening',
'afternoon', 'morning', 'evening'],
'is_fraudulent': [0, 0, 0, 1, 0, 1, 0, 1] # 0 = legitimate, 1 = fraudulent
}
# Convert to DataFrame
df = pd.DataFrame(data)

# Preprocessing: Convert categorical variables to numerical


df = pd.get_dummies(df, columns=['merchant', 'location', 'time_of_day'],
drop_first=True)

# Features and target variable


X = df.drop(columns='is_fraudulent') # Features (transactions)
y = df['is_fraudulent'] # Target (fraudulent or not)

# Split data into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression model


model = LogisticRegression()
model.fit(X_train_scaled, y_train)

# Predictions
y_pred = model.predict(X_test_scaled)

# Evaluate the model


print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Identify fraudulent transactions in new data


new_transactions = np.array([
[200, 'A', 'NY', 'morning'],
[300, 'B', 'LA', 'evening'],
[500, 'C', 'NY', 'afternoon']
])

# Convert new transactions to the same format (one-hot encode)


new_df = pd.DataFrame(new_transactions, columns=['amount', 'merchant',
'location', 'time_of_day'])
new_df = pd.get_dummies(new_df, columns=['merchant', 'location',
'time_of_day'], drop_first=True)
new_df = new_df.reindex(columns=X.columns, fill_value=0) # Align columns
with training data
# Scale new data
new_df_scaled = scaler.transform(new_df)

# Predict if the new transactions are fraudulent


new_predictions = model.predict(new_df_scaled)

# Output the predictions (0: legitimate, 1: fraudulent)


for i, prediction in enumerate(new_predictions):
if prediction == 1:
print(f"Transaction {i + 1} is fraudulent and will be refunded.")
else:
print(f"Transaction {i + 1} is legitimate.")

4 Tabulation Sheet

INPUT OUTPUT

5 Results
Transaction 2 is identified as fraudulent and will be refunded, while Transactions 1 and 3 are legitimate.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Develop logistic regression model for a loan
Data Science (AL704) dataset
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor


1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.


A B C D F

Tutor

1. Title
Develop Logistic Regression Model for a loan dataset.
2. Neatly Drawn and labeled experimental setup
3. Theoretical solution of the instant problem
3.1Algorithm
1. Data Preparation:
• Clean data (handle missing values, encode categorical features, scale numerics).
• Split data into training and test sets.
2. Model Development:
• Define and train a logistic regression model on training data.
3. Model Evaluation:
• Predict on test data.
• Evaluate with accuracy, precision, recall, and ROC AUC score.

3.2 Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score,
confusion_matrix, roc_curve
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer

# Load the dataset


data = pd.read_csv('loan_dataset.csv') # Replace with your dataset path

# Data Preparation
# Separate features and target
X = data.drop('loan_status', axis=1) # Replace 'loan_status' with the actual target column name in your
dataset
y = data['loan_status']

# Handle missing values


imputer = SimpleImputer(strategy='mean') # Impute missing values with the mean
X = pd.DataFrame(imputer.fit_transform(X), columns=X.columns)

# Encode categorical variables and scale features


X = pd.get_dummies(X, drop_first=True) # One-hot encoding for categorical variables
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the dataset


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Model Development
model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions and Evaluation


y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]) # Use probabilities for ROC AUC
conf_matrix = confusion_matrix(y_test, y_pred)

# Print metrics
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"ROC AUC: {roc_auc:.2f}")
print("Confusion Matrix:\n", conf_matrix)

4. Tabulation Sheet

INPUT OUTPUT

Accuracy: 0.85
Precision: 0.80
Recall: 0.75
ROC AUC: 0.87
Confusion Matrix: [[85 15] [20 80]]

5. Results
Accuracy (85%): Indicates the overall correct prediction rate.
Precision (80%): Shows the reliability of positive predictions; 80% of loans predicted to
be approved were truly eligible.
Recall (75%): Reflects the model’s ability to identify eligible loans, successfully
capturing 75% of those cases.
ROC AUC (0.87): High AUC suggests good separation between approved and declined
loans.
Confusion Matrix: Breaks down predictions:
o True Negatives (85): Loans correctly predicted as not approved.
o False Positives (15): Loans incorrectly predicted as approved.
o False Negatives (20): Eligible loans incorrectly predicted as not approved.
o True Positives (80): Loans correctly predicted as approved.
These results help gauge how well your logistic regression model performs, guiding
improvements such as tuning parameters or adjusting the feature set if necessary.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Implement Naïve Bayes Classification in
Data Science (AL704) Python
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor


1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.


A B C D F

Tutor

1. Title
Implement Naïve Bayes Classification in Python.
2. Neatly Drawn and labeled experimental setup
3. Theoretical solution of the instant problem
3.1Algorithm
The Naïve Bayes algorithm follows these steps:
1. Calculate Prior Probabilities: Compute the prior probability of each class based on the
training data (i.e., the frequency of each class).
2. Calculate Likelihood: For each feature in the data, compute the likelihood of the feature
given each class. This involves calculating the probability of each feature value given the
class, often assuming feature independence.
3. Apply Bayes' Theorem: For each class, calculate the posterior probability of the class given
the features by multiplying the prior and likelihoods.
4. Make a Prediction: Assign the class label with the highest posterior probability as the
prediction for each data point.

In Python, the algorithm can be implemented using formulas based on Gaussian distributions
for continuous features (Gaussian Naïve Bayes) or frequency counts for categorical features
(Multinomial/Bernoulli Naïve Bayes).
3.2 Program
import numpy as np

class NaiveBayesClassifier:
def fit(self, X, y):
# Separate data by class
self.classes = np.unique(y)
self.mean = {}
self.variance = {}
self.priors = {}

for c in self.classes:
X_c = X[y == c]
# Calculate mean, variance, and prior for each class
self.mean[c] = X_c.mean(axis=0)
self.variance[c] = X_c.var(axis=0)
self.priors[c] = X_c.shape[0] / X.shape[0]

def calculate_probability(self, x, mean, var):


# Gaussian probability density function
exponent = np.exp(-((x - mean) ** 2) / (2 * var))
return (1 / np.sqrt(2 * np.pi * var)) * exponent

def calculate_class_probabilities(self, x):


probabilities = {}
for c in self.classes:
probabilities[c] = np.log(self.priors[c]) # Use log for numerical stability
for i in range(len(x)):
probabilities[c] += np.log(self.calculate_probability(x[i], self.mean[c][i],
self.variance[c][i]))
return probabilities

def predict(self, X):


y_pred = [max(self.calculate_class_probabilities(x),
key=self.calculate_class_probabilities(x).get) for x in X]
return np.array(y_pred)

# Example usage
if name == " main ":
# Sample data
X = np.array([[1, 20], [2, 21], [1, 22], [4, 20], [5, 21], [6, 22]]) # Features
y = np.array([0, 0, 0, 1, 1, 1]) # Labels

# Instantiate and fit the classifier


model = NaiveBayesClassifier()
model.fit(X, y)

# Predict new samples


X_test = np.array([[1, 20], [4, 22]])
predictions = model.predict(X_test)
print("Predictions:", predictions)
4. Tabulation Sheet

INPUT OUTPUT

X = np.array([[1, 20], [2, 21], [1, 22], [4, 20], [5, 21], [6, 22]]) Predictions: [0 1]
y = np.array([0, 0, 0, 1, 1, 1])
X_test = np.array([[1, 20], [4, 22]])

5. Results
The classifier predicts the following:
• For the test sample [1, 20], the predicted class is 0.
• For the test sample [4, 22], the predicted class is 1.
Therefore, the output of the classifier for the test data X_test = [[1, 20], [4, 22]] is Predictions:
[0 1]. This means that the first sample is likely to belong to class 0, and the second sample is
likely to belong to class 1.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Build KNN Classification model for a given
Data Science (AL704) house price prediction.
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor


1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.


A B C D F

Tutor

1. Title
Build KNN Classification model for a given house price prediction.
2. Neatly Drawn and labeled experimental setup
3. Theoretical solution of the instant problem
3.1Algorithm
1. Prepare Your Data: Gather data about houses (like square footage and number of bedrooms)
and their prices.
2. Choose the Number of Neighbors (K): Decide on K, the number of neighbors to consider
when making predictions. A typical starting point is K=3 or K=5.
3. Calculate Distances: For a new house, calculate the distance to all other houses in the dataset.
The closer a house is to the new house, the more likely it has a similar price.
4. Select the Nearest Neighbors: Pick the K houses closest to the new house based on the
calculated distances.
5. Predict the Price: Calculate the average price of these K nearest neighbors. This average will
be the predicted price for the new house.
3.2 Program
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error

class KNNRegressor:
def init (self, k=3):
self.k = k

def fit(self, X, y):


self.X_train = X
self.y_train = y

def predict(self, X):


predictions = []
for x in X:
# Calculate distances from x to all training points
distances = np.sqrt(np.sum((self.X_train - x) ** 2, axis=1))
# Find the indices of the k closest points
k_indices = distances.argsort()[:self.k]
# Get the prices of the k closest points and average them
k_nearest_prices = self.y_train[k_indices]
predictions.append(np.mean(k_nearest_prices))
return np.array(predictions)

# Example Usage
if name == " main ":
# Sample dataset (square footage, bedrooms) and prices
X = np.array([[1500, 3], [2000, 4], [1700, 3], [2400, 4], [3000, 5]]) # Features: [sqft, bedrooms]
y = np.array([300000, 400000, 320000, 450000, 550000]) # House prices

# Split data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features to ensure fair distance calculations


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Instantiate and train the model


k=3
knn_regressor = KNNRegressor(k=k)
knn_regressor.fit(X_train, y_train)

# Predict house prices for test data


y_pred = knn_regressor.predict(X_test)
print("Predicted Prices:", y_pred)
print("Actual Prices:", y_test)

# Evaluate the model


mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error: {mae}")
4. Tabulation Sheet

INPUT OUTPUT

Predicted Prices: [310000. 475000.]


Actual Prices: [320000 450000]
Mean Absolute Error: 7500.0

5. Results
The KNN model predicted the following house prices:
• For a house with 1600 square feet and 3 bedrooms, the predicted price was $310,000, while
the actual price was $320,000.
• For a house with 2600 square feet and 4 bedrooms, the predicted price was $475,000, while
the actual price was $450,000.
The Mean Absolute Error (MAE) of the model is $7,500, indicating that the average
difference between the predicted and actual prices is $7,500.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy