0% found this document useful (0 votes)

28 views35 pages

Shwet Mlds

Uploaded by

blazehimanshu143316

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views35 pages

Shwet Mlds

Uploaded by

blazehimanshu143316

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

2024-25

Acropolis Institute of
Technology and
Research, Indore Department of CSE
Submitted To: Dr. Mayur Rathi
(Artificial Intelligence & Machine
Learning)

Machine Learning for Data Science

(AL704)

Submitted By:
Shwet Pardhi
Enrollment No. : 0827AL211039
Class/Year/Sem : AL/4th / 7th

[LAB ASSIGNMENT MLDS (AL-704)]

The Machine Learning for Data Science lab focuses on applying machine learning techniques to solve real-world data analysis
problems. The lab emphasizes the use of Python and data science libraries like scikit-learn and pandas.
ACROPOLIS INSTITUTE OF TECHNOLOGY & RESEARCH,
INDORE

Department of CSE (Artificial Intelligence & Machine Learning)

CERTIFICATE

This is to certify that the experimental work entered in this journal as per

the B. TECH. IV year syllab us prescribed by the RGPV was done by

Mr. Shwet Pardhi B.TECH IV year VII semester in the Machine

Learning for Data Science (AL704) Laboratory of this institute during

the academic year 2024- 2025.

Signature of the Faculty

About the Laboratory

In this lab Students gain hands-on experience with data preprocessing, model building, and
evaluation using popular algorithms. The lab emphasizes the use of Python and data science libraries
like scikit-learn and pandas. By the end, participants are equipped with practical skills to analyze and
interpret complex datasets.

Student will be able to

• To derive practical solutions using predictive analytics.
• They will also understand the importance of various algorithms in Data Science.
❖ GENERAL INSTRUCTIONS FOR LABORATORY CLASSES

➢ DO’S

✓ Without Prior permission do not enter into the Laboratory.

✓ While entering into the LAB students should wear their ID cards.

✓ The Students should come with proper uniform.

✓ Students should sign in the LOGIN REGISTER before entering into the
laboratory.

✓ Students should come with observation and record note book to the laboratory.

✓ Students should maintain silence inside the laboratory.

✓ After completing the laboratory exercise, make sure to shutdown the system
properly.

➢ DONT’S

✓ Students bringing the bags inside the laboratory.

✓ Students using the computers in an improper way.

✓ Students scribbling on the desk and mishandling the chairs.

✓ Students using mobile phones inside the laboratory.

✓ Students making noise inside the laboratory.

SYLLABUS
Course: AL704 (Machine Learning for Data Science)
Branch/Year/Sem: Artificial Intelligence & Machine Learning / IV/ VII

Module1: Algorithms and Machine Learning, Introduction to algorithms, Tools to

analyze algorithms, Algorithmic techniques: Divide and Conquer, examples,
Randomization, Applications.

Module2: Graphs, maps, Map searching, Application of algorithms: stable marriages

example, Dictionaries and hashing, search trees, Dynamic programming.

Module3: Linear Programming, NP completeness, Introduction to personal Genomics,

Massive Raw data in Genomics, Data science on Personal Genomes, Interconnectedness on
Personal Genomes, Case studies .

Module4: Introduction, Classification, Linear Classification, Ensemble Classifiers, Model

Selection, Cross Validation, Holdout.

Module5: Probabilistic modelling, Topic modelling, Probabilistic Inference,

Application: prediction of preterm birth, Data description and preparation,
Relationship between machine learning and statistics.

HARDWARE AND SOFTWARE REQUIREMENTS:

S. Name of Item Specification

No.
1 Computer System Hard Disk min 5 GB
RAM: 4 GB / 8 GB
Processor: Intel i3 or above

S. Name of Item Specification

No.
1 Operating system Window XP or 2000
2 Editor Python ,R Programming
RATIONALE:
The purpose of this subject is to cover the students will be able to derive practical
solutions using predictive analytics. They will also understand the importance of various
algorithms in Data Science.

PREREQUISITE:-

1. Basic Programming Skills: Familiarity with Python, including knowledge of data

structures and control flow.
2. Fundamentals of Statistics: Understanding of key concepts like probability,
distributions, and statistical inference.
3. Linear Algebra and Calculus: Basic understanding of vectors, matrices, and
derivatives, as they are foundational for many machine learning algorithms.
4. Introduction to Data Science: Prior exposure to data analysis, data cleaning, and
working with libraries like pandas and numpy.

COURSE OBJECTIVES AND OUTCOMES

➢ Course Objectives
1. To make the student learn a machine learning algorithm and implementation.
2. To teach the student to implement machine learning algorithm in Python programming to
solve the problems
➢ Course Outcomes
At the end of the course student will be able to:
• Apply practical solutions using predictive analytics.
• Understand the importance of various algorithms in Data Science.
• Create competitive advantage from both structured and unstructured data.
• Predict outcomes with supervised machine learning techniques.
• Unearth patterns in customer behavior with unsupervised technique
Index
Date of Page Date of Grade &
S.No Exp. Name of the Experiment No. Submission Sign of the
Faculty
1 Prepare a data set for Indian Stock Market data for
financial sector company.

2 House Price Prediction using Linear Regression

Machine Learning algorithms.

3 Loan Approval Prediction using Machine Learning

using Python.

4 Prediction car Price by linear regression using scikit-

learn in Python.

5 Given credit card transactions for a customer in a

month, identify those transactions that were made by
the customer and those that were not. A program
with a model of this decision could refund those
transactions that were fraudulent.
6 Develop Logistic Regression Model for a loan
dataset.

7 Implement Naïve Bayes Classification in Python.

8 Build KNN Classification model for a given house

price prediction.
Program Outcome (PO)

The engineering graduate of this institute will demonstrate:

a) Apply knowledge of mathematics, science, computing and engineering fundamentals to computer
science engineering problems.
b) Able to identify, formulate, and demonstrate with excellent programming, and problem solving skills.
c) Design solutions for engineering problems including design of experiment and processes to meet
desired needs within reasonable constraints of manufacturability, sustainability, ecological,
intellectual and health and safety considerations.
d) Propose and develop effective investigational solution of complex problems using research
methodology; including design of experiment, analysis and interpretation of data, and combination of
information to provide suitable conclusion. synthesis
e) Ability to create, select and use the modern techniques and various tools to solve engineering
problems and to evaluate solutions with an understanding of the limitations.
f) Ability to acquire knowledge of contemporary issues to assess societal, health and safety, legal and
cultural issues.
g) Ability to evaluate the impact of engineering solutions on individual as well as organization in a
societal and environmental context, and recognize sustainable development, and will be aware of
emerging technologies and current professional issues.
h) Capability to possess leadership and managerial skills, and understand and commit to professional
ethics and responsibilities.
i) Ability to demonstrate the team work and function effectively as an individual, with an ability to
design, develop, test and debug the project, and will be able to work with a multi-disciplinary team.
j) Ability to communicate effectively on engineering problems with the community, such as being able
to write effective reports and design documentation.
k) Flexibility to feel the recognition of the need for, and have the ability to engage in independent and
life- long learning by professional development and quality enhancement programs in context of
technological change.
l) A practice of engineering and management principles and apply these to one’s own work, as a
member and leader in a team, to manage projects and entrepreneurship.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Data Preparation for Indian Stock Market Data
Data Science (AL704) (Financial Sector Companies)
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor

1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.

A B C D F

Tutor

1 Title
Data Preparation for Indian Stock Market Data (Financial Sector Companies)
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Identify Data Sources:
• Use stock exchange APIs (e.g., NSE or BSE) or web scraping to gather the data.
• Ensure the data spans a significant timeframe to provide comprehensive insights.
2. Collect Data:
• Retrieve data for multiple financial sector companies.
• Include historical prices, volume, and key financial metrics.
3. Preprocess Data:
• Clean missing or erroneous data.
• Format data columns appropriately (e.g., date, numerical values).
4. Store Data:
• Organize data in a structured format, such as a DataFrame, and export to a CSV file.
• Ensure the dataset is well-documented with headers and descriptions.

3.2 Program
import yfinance as yf
import pandas as pd

# Define a list of financial sector company symbols

companies = ['HDFCBANK.NS', 'ICICIBANK.NS', 'SBIN.NS'] # Example symbols

# Fetch data and store it in a DataFrame

all_data = pd.DataFrame()

for company in companies:

data = yf.download(company, start='2020-01-01', end='2023-12-31')
data['Company'] = company
all_data = pd.concat([all_data, data])

# Save to CSV
all_data.to_csv('indian_stock_market_data.csv')

4 Tabulation Sheet

INPUT OUTPUT
Stock Symbols Historical Data Table

Date Range Date, Open, High, Low, Close, Volume

5 Results
• A structured and cleaned dataset has been prepared for the financial sector companies in the
Indian stock market.
• The dataset includes essential features like historical stock prices, trading volume, and
company identifiers.
• The data is saved in indian_stock_market_data.csv and is ready for further analysis.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: House Price Prediction Using Linear Regression
Data Science (AL704)
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor

1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work
Additional remarks

Grade: Cross the grade.

A B C D F

Tutor

1 Title
House Price Prediction Using Linear Regression
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Data Collection:
• Use a publicly available housing dataset, such as the "California Housing" dataset from
scikit-learn or data from Kaggle.
2. Data Preprocessing:
• Handle missing values by using methods like mean/mode substitution.
• Convert categorical data into numerical values using one-hot encoding.
• Split data into training and testing sets.
3. Model Development:
• Use a simple linear regression model from scikit-learn.
• Train the model on the training data and make predictions.
4. Model Evaluation:
• Use metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R-
squared to evaluate performance.

3.2 Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset

data = pd.read_csv('housing_data.csv') # Replace with your dataset
X = data[['square_feet', 'bedrooms', 'bathrooms', 'location_encoded']] # Features
y = data['price'] # Target

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a linear regression model

model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")

print(f"R-squared: {r2}")
4 Tabulation Sheet

INPUT OUTPUT
Features (X) Predicted House Prices

Square Feet Actual vs Predicted Plot

Bedrooms Error Metrics (MSE, R²)

Bathrooms, Location

5 Results
• A structured and cleaned dataset has been prepared for the financial sector companies in the
Indian stock market.
• The dataset includes essential features like historical stock prices, trading volume, and
company identifiers.
• The data is saved in indian_stock_market_data.csv and is ready for further analysis.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Loan Approval Prediction Using Machine
Data Science (AL704) Learning in Python
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor

1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.

A B C D F

Tutor

1 Title
Loan Approval Prediction Using Machine Learning in Python
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1 Data Collection:
• Use a publicly available loan dataset, such as the "Loan Prediction Problem Dataset" from Kaggle.
2 Data Preprocessing:
• Fill missing values with appropriate statistics (mean, median, or mode).
• Encode categorical variables (like gender, education, property area) using label encoding or one-hot
encoding.
• Scale features if necessary.
3 Model Development:
• Train multiple models such as Logistic Regression, Decision Trees, or Random Forest using scikit-
learn.
• Select the best-performing model based on evaluation metrics.
4 Model Evaluation:
• Use metrics such as accuracy, precision, recall, F1-score, and a confusion matrix.

3.2Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load the dataset

data = pd.read_csv('loan_data.csv') # Replace with your dataset

# Fill missing values

data.fillna(data.mean(), inplace=True)

# Encode categorical features

label_encoder = LabelEncoder()
data['Gender'] = label_encoder.fit_transform(data['Gender'])
data['Married'] = label_encoder.fit_transform(data['Married'])
data['Education'] = label_encoder.fit_transform(data['Education'])
data['Self_Employed'] = label_encoder.fit_transform(data['Self_Employed'])
data['Property_Area'] = label_encoder.fit_transform(data['Property_Area'])

# Define features and target

X = data[['Gender', 'Married', 'Education', 'ApplicantIncome', 'LoanAmount', 'Credit_History', 'Property_Area']]
y = data['Loan_Status'] # Assuming 'Loan_Status' is the target variable

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Train a Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model

accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)
class_report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(conf_matrix)
print("Classification Report:")
print(class_report)

4 Tabulation Sheet

INPUT OUTPUT
Features (X) Prediction: Approved/Not

Gender, Income, Loan Accuracy, Confusion Matrix

Amount, Credit History Classification Report

5 Results
• Model Performance:
• Accuracy: Value from output
• Confusion Matrix: Shows true positives, false positives, etc.
• Classification Report: Precision, recall, F1-score for each class
• The logistic regression model provides a reasonably accurate prediction of loan approval.
Performance can be improved with more advanced models or data preprocessing techniques.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Car Price Prediction Using Linear Regression
Data Science (AL704) with Scikit-Learn in Python.
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor

1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.

A B C D F

Tutor

1 Title
Car Price Prediction Using Linear Regression with Scikit-Learn in Python
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1 Data Collection:
▪ Use a car price dataset (for example, from Kaggle or a custom CSV file).
2 Data Preprocessing:
• Handle any missing or null values appropriately.
• Convert categorical data (like car make and model) to numerical format using encoding techniques.
▪ Normalize or scale numerical features if needed to ensure better model performance.
3 Model Development:
• Use the Linear Regression class from scikit-learn to build and train the model.
• Split the dataset into training and testing sets to validate the model.
4 Model Evaluation:
• Evaluate the model using R-squared and Mean Squared Error (MSE) to understand its predictive
power.

3.2Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset

data = pd.read_csv('car_data.csv') # Replace with your car price dataset

# Display the first few rows of the dataset

print(data.head())

# Define features (X) and target (y)

X = data[['year', 'mileage', 'engine_size', 'make_encoded']] # Example features
y = data['price'] # Target variable

# Split the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train the linear regression model

model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set

y_pred = model.predict(X_test)
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")

print(f"R-squared: {r2}")

4 Tabulation Sheet

INPUT OUTPUT
Features (X) Prediction: Approved/Not

Gender, Income, Loan Accuracy, Confusion Matrix

Amount, Credit History Classification Report

5 Results
• Model Performance:
• Accuracy: Value from output
• Confusion Matrix: Shows true positives, false positives, etc.
• Classification Report: Precision, recall, F1-score for each class
• The logistic regression model provides a reasonably accurate prediction of loan approval.
Performance can be improved with more advanced models or data preprocessing techniques.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Group / Title: Given credit card transactions for a customer in
a month, identify those transactions that were made by the
Lab: Machine Learning for
customer and those that were not. A program with a model of
Data Science (AL704)
this decision could refund those transactions that were
fraudulent.
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor

1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.

A B C D F

Tutor

1 Title
Given credit card transactions for a customer in a month, identify those transactions that were made by the
customer and those that were not. A program with a model of this decision could refund those transactions
that were fraudulent.
2 Neatly Drawn and labeled experimental setup
3 Theoretical solution of the instant problem
3.1 Algorithm
1. Input Data: Collect transaction data (amount, merchant, date, location) and historical
customer behavior.
2. Data Preprocessing:
• Clean data, handle missing values.
• Create features like transaction frequency and typical amounts.
• Normalize continuous features (e.g., amount).
3. Label Transactions: Label historical data as legitimate or fraudulent.
4. Train-Test Split: Divide data into training and test sets.
5. Model Selection: Choose a classifier (e.g., Random Forest, SVM, Logistic Regression).
6. Model Training: Train the model on the training data.
7. Prediction: Use the model to predict fraudulent vs. legitimate transactions.
8. Evaluation: Measure accuracy, precision, recall, and F1-score.
9. Flag Fraudulent Transactions: Identify fraudulent transactions and trigger refund or
review.
10. Deployment: Integrate the model into the system for real-time classification.

3.2 Program
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Sample DataFrame (In practice, this would be a larger dataset with more
features)
# Columns: ['amount', 'merchant', 'location', 'time_of_day', 'is_fraudulent']
data = {
'amount': [100, 200, 150, 300, 250, 120, 500, 1500],
'merchant': ['A', 'B', 'A', 'B', 'C', 'A', 'C', 'B'],
'location': ['NY', 'NY', 'LA', 'LA', 'NY', 'LA', 'NY', 'LA'],
'time_of_day': ['morning', 'evening', 'afternoon', 'morning', 'evening',
'afternoon', 'morning', 'evening'],
'is_fraudulent': [0, 0, 0, 1, 0, 1, 0, 1] # 0 = legitimate, 1 = fraudulent
}
# Convert to DataFrame
df = pd.DataFrame(data)

# Preprocessing: Convert categorical variables to numerical

df = pd.get_dummies(df, columns=['merchant', 'location', 'time_of_day'],
drop_first=True)

# Features and target variable

X = df.drop(columns='is_fraudulent') # Features (transactions)
y = df['is_fraudulent'] # Target (fraudulent or not)

# Split data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Feature scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Logistic Regression model

model = LogisticRegression()
model.fit(X_train_scaled, y_train)

# Predictions
y_pred = model.predict(X_test_scaled)

# Evaluate the model

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

# Identify fraudulent transactions in new data

new_transactions = np.array([
[200, 'A', 'NY', 'morning'],
[300, 'B', 'LA', 'evening'],
[500, 'C', 'NY', 'afternoon']
])

# Convert new transactions to the same format (one-hot encode)

new_df = pd.DataFrame(new_transactions, columns=['amount', 'merchant',
'location', 'time_of_day'])
new_df = pd.get_dummies(new_df, columns=['merchant', 'location',
'time_of_day'], drop_first=True)
new_df = new_df.reindex(columns=X.columns, fill_value=0) # Align columns
with training data
# Scale new data
new_df_scaled = scaler.transform(new_df)

# Predict if the new transactions are fraudulent

new_predictions = model.predict(new_df_scaled)

# Output the predictions (0: legitimate, 1: fraudulent)

for i, prediction in enumerate(new_predictions):
if prediction == 1:
print(f"Transaction {i + 1} is fraudulent and will be refunded.")
else:
print(f"Transaction {i + 1} is legitimate.")

4 Tabulation Sheet

INPUT OUTPUT

5 Results
Transaction 2 is identified as fraudulent and will be refunded, while Transactions 1 and 3 are legitimate.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Develop logistic regression model for a loan
Data Science (AL704) dataset
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor

1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.

A B C D F

Tutor

1. Title
Develop Logistic Regression Model for a loan dataset.
2. Neatly Drawn and labeled experimental setup
3. Theoretical solution of the instant problem
3.1Algorithm
1. Data Preparation:
• Clean data (handle missing values, encode categorical features, scale numerics).
• Split data into training and test sets.
2. Model Development:
• Define and train a logistic regression model on training data.
3. Model Evaluation:
• Predict on test data.
• Evaluate with accuracy, precision, recall, and ROC AUC score.

3.2 Program
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score,
confusion_matrix, roc_curve
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer

# Load the dataset

data = pd.read_csv('loan_dataset.csv') # Replace with your dataset path

# Data Preparation
# Separate features and target
X = data.drop('loan_status', axis=1) # Replace 'loan_status' with the actual target column name in your
dataset
y = data['loan_status']

# Handle missing values

imputer = SimpleImputer(strategy='mean') # Impute missing values with the mean
X = pd.DataFrame(imputer.fit_transform(X), columns=X.columns)

# Encode categorical variables and scale features

X = pd.get_dummies(X, drop_first=True) # One-hot encoding for categorical variables
scaler = StandardScaler()
X = scaler.fit_transform(X)

# Split the dataset

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
# Model Development
model = LogisticRegression()
model.fit(X_train, y_train)

# Predictions and Evaluation

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
roc_auc = roc_auc_score(y_test, model.predict_proba(X_test)[:, 1]) # Use probabilities for ROC AUC
conf_matrix = confusion_matrix(y_test, y_pred)

# Print metrics
print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"ROC AUC: {roc_auc:.2f}")
print("Confusion Matrix:\n", conf_matrix)

4. Tabulation Sheet

INPUT OUTPUT

Accuracy: 0.85
Precision: 0.80
Recall: 0.75
ROC AUC: 0.87
Confusion Matrix: [[85 15] [20 80]]

5. Results
Accuracy (85%): Indicates the overall correct prediction rate.
Precision (80%): Shows the reliability of positive predictions; 80% of loans predicted to
be approved were truly eligible.
Recall (75%): Reflects the model’s ability to identify eligible loans, successfully
capturing 75% of those cases.
ROC AUC (0.87): High AUC suggests good separation between approved and declined
loans.
Confusion Matrix: Breaks down predictions:
o True Negatives (85): Loans correctly predicted as not approved.
o False Positives (15): Loans incorrectly predicted as approved.
o False Negatives (20): Eligible loans incorrectly predicted as not approved.
o True Positives (80): Loans correctly predicted as approved.
These results help gauge how well your logistic regression model performs, guiding
improvements such as tuning parameters or adjusting the feature set if necessary.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Implement Naïve Bayes Classification in
Data Science (AL704) Python
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor

1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.

A B C D F

Tutor

1. Title
Implement Naïve Bayes Classification in Python.
2. Neatly Drawn and labeled experimental setup
3. Theoretical solution of the instant problem
3.1Algorithm
The Naïve Bayes algorithm follows these steps:
1. Calculate Prior Probabilities: Compute the prior probability of each class based on the
training data (i.e., the frequency of each class).
2. Calculate Likelihood: For each feature in the data, compute the likelihood of the feature
given each class. This involves calculating the probability of each feature value given the
class, often assuming feature independence.
3. Apply Bayes' Theorem: For each class, calculate the posterior probability of the class given
the features by multiplying the prior and likelihoods.
4. Make a Prediction: Assign the class label with the highest posterior probability as the
prediction for each data point.

In Python, the algorithm can be implemented using formulas based on Gaussian distributions
for continuous features (Gaussian Naïve Bayes) or frequency counts for categorical features
(Multinomial/Bernoulli Naïve Bayes).
3.2 Program
import numpy as np

class NaiveBayesClassifier:
def fit(self, X, y):
# Separate data by class
self.classes = np.unique(y)
self.mean = {}
self.variance = {}
self.priors = {}

for c in self.classes:
X_c = X[y == c]
# Calculate mean, variance, and prior for each class
self.mean[c] = X_c.mean(axis=0)
self.variance[c] = X_c.var(axis=0)
self.priors[c] = X_c.shape[0] / X.shape[0]

def calculate_probability(self, x, mean, var):

# Gaussian probability density function
exponent = np.exp(-((x - mean) ** 2) / (2 * var))
return (1 / np.sqrt(2 * np.pi * var)) * exponent

def calculate_class_probabilities(self, x):

probabilities = {}
for c in self.classes:
probabilities[c] = np.log(self.priors[c]) # Use log for numerical stability
for i in range(len(x)):
probabilities[c] += np.log(self.calculate_probability(x[i], self.mean[c][i],
self.variance[c][i]))
return probabilities

def predict(self, X):

y_pred = [max(self.calculate_class_probabilities(x),
key=self.calculate_class_probabilities(x).get) for x in X]
return np.array(y_pred)

# Example usage
if name == " main ":
# Sample data
X = np.array([[1, 20], [2, 21], [1, 22], [4, 20], [5, 21], [6, 22]]) # Features
y = np.array([0, 0, 0, 1, 1, 1]) # Labels

# Instantiate and fit the classifier

model = NaiveBayesClassifier()
model.fit(X, y)

# Predict new samples

X_test = np.array([[1, 20], [4, 22]])
predictions = model.predict(X_test)
print("Predictions:", predictions)
4. Tabulation Sheet

INPUT OUTPUT

X = np.array([[1, 20], [2, 21], [1, 22], [4, 20], [5, 21], [6, 22]]) Predictions: [0 1]
y = np.array([0, 0, 0, 1, 1, 1])
X_test = np.array([[1, 20], [4, 22]])

5. Results
The classifier predicts the following:
• For the test sample [1, 20], the predicted class is 0.
• For the test sample [4, 22], the predicted class is 1.
Therefore, the output of the classifier for the test data X_test = [[1, 20], [4, 22]] is Predictions:
[0 1]. This means that the first sample is likely to belong to class 0, and the second sample is
likely to belong to class 1.
Acropolis Institute of Technology and Research, Indore
Department of CSE (Artificial Intelligence & Machine Learning)
Lab: Machine Learning for Group / Title: Build KNN Classification model for a given
Data Science (AL704) house price prediction.
EVALUATION RECORD Type/ Lab Session:
Name Shwet Pardhi Enrollment No. 0827AL211039
Performing on First submission Second submission
Extra Regular

Grade and Remarks by the Tutor

1. Clarity about the objective of experiment
2. Clarity about the Outcome
3. Submitted the work in desired format
4. Shown capability to solve the problem
5. Contribution to the team work

Additional remarks

Grade: Cross the grade.

A B C D F

Tutor

1. Title
Build KNN Classification model for a given house price prediction.
2. Neatly Drawn and labeled experimental setup
3. Theoretical solution of the instant problem
3.1Algorithm
1. Prepare Your Data: Gather data about houses (like square footage and number of bedrooms)
and their prices.
2. Choose the Number of Neighbors (K): Decide on K, the number of neighbors to consider
when making predictions. A typical starting point is K=3 or K=5.
3. Calculate Distances: For a new house, calculate the distance to all other houses in the dataset.
The closer a house is to the new house, the more likely it has a similar price.
4. Select the Nearest Neighbors: Pick the K houses closest to the new house based on the
calculated distances.
5. Predict the Price: Calculate the average price of these K nearest neighbors. This average will
be the predicted price for the new house.
3.2 Program
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error

class KNNRegressor:
def init (self, k=3):
self.k = k

def fit(self, X, y):

self.X_train = X
self.y_train = y

def predict(self, X):

predictions = []
for x in X:
# Calculate distances from x to all training points
distances = np.sqrt(np.sum((self.X_train - x) ** 2, axis=1))
# Find the indices of the k closest points
k_indices = distances.argsort()[:self.k]
# Get the prices of the k closest points and average them
k_nearest_prices = self.y_train[k_indices]
predictions.append(np.mean(k_nearest_prices))
return np.array(predictions)

# Example Usage
if name == " main ":
# Sample dataset (square footage, bedrooms) and prices
X = np.array([[1500, 3], [2000, 4], [1700, 3], [2400, 4], [3000, 5]]) # Features: [sqft, bedrooms]
y = np.array([300000, 400000, 320000, 450000, 550000]) # House prices

# Split data into training and test sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Standardize features to ensure fair distance calculations

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Instantiate and train the model

k=3
knn_regressor = KNNRegressor(k=k)
knn_regressor.fit(X_train, y_train)

# Predict house prices for test data

y_pred = knn_regressor.predict(X_test)
print("Predicted Prices:", y_pred)
print("Actual Prices:", y_test)

# Evaluate the model

mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error: {mae}")
4. Tabulation Sheet

INPUT OUTPUT

Predicted Prices: [310000. 475000.]

Actual Prices: [320000 450000]
Mean Absolute Error: 7500.0

5. Results
The KNN model predicted the following house prices:
• For a house with 1600 square feet and 3 bedrooms, the predicted price was $310,000, while
the actual price was $320,000.
• For a house with 2600 square feet and 4 bedrooms, the predicted price was $475,000, while
the actual price was $450,000.
The Mean Absolute Error (MAE) of the model is $7,500, indicating that the average
difference between the predicted and actual prices is $7,500.

IILM-University-Fee-Structure-Brochure-2025
No ratings yet
IILM-University-Fee-Structure-Brochure-2025
14 pages
Kramer Vp728
No ratings yet
Kramer Vp728
69 pages
The Impact of E-Commerce On The Digital Economy
No ratings yet
The Impact of E-Commerce On The Digital Economy
6 pages
DSS Products Comparison List - V8.2.0 - 20221222
No ratings yet
DSS Products Comparison List - V8.2.0 - 20221222
22 pages
Lista de Precios 14.03.22 t.c3.72
No ratings yet
Lista de Precios 14.03.22 t.c3.72
68 pages
RISHABH ML FILE
No ratings yet
RISHABH ML FILE
102 pages
MK37 VT
No ratings yet
MK37 VT
38 pages
Ml Lab Manual
No ratings yet
Ml Lab Manual
43 pages
Cy3681 Ez Usb Fx2 Development Kit 14
No ratings yet
Cy3681 Ez Usb Fx2 Development Kit 14
462 pages
Machine learning record
No ratings yet
Machine learning record
52 pages
ML Practical Format
No ratings yet
ML Practical Format
82 pages
ML Manual2025_IV YEar (1)
No ratings yet
ML Manual2025_IV YEar (1)
39 pages
Pitfalls of data visualization
No ratings yet
Pitfalls of data visualization
11 pages
ML Lab Manual-1
No ratings yet
ML Lab Manual-1
64 pages
CSE-DS Power BI Updated Lab Manual
No ratings yet
CSE-DS Power BI Updated Lab Manual
99 pages
Vinay_SDE
No ratings yet
Vinay_SDE
1 page
Malla Reddy Engineering College: Main Campus
No ratings yet
Malla Reddy Engineering College: Main Campus
46 pages
ML Manual2024 - IV YEar
No ratings yet
ML Manual2024 - IV YEar
39 pages
ML Lab Manual Simplified
No ratings yet
ML Lab Manual Simplified
40 pages
ML-MANUAL
No ratings yet
ML-MANUAL
42 pages
FML_lab_manual
No ratings yet
FML_lab_manual
49 pages
DE LA_CRUZ_JULIUS_EDMAN-203_FINAL-EXAMINATION
No ratings yet
DE LA_CRUZ_JULIUS_EDMAN-203_FINAL-EXAMINATION
15 pages
Machine Learning (r22a6602)
No ratings yet
Machine Learning (r22a6602)
125 pages
CSE Sem7 N 8
No ratings yet
CSE Sem7 N 8
51 pages
AEC 5 UNITS NOTES NEW completed
No ratings yet
AEC 5 UNITS NOTES NEW completed
64 pages
ML Lab Mannual Final
No ratings yet
ML Lab Mannual Final
54 pages
MLA LabManual1
No ratings yet
MLA LabManual1
52 pages
VMTW_ML_LAB_MANUAL
No ratings yet
VMTW_ML_LAB_MANUAL
37 pages
190-00357-01 500W Quick Ref
No ratings yet
190-00357-01 500W Quick Ref
22 pages
FDS Lab Manual Student Manual
No ratings yet
FDS Lab Manual Student Manual
50 pages
AI_(AL-304)__Lab_Manual
No ratings yet
AI_(AL-304)__Lab_Manual
37 pages
ML Record_unlocked
No ratings yet
ML Record_unlocked
67 pages
1 DSML Intro
No ratings yet
1 DSML Intro
12 pages
New Syllabus
No ratings yet
New Syllabus
4 pages
Assignment PDF
No ratings yet
Assignment PDF
2 pages
(Fadia PDF
No ratings yet
(Fadia PDF
464 pages
Crime Data Mediante Machine Learning
No ratings yet
Crime Data Mediante Machine Learning
6 pages
ME P4252-II Semester - MACHINE LEARNING
No ratings yet
ME P4252-II Semester - MACHINE LEARNING
46 pages
EM 538_ISE 489 Syllabus (1)
No ratings yet
EM 538_ISE 489 Syllabus (1)
11 pages
python lab record AIDS (1)
No ratings yet
python lab record AIDS (1)
79 pages
ML Lab Manual 20-06
No ratings yet
ML Lab Manual 20-06
40 pages
III SEM - AI19442 - FOML
No ratings yet
III SEM - AI19442 - FOML
42 pages
AD3461 MACHINE LEARNING LABORATORY SYLLABUS
No ratings yet
AD3461 MACHINE LEARNING LABORATORY SYLLABUS
2 pages
The Rise and Fall of The HP Way
100% (1)
The Rise and Fall of The HP Way
6 pages
Experiment List. DSPYL
No ratings yet
Experiment List. DSPYL
10 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
31 pages
III-II AIDS R22 ML
No ratings yet
III-II AIDS R22 ML
25 pages
AML Lab record index
No ratings yet
AML Lab record index
4 pages
Anant MLDS File
No ratings yet
Anant MLDS File
38 pages
21ai66 ML Lab Manual
No ratings yet
21ai66 ML Lab Manual
41 pages
Final Scheme and Syllabus for Minor (ML) - TT2 1
No ratings yet
Final Scheme and Syllabus for Minor (ML) - TT2 1
3 pages
21ucs608 Ai&Ml Lab Manual (1) (1)
No ratings yet
21ucs608 Ai&Ml Lab Manual (1) (1)
28 pages
Group 2 - IAP
No ratings yet
Group 2 - IAP
11 pages
Major Project
No ratings yet
Major Project
20 pages
ai file
No ratings yet
ai file
36 pages
ML Using Python IT UPDATED
No ratings yet
ML Using Python IT UPDATED
53 pages
PA_LAB_MDM[1]
No ratings yet
PA_LAB_MDM[1]
4 pages
21UCS608 AI&ML Lab Manual (1)
No ratings yet
21UCS608 AI&ML Lab Manual (1)
29 pages
Computer Science & Engineering Micro Project: MLR Institute of Technology
No ratings yet
Computer Science & Engineering Micro Project: MLR Institute of Technology
33 pages
FlashFX Pro 3.3 With BCM5892 User Manual
No ratings yet
FlashFX Pro 3.3 With BCM5892 User Manual
3 pages
1 22csu601-Aiml Syllabus
No ratings yet
1 22csu601-Aiml Syllabus
4 pages
FINAL UNIT 4
No ratings yet
FINAL UNIT 4
107 pages
Institute's Vision
No ratings yet
Institute's Vision
57 pages
ML Unit I
No ratings yet
ML Unit I
13 pages
LG P925 Thrill 4G Service Manual
No ratings yet
LG P925 Thrill 4G Service Manual
278 pages
DS Minor Degree Specialization Scheme 2022 23
No ratings yet
DS Minor Degree Specialization Scheme 2022 23
19 pages
Skill Based Projects - Data - Science (See List On Last Page)
No ratings yet
Skill Based Projects - Data - Science (See List On Last Page)
4 pages
Primary (Class 3 To 5) Paragraph (150 Words) / Painting
No ratings yet
Primary (Class 3 To 5) Paragraph (150 Words) / Painting
7 pages
Precision Navigator II: The Professional River Radar With Integrated ECDIS Map View and Inland AIS
No ratings yet
Precision Navigator II: The Professional River Radar With Integrated ECDIS Map View and Inland AIS
8 pages
Machine Learning & Deep Learning Prodegree
No ratings yet
Machine Learning & Deep Learning Prodegree
6 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
167 pages
ML LAB MANNUAL R22 CSE(DS)
No ratings yet
ML LAB MANNUAL R22 CSE(DS)
46 pages
4Ja1-Tc & 4Jh1-Tc Engine: Engine Management System Operation & Diagnosis
100% (1)
4Ja1-Tc & 4Jh1-Tc Engine: Engine Management System Operation & Diagnosis
91 pages
Grade Three Exercise 3
No ratings yet
Grade Three Exercise 3
4 pages
INF385T IMLsyllabus
No ratings yet
INF385T IMLsyllabus
4 pages
202046702 Artificial Intelligence and Machine Learning
No ratings yet
202046702 Artificial Intelligence and Machine Learning
4 pages
20 - 1-Introduction To Kinematics of Machines
No ratings yet
20 - 1-Introduction To Kinematics of Machines
11 pages
7th Sem Updated Lab Manual
No ratings yet
7th Sem Updated Lab Manual
14 pages
3P Machine Learning IAC VI Sem BTECH CSE Lab
No ratings yet
3P Machine Learning IAC VI Sem BTECH CSE Lab
2 pages
Ford Systemsthinking
No ratings yet
Ford Systemsthinking
41 pages
Adding Bullets to Wiki Markup (1)
No ratings yet
Adding Bullets to Wiki Markup (1)
9 pages
Life of A Reservoir
100% (2)
Life of A Reservoir
8 pages
Actual4Test: Actual4test - Actual Test Exam Dumps-Pass For IT Exams
No ratings yet
Actual4Test: Actual4test - Actual Test Exam Dumps-Pass For IT Exams
4 pages
Nikhil MOOC Report
No ratings yet
Nikhil MOOC Report
16 pages
Hibernate Tool
No ratings yet
Hibernate Tool
50 pages
Artifical Intelligence and Machine Learning Lab
No ratings yet
Artifical Intelligence and Machine Learning Lab
109 pages
Data Science Course in Hyderabad - Innomatics
No ratings yet
Data Science Course in Hyderabad - Innomatics
10 pages
ME P4252-II Semester - MACHINE LEARNING
100% (1)
ME P4252-II Semester - MACHINE LEARNING
48 pages
Machine Learning Mastery for Engineers
From Everand
Machine Learning Mastery for Engineers
Abdellatif Sadeq
No ratings yet
BTCS9202 Data Sciences Lab Manual
No ratings yet
BTCS9202 Data Sciences Lab Manual
39 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.