0% found this document useful (0 votes)

19 views28 pages

Logistic Regression

This document discusses logistic regression, including what it is, when it is used, how the logistic function relates to logistic regression, and how to evaluate logistic regression models. It also includes code to load and explore a sample dataset.

Uploaded by

dgdangelodg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views28 pages

Logistic Regression

Uploaded by

dgdangelodg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

6/1/23, 10:31 PM logistic_regression

Here Some Important Questions with Answer on logistic Regression

What is logistic regression, and when is it used?

logistic regression is also part of regression but the main difference between logistics
regression and other regression problem is ,logistics regression work on categorical
problems like male and female , true and false ,yes or no

When is it used ?

1. Predictive modelling
2. Medical Research
3. Credit Scoring
4. Market and customer analystics

What is the logistic function (also known as the sigmoid function), and why is it used in
logistic regression?

The logistics function, also known as sigmoid function, it is a mathematical function that
maps any real value number to a value between 0 and 1 , it is s-shaped cured and is
represented by formula σ(z) = 1 / (1 + e^(-z))

where σ(z) represents the output (probability) and z represents the input to the function.

How do you evaluate the performance of a logistic regression model?

Here is some commonly used evaluation methods for logistic regression

1. confusion matrix
2. Accuracy
3. Precision
4. Recall
5. F1 Score
6. ROC Curve

Import Ncessary Library

In [ ]: import numpy as np
import pandas as pd

load dataset

In [ ]: df = pd.read_csv('ft.csv')
df.head()

file:///C:/Users/rinki/Downloads/logistic_regression.html 1/28
6/1/23, 10:31 PM logistic_regression

Out[ ]: male age education currentSmoker cigsPerDay BPMeds prevalentStroke prevalentHyp

0 1 39 4.0 0 0.0 0.0 0 0

1 0 46 2.0 0 0.0 0.0 0 0

2 1 48 1.0 1 20.0 0.0 0 0

3 0 61 3.0 1 30.0 0.0 0 1

4 0 46 3.0 1 23.0 0.0 0 0

Perform EDA

In [ ]: df.shape

Out[ ]: (4238, 16)

In [ ]: #view null value

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4238 entries, 0 to 4237
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 male 4238 non-null int64
1 age 4238 non-null int64
2 education 4133 non-null float64
3 currentSmoker 4238 non-null int64
4 cigsPerDay 4209 non-null float64
5 BPMeds 4185 non-null float64
6 prevalentStroke 4238 non-null int64
7 prevalentHyp 4238 non-null int64
8 diabetes 4238 non-null int64
9 totChol 4188 non-null float64
10 sysBP 4238 non-null float64
11 diaBP 4238 non-null float64
12 BMI 4219 non-null float64
13 heartRate 4237 non-null float64
14 glucose 3850 non-null float64
15 TenYearCHD 4238 non-null int64
dtypes: float64(9), int64(7)
memory usage: 529.9 KB

In [ ]: #view descriptive statics

df.describe()

file:///C:/Users/rinki/Downloads/logistic_regression.html 2/28
6/1/23, 10:31 PM logistic_regression

Out[ ]: male age education currentSmoker cigsPerDay BPMeds preval

count 4238.000000 4238.000000 4133.000000 4238.000000 4209.000000 4185.000000 42

mean 0.429212 49.584946 1.978950 0.494101 9.003089 0.029630

std 0.495022 8.572160 1.019791 0.500024 11.920094 0.169584

min 0.000000 32.000000 1.000000 0.000000 0.000000 0.000000

25% 0.000000 42.000000 1.000000 0.000000 0.000000 0.000000

50% 0.000000 49.000000 2.000000 0.000000 0.000000 0.000000

75% 1.000000 56.000000 3.000000 1.000000 20.000000 0.000000

max 1.000000 70.000000 4.000000 1.000000 70.000000 1.000000

In [ ]: #check duplicate rows

duplicate_rows = df.duplicated()
#count the number of True values
num_dup_rows = duplicate_rows.sum()
num_dup_rows

Out[ ]: 0

In [ ]: import matplotlib.pyplot as plt

import seaborn as sns

num_feat = ["male","age","education","currentSmoker","cigsPerDay","BPMeds","prev
for feature in num_feat:
plt.figure(figsize =(7,7) )
sns.histplot(df[feature],kde = True)
plt.title(f"Histogram of {feature}")

file:///C:/Users/rinki/Downloads/logistic_regression.html 3/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 4/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 5/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 6/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 7/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 8/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 9/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 10/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 11/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 12/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 13/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 14/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 15/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 16/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 17/28
6/1/23, 10:31 PM logistic_regression

In [ ]: num_feat = ["male","age","education","currentSmoker","cigsPerDay","BPMeds","prev
sns.pairplot(df[num_feat])
plt.show()

file:///C:/Users/rinki/Downloads/logistic_regression.html 18/28
6/1/23, 10:31 PM logistic_regression

In [ ]: for feature in num_feat:

plt.figure(figsize=(6,4))
sns.boxplot(x=df[feature])
plt.title(f'boxplot of {feature}')
plt.show()

file:///C:/Users/rinki/Downloads/logistic_regression.html 19/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 20/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 21/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 22/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 23/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 24/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 25/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 26/28
6/1/23, 10:31 PM logistic_regression

In [ ]: X = df[['age','prevalentHyp','sysBP','diaBP','glucose']]
y = df['TenYearCHD']

In [ ]: X.isnull().sum()

Out[ ]: age 0
prevalentHyp 0
sysBP 0
diaBP 0
glucose 388
dtype: int64

In [ ]: X['glucose'] = X['glucose'].fillna(value=df['glucose'].mean())

<ipython-input-19-32a7772c3ba4>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/sta

ble/user_guide/indexing.html#returning-a-view-versus-a-copy
X['glucose'] = X['glucose'].fillna(value=df['glucose'].mean())

In [ ]: from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_st

In [ ]: from sklearn.linear_model import LogisticRegression

lr = LogisticRegression()
lr.fit(x_train, y_train)

Out[ ]: ▾ LogisticRegression

LogisticRegression()

file:///C:/Users/rinki/Downloads/logistic_regression.html 27/28
6/1/23, 10:31 PM logistic_regression

In [ ]: score = lr.score(x_train, y_train)

score

Out[ ]: 0.8486176668914363

In [ ]: from sklearn.metrics import confusion_matrix

y_pred = lr.predict(x_test)
y_true = y_test
confusion_matrix(y_true, y_pred)

Out[ ]: array([[1080, 4],

[ 182, 6]])

In [ ]: score = np.array(score).reshape(-1, 1)

In [ ]: from sklearn.metrics import roc_curve, auc

fpr, tpr, thresholds = roc_curve(y_true, y_pred)

In [ ]: roc_auc = auc(fpr, tpr)

In [ ]: plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' %
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()

file:///C:/Users/rinki/Downloads/logistic_regression.html 28/28

Get (Ebook PDF) Biostatistics With R An Introduction To Statistics Through Biological Data PDF Ebook With Full Chapters Now
100% (6)
Get (Ebook PDF) Biostatistics With R An Introduction To Statistics Through Biological Data PDF Ebook With Full Chapters Now
51 pages
A Guide to Robust Statistical Methods -- Rand R_ Wilcox
No ratings yet
A Guide to Robust Statistical Methods -- Rand R_ Wilcox
338 pages
Tesis de Triage Por Zoar
100% (4)
Tesis de Triage Por Zoar
102 pages
Vedant,Aiml
No ratings yet
Vedant,Aiml
63 pages
Case 4-Stock - Returns - With - Analysis
No ratings yet
Case 4-Stock - Returns - With - Analysis
42 pages
LAB8_LogisticReg_HeartDisease[1]
No ratings yet
LAB8_LogisticReg_HeartDisease[1]
31 pages
linear_merged_pagenumber
No ratings yet
linear_merged_pagenumber
48 pages
Heart Failure Prediction
100% (1)
Heart Failure Prediction
41 pages
DTREG
No ratings yet
DTREG
395 pages
Caleb University Annual Final Year Research Project Guidelines
No ratings yet
Caleb University Annual Final Year Research Project Guidelines
14 pages
Applied Univariate, Bivariate, and Multivariate Statistics Using Python: A Beginner's Guide to Advanced Data Analysis 1st Edition Daniel J. Denis - Download the full ebook set with all chapters in PDF format
No ratings yet
Applied Univariate, Bivariate, and Multivariate Statistics Using Python: A Beginner's Guide to Advanced Data Analysis 1st Edition Daniel J. Denis - Download the full ebook set with all chapters in PDF format
63 pages
Project Report
No ratings yet
Project Report
17 pages
Topic 3 SRM 1
No ratings yet
Topic 3 SRM 1
61 pages
Module 2 - Intro To Regression Analysis
No ratings yet
Module 2 - Intro To Regression Analysis
29 pages
turing-data-analysis
No ratings yet
turing-data-analysis
30 pages
Pythone code for predicting diabetes using ML
No ratings yet
Pythone code for predicting diabetes using ML
18 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
eda-ml-decision-tree.ipynb - Colab
No ratings yet
eda-ml-decision-tree.ipynb - Colab
20 pages
Gondar
No ratings yet
Gondar
8 pages
Diabetes_Prediction_1704256341
No ratings yet
Diabetes_Prediction_1704256341
17 pages
Lecture 5. Part 1 - Regression Analysis
No ratings yet
Lecture 5. Part 1 - Regression Analysis
28 pages
AML Sessional 1 Students
No ratings yet
AML Sessional 1 Students
16 pages
Business Data Mining Week 8
No ratings yet
Business Data Mining Week 8
24 pages
m3125 Practical 3
No ratings yet
m3125 Practical 3
13 pages
DSBDA 5
No ratings yet
DSBDA 5
12 pages
Brain Stroke Prediction Using ML - Jupyter Notebook
No ratings yet
Brain Stroke Prediction Using ML - Jupyter Notebook
17 pages
LDA Code
No ratings yet
LDA Code
19 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Major project - Colab
No ratings yet
Major project - Colab
15 pages
Hipotesis Parameter Deret Fourier
No ratings yet
Hipotesis Parameter Deret Fourier
13 pages
Survival Analysis Theory 2024-4
No ratings yet
Survival Analysis Theory 2024-4
49 pages
Untitled2.Ipynb - Colab
No ratings yet
Untitled2.Ipynb - Colab
8 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
Assignment 1
No ratings yet
Assignment 1
10 pages
1728086737277
No ratings yet
1728086737277
26 pages
07 Logistics Regression
No ratings yet
07 Logistics Regression
23 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
8.0 Lakeland College
No ratings yet
8.0 Lakeland College
2 pages
Heart Disease App With Code
No ratings yet
Heart Disease App With Code
22 pages
Solutions For Chapter 7
No ratings yet
Solutions For Chapter 7
36 pages
4
No ratings yet
4
6 pages
AI Mini Project
No ratings yet
AI Mini Project
6 pages
Ordered Probit and Logit Models Stata Program and Output PDF
No ratings yet
Ordered Probit and Logit Models Stata Program and Output PDF
7 pages
Diabetes
No ratings yet
Diabetes
7 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
heart_cleveland.ipynb - Colab
No ratings yet
heart_cleveland.ipynb - Colab
5 pages
PO687 Assignment Example: What's in Orange Are Tips From Me
No ratings yet
PO687 Assignment Example: What's in Orange Are Tips From Me
14 pages
Linear Regression: Data Exploration
No ratings yet
Linear Regression: Data Exploration
12 pages
LP Practical ! Jupyter Notebook
No ratings yet
LP Practical ! Jupyter Notebook
6 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
Question 7 - Jupyter Notebook
No ratings yet
Question 7 - Jupyter Notebook
4 pages
ExNo 08ml
No ratings yet
ExNo 08ml
4 pages
Logistic Regression
No ratings yet
Logistic Regression
11 pages
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
No ratings yet
# Load Packages: Pandas Pandas PD PD Numpy Numpy NP NP
17 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Heart Disease Classification ML Assignment - Jupyter Notebook
No ratings yet
Heart Disease Classification ML Assignment - Jupyter Notebook
7 pages
Chincarini, Ludwig and Kim, Daehwan - Another Look at The Information Ratio (2006)
No ratings yet
Chincarini, Ludwig and Kim, Daehwan - Another Look at The Information Ratio (2006)
20 pages
Logistic - Ipynb - Colaboratory
No ratings yet
Logistic - Ipynb - Colaboratory
6 pages
Unit5 - Logistic Regression
No ratings yet
Unit5 - Logistic Regression
4 pages
Ide To 6 Classification Algorithms
No ratings yet
Ide To 6 Classification Algorithms
34 pages
Heart Disease Indicator Prediction Model
No ratings yet
Heart Disease Indicator Prediction Model
17 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Logistic Regression 205
No ratings yet
Logistic Regression 205
8 pages
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
No ratings yet
Week - 6 - SWI - MLP - LogisticRegression - Ipynb - Colaboratory
15 pages
Diabetes Prediction System
No ratings yet
Diabetes Prediction System
4 pages
Mpam Curriculum - Ready
No ratings yet
Mpam Curriculum - Ready
88 pages
B58_ Handling Missing Values,Feature_Selection (1)
No ratings yet
B58_ Handling Missing Values,Feature_Selection (1)
4 pages
5
No ratings yet
5
5 pages
Chapter 4-Correlation and Regresssion
No ratings yet
Chapter 4-Correlation and Regresssion
60 pages
Student Notebook HR Analysis
No ratings yet
Student Notebook HR Analysis
11 pages
baseline.ipynb - Colab
No ratings yet
baseline.ipynb - Colab
5 pages
Performance of Predictive Models To Determine Weld Bead Shape - 2022 - Marine ST
No ratings yet
Performance of Predictive Models To Determine Weld Bead Shape - 2022 - Marine ST
13 pages
Inconel Machining
No ratings yet
Inconel Machining
7 pages
Logistic Regression in Machine Learning
No ratings yet
Logistic Regression in Machine Learning
11 pages
Mms c.pdf4
No ratings yet
Mms c.pdf4
11 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
MBA Course Structure 2012-13 Onwards
No ratings yet
MBA Course Structure 2012-13 Onwards
90 pages
Zhang 2021 J. Phys. Conf. Ser. 1769 012024
No ratings yet
Zhang 2021 J. Phys. Conf. Ser. 1769 012024
6 pages
Google Analytics Customer Revenue Prediction PDF
No ratings yet
Google Analytics Customer Revenue Prediction PDF
14 pages
Hare Krishna
No ratings yet
Hare Krishna
1 page
utf-8''C2M1 Assignment
No ratings yet
utf-8''C2M1 Assignment
24 pages
Bio-Signal Analysis For Smoking
No ratings yet
Bio-Signal Analysis For Smoking
1 page
2022-CSSGB-BoK Six Sigma
No ratings yet
2022-CSSGB-BoK Six Sigma
7 pages
Final Report Textile Internship1
No ratings yet
Final Report Textile Internship1
46 pages
Random Forest - US - Heart - Patients - Class
100% (1)
Random Forest - US - Heart - Patients - Class
24 pages
Estimating Spatial Relationships Between Land Use/Land Cover Change and Sediment Transport in The Asejire Reservoir Catchment Area, SouthWest Nigeria
No ratings yet
Estimating Spatial Relationships Between Land Use/Land Cover Change and Sediment Transport in The Asejire Reservoir Catchment Area, SouthWest Nigeria
187 pages
Predictive Modelling - Logistic Regression - Mentor Version-1 - Jupyter Notebook
No ratings yet
Predictive Modelling - Logistic Regression - Mentor Version-1 - Jupyter Notebook
22 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Logistic Regression

Uploaded by

Logistic Regression

Uploaded by

6/1/23, 10:31 PM logistic_regression

Here Some Important Questions with Answer on logistic Regression

What is logistic regression, and when is it used?

How do you evaluate the performance of a logistic regression model?

Here is some commonly used evaluation methods for logistic regression

Import Ncessary Library

Out[ ]: male age education currentSmoker cigsPerDay BPMeds prevalentStroke prevalentHyp

0 1 39 4.0 0 0.0 0.0 0 0

1 0 46 2.0 0 0.0 0.0 0 0

2 1 48 1.0 1 20.0 0.0 0 0

3 0 61 3.0 1 30.0 0.0 0 1

4 0 46 3.0 1 23.0 0.0 0 0

Out[ ]: (4238, 16)

In [ ]: #view null value

In [ ]: #view descriptive statics

Out[ ]: male age education currentSmoker cigsPerDay BPMeds preval

count 4238.000000 4238.000000 4133.000000 4238.000000 4209.000000 4185.000000 42

mean 0.429212 49.584946 1.978950 0.494101 9.003089 0.029630

std 0.495022 8.572160 1.019791 0.500024 11.920094 0.169584

min 0.000000 32.000000 1.000000 0.000000 0.000000 0.000000

25% 0.000000 42.000000 1.000000 0.000000 0.000000 0.000000

50% 0.000000 49.000000 2.000000 0.000000 0.000000 0.000000

75% 1.000000 56.000000 3.000000 1.000000 20.000000 0.000000

max 1.000000 70.000000 4.000000 1.000000 70.000000 1.000000

In [ ]: #check duplicate rows

In [ ]: import matplotlib.pyplot as plt

In [ ]: for feature in num_feat:

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/sta

In [ ]: from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_st

In [ ]: from sklearn.linear_model import LogisticRegression

In [ ]: score = lr.score(x_train, y_train)

In [ ]: from sklearn.metrics import confusion_matrix

Out[ ]: array([[1080, 4],

In [ ]: from sklearn.metrics import roc_curve, auc

In [ ]: roc_auc = auc(fpr, tpr)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.