0% found this document useful (0 votes)
19 views28 pages

Logistic Regression

This document discusses logistic regression, including what it is, when it is used, how the logistic function relates to logistic regression, and how to evaluate logistic regression models. It also includes code to load and explore a sample dataset.

Uploaded by

dgdangelodg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views28 pages

Logistic Regression

This document discusses logistic regression, including what it is, when it is used, how the logistic function relates to logistic regression, and how to evaluate logistic regression models. It also includes code to load and explore a sample dataset.

Uploaded by

dgdangelodg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

6/1/23, 10:31 PM logistic_regression

Here Some Important Questions with Answer on logistic Regression

What is logistic regression, and when is it used?

logistic regression is also part of regression but the main difference between logistics
regression and other regression problem is ,logistics regression work on categorical
problems like male and female , true and false ,yes or no

When is it used ?

1. Predictive modelling
2. Medical Research
3. Credit Scoring
4. Market and customer analystics

What is the logistic function (also known as the sigmoid function), and why is it used in
logistic regression?

The logistics function, also known as sigmoid function, it is a mathematical function that
maps any real value number to a value between 0 and 1 , it is s-shaped cured and is
represented by formula σ(z) = 1 / (1 + e^(-z))

where σ(z) represents the output (probability) and z represents the input to the function.

How do you evaluate the performance of a logistic regression model?

Here is some commonly used evaluation methods for logistic regression

1. confusion matrix
2. Accuracy
3. Precision
4. Recall
5. F1 Score
6. ROC Curve

Import Ncessary Library

In [ ]: import numpy as np
import pandas as pd

load dataset

In [ ]: df = pd.read_csv('ft.csv')
df.head()

file:///C:/Users/rinki/Downloads/logistic_regression.html 1/28
6/1/23, 10:31 PM logistic_regression

Out[ ]: male age education currentSmoker cigsPerDay BPMeds prevalentStroke prevalentHyp

0 1 39 4.0 0 0.0 0.0 0 0

1 0 46 2.0 0 0.0 0.0 0 0

2 1 48 1.0 1 20.0 0.0 0 0

3 0 61 3.0 1 30.0 0.0 0 1

4 0 46 3.0 1 23.0 0.0 0 0

Perform EDA

In [ ]: df.shape

Out[ ]: (4238, 16)

In [ ]: #view null value


df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4238 entries, 0 to 4237
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 male 4238 non-null int64
1 age 4238 non-null int64
2 education 4133 non-null float64
3 currentSmoker 4238 non-null int64
4 cigsPerDay 4209 non-null float64
5 BPMeds 4185 non-null float64
6 prevalentStroke 4238 non-null int64
7 prevalentHyp 4238 non-null int64
8 diabetes 4238 non-null int64
9 totChol 4188 non-null float64
10 sysBP 4238 non-null float64
11 diaBP 4238 non-null float64
12 BMI 4219 non-null float64
13 heartRate 4237 non-null float64
14 glucose 3850 non-null float64
15 TenYearCHD 4238 non-null int64
dtypes: float64(9), int64(7)
memory usage: 529.9 KB

In [ ]: #view descriptive statics


df.describe()

file:///C:/Users/rinki/Downloads/logistic_regression.html 2/28
6/1/23, 10:31 PM logistic_regression

Out[ ]: male age education currentSmoker cigsPerDay BPMeds preval

count 4238.000000 4238.000000 4133.000000 4238.000000 4209.000000 4185.000000 42

mean 0.429212 49.584946 1.978950 0.494101 9.003089 0.029630

std 0.495022 8.572160 1.019791 0.500024 11.920094 0.169584

min 0.000000 32.000000 1.000000 0.000000 0.000000 0.000000

25% 0.000000 42.000000 1.000000 0.000000 0.000000 0.000000

50% 0.000000 49.000000 2.000000 0.000000 0.000000 0.000000

75% 1.000000 56.000000 3.000000 1.000000 20.000000 0.000000

max 1.000000 70.000000 4.000000 1.000000 70.000000 1.000000

In [ ]: #check duplicate rows


duplicate_rows = df.duplicated()
#count the number of True values
num_dup_rows = duplicate_rows.sum()
num_dup_rows

Out[ ]: 0

In [ ]: import matplotlib.pyplot as plt


import seaborn as sns

num_feat = ["male","age","education","currentSmoker","cigsPerDay","BPMeds","prev
for feature in num_feat:
plt.figure(figsize =(7,7) )
sns.histplot(df[feature],kde = True)
plt.title(f"Histogram of {feature}")

file:///C:/Users/rinki/Downloads/logistic_regression.html 3/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 4/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 5/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 6/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 7/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 8/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 9/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 10/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 11/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 12/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 13/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 14/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 15/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 16/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 17/28
6/1/23, 10:31 PM logistic_regression

In [ ]: num_feat = ["male","age","education","currentSmoker","cigsPerDay","BPMeds","prev
sns.pairplot(df[num_feat])
plt.show()

file:///C:/Users/rinki/Downloads/logistic_regression.html 18/28
6/1/23, 10:31 PM logistic_regression

In [ ]: for feature in num_feat:


plt.figure(figsize=(6,4))
sns.boxplot(x=df[feature])
plt.title(f'boxplot of {feature}')
plt.show()

file:///C:/Users/rinki/Downloads/logistic_regression.html 19/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 20/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 21/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 22/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 23/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 24/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 25/28
6/1/23, 10:31 PM logistic_regression

file:///C:/Users/rinki/Downloads/logistic_regression.html 26/28
6/1/23, 10:31 PM logistic_regression

In [ ]: X = df[['age','prevalentHyp','sysBP','diaBP','glucose']]
y = df['TenYearCHD']

In [ ]: X.isnull().sum()

Out[ ]: age 0
prevalentHyp 0
sysBP 0
diaBP 0
glucose 388
dtype: int64

In [ ]: X['glucose'] = X['glucose'].fillna(value=df['glucose'].mean())

<ipython-input-19-32a7772c3ba4>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/sta


ble/user_guide/indexing.html#returning-a-view-versus-a-copy
X['glucose'] = X['glucose'].fillna(value=df['glucose'].mean())

In [ ]: from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X,y,test_size=0.3, random_st

In [ ]: from sklearn.linear_model import LogisticRegression


lr = LogisticRegression()
lr.fit(x_train, y_train)

Out[ ]: ▾ LogisticRegression

LogisticRegression()

file:///C:/Users/rinki/Downloads/logistic_regression.html 27/28
6/1/23, 10:31 PM logistic_regression

In [ ]: score = lr.score(x_train, y_train)


score

Out[ ]: 0.8486176668914363

In [ ]: from sklearn.metrics import confusion_matrix


y_pred = lr.predict(x_test)
y_true = y_test
confusion_matrix(y_true, y_pred)

Out[ ]: array([[1080, 4],


[ 182, 6]])

In [ ]: score = np.array(score).reshape(-1, 1)

In [ ]: from sklearn.metrics import roc_curve, auc


fpr, tpr, thresholds = roc_curve(y_true, y_pred)

In [ ]: roc_auc = auc(fpr, tpr)

In [ ]: plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' %
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()

file:///C:/Users/rinki/Downloads/logistic_regression.html 28/28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy