Logistic Regression
Logistic Regression
logistic regression is also part of regression but the main difference between logistics
regression and other regression problem is ,logistics regression work on categorical
problems like male and female , true and false ,yes or no
When is it used ?
1. Predictive modelling
2. Medical Research
3. Credit Scoring
4. Market and customer analystics
What is the logistic function (also known as the sigmoid function), and why is it used in
logistic regression?
The logistics function, also known as sigmoid function, it is a mathematical function that
maps any real value number to a value between 0 and 1 , it is s-shaped cured and is
represented by formula σ(z) = 1 / (1 + e^(-z))
where σ(z) represents the output (probability) and z represents the input to the function.
1. confusion matrix
2. Accuracy
3. Precision
4. Recall
5. F1 Score
6. ROC Curve
In [ ]: import numpy as np
import pandas as pd
load dataset
In [ ]: df = pd.read_csv('ft.csv')
df.head()
file:///C:/Users/rinki/Downloads/logistic_regression.html 1/28
6/1/23, 10:31 PM logistic_regression
Perform EDA
In [ ]: df.shape
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4238 entries, 0 to 4237
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 male 4238 non-null int64
1 age 4238 non-null int64
2 education 4133 non-null float64
3 currentSmoker 4238 non-null int64
4 cigsPerDay 4209 non-null float64
5 BPMeds 4185 non-null float64
6 prevalentStroke 4238 non-null int64
7 prevalentHyp 4238 non-null int64
8 diabetes 4238 non-null int64
9 totChol 4188 non-null float64
10 sysBP 4238 non-null float64
11 diaBP 4238 non-null float64
12 BMI 4219 non-null float64
13 heartRate 4237 non-null float64
14 glucose 3850 non-null float64
15 TenYearCHD 4238 non-null int64
dtypes: float64(9), int64(7)
memory usage: 529.9 KB
file:///C:/Users/rinki/Downloads/logistic_regression.html 2/28
6/1/23, 10:31 PM logistic_regression
Out[ ]: 0
num_feat = ["male","age","education","currentSmoker","cigsPerDay","BPMeds","prev
for feature in num_feat:
plt.figure(figsize =(7,7) )
sns.histplot(df[feature],kde = True)
plt.title(f"Histogram of {feature}")
file:///C:/Users/rinki/Downloads/logistic_regression.html 3/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 4/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 5/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 6/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 7/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 8/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 9/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 10/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 11/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 12/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 13/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 14/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 15/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 16/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 17/28
6/1/23, 10:31 PM logistic_regression
In [ ]: num_feat = ["male","age","education","currentSmoker","cigsPerDay","BPMeds","prev
sns.pairplot(df[num_feat])
plt.show()
file:///C:/Users/rinki/Downloads/logistic_regression.html 18/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 19/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 20/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 21/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 22/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 23/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 24/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 25/28
6/1/23, 10:31 PM logistic_regression
file:///C:/Users/rinki/Downloads/logistic_regression.html 26/28
6/1/23, 10:31 PM logistic_regression
In [ ]: X = df[['age','prevalentHyp','sysBP','diaBP','glucose']]
y = df['TenYearCHD']
In [ ]: X.isnull().sum()
Out[ ]: age 0
prevalentHyp 0
sysBP 0
diaBP 0
glucose 388
dtype: int64
In [ ]: X['glucose'] = X['glucose'].fillna(value=df['glucose'].mean())
<ipython-input-19-32a7772c3ba4>:1: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
Out[ ]: ▾ LogisticRegression
LogisticRegression()
file:///C:/Users/rinki/Downloads/logistic_regression.html 27/28
6/1/23, 10:31 PM logistic_regression
Out[ ]: 0.8486176668914363
In [ ]: score = np.array(score).reshape(-1, 1)
In [ ]: plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' %
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic')
plt.legend(loc="lower right")
plt.show()
file:///C:/Users/rinki/Downloads/logistic_regression.html 28/28