Supervised Learning With Scikit-Learn: Preprocessing Data
Supervised Learning With Scikit-Learn: Preprocessing Data
Preprocessing data
Supervised Learning with scikit-learn
Dummy variables
Supervised Learning with scikit-learn
Automobile dataset
● mpg: Target Variable
● Origin: Categorical Feature
Supervised Learning with scikit-learn
In [4]: print(df_origin.head())
mpg displ hp weight accel size origin_Asia origin_Europe \
0 18.0 250.0 88 3139 14.5 15.0 0 0
1 9.0 304.0 193 4732 18.5 20.0 0 0
2 36.1 91.0 60 1800 16.4 10.0 1 0
3 18.5 250.0 98 3525 19.0 15.0 0 0
4 34.3 97.0 78 2188 15.8 10.0 0 1
origin_US
0 1
1 1
2 0
3 1
4 0
Supervised Learning with scikit-learn
In [6]: print(df_origin.head())
mpg displ hp weight accel size origin_Europe origin_US
0 18.0 250.0 88 3139 14.5 15.0 0 1
1 9.0 304.0 193 4732 18.5 20.0 0 1
2 36.1 91.0 60 1800 16.4 10.0 0 0
3 18.5 250.0 98 3525 19.0 15.0 0 1
4 34.3 97.0 78 2188 15.8 10.0 1 0
Supervised Learning with scikit-learn
Let’s practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN
Handling missing
data
Supervised Learning with scikit-learn
In [2]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
pregnancies 768 non-null int64
glucose 768 non-null int64
diastolic 768 non-null int64
triceps 768 non-null int64
insulin 768 non-null int64
bmi 768 non-null float64
dpf 768 non-null float64
age 768 non-null int64
diabetes 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
None
Supervised Learning with scikit-learn
diabetes
0 1
1 0
2 1
3 0
4 1
Supervised Learning with scikit-learn
In [11]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
pregnancies 768 non-null int64
glucose 768 non-null int64
diastolic 768 non-null int64
triceps 541 non-null float64
insulin 394 non-null float64
bmi 757 non-null float64
dpf 768 non-null float64
age 768 non-null int64
diabetes 768 non-null int64
dtypes: float64(4), int64(5)
memory usage: 54.1 KB
Supervised Learning with scikit-learn
In [13]: df.shape
Out[13]: (393, 9)
Supervised Learning with scikit-learn
In [3]: imp.fit(X)
In [4]: X = imp.transform(X)
Supervised Learning with scikit-learn
Let’s practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN
Centering and
scaling
Supervised Learning with scikit-learn
Scaling in scikit-learn
In [2]: from sklearn.preprocessing import scale
Scaling in a pipeline
In [6]: from sklearn.preprocessing import StandardScaler
Let’s practice!
SUPERVISED LEARNING WITH SCIKIT-LEARN
Final thoughts
Supervised Learning with scikit-learn
Congratulations!