0% found this document useful (0 votes)

3 views3 pages

Loan ML Complete Guide

This document is an end-to-end cookbook for machine learning focused on loan data, detailing steps from data loading to model training and evaluation. It covers various preprocessing techniques such as missing value imputation, scaling, encoding, feature engineering, and model selection. Additionally, it includes snippets for implementing common machine learning models using Python libraries like scikit-learn and XGBoost.

Uploaded by

vijay575886

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views3 pages

Loan ML Complete Guide

Uploaded by

vijay575886

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

LOAN / TABULAR ML – END■TO■END COOKBOOK

======================================

SECTION 1 · DATA LOADING FROM S3

----------------------------------
import boto3, pandas as pd
from io import BytesIO, StringIO

bucket = "my-data-bucket"
key = "loan/loan_data.csv"

s3 = boto3.client("s3")
obj = s3.get_object(Bucket=bucket, Key=key)
df = pd.read_csv(StringIO(obj["Body"].read().decode("utf-8")))
# For Parquet:
# df = pd.read_parquet(BytesIO(obj["Body"].read()))

------------------------------------------------------------

SECTION 2 · MISSING■VALUE IMPUTATION

--------------------------------------
from sklearn.impute import SimpleImputer
num_imputer = SimpleImputer(strategy="median")
cat_imputer = SimpleImputer(strategy="most_frequent")
df[num_cols] = num_imputer.fit_transform(df[num_cols])
df[cat_cols] = cat_imputer.fit_transform(df[cat_cols])

------------------------------------------------------------

SECTION 3 · NUMERIC SCALING & TRANSFORMS

------------------------------------------
from sklearn.preprocessing import StandardScaler, MinMaxScaler
df[num_cols] = StandardScaler().fit_transform(df[num_cols])
# or MinMaxScaler / PowerTransformer

------------------------------------------------------------

SECTION 4 · CATEGORICAL ENCODING

----------------------------------
# 4.1 One■Hot
df = pd.get_dummies(df, columns=cat_cols, dtype=int)

# 4.2 Ordinal / Label

from sklearn.preprocessing import OrdinalEncoder
df[cat_cols] = OrdinalEncoder(handle_unknown="use_encoded_value",
unknown_value=-1).fit_transform(df[cat_cols])

# 4.3 Target / Mean Encoding

import category_encoders as ce
enc = ce.TargetEncoder(cols=cat_cols)
df[cat_cols] = enc.fit_transform(df[cat_cols], df["target"])

------------------------------------------------------------

SECTION 5 · DATE■TIME FEATURES

--------------------------------
df["issue_d"] = pd.to_datetime(df["issue_d"])
df["issue_year"] = df["issue_d"].dt.year
df["issue_qtr"] = df["issue_d"].dt.quarter
df["issue_wkday"]= df["issue_d"].dt.weekday

------------------------------------------------------------

SECTION 6 · TEXT VECTORS (TF■IDF)

-----------------------------------
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(max_features=10_000, ngram_range=(1,2),
stop_words="english")
X_text = tfidf.fit_transform(df["review_text"])

------------------------------------------------------------
SECTION 7 · POLYNOMIAL / INTERACTIONS
---------------------------------------
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=2, include_bias=False)
df_poly = pd.DataFrame(poly.fit_transform(df[num_cols]),
columns=poly.get_feature_names_out(num_cols))

------------------------------------------------------------

SECTION 8 · BINNING CONTINUOUS VARS

-------------------------------------
df["income_band"] = pd.cut(df["annual_inc"],
bins=[0,40_000,80_000,120_000, float("inf")],
labels=["Low","Mid","High","VeryHigh"])

------------------------------------------------------------

SECTION 9 · CLASS BALANCING (UPSAMPLING)

------------------------------------------
from sklearn.utils import resample
maj = df[df.target==0]; min = df[df.target==1]
min_up = resample(min, replace=True, n_samples=len(maj), random_state=42)
df_bal = pd.concat([maj, min_up]).sample(frac=1, random_state=42).reset_index(drop=True)

------------------------------------------------------------

SECTION 10 · FEATURE SELECTION

-------------------------------
from sklearn.feature_selection import VarianceThreshold, SelectKBest, mutual_info_classif
vt = VarianceThreshold(threshold=0.01)
X_vt = vt.fit_transform(df.drop(columns="target"))
skb = SelectKBest(mutual_info_classif, k=30)
X_k = skb.fit_transform(df.drop(columns="target"), df["target"])

------------------------------------------------------------

SECTION 11 · END■TO■END PIPELINE + EXPORT

-----------------------------------------
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
import joblib

num_pipe = Pipeline([("impute", SimpleImputer(strategy="median")),

("scale", StandardScaler())])

cat_pipe = Pipeline([("impute", SimpleImputer(strategy="most_frequent")),

("onehot", ce.OneHotEncoder(use_cat_names=True,
handle_unknown="ignore"))])

prep = ColumnTransformer([("num", num_pipe, num_cols),

("cat", cat_pipe, cat_cols)])

rf_model = Pipeline([("prep", prep),

("clf", RandomForestClassifier(random_state=42))])

rf_model.fit(X_train, y_train)
joblib.dump(rf_model, "rf_full_pipeline.pkl")

------------------------------------------------------------

SECTION 12 · TIME■SERIES FEATURE ENGINEERING

---------------------------------------------
for lag in [1,7,30]: df[f"y_lag_{lag}"] = df["y"].shift(lag)
df["y_roll_mean_7"] = df["y"].rolling(7).mean()
df["y_ema_0_9"] = df["y"].ewm(alpha=0.9).mean()
df_ts = df.dropna()

------------------------------------------------------------

SECTION 13 · DIMENSIONALITY REDUCTION

--------------------------------------
# PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=0.95).fit_transform(df[num_cols])
# UMAP
import umap
X_umap = umap.UMAP(random_state=42).fit_transform(df[num_cols])
# Truncated SVD (sparse)
from sklearn.decomposition import TruncatedSVD
X_svd = TruncatedSVD(n_components=300, random_state=42).fit_transform(X_text)

------------------------------------------------------------

SECTION 14 · HYPER■PARAMETER OPTIMISATION

------------------------------------------
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from skopt import BayesSearchCV

# GridSearch example for RF omitted here for brevity

------------------------------------------------------------

SECTION 15 · QUICK■START SNIPPETS: COMMON ML MODELS

----------------------------------------------------
# 1 Logistic Regression
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression(max_iter=1000).fit(X_train, y_train)

# 2 Decision Tree
from sklearn.tree import DecisionTreeClassifier
dt = DecisionTreeClassifier(random_state=42).fit(X_train, y_train)

# 3 Random Forest
from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier(n_estimators=500, random_state=42).fit(X_train, y_train)

# 4 Gradient Boosting
from sklearn.ensemble import GradientBoostingClassifier
gb = GradientBoostingClassifier().fit(X_train, y_train)

# 5 XGBoost
import xgboost as xgb
xgb_clf = xgb.XGBClassifier(random_state=42).fit(X_train, y_train)

# 6 LightGBM
import lightgbm as lgb
lgb_clf = lgb.LGBMClassifier(random_state=42).fit(X_train, y_train)

# 7 Support Vector Machine

from sklearn.svm import SVC
svc = SVC(kernel="rbf", probability=True).fit(X_train, y_train)

# 8 k-Nearest Neighbours
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=15).fit(X_train, y_train)

# 9 Naive Bayes
from sklearn.naive_bayes import GaussianNB
nb = GaussianNB().fit(X_train, y_train)

#10 Multi-layer Perceptron

from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(128,64), max_iter=500,
random_state=42).fit(X_train, y_train)

------------------------------------------------------------

END OF COOKBOOK

ML Assignment
No ratings yet
ML Assignment
34 pages
Model Evaluation and Selection Cheatsheet 1708023215
No ratings yet
Model Evaluation and Selection Cheatsheet 1708023215
7 pages
ML Book Notes
No ratings yet
ML Book Notes
9 pages
Student Abandonment Classification in Brazil
No ratings yet
Student Abandonment Classification in Brazil
59 pages
ML
No ratings yet
ML
23 pages
DA Assignment
No ratings yet
DA Assignment
18 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
22 pages
ML All Projectpdf Removed
No ratings yet
ML All Projectpdf Removed
41 pages
Credit Scores Classification
No ratings yet
Credit Scores Classification
104 pages
Warpper Method
No ratings yet
Warpper Method
8 pages
TP - Ipynb - Colab
No ratings yet
TP - Ipynb - Colab
6 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Ml-Exp-3 - Jupyter Notebook
No ratings yet
Ml-Exp-3 - Jupyter Notebook
6 pages
ML Minimized Programs
No ratings yet
ML Minimized Programs
9 pages
COMPARISON - Jupyter Notebook
No ratings yet
COMPARISON - Jupyter Notebook
5 pages
Codigo Modelo
No ratings yet
Codigo Modelo
5 pages
Untitled 57
No ratings yet
Untitled 57
4 pages
ML Fat
No ratings yet
ML Fat
9 pages
Multi Classification - Py (For 1 Class TP, TN, FP, FN)
No ratings yet
Multi Classification - Py (For 1 Class TP, TN, FP, FN)
25 pages
ML Manual
No ratings yet
ML Manual
9 pages
AI
No ratings yet
AI
16 pages
Machine Learning Practice
No ratings yet
Machine Learning Practice
17 pages
Code
No ratings yet
Code
6 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
Car Mock - ML Ans
No ratings yet
Car Mock - ML Ans
6 pages
Ensembles Models and Decision Tree
No ratings yet
Ensembles Models and Decision Tree
21 pages
Regression Analysis - Cheatsheet
No ratings yet
Regression Analysis - Cheatsheet
9 pages
Da 012307
No ratings yet
Da 012307
8 pages
ML Codes
No ratings yet
ML Codes
9 pages
PNP Cybercrime Strategy
No ratings yet
PNP Cybercrime Strategy
13 pages
Machine Learning Lab Assignment 1
No ratings yet
Machine Learning Lab Assignment 1
23 pages
Iii Aid - ML
No ratings yet
Iii Aid - ML
30 pages
ML Functions
No ratings yet
ML Functions
12 pages
23BCE7199 ML Lab Assignment
No ratings yet
23BCE7199 ML Lab Assignment
15 pages
Slip
No ratings yet
Slip
5 pages
CP4252 Lab Manual
No ratings yet
CP4252 Lab Manual
13 pages
Models
No ratings yet
Models
2 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Machine Learning Model Building
No ratings yet
Machine Learning Model Building
6 pages
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
Aiml Practicals
No ratings yet
Aiml Practicals
22 pages
MlLabManualdocx 2024 09 04 22 02 58
No ratings yet
MlLabManualdocx 2024 09 04 22 02 58
19 pages
ML Journal External
No ratings yet
ML Journal External
14 pages
LGB Regressor
No ratings yet
LGB Regressor
3 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
1
No ratings yet
1
13 pages
Practicalpgm ML
No ratings yet
Practicalpgm ML
33 pages
Project 2
No ratings yet
Project 2
5 pages
Soft Sensor Code
No ratings yet
Soft Sensor Code
4 pages
Soft Sensor Code
No ratings yet
Soft Sensor Code
4 pages
Articles Xgboost Classification With Smote-Enn Algorithm
No ratings yet
Articles Xgboost Classification With Smote-Enn Algorithm
11 pages
23BCE7092 ML Lab Assignment
No ratings yet
23BCE7092 ML Lab Assignment
14 pages
Shobit Sharma (2124399) ML Lab File PDF
No ratings yet
Shobit Sharma (2124399) ML Lab File PDF
19 pages
Train
No ratings yet
Train
17 pages
ML Practical 205160694034
No ratings yet
ML Practical 205160694034
33 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
QB 1
No ratings yet
QB 1
11 pages
Import Numpy As NP Import Pandas As PD
No ratings yet
Import Numpy As NP Import Pandas As PD
7 pages
5G Network Design With HTZ
No ratings yet
5G Network Design With HTZ
4 pages
HRSD ServiceNow Resume Sample
No ratings yet
HRSD ServiceNow Resume Sample
2 pages
DNV-CG-0004 2021-07
No ratings yet
DNV-CG-0004 2021-07
44 pages
RSTi EP Datasheet
No ratings yet
RSTi EP Datasheet
2 pages
20.2 - Engineering User Guide 2of3
No ratings yet
20.2 - Engineering User Guide 2of3
331 pages
The Inmate of The Dungeon1894 by Morrow, W. C.
No ratings yet
The Inmate of The Dungeon1894 by Morrow, W. C.
16 pages
EDPM - Document Management - File Retention and Versioning
100% (1)
EDPM - Document Management - File Retention and Versioning
13 pages
Test Plan Management
No ratings yet
Test Plan Management
80 pages
KST LaserTech 31 en
No ratings yet
KST LaserTech 31 en
109 pages
Gistfile 1
No ratings yet
Gistfile 1
8 pages
ARDU-5351 Manual English
No ratings yet
ARDU-5351 Manual English
11 pages
Riscv Card
No ratings yet
Riscv Card
5 pages
Gmail - Booking Confirmation On IRCTC, Train - 03401, 25-Sep-2021, CC, BGP - PNBE
No ratings yet
Gmail - Booking Confirmation On IRCTC, Train - 03401, 25-Sep-2021, CC, BGP - PNBE
1 page
Inside Questions of Digital Document
No ratings yet
Inside Questions of Digital Document
13 pages
Working With Categorical Data Chapter4
No ratings yet
Working With Categorical Data Chapter4
33 pages
DBT Presentation
No ratings yet
DBT Presentation
12 pages
C++ Programming: From Problem Analysis To Program Design: Chapter 1: An Overview of Computers and Programming Languages
No ratings yet
C++ Programming: From Problem Analysis To Program Design: Chapter 1: An Overview of Computers and Programming Languages
55 pages
Visual Hflcal - A Software Tool
No ratings yet
Visual Hflcal - A Software Tool
8 pages
Tinder Questionnaire
No ratings yet
Tinder Questionnaire
4 pages
Battery Charger1
No ratings yet
Battery Charger1
25 pages
A Review of Methods, Techniques and Tools For Project Planning and Control
No ratings yet
A Review of Methods, Techniques and Tools For Project Planning and Control
20 pages
Employees Registration in HH - User Manual
No ratings yet
Employees Registration in HH - User Manual
8 pages
LMS (LLD)
No ratings yet
LMS (LLD)
5 pages
Computer Organization: 1st Sem 2018-2019 1
No ratings yet
Computer Organization: 1st Sem 2018-2019 1
13 pages
Digital Transformation For 18 Hydroelectric Power Plants
No ratings yet
Digital Transformation For 18 Hydroelectric Power Plants
2 pages
Crypto - Lab - 8.ipynb - Colab
No ratings yet
Crypto - Lab - 8.ipynb - Colab
2 pages
Extract Paragraphs From PDF
No ratings yet
Extract Paragraphs From PDF
2 pages
Quiz - Exam Questions Technical Support Fundamentals
No ratings yet
Quiz - Exam Questions Technical Support Fundamentals
2 pages
15ME663
No ratings yet
15ME663
2 pages
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Loan ML Complete Guide

Uploaded by

Loan ML Complete Guide

Uploaded by

LOAN / TABULAR ML – END■TO■END COOKBOOK

SECTION 1 · DATA LOADING FROM S3

SECTION 2 · MISSING■VALUE IMPUTATION

SECTION 3 · NUMERIC SCALING & TRANSFORMS

SECTION 4 · CATEGORICAL ENCODING

# 4.2 Ordinal / Label

# 4.3 Target / Mean Encoding

SECTION 5 · DATE■TIME FEATURES

SECTION 6 · TEXT VECTORS (TF■IDF)

SECTION 8 · BINNING CONTINUOUS VARS

SECTION 9 · CLASS BALANCING (UPSAMPLING)

SECTION 10 · FEATURE SELECTION

SECTION 11 · END■TO■END PIPELINE + EXPORT

num_pipe = Pipeline([("impute", SimpleImputer(strategy="median")),

cat_pipe = Pipeline([("impute", SimpleImputer(strategy="most_frequent")),

prep = ColumnTransformer([("num", num_pipe, num_cols),

rf_model = Pipeline([("prep", prep),

SECTION 12 · TIME■SERIES FEATURE ENGINEERING

SECTION 13 · DIMENSIONALITY REDUCTION

SECTION 14 · HYPER■PARAMETER OPTIMISATION

# GridSearch example for RF omitted here for brevity

SECTION 15 · QUICK■START SNIPPETS: COMMON ML MODELS

# 7 Support Vector Machine

#10 Multi-layer Perceptron

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.