0% found this document useful (0 votes)

25 views3 pages

Ee 708 Report

This project report presents a hybrid framework for predicting company bankruptcy using a combination of Gaussian Naive Bayes (GNB) and Deep Neural Network (DNN) models. The approach includes extensive exploratory data analysis, data preprocessing techniques like SMOTE for class imbalance, and feature selection through ANOVA, achieving a test accuracy of 97.23% and an F1-score of 0.51. The results demonstrate the effectiveness of the ensemble model in accurately predicting bankruptcy despite significant class imbalance.

Uploaded by

csk.312.13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views3 pages

Ee 708 Report

Uploaded by

csk.312.13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

EE708 Course Project Report, Indian Institute of Technology Kanpur

Company Bankruptcy Prediction:

A Hybrid Framework Leveraging Combined Probabilities from
Gaussian and Neural Networks
Ch V Sai Koushik Chilamakuri Kundan Sai Challa Kethan
230312 230330 230317
skoushik23@iitk.ac.in ckundans23@iitk.ac.in kethanc23@iitk.ac.in

Srijani Gadupudi Macha Mohana Harika

231033 230612
srijanig23@iitk.ac.in mharika23@iitk.ac.in

number of features from 86 to 62. A detailed correlation

I. INTRODUCTION heatmap (Fig. 1) illustrates the relationships among retained
features.
Predicting a company's bankruptcy has become a critical
task in today's uncertain economic conditions. This project
aims to build a robust machine learning model that can
predict bankruptcy by exploring a range of factors which
indicate the financial status of the company.
This report is structured as follows: we begin with
Exploratory Data Analysis (EDA) to examine the dataset and
uncover key patterns and correlations; next, we discuss Data
Preprocessing, detailing how the data is cleaned, normalized,
and balanced; then, we describe the Model Architecture,
outlining the machine learning models used for bankruptcy
prediction and the rationale behind their design; and finally,
we evaluate the models using Performance Metrics such as
accuracy, precision, recall, and F1 score to assess their
practical effectiveness.
II. EXPLORATORY DATA ANALYSIS (EDA) Figure 1: Feature Correlation Heatmap of 62 features
III. DATA PREPROCESSING
Exploratory Data Analysis (EDA) and feature
engineering involves several key steps to understand and
preprocess the dataset effectively. The primary goal was to A. Analysis of Variance (ANOVA):
understand the data's characteristics, identify key features,
Analysis of Variance (ANOVA) was then used to select
and prepare the dataset for model training.
the most relevant features for predicting bankruptcy.
The dataset used for bankruptcy prediction comprises ANOVA F-scores were calculated for each feature to assess
5,455 rows, each corresponding to a company, and 96 its significance.
financial features. The target variable distribution revealed a Variance between groups
significant class imbalance, with 5,301 non-bankrupt F score =
Variance within groups
companies (97.2%) and 154 bankrupt companies (2.8%).
This imbalance was considered in subsequent model k

development. Variance between groups = Ni(μi − μ)

i=1
A preliminary analysis extreme values were identified in k
several numerical features, potentially indicating data errors. Variance within groups = σi
Features with over 800 erroneous entries were discarded, i=1
while features with fewer than 200 errors underwent median
imputation. A high F-score signifies substantial variation in feature
values across target variable classes. The SelectKBest
To mitigate multicollinearity and enhance model method, utilizing the f_classif scoring function, was
interpretability, a feature correlation analysis was conducted. employed to identify the top 30 features. These selected
Features with a Pearson correlation coefficient exceeding 0.9 features ensured that the model concentrated on those
were examined, and the one with the weaker correlation to exhibiting the most significant differences in behavior
the target variable was removed. This process reduced the between the two classes.

Page 1
B. Oversampling using SMOTE: into the ensemble, the model effectively leveraged
The dataset exhibited a significant class imbalance, with probabilistic classification, improving the F1-score to 0.51
5,301 non-bankrupt (97.2%) and 154 bankrupt (2.8%) To leverage both models, we applied an ensemble
companies. To address this, the Synthetic Minority approach using soft voting. The probability outputs from the
Oversampling Technique (SMOTE) was applied to the DNN and GNB models were averaged to compute the final
training data. bankruptcy probability. Instead of using the default
classification threshold of 0.5, we fine-tuned the threshold by
Following an 80-20 train-test split, the training set evaluating F1-scores across multiple threshold values
contained 4,241 non-bankrupt companies and 123 bankrupt (between 0.30-0.60) . The threshold (0.45) that maximized
companies. SMOTE was used to generate synthetic samples the F1-score was selected for final predictions.
for the minority class, balancing the training set to 4,241
instances in each class. This oversampling was restricted to
the training data to prevent biasing the test set. SMOTE V. PERFORMANCE METRICS OF THE MODEL
operates by selecting a minority class sample, identifying its
k-nearest neighbors, and generating a new synthetic data The Gaussian Naive Bayes and DNN ensemble model
point through linear interpolation between the selected reached 97.23% accuracy on the test set. Other result metrics
sample and one of its neighbors increasing the in Classification Report (fig 3) along with Confusion
representation of the minority class. Matrix(fig 2) are shown below :

C. Standardisation:
To ensure uniform feature scaling, StandardScaler was
applied, transforming all features to have a mean of 0 and a
standard deviation of 1. This prevents dominance by
features with larger magnitudes.

IV. MODEL ARCHITECTURE

We developed a hybrid bankruptcy prediction model by

ensembling Deep Neural Network (DNN) and a Gaussian Figure 2: Confusion Matrix
Naive Bayes (GNB) classifier. The objective was to leverage
the probabilistic nature of GNB alongside the deep feature
learning capabilities of DNN to improve performance.
The first model, DNN, was developed to capture complex,
non-linear relationships between financial indicators. It
consists of an input layer, three hidden layers, and an output
layer. The input layer receives the selected 30 features. The
first hidden layer comprises 256 neurons, utilizing the ReLU
activation function, batch normalization, and dropout (50%)
to prevent overfitting. The second hidden layer has 128
neurons, also incorporating batch normalization and dropout Figure 3: Classification Report
(50%). The third hidden layer refines the feature
representations further with 64 neurons and a reduced
drop(40%). The output layer contains a single neuron with a VI. CONCLUSION
sigmoid activation function, outputting a probability score This study introduced a hybrid bankruptcy prediction
for bankruptcy classification. framework that integrates a Deep Neural Network (DNN)
The DNN model was compiled with the Adam optimizer with a Gaussian Naïve Bayes (GNB) classifier. Rigorous
(learning rate = 0.0005) and binary cross-entropy loss exploratory data analysis and feature engineering—including
function. It was trained for 200 epochs with a batch size of SMOTE for class imbalance, ANOVA-based feature
64 and a 20% validation split. selection, and standardization-ensured robust data
preprocessing.
GNB applies Bayes' theorem, assuming feature
independence and normal distribution given the class label . The ensemble, which combines deep feature learning and
It computes the posterior probability for each class using the probabilistic inference through fine-tuning the decision
prior probability and the likelihood, where the likelihood of threshold and soft voting for both advanced models,
continuous features is modeled using a normal distribution. achieved a test accuracy of 97.23% and an F1-Score of 0.51
The model predicts the class with the highest posterior while maintaining precision, recall trade-off. This result is
probability. noteworthy given the data with extreme imbalance with class
ratio of 35:1 . These results underscore the framework’s
Initially, employing only the DNN with feature selection potential for accurate bankruptcy prediction, laying the
yielded a maximum F1-score of 0.46. By integrating GNB groundwork for future enhancements in feature selection and
ensemble strategies.

Page 2
VII. REFERENCES Journal of Jilin University (engineering science edition),
2016, 46(3):
[1] Ohlson J A. Financial ratios and the probabilistic 884-889.
prediction of bankruptcy[J]. Journal of accounting research,
1980: 109-131. [4] Shubhair A Abdullah, Ahmed Al-Ashoor, an artificial
deep neural network for the binary classification of network
[2] Kong yiqing, semi-supervised learning and its traffic, Journal of Advanced Computer Science and
application research [D]. Wuxi, Jiangnan University, 2009: Applications, Vol .11, No. 1, 2010.
33-39.Advances in Intelligent Systems Research, volume
168399. [5] Zeng-jun BI, yao-quan HAN, Cai-quan HUANG and
Min WANG, Guassian Naive Bayesian Data classification
[3] Dong liyan, sui peng, sun peng, li yongli, a new naive model based on clustering algorithm.
bayesian algorithm based on semi-supervised learning [J],

Page 3

FRA Extended
No ratings yet
FRA Extended
22 pages
Machine Learning Seminar Presentation
No ratings yet
Machine Learning Seminar Presentation
20 pages
Loan Approval - PPT
No ratings yet
Loan Approval - PPT
19 pages
Bankruptcy Prediction Dissertation
100% (2)
Bankruptcy Prediction Dissertation
4 pages
Bankruptcy Prediction Using Imaged Nancial Ratios and Convolutional Neural Networks
No ratings yet
Bankruptcy Prediction Using Imaged Nancial Ratios and Convolutional Neural Networks
40 pages
Prediction of Company Bankruptcy: Amlan Nag
100% (2)
Prediction of Company Bankruptcy: Amlan Nag
16 pages
Problem Statement: Context
No ratings yet
Problem Statement: Context
18 pages
1 s2.0 S095741741830616X Main
No ratings yet
1 s2.0 S095741741830616X Main
13 pages
Article en Anglais Predicting Logistique Et Neurone PME PSX
No ratings yet
Article en Anglais Predicting Logistique Et Neurone PME PSX
18 pages
Practical Research Module 5
81% (31)
Practical Research Module 5
26 pages
Charalambous 2000
No ratings yet
Charalambous 2000
23 pages
SSRN Id4249412
No ratings yet
SSRN Id4249412
45 pages
Document 2
No ratings yet
Document 2
20 pages
Intelligent Bankruptcy Prediction Using Cutting-Edge N-Valued Interval
No ratings yet
Intelligent Bankruptcy Prediction Using Cutting-Edge N-Valued Interval
11 pages
Exit Exam Model Hawassa University
No ratings yet
Exit Exam Model Hawassa University
32 pages
AdaBoost Models For Corporate Bankruptcy Prediction With Missing Data
No ratings yet
AdaBoost Models For Corporate Bankruptcy Prediction With Missing Data
26 pages
Business Failure Prediction With Support Vector Machines and Neural Networks: A Comparative Study
No ratings yet
Business Failure Prediction With Support Vector Machines and Neural Networks: A Comparative Study
14 pages
Ijiset Ncisct 220503
No ratings yet
Ijiset Ncisct 220503
9 pages
Annotated Follow-Along Guide - Construct A Naive Bayes Model With Python
No ratings yet
Annotated Follow-Along Guide - Construct A Naive Bayes Model With Python
9 pages
Naive Bayes Model With Python 1684166563
No ratings yet
Naive Bayes Model With Python 1684166563
9 pages
Elicit - Machine Learning in Economic and Financial Forecas - Report
No ratings yet
Elicit - Machine Learning in Economic and Financial Forecas - Report
11 pages
D (Absen 31-40) 9-Buku
No ratings yet
D (Absen 31-40) 9-Buku
56 pages
Springer Journal of Productivity Analysis: This Content Downloaded From 52.13.166.141 On Wed, 06 May 2020 21:29:48 UTC
No ratings yet
Springer Journal of Productivity Analysis: This Content Downloaded From 52.13.166.141 On Wed, 06 May 2020 21:29:48 UTC
12 pages
18716-Article Text-28019-1-10-20200527
No ratings yet
18716-Article Text-28019-1-10-20200527
15 pages
D (Absen 31-40) 9-Buku
No ratings yet
D (Absen 31-40) 9-Buku
55 pages
Company Bankruptcy Detection PDF
No ratings yet
Company Bankruptcy Detection PDF
34 pages
1 s2.0 S0950705111002413 Main
No ratings yet
1 s2.0 S0950705111002413 Main
11 pages
Expert Systems With Applications
No ratings yet
Expert Systems With Applications
21 pages
Hybrid Genetic Algorithms and Support Vector Machines
No ratings yet
Hybrid Genetic Algorithms and Support Vector Machines
9 pages
Bankruptcy Prediction Model With Risk Factors Using Fuzzy Logic Approach
No ratings yet
Bankruptcy Prediction Model With Risk Factors Using Fuzzy Logic Approach
10 pages
8769 Main PDF
No ratings yet
8769 Main PDF
28 pages
Ads 9
No ratings yet
Ads 9
8 pages
Bank Loan2
No ratings yet
Bank Loan2
13 pages
Machine Learning Models and Bankruptcy Prediction
No ratings yet
Machine Learning Models and Bankruptcy Prediction
31 pages
Bankruptcy Prediction Model 01-02-2017
No ratings yet
Bankruptcy Prediction Model 01-02-2017
7 pages
A Comparison of Artificial Neural Network Model and Logistics Regression in
No ratings yet
A Comparison of Artificial Neural Network Model and Logistics Regression in
13 pages
Bankruptcy 1
No ratings yet
Bankruptcy 1
9 pages
Viral Pandey Bankruptcy Prediction
No ratings yet
Viral Pandey Bankruptcy Prediction
7 pages
Improving Prediction Accuracy Using Random Forest Algorithm
No ratings yet
Improving Prediction Accuracy Using Random Forest Algorithm
7 pages
Bankruptcy Prediction Using Machine Learning: Nanxi Wang
No ratings yet
Bankruptcy Prediction Using Machine Learning: Nanxi Wang
11 pages
Application of Machine Learning Algorithms For Business Failure Prediction
No ratings yet
Application of Machine Learning Algorithms For Business Failure Prediction
15 pages
Vehicle Loan Fraud Prediction Using Data Science and Machine Learning Techniques
No ratings yet
Vehicle Loan Fraud Prediction Using Data Science and Machine Learning Techniques
4 pages
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
No ratings yet
Customer Credit Risk Application and Evaluation of Machine Learning and Deep Learning Models
5 pages
1 s2.0 S095741741201250X Main
No ratings yet
1 s2.0 S095741741201250X Main
6 pages
Albright DADM 6e - PPT - Ch04
No ratings yet
Albright DADM 6e - PPT - Ch04
27 pages
1 PB
No ratings yet
1 PB
13 pages
Decision Tree Combined With Neural Networks For Financial Forecast
No ratings yet
Decision Tree Combined With Neural Networks For Financial Forecast
7 pages
Company Bankruptcy Prediction With SMOTE
No ratings yet
Company Bankruptcy Prediction With SMOTE
8 pages
Machine Learning Models and Bankruptcy Prediction Paper File
No ratings yet
Machine Learning Models and Bankruptcy Prediction Paper File
13 pages
Business Report FRA-Extended Project
No ratings yet
Business Report FRA-Extended Project
22 pages
Bankruptcy Usecase
No ratings yet
Bankruptcy Usecase
16 pages
A Structured Approach To Neural
No ratings yet
A Structured Approach To Neural
8 pages
A Genetic Programming Approach For Bankruptcy Prediction Using A Highly Unbalanced Database
No ratings yet
A Genetic Programming Approach For Bankruptcy Prediction Using A Highly Unbalanced Database
10 pages
@2016ensemble Boosted Trees With Synthetic Features Generation in
No ratings yet
@2016ensemble Boosted Trees With Synthetic Features Generation in
9 pages
Thesis Frank Wagenmans 3870154
No ratings yet
Thesis Frank Wagenmans 3870154
52 pages
ML in Financial Crisis Prediction Survey
No ratings yet
ML in Financial Crisis Prediction Survey
16 pages
An Adaptive Fuzzy Neural Network Model For Bankruptcy Prediction of Listed Companies On The Tehran Stock Exchange
No ratings yet
An Adaptive Fuzzy Neural Network Model For Bankruptcy Prediction of Listed Companies On The Tehran Stock Exchange
6 pages
Knowledge-Based Systems: Chih-Fong Tsai
No ratings yet
Knowledge-Based Systems: Chih-Fong Tsai
8 pages
A Genetic Algorithm Application in Bankruptcy Prediction Modeling
No ratings yet
A Genetic Algorithm Application in Bankruptcy Prediction Modeling
8 pages
Course: Multivariate Data Analysis and Forecasting Techniques Course Coordinator: Dr. PK Das Assignment On Case Analysis
No ratings yet
Course: Multivariate Data Analysis and Forecasting Techniques Course Coordinator: Dr. PK Das Assignment On Case Analysis
7 pages
Neural Network in Financial Analysis
No ratings yet
Neural Network in Financial Analysis
33 pages
Furman University Statistics Using SPSS
No ratings yet
Furman University Statistics Using SPSS
117 pages
Credit Risk Analysis
No ratings yet
Credit Risk Analysis
6 pages
FRONT PAGES Shilpa
No ratings yet
FRONT PAGES Shilpa
12 pages
Model Building Approach
No ratings yet
Model Building Approach
7 pages
Simulation Examples
No ratings yet
Simulation Examples
30 pages
Quantitative Methods
No ratings yet
Quantitative Methods
5 pages
Risk Assessment Model For Railway Passengers On A Crowded Platform
No ratings yet
Risk Assessment Model For Railway Passengers On A Crowded Platform
8 pages
Social Studies Ryan Abogado - 041140
No ratings yet
Social Studies Ryan Abogado - 041140
133 pages
What Is The Relationship Between SAT Scores and Family Income of The Test Takers Around The World?
No ratings yet
What Is The Relationship Between SAT Scores and Family Income of The Test Takers Around The World?
12 pages
CardiffMet PHD Final PDF
No ratings yet
CardiffMet PHD Final PDF
513 pages
Dip in Farmasi Doc 3
No ratings yet
Dip in Farmasi Doc 3
109 pages
Mar Gregorios College of Law: Mar Ivanios Vidya Nagar
No ratings yet
Mar Gregorios College of Law: Mar Ivanios Vidya Nagar
28 pages
(Ebook PDF) Essentials of Research Methods in Health, Physical Education, Exercise Science, and Recreation 3rd Edition 2024 Scribd Download
100% (3)
(Ebook PDF) Essentials of Research Methods in Health, Physical Education, Exercise Science, and Recreation 3rd Edition 2024 Scribd Download
48 pages
SAS - Session 2 - PSY079 - Intro To Psychology
No ratings yet
SAS - Session 2 - PSY079 - Intro To Psychology
7 pages
Ijsrp p8555 PDF
No ratings yet
Ijsrp p8555 PDF
11 pages
01-02 Statistical Report
No ratings yet
01-02 Statistical Report
126 pages
HL AI Probability Distributions Notes RMS
No ratings yet
HL AI Probability Distributions Notes RMS
11 pages
Assignment 13 (Statistics)
No ratings yet
Assignment 13 (Statistics)
3 pages
Likert Scale Instructions On IBM SPSS 21
No ratings yet
Likert Scale Instructions On IBM SPSS 21
16 pages
Reporting Results in APA Format UW - Edu
100% (1)
Reporting Results in APA Format UW - Edu
3 pages
Learning Sheet No. 8
No ratings yet
Learning Sheet No. 8
4 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
51 pages
Experimental Psych Midterms Reviewer
No ratings yet
Experimental Psych Midterms Reviewer
15 pages
AS3021 Operational Research 2011/2012 Coursework
No ratings yet
AS3021 Operational Research 2011/2012 Coursework
4 pages
BW LME Tutorial2 PDF
No ratings yet
BW LME Tutorial2 PDF
22 pages
tutorialKS PDF
No ratings yet
tutorialKS PDF
7 pages
STATISTIK
No ratings yet
STATISTIK
6 pages
MStat PSB 2019
No ratings yet
MStat PSB 2019
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ee 708 Report

Uploaded by

Ee 708 Report

Uploaded by

EE708 Course Project Report, Indian Institute of Technology Kanpur

Company Bankruptcy Prediction:

Srijani Gadupudi Macha Mohana Harika

number of features from 86 to 62. A detailed correlation

development. Variance between groups = Ni(μi − μ)

IV. MODEL ARCHITECTURE

We developed a hybrid bankruptcy prediction model by

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.