Ee 708 Report
Ee 708 Report
Page 1
B. Oversampling using SMOTE: into the ensemble, the model effectively leveraged
The dataset exhibited a significant class imbalance, with probabilistic classification, improving the F1-score to 0.51
5,301 non-bankrupt (97.2%) and 154 bankrupt (2.8%) To leverage both models, we applied an ensemble
companies. To address this, the Synthetic Minority approach using soft voting. The probability outputs from the
Oversampling Technique (SMOTE) was applied to the DNN and GNB models were averaged to compute the final
training data. bankruptcy probability. Instead of using the default
classification threshold of 0.5, we fine-tuned the threshold by
Following an 80-20 train-test split, the training set evaluating F1-scores across multiple threshold values
contained 4,241 non-bankrupt companies and 123 bankrupt (between 0.30-0.60) . The threshold (0.45) that maximized
companies. SMOTE was used to generate synthetic samples the F1-score was selected for final predictions.
for the minority class, balancing the training set to 4,241
instances in each class. This oversampling was restricted to
the training data to prevent biasing the test set. SMOTE V. PERFORMANCE METRICS OF THE MODEL
operates by selecting a minority class sample, identifying its
k-nearest neighbors, and generating a new synthetic data The Gaussian Naive Bayes and DNN ensemble model
point through linear interpolation between the selected reached 97.23% accuracy on the test set. Other result metrics
sample and one of its neighbors increasing the in Classification Report (fig 3) along with Confusion
representation of the minority class. Matrix(fig 2) are shown below :
C. Standardisation:
To ensure uniform feature scaling, StandardScaler was
applied, transforming all features to have a mean of 0 and a
standard deviation of 1. This prevents dominance by
features with larger magnitudes.
Page 2
VII. REFERENCES Journal of Jilin University (engineering science edition),
2016, 46(3):
[1] Ohlson J A. Financial ratios and the probabilistic 884-889.
prediction of bankruptcy[J]. Journal of accounting research,
1980: 109-131. [4] Shubhair A Abdullah, Ahmed Al-Ashoor, an artificial
deep neural network for the binary classification of network
[2] Kong yiqing, semi-supervised learning and its traffic, Journal of Advanced Computer Science and
application research [D]. Wuxi, Jiangnan University, 2009: Applications, Vol .11, No. 1, 2010.
33-39.Advances in Intelligent Systems Research, volume
168399. [5] Zeng-jun BI, yao-quan HAN, Cai-quan HUANG and
Min WANG, Guassian Naive Bayesian Data classification
[3] Dong liyan, sui peng, sun peng, li yongli, a new naive model based on clustering algorithm.
bayesian algorithm based on semi-supervised learning [J],
Page 3