Abstract
In practical classification problems, the issue of sample imbalance is a pervasive challenge. The advent of federated learning, which involves the sharing of models among multiple participants without sharing data, has further complicated the handling of sample imbalance. This complexity is particularly pronounced in vertical federated learning, where a high degree of overlap in participant samples is required. The process of aligning samples while preserving privacy may result in a significant reduction in the available data samples, exacerbating the pre-existing imbalance issue. In the context of ensuring data privacy, we propose a secure privacy-preserving SMOTE (SP2-SMOTE) sampling method. It extends traditional SMOTE by allowing parties to independently generate synthetic samples without exposing the data, while effectively preventing unauthorized label inference through minority-class nearest neighbor interpolation. The evaluation of the imbalanced KEEL dataset, divided into two participants based on sample feature importance, demonstrates that SP2-SMOTE significantly improves the classification performance of vertical federated learning. These advances are validated by a series of metrics. This work offers a robust solution to the challenge of imbalanced data in vertical federated learning, rigorously preserving privacy for practical applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Arafa, A., El-Fishawy, N., Badawy, M., Radad, M.: RN-SMOTE: reduced noise smote based on DBSCAN for enhancing imbalanced data classification. J. King Saud Univ. Comput. Inf. Sci. 34(8, Part A), 5059–5074 (2022). https://doi.org/10.1016/j.jksuci.2022.06.005
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM (1) (2004). https://doi.org/10.1145/1007730.1007735
Bian, K., Zheng, H.: FedAvg-DWA: a novel algorithm for enhanced fraud detection in federated learning environment. In: 2023 4th International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pp. 13–17 (2023). https://doi.org/10.1109/ICBAIE59714.2023.10281317
Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Advances in Knowledge Discovery and Data Mining, pp. 475–482. Springer Berlin Heidelberg, Berlin, Heidelberg (2009)
Chen, Z., Yang, C., Zhu, M., Peng, Z., Yuan, Y.: Personalized retrogress-resilient federated learning toward imbalanced medical data. IEEE Trans. Med. Imaging 41(12), 3663–3674 (2022). https://doi.org/10.1109/TMI.2022.3192483
Clifton, C., Kantarcioglu, M., Vaidya, J.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. Newsl. 4, 28–34 (2002). https://doi.org/10.1145/772862.772867
Dai, W., et al.: TEE: a virtual DRTM based execution environment for secure cloud-end computing. Futur. Gener. Comput. Syst. Int. J. e-Sci. 49, 47–57 (2015). https://doi.org/10.1016/j.future.2014.08.005
Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf. Sci. 465, 1–20 (2018). https://doi.org/10.1016/j.ins.2018.06.056
Guo, J., Wu, H., Chen, X., Lin, W.: Adaptive SV-borderline smote-SVM algorithm for imbalanced data classification. Appl. Soft Comput. 150, 110986 (2024). https://doi.org/10.1016/j.asoc.2023.110986
Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Lecture Notes in Computer Science (2005). https://doi.org/10.1007/1153805_91
He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969
Konen, J., Mcmahan, H.B., Ramage, D., Richtárik, P.: Federated optimization: distributed machine learning for on-device intelligence (2016). https://doi.org/10.48550/arXiv.1610.02527
Konen, J., Mcmahan, H.B., Yu, F.X., Richtárik, P., Bacon, D.: Federated learning: strategies for improving communication efficiency (2016). https://doi.org/10.48550/arXiv.1610.05492
Kunakorntum, I., Hinthong, W., Phunchongharn, P.: A synthetic minority based on probabilistic distribution (SyMProD) oversampling for imbalanced datasets. IEEE Access 8, 114692–114704 (2020). https://doi.org/10.1109/ACCESS.2020.3003346
Pezoulas, V.C., Kalatzis, F., Exarchos, T.P., Goules, A., Tzioufas, A.G., Fotiadis, D.I.: FHBF: federated hybrid boosted forests with dropout rates for supervised learning tasks across highly imbalanced clinical datasets. Patterns 5(1), 100893 (2024). https://doi.org/10.1016/j.patter.2023.100893
Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. (2012). https://doi.org/10.1007/s10115-011-0465-6
Sun, P., Wang, Z., Jia, L., Xu, Z.: SMOTE-kTLNN: a hybrid re-sampling method based on smote and a two-layer nearest neighbor classifier. Expert Syst. Appl. 238, 121848 (2024). https://doi.org/10.1016/j.eswa.2023.121848
Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015). https://doi.org/10.1016/j.ins.2014.08.051
Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2) (2019). https://doi.org/10.1145/3298981
Yao, A.C.C.: How to generate and exchange secrets. In: 27th Annual Symposium on Foundations of Computer Science (SFCS 1986), pp. 162–167 (1986). https://doi.org/10.1109/SFCS.1986.25
Acknowledgments
This work is supported by National Natural Science Foundation of China (61903262), Natural Science Foundation of Liao Ning province (2024-MS-133), and the Fundamental Research Funds for the Universities of Liaoning province (20240206).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Du, W., Wang, H., Shen, J., Meng, G., Guo, Y., Zhou, W. (2025). Secure Privacy-Preserving SMOTE for Vertical Federated Learning. In: Sheng, Q.Z., et al. Advanced Data Mining and Applications. ADMA 2024. Lecture Notes in Computer Science(), vol 15388. Springer, Singapore. https://doi.org/10.1007/978-981-96-0814-0_20
Download citation
DOI: https://doi.org/10.1007/978-981-96-0814-0_20
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-96-0813-3
Online ISBN: 978-981-96-0814-0
eBook Packages: Computer ScienceComputer Science (R0)