Skip to main content

Secure Privacy-Preserving SMOTE for Vertical Federated Learning

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2024)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 15388))

Included in the following conference series:

  • 90 Accesses

Abstract

In practical classification problems, the issue of sample imbalance is a pervasive challenge. The advent of federated learning, which involves the sharing of models among multiple participants without sharing data, has further complicated the handling of sample imbalance. This complexity is particularly pronounced in vertical federated learning, where a high degree of overlap in participant samples is required. The process of aligning samples while preserving privacy may result in a significant reduction in the available data samples, exacerbating the pre-existing imbalance issue. In the context of ensuring data privacy, we propose a secure privacy-preserving SMOTE (SP2-SMOTE) sampling method. It extends traditional SMOTE by allowing parties to independently generate synthetic samples without exposing the data, while effectively preventing unauthorized label inference through minority-class nearest neighbor interpolation. The evaluation of the imbalanced KEEL dataset, divided into two participants based on sample feature importance, demonstrates that SP2-SMOTE significantly improves the classification performance of vertical federated learning. These advances are validated by a series of metrics. This work offers a robust solution to the challenge of imbalanced data in vertical federated learning, rigorously preserving privacy for practical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Arafa, A., El-Fishawy, N., Badawy, M., Radad, M.: RN-SMOTE: reduced noise smote based on DBSCAN for enhancing imbalanced data classification. J. King Saud Univ. Comput. Inf. Sci. 34(8, Part A), 5059–5074 (2022). https://doi.org/10.1016/j.jksuci.2022.06.005

  2. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM (1) (2004). https://doi.org/10.1145/1007730.1007735

  3. Bian, K., Zheng, H.: FedAvg-DWA: a novel algorithm for enhanced fraud detection in federated learning environment. In: 2023 4th International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), pp. 13–17 (2023). https://doi.org/10.1109/ICBAIE59714.2023.10281317

  4. Bunkhumpornpat, C., Sinapiromsaran, K., Lursinsap, C.: Safe-level-smote: safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem. In: Advances in Knowledge Discovery and Data Mining, pp. 475–482. Springer Berlin Heidelberg, Berlin, Heidelberg (2009)

    Google Scholar 

  5. Chen, Z., Yang, C., Zhu, M., Peng, Z., Yuan, Y.: Personalized retrogress-resilient federated learning toward imbalanced medical data. IEEE Trans. Med. Imaging 41(12), 3663–3674 (2022). https://doi.org/10.1109/TMI.2022.3192483

    Article  Google Scholar 

  6. Clifton, C., Kantarcioglu, M., Vaidya, J.: Tools for privacy preserving distributed data mining. ACM SIGKDD Explor. Newsl. 4, 28–34 (2002). https://doi.org/10.1145/772862.772867

    Article  Google Scholar 

  7. Dai, W., et al.: TEE: a virtual DRTM based execution environment for secure cloud-end computing. Futur. Gener. Comput. Syst. Int. J. e-Sci. 49, 47–57 (2015). https://doi.org/10.1016/j.future.2014.08.005

  8. Douzas, G., Bacao, F., Last, F.: Improving imbalanced learning through a heuristic oversampling method based on k-means and smote. Inf. Sci. 465, 1–20 (2018). https://doi.org/10.1016/j.ins.2018.06.056

    Article  Google Scholar 

  9. Guo, J., Wu, H., Chen, X., Lin, W.: Adaptive SV-borderline smote-SVM algorithm for imbalanced data classification. Appl. Soft Comput. 150, 110986 (2024). https://doi.org/10.1016/j.asoc.2023.110986

    Article  Google Scholar 

  10. Han, H., Wang, W.Y., Mao, B.H.: Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. Lecture Notes in Computer Science (2005). https://doi.org/10.1007/1153805_91

  11. He, H., Bai, Y., Garcia, E.A., Li, S.: ADASYN: adaptive synthetic sampling approach for imbalanced learning. In: 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pp. 1322–1328 (2008). https://doi.org/10.1109/IJCNN.2008.4633969

  12. Konen, J., Mcmahan, H.B., Ramage, D., Richtárik, P.: Federated optimization: distributed machine learning for on-device intelligence (2016). https://doi.org/10.48550/arXiv.1610.02527

  13. Konen, J., Mcmahan, H.B., Yu, F.X., Richtárik, P., Bacon, D.: Federated learning: strategies for improving communication efficiency (2016). https://doi.org/10.48550/arXiv.1610.05492

  14. Kunakorntum, I., Hinthong, W., Phunchongharn, P.: A synthetic minority based on probabilistic distribution (SyMProD) oversampling for imbalanced datasets. IEEE Access 8, 114692–114704 (2020). https://doi.org/10.1109/ACCESS.2020.3003346

    Article  Google Scholar 

  15. Pezoulas, V.C., Kalatzis, F., Exarchos, T.P., Goules, A., Tzioufas, A.G., Fotiadis, D.I.: FHBF: federated hybrid boosted forests with dropout rates for supervised learning tasks across highly imbalanced clinical datasets. Patterns 5(1), 100893 (2024). https://doi.org/10.1016/j.patter.2023.100893

    Article  Google Scholar 

  16. Ramentol, E., Caballero, Y., Bello, R., Herrera, F.: SMOTE-RSB*: a hybrid preprocessing approach based on oversampling and undersampling for high imbalanced data-sets using SMOTE and rough sets theory. Knowl. Inf. Syst. (2012). https://doi.org/10.1007/s10115-011-0465-6

    Article  Google Scholar 

  17. Sun, P., Wang, Z., Jia, L., Xu, Z.: SMOTE-kTLNN: a hybrid re-sampling method based on smote and a two-layer nearest neighbor classifier. Expert Syst. Appl. 238, 121848 (2024). https://doi.org/10.1016/j.eswa.2023.121848

    Article  Google Scholar 

  18. Sáez, J.A., Luengo, J., Stefanowski, J., Herrera, F.: SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf. Sci. 291, 184–203 (2015). https://doi.org/10.1016/j.ins.2014.08.051

    Article  Google Scholar 

  19. Yang, Q., Liu, Y., Chen, T., Tong, Y.: Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10(2) (2019). https://doi.org/10.1145/3298981

  20. Yao, A.C.C.: How to generate and exchange secrets. In: 27th Annual Symposium on Foundations of Computer Science (SFCS 1986), pp. 162–167 (1986). https://doi.org/10.1109/SFCS.1986.25

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China (61903262), Natural Science Foundation of Liao Ning province (2024-MS-133), and the Fundamental Research Funds for the Universities of Liaoning province (20240206).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenyou Du .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Du, W., Wang, H., Shen, J., Meng, G., Guo, Y., Zhou, W. (2025). Secure Privacy-Preserving SMOTE for Vertical Federated Learning. In: Sheng, Q.Z., et al. Advanced Data Mining and Applications. ADMA 2024. Lecture Notes in Computer Science(), vol 15388. Springer, Singapore. https://doi.org/10.1007/978-981-96-0814-0_20

Download citation

  • DOI: https://doi.org/10.1007/978-981-96-0814-0_20

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-96-0813-3

  • Online ISBN: 978-981-96-0814-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy