Abstract
Among the many proposed solutions in graph embedding, traditional random walk-based embedding methods have shown their promise in several fields. However, when the graph contains high-degree nodes, random walks often neglect low- or middle-degree nodes and tend to prefer stepping through high-degree ones instead. This results in random-walk samples providing a very accurate topological representation of neighbourhoods surrounding high-degree nodes, which contrasts with a coarse-grained representation of neighbourhoods surrounding middle and low-degree nodes. This in turn affects the performance of the subsequent predictive models, which tend to overfit high-degree nodes and/or edges having high-degree nodes as one of the vertices. We propose a solution to this problem, which relies on a degree normalization approach. Experiments with popular RW-based embedding methods applied to edge prediction problems involving eight protein-protein interaction (PPI) graphs from the STRING database show the effectiveness of the proposed approach: degree normalization not only improves predictions but also provides more stable results, suggesting that our proposal has a regularization effect leading to a more robust convergence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ben-Israel, A., Greville, T.N.E.: Generalized Inverses: Theory and Applications, vol. 15. Springer, Heidelberg (2003). https://doi.org/10.1007/b97366
Campbell, S.L., Meyer, C.D.: Generalized Inverses of Linear Transformations. SIAM (2009)
Cappelletti, L., et al.: GRAPE: fast and scalable graph processing and embedding. arXiv preprint arXiv:2110.06196 (2022)
Chicco, D., Jurman, G.: The advantages of the Matthews correlation coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genom. 21(1), 1–13 (2020)
Cuzzocrea, A., Cappelletti, L., Valentini, G.: A neural model for the prediction of pathogenic genomic variants in mendelian diseases. In: Proceedings of the 1st International Conference on Advances in Signal Processing and Artificial Intelligence (ASPAI 2019), Barcelona, Spain, pp. 34–38 (2019)
Grover, A., Leskovec, J.: node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 855–864 (2016)
Li, M.M., Huang, K., Zitnik, M.: Graph representation learning in biomedicine and healthcare. Nat. Biomed. Eng. 6(12), 1353–1369 (2022)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 701–710 (2014)
Perozzi, B., Kulkarni, V., Chen, H., Skiena, S.: Don’t walk, skip! Online learning of multi-scale network embeddings. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 258–265 (2017)
Petrini, A., et al.: parSMURF, a high-performance computing tool for the genome-wide detection of pathogenic variants. GigaScience 9(5), giaa052 (2020)
Radhakrishna Rao, C., Mitra, S.K., et al.: Generalized inverse of a matrix and its applications. In: Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 601–620. University of California Press, Oakland (1972)
Szklarczyk, D., et al.: The STRING database in 2021: customizable protein-protein networks, and functional characterization of user-uploaded gene/measurement sets. Nucleic Acids Res. 49(D1), D605–D612 (2021)
Yi, H.-C., You, Z.-H., Huang, D.-S., Kwoh, C.K.: Graph representation learning in bioinformatics: trends, methods and applications. Brief. Bioinform. 23(1), bbab340 (2021)
Yue, X., et al.: Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36(4), 1241–1251 (2019)
Acknowledgment
This research was supported by the “National Center for Gene Therapy and Drugs based on RNA Technology”, PNRR-NextGenerationEU program [G43C22001320007].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Additional Results
A Additional Results
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cappelletti, L. et al. (2023). Degree-Normalization Improves Random-Walk-Based Embedding Accuracy in PPI Graphs. In: Rojas, I., Valenzuela, O., Rojas Ruiz, F., Herrera, L.J., Ortuño, F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2023. Lecture Notes in Computer Science(), vol 13920. Springer, Cham. https://doi.org/10.1007/978-3-031-34960-7_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-34960-7_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34959-1
Online ISBN: 978-3-031-34960-7
eBook Packages: Computer ScienceComputer Science (R0)