Abstract
For many diseases it is necessary to gather large cohorts of patients with the disease in order to have enough power to discover the important factors. In this setting, it is very important to preserve the privacy of each patient and ideally remove the necessity to gather all data in one place. Examples include genomic research of cancer, infectious diseases or Alzheimer’s. This problem leads us to develop privacy preserving machine learning algorithms. So far in the literature there are studies addressing the calculation of a specific function privately with lack of generality or utilizing computationally expensive encryption to preserve the privacy, which slows down the computation significantly. In this study, we propose a framework utilizing randomized encoding in which four basic arithmetic operations (addition, subtraction, multiplication and division) can be performed, in order to allow the calculation of machine learning algorithms involving one type of these operations privately. Among the suitable machine learning algorithms, we apply the oligo kernel and the radial basis function kernel to the coreceptor usage prediction problem of HIV by employing the framework to calculate the kernel functions. The results show that we do not sacrifice the performance of the algorithms for privacy in terms of F1-score and AUROC. Furthermore, the execution time of the framework in the experiments of the oligo kernel is comparable with the non-private version of the computation. Our framework in the experiments of radial basis function kernel is also way faster than the existing approaches utilizing integer vector homomorphic encryption and consequently homomorphic encryption based solutions, which indicates that our approach has a potential for application to many other diseases and data types.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Applebaum, B.: Garbled circuits as randomized encodings of functions: a primer. Tutorials on the Foundations of Cryptography. ISC, pp. 1–44. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-57048-8_1
Applebaum, B., Ishai, Y., Kushilevitz, E.: Computationally private randomizing polynomials and their applications. Comput. Complex. 15(2), 115–162 (2006)
Applebaum, B., Ishai, Y., Kushilevitz, E.: Cryptography in \({\rm NC}^{\hat{\,}}0\). SIAM J. Comput. 36(4), 845–888 (2006)
Applebaum, B., Ishai, Y., Kushilevitz, E.: How to garble arithmetic circuits. SIAM J. Comput. 43(2), 905–929 (2014)
Ayday, E., De Cristofaro, E., Hubaux, J.P., Tsudik, G.: Whole genome sequencing: revolutionary medicine or privacy nightmare? Computer 48(2), 58–66 (2015)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Döring, M., et al.: geno2pheno[ngs-freq]: a genotypic interpretation system for identifying viral drug resistance using next-generation sequencing data. Nucleic Acids Res. gky349 (2018). https://doi.org/10.1093/nar/gky349
Halevi, S., Shoup, V.: Algorithms in HElib. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014. LNCS, vol. 8616, pp. 554–571. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44371-2_31
Halevi, S., Shoup, V.: HElib-an implementation of homomorphic encryption. Cryptology ePrint Archive, Report 2014/039 (2014)
Igel, C., Glasmachers, T., Mersch, B., Pfeifer, N., Meinicke, P.: Gradient-based optimization of kernel-target alignment for sequence kernels applied to bacterial gene start detection. IEEE/ACM Trans. Comput. Biol. Bioinform. 4(2), 216–226 (2007)
Kale, G., Ayday, E., Tastan, O.: A utility maximizing and privacy preserving approach for protecting kinship in genomic databases. Bioinformatics 34(2), 181–189 (2017)
Kauppi, J.P., et al.: Towards brain-activity-controlled information retrieval: decoding image relevance from MEG signals. NeuroImage 112, 288–298 (2015)
Lengauer, T., Pfeifer, N., Kaiser, R.: Personalized HIV therapy to control drug resistance. Drug Discovery Today: Technol. 11, 57–64 (2014)
Lengauer, T., Sander, O., Sierra, S., Thielen, A., Kaiser, R.: Bioinformatics prediction of HIV coreceptor usage. Nat. Biotechnol. 25(12), 1407–1410 (2007). https://doi.org/10.1038/nbt1371
Liu, F., Ng, W.K., Zhang, W.: Encrypted SVM for outsourced data mining. In: 2015 IEEE 8th International Conference on Cloud Computing (CLOUD), pp. 1085–1092. IEEE (2015)
Lunshof, J.E., Chadwick, R., Vorhaus, D.B., Church, G.M.: From genetic privacy to open consent. Nat. Rev. Genet. 9(5), 406 (2008)
Marouli, E., et al.: Rare and low-frequency coding variants alter human adult height. Nature 542(7640), 186 (2017)
Meinicke, P., Tech, M., Morgenstern, B., Merkl, R.: Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites. BMC Bioinform. 5(1), 169 (2004)
Mersch, B., Gepperth, A., Suhai, S., Hotz-Wagenblatt, A.: Automatic detection of exonic splicing enhancers (ESEs) using SVMs. BMC Bioinform. 9(1), 369 (2008)
Michailidou, K., et al.: Genome-wide association analysis of more than 120,000 individuals identifies 15 new susceptibility loci for breast cancer. Nat. Genet. 47(4), 373 (2015)
Ming, J., et al.: COINSTAC: decentralizing the future of brain imaging analysis. F1000Research 6 (2017)
Pfeifer, N., Kohlbacher, O.: Multiple instance learning allows MHC class II epitope predictions across alleles. In: Crandall, K.A., Lagergren, J. (eds.) WABI 2008. LNCS, vol. 5251, pp. 210–221. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87361-7_18
Reis-Filho, J.S.: Next-generation sequencing. Breast Cancer Res. 11(3), S12 (2009)
Schölkopf, B., Smola, A.J., et al.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge (2002)
Vaidya, J., Yu, H., Jiang, X.: Privacy-preserving SVM classification. Knowl. Inf. Syst. 14(2), 161–178 (2008)
Yu, A., Lai, W.L., Payor, J.: Efficient integer vector homomorphic encryption (2015)
Zhang, J., Ma, K.K., Er, M.H., Chong, V.: Tumor segmentation from magnetic resonance imaging by learning via one-class support vector machine. In: International Workshop on Advanced Image Technology (IWAIT 2004), pp. 207–211 (2004)
Zhang, J., Wang, X., Yiu, S.M., Jiang, Z.L., Li, J.: Secure dot product of outsourced encrypted vectors and its application to SVM. In: Proceedings of the Fifth ACM International Workshop on Security in Cloud Computing, pp. 75–82. ACM (2017)
Zhou, H., Wornell, G.: Efficient homomorphic encryption on integer vectors and its applications. In: 2014 Information Theory and Applications Workshop (ITA), pp. 1–9. IEEE (2014)
Acknowledgement
This study is supported by the DFG Cluster of Excellence “Machine Learning – New Perspectives for Science”, EXC 2064/1, project number 390727645. Furthermore, NP and MA acknowledge funding from the German Federal Ministry of Education and Research (BMBF) within the ‘Medical Informatics Initiative’ (DIFUTURE, reference number 01ZZ1804D).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ünal, A.B., Akgün, M., Pfeifer, N. (2019). A Framework with Randomized Encoding for a Fast Privacy Preserving Calculation of Non-linear Kernels for Machine Learning Applications in Precision Medicine. In: Mu, Y., Deng, R., Huang, X. (eds) Cryptology and Network Security. CANS 2019. Lecture Notes in Computer Science(), vol 11829. Springer, Cham. https://doi.org/10.1007/978-3-030-31578-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-31578-8_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31577-1
Online ISBN: 978-3-030-31578-8
eBook Packages: Computer ScienceComputer Science (R0)