Abstract
In data analysis, incomplete data commonly occurs and can have significant effects on the conclusions that can be drawn from the data. Incomplete data cause another problem, so-called uncertainty which leads to producing unreliable results. Hence, developing effective techniques to impute these missing values is crucial. Missing or incomplete data and noise are two common sources of uncertainty. In this paper, an effective method for imputing missing values is introduced which is robust to uncertainties that are arising from incompleteness and noise. A kernel-based method for removing the noise is designed. Using the belief function theory, the class of incomplete data is determined. Finally, every missing dimension is imputed considering the mean value of the same dimension of the members belonging to the determined class. The performance has been evaluated on real-world data sets from UCI repository. The results of the experiments have been compared with state-of-the-art methods, which show the superiority of the proposed method regarding classification accuracy.

Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Roshanbin N, Miller J (2016) A comparative study of the performance of local feature-based pattern recognition algorithms. Pattern Anal Applic:1–12. https://doi.org/10.1007/s10044-016-0554-y
Little RJ, Rubin DB (1987) Statistical Analysis with Missing Data. John A Wiley & Sons, Inc, New York
Cleophas TJ, Zwinderman AH (2016) Missing data imputation. In: Clinical Data Analysis on a Pocket Calculator. Springer, pp 93–97
Playle R, Coulman E, Gallagher D, Simpson S (2015) The use of multiple imputation (MI) in cluster randomised trials with suspected missing not at random (MNAR) outcome. Trials 16(S2):P143
Hamidzadeh J, Moradi M (2018) Improved one-class classification using filled function. Appl Intell:1–17
Zahin SA, Ahmed CF, Alam T (2018) An effective method for classification with missing values. Appl Intell:1–22
Pan R, Yang T, Cao J, Lu K, Zhang Z (2015) Missing data imputation by K nearest neighbours based on grey relational structure and mutual information. Appl Intell 43(3):614–632
Zhu B, He C, Liatsis P (2012) A robust missing value imputation method for noisy data. Appl Intell 36(1):61–74
Donner A (1982) The relative effectiveness of procedures commonly used in multiple regression analysis for dealing with missing values. Am Stat 36(4):378–381
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B Methodol:1–38
Silva-Ramírez E-L, Pino-Mejías R, López-Coello M, Cubiles-de-la-Vega M-D (2011) Missing value imputation on missing completely at random data using multilayer perceptrons. Neural Netw 24(1):121–129
van Stein B, Kowalczyk W (2016) An incremental algorithm for repairing training sets with missing values. In: International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems. Springer, pp 175–186
Beyad Y, Maeder M (2013) Multivariate linear regression with missing values. Anal Chim Acta 796:38–41
Che Z, Purushotham S, Cho K, Sontag D, Liu Y (2018) Recurrent neural networks for multivariate time series with missing values. Sci Rep 8(1):6085
Van Hulse J, Khoshgoftaar TM (2014) Incomplete-case nearest neighbor imputation in software measurement data. Inf Sci 259:596–610
Lee M, Rahbar MH, Brown M, Gensler L, Weisman M, Diekman L, Reveille JD (2018) A multiple imputation method based on weighted quantile regression models for longitudinal censored biomarker data with missing values at early visits. BMC Med Res Methodol 18(1):8
Shafer G (1976) A mathematical theory of evidence, vol 1. Princeton University Press, Princeton
Huang S, Su X, Hu Y, Mahadevan S, Deng Y (2014) A new decision-making method by incomplete preferences based on evidence distance. Knowl-Based Syst 56:264–272
Han D, Deng Y, Han C (2013) Sequential weighted combination for unreliable evidence based on evidence variance. Decis Support Syst 56:387–393
Deng X, Hu Y, Chan FT, Mahadevan S, Deng Y (2015) Parameter estimation based on interval-valued belief structures. Eur J Oper Res 241(2):579–582
Liu Z-G, Pan Q, Mercier G, Dezert J (2015) A new incomplete pattern classification method based on evidential reasoning. IEEE Transactions on Cybernetics 45(4):635–646
Denoeux T (2013) Maximum likelihood estimation from uncertain data in the belief function framework. IEEE Trans Knowl Data Eng 25(1):119–130
Zhou K, Martin A, Pan Q, Z-g L (2015) Median evidential c-means algorithm and its application to community detection. Knowl-Based Syst 74:69–88
Denœux T, Masson M-H (2004) EVCLUS: evidential clustering of proximity data. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34(1):95–109
Masson M-H, Denoeux T (2008) ECM: An evidential version of the fuzzy c-means algorithm. Pattern Recogn 41(4):1384–1397
Hamidzadeh J, Namaei N (2018) Belief-based chaotic algorithm for support vector data description. Soft Comput:1–26
Hamidzadeh J, Moslemnejad S (2018) Identification of uncertainty and decision boundary for SVM classification training using belief function. Appl Intell:1–16
Zhang S (2008) Parimputation: From imputation and null-imputation to partially imputation. IEEE Intelligent Informatics Bulletin 9:32–38
Zhang L, Bing Z, Zhang L (2015) A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data. Pattern Anal Applic 18(2):377–384
Tian J, Yu B, Yu D, Ma S (2014) Missing data analyses: a hybrid multiple imputation algorithm using Gray System Theory and entropy based on clustering. Appl Intell 40(2):376–388
Smets P (1990) The combination of evidence in the transferable belief model. IEEE Trans Pattern Anal Mach Intell 12(5):447–458
Smarandache F, Dezert J (2015) Advances and Applications of DSmT for Information Fusion, Vol. IV: Collected Works. Infinite Study
Li T, Zhang L, Lu W, Hou H, Liu X, Pedrycz W, Zhong C (2017) Interval kernel Fuzzy C-Means clustering of incomplete data. Neurocomputing 237:316–331. https://doi.org/10.1016/j.neucom.2017.01.017
Li D, Gu H, Zhang L (2013) A hybrid genetic algorithm–fuzzy c-means approach for incomplete data clustering based on nearest-neighbor intervals. Soft Comput 17(10):1787–1796
Li D, Deogun J, Spaulding W, Shuart B (2004) Towards Missing Data Imputation: A Study of Fuzzy K-means Clustering Method. Rough Sets and Current Trends in Computing: 4th International Conference, RSCTC 2004, Uppsala, Sweden, June 1–5, 2004. Proceedings. Springer Berlin Heidelberg, Berlin, Heidelberg. doi:10.1007/978-3-540-25929-9_70
Mac ParthaláIn N, Jensen R (2013) Unsupervised fuzzy-rough set-based dimensionality reduction. Inf Sci 229:106–121
Qian Y, Liang J, Pedrycz W, Dang C (2011) An efficient accelerator for attribute reduction from incomplete data in rough set framework. Pattern Recogn 44:1658–1670
Liu Z-G, Pan Q, Dezert J (2013) A new belief-based K-nearest neighbor classification method. Pattern Recogn 46:834–844
Liu Z-G, Pan Q, Dezert J, Mercier G (2014) Credal classification rule for uncertain data based on belief functions. Pattern Recogn 47:2532–2541
Z-g L, Pan Q, Dezert J, Mercier G (2015) Credal c-means clustering method based on belief functions. Knowl-Based Syst 74:119–132
Chen H, Du Y, Jiang K (2012) Classification of incomplete data using classifier ensembles. 2012 International Conference on Systems and Informatics (ICSAI2012). doi:10.1109/ICSAI.2012.6223495
Sefidian AM, Daneshpour N (2019) Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model. Expert Syst Appl 115:68–94
Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164
Gautam C, Ravi V (2015) Data imputation via evolutionary computation, clustering and a neural network. Neurocomputing 156:134–142. https://doi.org/10.1016/j.neucom.2014.12.073
David JM, Balakrishnan K (2014) Learning disability prediction tool using ANN and ANFIS. Soft Comput 18(6):1093–1112
Silva-Ramírez E-L, Pino-Mejías R, López-Coello M (2015) Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns. Appl Soft Comput 29:65–74
Singh N, Javeed A, Chhabra S, Kumar P (2015) Missing value imputation with unsupervised kohonen self organizing map. In: Emerging Research in Computing, Information, Communication and Applications. Springer, pp 61–76
Moghaddam VH, Hamidzadeh J (2016) New Hermite orthogonal polynomial kernel and combined kernels in Support Vector Machine classifier. Pattern Recogn 60:921–935
Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin
Z-g L, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95
Z-g L, Liu Y, Dezert J, Pan Q (2015) Classification of incomplete data based on belief functions and K-nearest neighbors. Knowl-Based Syst 89:113–125. https://doi.org/10.1016/j.knosys.2015.06.022
Merz CJ (1998) UCI repository of machine learning databases. http://wwwicsuciedu/~mlearn/MLRepository.html
Sheskin DJ (2011) Handbook of parametric and nonparametric statistical procedures. Fifth Edition, Chapman and Hall/CRC
Hu Y, Yang Y, Wang C, Tian M (2017) Imputation in nonparametric quantile regression with complex data. Statistics & Probability Letters 127:120–130
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hamidzadeh, J., Moradi, M. Enhancing data analysis: uncertainty-resistance method for handling incomplete data. Appl Intell 50, 74–86 (2020). https://doi.org/10.1007/s10489-019-01514-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01514-4