0% found this document useful (0 votes)

9 views7 pages

M Akaba 2019

Uploaded by

Kalam Pariyar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views7 pages

M Akaba 2019

Uploaded by

Kalam Pariyar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

A Comparison of Strategies for Missing Values in

Data on Machine Learning Classification

Algorithms
Tebogo Makaba Eustace Dogo
Department of Applied Information Systems Department of Electrical and Electronic Engineering
University of Johannesburg University of Johannesburg
Johannesburg, South Africa Johannesburg, South Africa
tmakaba@uj.ac.za eustaced@uj.ac.za

Abstract— Dealing with missing values in data is an important or case deletion, single and multiple imputations. Researchers
feature engineering task in data science to prevent negative continue to develop enhanced variants. On the other hand, some
impacts on machine learning classification models in terms of researchers have carried out a comparative evaluation of the
accurate prediction. However, it is often unclear what the current missing data techniques to provide more insight and
underlying cause of the missing values in real-life data is or rather
guidance on the choice of techniques, depending on the
the missing data mechanism that is causing the missingness. Thus,
it becomes necessary to evaluate several missing data approaches percentage, pattern and mechanism underlining the missingness
for a given dataset. In this paper, we perform a comparative study in a dataset [3], [5], [7]-[10].
of several approaches for handling missing values in data, namely This study compares six missing data-handling methods,
listwise deletion, mean, mode, k–nearest neighbors, expectation-
maximization, and multiple imputations by chained equations.
namely, listwise deletion (LD), mean, mode, k-nearest neighbor
The comparison is performed on two real-world datasets, using the (k-NN), expectation-maximization single imputation (EMSI)
following evaluation metrics: Accuracy, root mean squared error, and multiple imputations by chained equation (MICE), on six
receiver operating characteristics, and the F1 score. Most ML algorithms: logistic regression (LR), k-NN, support vector
classifiers performed well across the missing data strategies. machine (SVM), random forest (RF), naïve Bayes (NB) and
However, based on the result obtained, the support vector artificial neural network (ANN). Two real-life datasets are used
classifier method overall performed marginally better for the and evaluated based on the following performance metrics:
numerical data and naïve Bayes classifier for the categorical data accuracy, root mean squared error (RMSE), receiver operator
when compared to the other evaluated missing value methods. characteristics (ROC) and the F1-score.
Keywords - missing data; imputation methods; performance The rest of the paper is organized in the following way:
metrics; machine learning, classification Section II reviews the literature with regard to the missing
I. INTRODUCTION values and imputation strategies and the classifiers employed in
this study. Section III outlines the study methodology, which
Approaches to dealing with missing data have been well comprises the experimental set-up, data set used, and the
researched in literature, using either statistical [1], [2] or performance metrics for evaluation. Section IV provides the
computational intelligence (such as machine learning (ML)) results achieved and a discussion on these. Finally, section V
[3], [4] approaches. Missing values in data are broadly concludes the paper.
categorized into three missingness mechanisms [1], [2]: data
missing completely at random (MCAR) when the probability of II. MISSING DATA METHODS
an instance case or variable having a missing value is not The term missing data refers to the absence of records or
dependent on either the known value itself or any other value or values or observations usually expected to be present in a
variable in the given dataset; data missing at random (MAR) dataset. Missing data strategies are broadly categorized into
when the probability of an instance or variable having a missing three: (1) filling with zero, or ignoring data with missing values,
value is dependent on other known variables but not on the or deleting or dropping missing values, (2) single imputation
value of the missing data itself; data missing not at random strategies and (3) multiple imputation strategies. Four of the
(MNAR) when the probability of an instance or variable having methods used in this study are based on single imputation, while
a missing value is dependent on the value of that variable itself. one is based on multiple imputation methods (IM). The methods
Missing data are now a common problem in many real-world that are considered in this study are briefly described as follows:
datasets in numerous domains such as fraud detection, sensor
readings, anomaly detection etc. The missingness can be A. Listwise Deletion
attributed to numerous sources and reasons such as LD is a statistical method that handles missing data by
measurement error, mechanical faults, non-response or deleting deleting or ignoring the entire record of missing values in a
of values [5]. Missing data, if not addressed during the data dataset, and thus excluding these from the analysis. Only the
preprocessing stage prior to feeding these into an ML model, complete data are retained, which can result in biased
could induce complexity into the data analysis and affect the estimations. This method is also referred to as complete-case
performance of ML algorithms in terms of conclusions that can analysis and assumes that data are MCAR [8].
be inferred from the data, because of reduced data samples and
bias in estimation of the algorithms’ parameters. Numerous
missing data imputation handling techniques have been
developed [6], which could be broadly categorized as listwise

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 09:30:39 UTC from IEEE Xplore. Restrictions apply.
B. Imputation Methods 4) Random Forest (RF): The RF model is an ensemble
Imputation is an approach to handling missing data by and tree-based learning method that can be used to build
estimating the missing values in a dataset. The IM could be predictive models. It combines a number of decision tree
subdivided into single and multiple IM. The methods classifiers and averages their predictive accuracy, in the
considered in this paper are briefly described as follows: process improving on the overall model performance.
Ensemble learning uses multiple learning models to gain
1) Mean/Mode: Mean consists of replacing the missing better predictive results [12].
data for a given variable by the mean or mode of all known 5) Naïve Bayes: The NB classifier is a probabilistic
values of that variable. Generally, the mean method is learning technique that is based on the Bayes theorem, which
suitable for numerical variables and the mode for categorical assumes features are statistically independent. NBC uses
variables. Mean or mode usually assumes MCAR [1]. prior knowledge to calculate the probability of a sample for a
2) k-Nearest Neighbors: k-NN defines a set of nearest certain category [12].
neighbors for each sample or individual and then replaces the 6) Artificial Neural Networks: An ANN examines the
missing data for a given variable by averaging through relationship between inputs and outputs by using the training
estimating (non-missing) values of its neighbors. The size of dataset without much detail about the system; it mimics the
the dataset to be analyzed and the optimal k value are crucial workings of the human brain [12].
for this method. k-NN usually assumes data are MCAR [8].
3) Expectation maximization (EM): EM is an iterative IV. RELATED WORK
means of imputing one or more plausible missing data (EM A considerable number of research articles are available
single or multiple imputations) values, resulting a complete to deal with missing values across several domains. Some of
new dataset, through a repeated procedure [2], [11]. EM the earlier research works focused on developing enhanced
usually assumes that data are MAR. missing data IM, such as in [4], while others focused on
4) Multiple imputations by chained equations: The
performing a comparative analysis of existing missing data
MICE method is an iterative algorithm based on chained
equations that use an imputation model specified separately methods on different ML algorithms, such as in [3], [7], [14].
for each variable and involving the other variables as Most of the articles apply single imputation strategies in
estimators. MICE is a multiple imputation method that dealing with missing values in the dataset, since, it is very
involves imputing missing values in a dataset not once, but often unclear what the underlying causes of missing values in
many times [1]. MICE usually assume that data are MAR. any given data are and hard to know in advance which
missing value method is ideal for a given dataset or problem
The criteria and justification for choosing the missing data [10]. In addition, applying missing data imputation have is
methods are based on their popularity and how often they likely to distort variable distribution and associated
have been cited and used in literature, as suggested in Table interactions, and in a way also affects the ML model. It is for
1. this reason that we embark on conducting an experimental
III. MACHINE LEARNING MODELS comparison of several missing data approaches for our real-
world dataset against different ML classification algorithms.
The six classifiers are selected based on their different
In this way we could gain valuable insights into the biases
forms of learning methods. This ensures a broader
shown by these missing values strategies and how they affect
consideration of families of algorithms depending on their
learning philosophies: linear, density-based models, instance- different learning classification algorithms for our given
based, tree and neural network-based models [12]. These datasets. From the summary of some related works outlined
allow researchers a robust assessment of the missing data in Table 1, it appears that the following missing data methods
methods. are the most popularly used: mean/mode, k-NN, EM and
multiple imputations such as MICE.
1) Logistic Regression (LR): LR is a linear-based
classifier that calculates the linear output, followed by a V. STUDY METHODOLOGY
stashing function over the regression output. LR is an easy,
fast and simple ML method. A. Experimental Set-up
2) k-Nearest Neighbors: The k-NN classifier is an The aim of this experiment was to carry out a comparative
instance-based method where new instance query results are analysis and evaluate the impact of five missing data-
classified according to the majority k-NN of the category handling methods against six classifier ML algorithms with
using the Euclidean distance. The basic logic of the k-NN is four performance metrics using two real-world datasets.
to explore the nearest neighbor by assigning an initial size of
k neighborhood [13]. One of the main advantages of k-NN is
that it is an easy and simple ML algorithm.
3) Support Vector Machine: SVM is a supervised ML
algorithm that uses a technique called the kernel trick to
transform the dataset and from the transformation it finds the
best boundary between the possible results.

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 09:30:39 UTC from IEEE Xplore. Restrictions apply.
TABLE 1. SUMMARY OF RELATED WORKS

# Study Methods/Algorithms Dataset Metrics Findings

Comparison of IM based on ML:
Breast cancer
MLP, SOM and k-NN with statistical ROC curve The results of the study showed that ML
from El A´
1 [3] imputation-based methods: mean, Friedman’s test, Pairwise IMs outperformed statistical IMs when
lamo-I project
hot-deck and multiple imputations test predicting a patient’s outcome.
in Spain.
(MI) and EM
The results of the study revealed that MICE
[15] Mean confidence interval
IM: Comparison of six MICE Iris in combination with Bayesian regression
2 length and mean standard
methods. produced the least standard error and mean
error
confidence interval length.
IM: Comparison of mean, k-NN and Gene Evolutionary k-NN outperformed the
3 [4] Mean error
evolutionary k-NN expression normal k-NN and mean methods
ML: Decision tree (DT)
Missing data methods: LD, EMSI, Twenty-one Multiple imputations using EM algorithm
4 [7] EMMI, Surrogate variable splitting, UCI ML Excess error represented a superior approach to handle
DT single imputation, mean or more repository incomplete data.
single imputation and fractional cases
Medical
[14] ML: Bayesian Networks (BN) ROC, AUC, Sensitivity k-NN imputation approach proved a far
5 obstructive
IM: k-NN analysis and specificity better solution than LD.
sleep apnea
The proposed tree-based classifier
imputation method was evaluated against
ML: CART, k-NN, LDA, NBC, seven classifiers: C4.5, CART, KNN, LDA,
Gauteng road
repeated incremental pruning to Error rate and Excess NB, RIPPER and SVM across three
6 [12] traffic
produce error reduction (RIPPER), error rate missing data mechanisms: MCAR, MAR
accident
SVM and C4.5. and IM. The proposed method proved
robust and efficient in comparison to
existing methods
Questionnaire-
ML: regression model
based study in
Six IM: EMMI with bootstrapping,
the Norwegian
MI using multiple correspondence MI using multiple correspondence analysis
7 [9] opioid Standard error
analysis, MI using latent class had the best overall performance.
maintenance
analysis, multiple hot-deck, MICE
treatment
based LR and MICE based RF
program
Evaluated Bayesian network on incomplete
dataset based on MCAR and MAR; the
ML: BN Alarm Cross-entropy and log-
8 [16] proposed algorithm performed better
IM: EM AND MI network likelihood
compared to commonly used adhoc
methods.
RMSE, unsupervised
Comparison of six IM: mean, k-NN,
Iris, E. coli classification error,
fuzzy k-means, singular value bPCA and fKM showed better performance
9 [5] and breast supervised classification
computation, Bayesian principal based on the MCAR assumption.
cancer error and
component analysis and MICE
execution time
ML: RF, k-NN, ANN and SVM Two UCI Accuracy, mean absolute k-NN was a better performer with
1
Missing methods: LD, mean-mode, remote error, RMSE, precision, regression imputation, while RF was the
0 [8]
k-NN and regression imputation sensing ROC worst performer

Our experiments were conducted on ‘SPyDER’ frequent (mode) missing strategies because of the size of the
(Scientific Python Development EnviRonment) on Anaconda dataset, the number of missing values and our observation
Python distribution IDE, each time using one missing data with regard to k-NN, EMSI and MICE strategies, which did
method to test the chosen ML algorithms. The experimental not show much difference with the numeric dataset, as shown
simulation is a three-way repeated-measures strategy, which in Table 3.
allows the main effect factors (6 classifiers, 6 missing data
B. Dataset
methods and 4 performance metrics) to be evaluated against
interaction with the random effect factor (numerical and The experiments were carried out using two real-life data
categorical) datasets. Throughout the experimentation, we sets, namely Gauteng road traffic and water quality datasets.
kept the default settings of the presented classifiers. However, The characteristics of the dataset are summarized in Table 2.
for the categorical data, we only considered LD and the most

TABLE 2. SUMMARY OF DATA SET CHARACTERISTICS USED IN THE EXPERIMENTS

Missing
Missing
Dataset Data Type Instances Attributes Class values
values
%

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 09:30:39 UTC from IEEE Xplore. Restrictions apply.
Gauteng
Nominal
road 672 4 3 21 3.12
categorical
traffic
Water
Continuous
quality 1000 9 2 200 20
numerical
data

C. Performance Metrics guesses created by other IM, by taking into cognizance all the
The following performance metrics were used to evaluate available information from other variables in the data and
the performance of the models after implementing the averaging their results for better estimates of the unknown
missing data methods: Accuracy, RMSE, ROC and F1-score. true missing value. It could thus provide more valid standard
The four chosen metrics are the most popular methods used errors, p-values and final inferences. However,
for evaluating classification ML algorithms [17]. computational cost is one of MICE’s drawbacks.

VI. RESULTS AND DISCUSSION With regard to the categorical data, overall NBC seems to
perform slightly better on both LD and mode strategies used
Table 3 and Figure 1 show the results for numerical water in comparison to the other classifiers. One reason for this is
quality data, while Table 4 and Figure 2 show the result for that generally, NBC performs well with a smaller dataset with
categorical Gauteng road traffic data. The results report the a low missing rate. On the other hand, ANN had the lowest
performance of the examined classifiers based on different RMSE for the LD and mode methods in comparison to all the
missing data methods with a constant percentage of missing other classifiers, indicating better fit of ANN model and
values. The following is observed: classification accuracy. However, all the classifiers examined
With regard to the numerical data, generally all classifiers performed slightly better against the mode strategy in
performed well across the different missing data strategies comparison to the LD method. Because data are lost when
used in this study. However, overall SVC performed with using the LD method, complexity could be added in term of
consistency and slightly better in terms of all the performance variance and bias. In general, we observed that the results
metrics evaluated, with the NB classifier showing the obtained varied depending on the classifier, type of data
marginally lowest performance except when using the LD (numerical or categorical), and percentage of missing value.
and mode methods. In addition, LD, mean and mode This means that no single missing data methods is superior or
performed well across all the classifiers compared to the more fits all dataset type problems. We have seen in our case that
advanced k-NN, EMSI. The reasons for their performance, results varied with both numerical and categorical datasets,
apart from ease of implementation, are the low occurrence of reasons such as how correlated the attributes are, the data
missing values in the numerical dataset and variance distribution pattern, data size, missing value rate and data
reduction. Moreover, we observed that the MICE method type. Different missing value methods induce biases,
performed well for all the classifiers. One possible reason is particularly if the methods are based on certain assumptions
that it takes into account the uncertainties resulting from pointed out earlier in section II.

TABLE 3. RESULTS OF DATASET 1 (Numerical)

Models/Metrics Missing Data Methods

LD Mean Mode k-NN EMSI MICE
Accuracy
LR 1.00 1.00 1.00 1.00 0.975 1.00
k-NN 1.00 1.00 1.00 0.995 0.980 1.00
SVC 1.00 1.00 1.00 0.995 1.00 1.00
NB 1.00 0.97 1.00 0.980 0.91 0.97
RF 1.00 1.00 1.00 1.00 0.995 1.00
ANN 1.00 1.000 1.00 1.00 0.985 1.00
RMSE
LR 0.0010 0.007 0.00033 0.00192 0.0233 0.0062
k-NN 0.00 0.000 0.000 0.005 0.0148 0.00
SVC 0.000059 0.000075 0.000117 0.00257 0.000654 0.000075
NB 0.00 0.029 0.000 0.0142 0.0766 0.0286
RF 0.000313 0.00030 0.0004 0.00045 0.0041 0.0003
ANN 0.000467 0.000342 0.000045 0.000598 0.0556 0.000648
ROC
LR 1.00 1.00 1.00 1.00 0.995 1.00
k-NN 1.00 1.00 1.00 0.9957 0.991 1.00
SVC 1.00 1.00 1.00 1.00 1.00 1.00
NB 1.00 1.00 1.00 0.998 0.973 1.00
RF 1.00 1.00 1.00 1.00 1.00 1.00
ANN 1.00 1.00 1.00 1.00 0.9957 1.00
F1-score
LR 1.00 1.00 1.00 1.00 0.971 1.00
k-NN 1.00 1.00 1.00 0.994 0.977 1.00
SVC 1.00 1.00 1.00 0.994 1.00 1.00
NB 1.00 0.96 1.00 0.976 0.854 0.963

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 09:30:39 UTC from IEEE Xplore. Restrictions apply.
RF 1.00 1.00 1.00 1.00 0.994 1.00
ANN 1.00 1.00 1.00 1.00 0.9825 1.00

Fig. 1.Performance result ML vs MS on Dataset 1 (Numerical)

TABLE 4. RESULTS DATASET 2 (Categorical)

Models/Metrics Missing Data

Methods
LD Mode
Accuracy
LR 0.885 0.911
k-NN 0.863 0.911
SVC 0.878 0.896
NB 0.90 0.911
RF 0.879 0.896
ANN 0.86 0.90
RMSE
LR 2.81 2.68
k-NN 2.84 2.66
SVC 2.72 2.56
NB 2.70 2.58
RF 2.81 2.68
ANN 0.14 0.10
ROC
LR 0.92 0.96
k-NN 0.83 0.92
SVC 0.96 0.96
NB 0.96 0.96
RF 0.87 0.94
ANN 0.88 0.92
F1-score
LR 0.89 0.92
k-NN 0.86 0.91
SVC 0.88 0.90
NB 0.91 0.92
RF 0.88 0.90
ANN 0.86 0.90

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 09:30:39 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Performance result ML vs MS on Dataset 2 (Categorical)

The authors are also thankful to Mikros Traffic Monitoring (Pty) Ltd
VII. CONCLUSION and Prof. T. Bartz-Beielstein for making the datasets available.
The aim of this work was to evaluate the performance of six
REFERENCES
ML classifiers on different missing data strategies using
[1] R. J. A. Little and D. B. Rubin, Statistical Analysis with Missing Data.
numerical and categorical datasets. We observed a very
Third Edition (eds R. Little and D. Rubin), 2019. DOI:
marginal difference in terms of overall performance across all 10.1002/9781119482260
the classifiers. However, SVC performed marginally better [2] D. B. Rubin, "Inference and missing data," Biometrika, vol. 63, (3), pp.
for the numerical dataset, while NB classifier did the same 581-592, 1976.
for the categorical dataset across the missing data methods [3] J. M. Jerez, L. Molina, P. J. García-Laencina, E. Alba, N. Ribelles, M.
examined. However, ANN had the lowest RMSE when Martín, and L. Franco, "Missing data imputation using statistical and
compared to all the other classifiers for the categorical machine learning methods in a real breast cancer problem," Artificial
Intelligence in Medicine, vol. 50, (2), pp. 105-115, 2010. Available:
dataset, indicating better fit of ANN model. Nonetheless, for https://www.clinicalkey.es/playcontent/1-s2.0-S0933365710000679.
the categorical dataset, we noticed slightly improved DOI: 10.1016/j.artmed.2010.05.002.
performance by the classifiers against mode method in [4] H. de Silva and A. S. Perera, "Missing data imputation using
comparison to the LD method. We intend to test other evolutionary k- nearest neighbour algorithm for gene expression data."
missing value strategies, including ML and missing data Sep 2016. Available: https://ieeexplore.ieee.org/document/7829911.
DOI: 10.1109/ICTER.2016.7829911.
methods in the future, using larger datasets and different
[5] S. P. Mandel J, "A Comparison of Six Methods for Missing Data
missing values rates. The authors would like to pay detailed Imputation," Journal of Biometrics & Biostatistics, vol. 6, (1), 2015.
attention to employing ML approaches to handling missing DOI: 10.4172/2155-6180.1000224.
data, statistical quantification of biases and sensitivity [6] H. Kang, "The prevention and handling of the missing data," Korean
analysis for the missing data strategies as areas of interest in Journal of Anesthesiology, vol. 64, (5), pp. 402-406, 2013. Available:
future work. Finally, our preliminary submission is that http://synapse.koreamed.org/search.php?where=aview&id=10.4097/kja
e.2013.64.5.402&code=0011KJAE&vmode=FULL. DOI:
knowing the cause of missing values in a dataset is key to 10.4097/kjae.2013.64.5.402.
tackling the missingness problem, since the missing value [7] B. Twala, "An empirical comparison of techniques for handling
methods are based on certain assumptions. incomplete data using decision trees," Applied Artificial Intelligence,
vol. 23, (5), pp. 373-405, 2009. Available:
ACKNOWLEDGMENT http://www.tandfonline.com/doi/abs/10.1080/08839510902872223.
DOI: 10.1080/08839510902872223.
We would like to thank the University of Johannesburg for
funding and making the resources available to complete this work. [8] T. Nkonyana and B. Twala, Eds., Impact of Poor Data Quality in
Remotely Sensed Data. (Artificial Intelligence and Evolutionary
Computations in Engineering Systems ed.) Singapore: Springer Nature.

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 09:30:39 UTC from IEEE Xplore. Restrictions apply.
[9] M. R. Stavseth, T. Clausen and J. Røislien, "How handling missing data Security, vol. 4, (4), pp. 323-335, 2012. Available:
may impact conclusions: A comparison of six different imputation http://www.tandfonline.com/doi/abs/10.1080/19439962.2012.702711.
methods for categorical questionnaire data," SAGE Open Medicine, vol. DOI: 10.1080/19439962.2012.702711.
7, pp. 2050312118822912, 2019. Available: [14] D. Ferreira-Santos, M. Monteiro-Soares and P. P. Rodrigues, "Impact of
https://www.ncbi.nlm.nih.gov/pubmed/30671242. imputing missing data in Bayesian network structure learning for
[10] T. Marwala, Computational Intelligence for Missing Data Imputation, obstructive sleep apnea diagnosis," Studies in Health Technology and
Estimation, and Management. 2009Available: Informatics, vol. 247, pp. 126-130, 2018. Available:
https://ebookcentral.proquest.com/lib/[SITE_ID]/detail.action?docID= https://www.ncbi.nlm.nih.gov/pubmed/29677936.
3309570. [15] G. Chhabra, V. Vashisht and J. Ranjan, "A comparison of multiple
[11] A. P. Dempster, N. M. Laird and D. B. Rubin, "Maximum likelihood imputation methods for data with missing values," Indian Journal of
from incomplete data via the Em algorithm," Journal of the Royal Science and Technology, vol. 10, (19), pp. 1-7, 2017. DOI:
Statistical Society, vol. 39, (1), pp. 1-38, 1977. Available: 10.17485/ijst/2017/v10i19/110646.
http://www.econis.eu/PPNSET?PPN=388257237. [16] M. Singh, "Learning Bayesian Networks from Incomplete Data," In
[12] B. Twala and F. Mekuria, "Ensemble multisensor data using state-of- AAAI/IAAI, pp. 539., 1997.
the-art classification methods." Sep 2013. Available: [17] C. Ferri, J. Hernndez-Orallo and R. Modroiu, "An experimental
https://ieeexplore.ieee.org/document/6757711. DOI: comparison of performance measures for classification," Pattern Recog.
10.1109/AFRCON.2013.6757711. Lett., vol. 30, (1), pp. 27-38, 2009.
[13] B. Twala, "Dancing with dirty Road traffic accidents data: The case of
Gauteng Province in South Africa," Journal of Transportation Safety &

Authorized licensed use limited to: University College London. Downloaded on May 23,2020 at 09:30:39 UTC from IEEE Xplore. Restrictions apply.

Problem Statements
100% (1)
Problem Statements
42 pages
SPSS
No ratings yet
SPSS
92 pages
An Investigation of Missing Data Methods For Classification Trees
No ratings yet
An Investigation of Missing Data Methods For Classification Trees
43 pages
Handling Data With Three Types of Missing Values
No ratings yet
Handling Data With Three Types of Missing Values
33 pages
WINSEM2018-19 - MGT1051 - TH - SJTG23 - VL2018195003627 - Reference Material I - 12-12 - C1 - BAE
No ratings yet
WINSEM2018-19 - MGT1051 - TH - SJTG23 - VL2018195003627 - Reference Material I - 12-12 - C1 - BAE
20 pages
IJDKP
No ratings yet
IJDKP
17 pages
Extreme Learning Machine For Missing Data Using Multiple Imputations
No ratings yet
Extreme Learning Machine For Missing Data Using Multiple Imputations
18 pages
A Comparative Study of Multiple Imputation and Maximum Likelihood Methods of Imputing Missing Data in A
No ratings yet
A Comparative Study of Multiple Imputation and Maximum Likelihood Methods of Imputing Missing Data in A
14 pages
"Handling and Mitigation of Missing Data in Sensors" Course: Business Data Mining Group 13
No ratings yet
"Handling and Mitigation of Missing Data in Sensors" Course: Business Data Mining Group 13
12 pages
The Importance of Writing
No ratings yet
The Importance of Writing
4 pages
Emmanuel 2021 A Survey On Missing Data in Machine Learning
No ratings yet
Emmanuel 2021 A Survey On Missing Data in Machine Learning
37 pages
Dealing With Missing Data: Key Assumptions and Methods For Applied Analysis
No ratings yet
Dealing With Missing Data: Key Assumptions and Methods For Applied Analysis
20 pages
Missing Data & How To Handle It
No ratings yet
Missing Data & How To Handle It
32 pages
Fuzzy Based Techniques For Handling Missing Values
No ratings yet
Fuzzy Based Techniques For Handling Missing Values
6 pages
Yana Bondarenko Statistical Analysis With Missing Values
No ratings yet
Yana Bondarenko Statistical Analysis With Missing Values
5 pages
Engineering Journal Missing Data Imputation Methods in Classification Contexts
No ratings yet
Engineering Journal Missing Data Imputation Methods in Classification Contexts
6 pages
This Question Bank Corresponds To Unit No. 1
No ratings yet
This Question Bank Corresponds To Unit No. 1
8 pages
Handling Missing Values in Data Mining
No ratings yet
Handling Missing Values in Data Mining
12 pages
Fuzzy Artmap and Neural Network Approach To Online Processing of Inputs With Missing Values
No ratings yet
Fuzzy Artmap and Neural Network Approach To Online Processing of Inputs With Missing Values
7 pages
Expand Your Practice With Intelligent Order Management Presentation
No ratings yet
Expand Your Practice With Intelligent Order Management Presentation
23 pages
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
No ratings yet
Comparing Multiple Imputation and Machine Learning Techniques For Longitudinal Data
13 pages
DM Missing Value
No ratings yet
DM Missing Value
21 pages
Artificial Intelligence American English Teacher
No ratings yet
Artificial Intelligence American English Teacher
9 pages
Missing Data Analysis: University College London, 2015
No ratings yet
Missing Data Analysis: University College London, 2015
37 pages
Ijctt V3i2p104
No ratings yet
Ijctt V3i2p104
5 pages
An Overview of Artificial Intelligence Applications For Power Electronics
No ratings yet
An Overview of Artificial Intelligence Applications For Power Electronics
26 pages
The Negative Impact of Missing Value Imputation in Classification of Diabetes Dataset and Solution For Improvement
No ratings yet
The Negative Impact of Missing Value Imputation in Classification of Diabetes Dataset and Solution For Improvement
8 pages
Missing Value Paper
No ratings yet
Missing Value Paper
10 pages
Missing Data
100% (2)
Missing Data
35 pages
Cp5293 Big Data Analytics 1
No ratings yet
Cp5293 Big Data Analytics 1
9 pages
Missing Data Values and How To Handle It
No ratings yet
Missing Data Values and How To Handle It
5 pages
Vdocuments - MX - SGL Calling Watson From RPG Scott Klement Watson Docs Use Curl Docs Are
No ratings yet
Vdocuments - MX - SGL Calling Watson From RPG Scott Klement Watson Docs Use Curl Docs Are
28 pages
西湖大学2
No ratings yet
西湖大学2
64 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
MIssing Data Imputation Using Machine Learning Algorithm
No ratings yet
MIssing Data Imputation Using Machine Learning Algorithm
11 pages
Analizing Missing Data
No ratings yet
Analizing Missing Data
12 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Roles of Imputation Methods For Filling The Missing Values: A Review
No ratings yet
Roles of Imputation Methods For Filling The Missing Values: A Review
9 pages
Missing Data Imputation Using Singular Value Decomposition
No ratings yet
Missing Data Imputation Using Singular Value Decomposition
6 pages
2 PB
No ratings yet
2 PB
10 pages
8 Hron Et Al 2010
No ratings yet
8 Hron Et Al 2010
13 pages
Experiences in Deploying Test Arenas For Autonomous Mobile Robots
No ratings yet
Experiences in Deploying Test Arenas For Autonomous Mobile Robots
8 pages
Centraltendencywhattoconsider 1
No ratings yet
Centraltendencywhattoconsider 1
6 pages
Se Speaking Qúy 2
No ratings yet
Se Speaking Qúy 2
12 pages
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
No ratings yet
Adventures in Data Analysis: A Systematic Review of Deep Learning Techniques For Pattern Recognition in Cyber Physical Social Systems
65 pages
Reading Response APSC 262
No ratings yet
Reading Response APSC 262
2 pages
Henry Thesis
No ratings yet
Henry Thesis
34 pages
AI in EU Insurance
No ratings yet
AI in EU Insurance
92 pages
AICE Milestone06 27.03.2024
No ratings yet
AICE Milestone06 27.03.2024
7 pages
Ai Powered Cyber Security Assistant
No ratings yet
Ai Powered Cyber Security Assistant
13 pages
CH 02 Data Handling Technique
No ratings yet
CH 02 Data Handling Technique
105 pages
Graham2009 Missing Values Analysis
No ratings yet
Graham2009 Missing Values Analysis
31 pages
A Method For Missing Values Imputation of Machine Learning Datasets
No ratings yet
A Method For Missing Values Imputation of Machine Learning Datasets
11 pages
Missing Data
No ratings yet
Missing Data
25 pages
The New Marketing Sample Chapter
No ratings yet
The New Marketing Sample Chapter
39 pages
Mida (AE)
No ratings yet
Mida (AE)
12 pages
DT - Missing Values
No ratings yet
DT - Missing Values
11 pages
Week 5 Lecture - Data Wrangling
No ratings yet
Week 5 Lecture - Data Wrangling
26 pages
Missing Value
No ratings yet
Missing Value
11 pages
EPAM Q1 2024 Investor Presentation
No ratings yet
EPAM Q1 2024 Investor Presentation
30 pages
A Comparison of Three Popular Methods For Handling Missing Data Complete Case Analysis Inverse
No ratings yet
A Comparison of Three Popular Methods For Handling Missing Data Complete Case Analysis Inverse
31 pages
Overview of Supervised Learning Algorithms
No ratings yet
Overview of Supervised Learning Algorithms
8 pages
Missing Data Mechanisms and Imputation Methods
No ratings yet
Missing Data Mechanisms and Imputation Methods
16 pages
Handling The Missing Values
No ratings yet
Handling The Missing Values
4 pages
EIA MCA-Nepal Compr 1608788451
No ratings yet
EIA MCA-Nepal Compr 1608788451
913 pages
Emmanuel Et Al. - 2021 - A Survey On Missing Data in Machine Learning
No ratings yet
Emmanuel Et Al. - 2021 - A Survey On Missing Data in Machine Learning
37 pages
Data Imputation For Missing Values
No ratings yet
Data Imputation For Missing Values
14 pages
Platias2020 Greece
No ratings yet
Platias2020 Greece
10 pages
An Analysis of Four Missing Data Treatment Methods For Supervised Learning
No ratings yet
An Analysis of Four Missing Data Treatment Methods For Supervised Learning
16 pages
CONCEPTS IN MACHINE LEARNING-Ktunotes - in
No ratings yet
CONCEPTS IN MACHINE LEARNING-Ktunotes - in
14 pages
The Future of Sports Cars - Insights From The Ferrari Roma
No ratings yet
The Future of Sports Cars - Insights From The Ferrari Roma
4 pages
Class 10 Unit1 Intro To AI
No ratings yet
Class 10 Unit1 Intro To AI
13 pages
FDS U4
No ratings yet
FDS U4
93 pages
DL Vs Conventional
No ratings yet
DL Vs Conventional
14 pages
AI ML in Development
No ratings yet
AI ML in Development
98 pages
APCalc Syllabus
No ratings yet
APCalc Syllabus
4 pages
PS-Wang Xinyi-Master of Laws (LLM) in Common Law, CUHK
No ratings yet
PS-Wang Xinyi-Master of Laws (LLM) in Common Law, CUHK
2 pages
Machine Learning Based Missing Data Imputation
No ratings yet
Machine Learning Based Missing Data Imputation
13 pages
Data in Machine Learning
No ratings yet
Data in Machine Learning
7 pages
Manly2014-Suggestion On Use of MICE
No ratings yet
Manly2014-Suggestion On Use of MICE
13 pages
What Is MICE and How It Works-2850 Citations
No ratings yet
What Is MICE and How It Works-2850 Citations
10 pages
Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts
No ratings yet
Journal of Statistical Software: Reviewer: Abdolvahab Khademi University of Massachusetts
4 pages
White 2010
No ratings yet
White 2010
23 pages
Probable Essays of 2025
No ratings yet
Probable Essays of 2025
4 pages
QG 24-25 - Đề Phát Triển Từ Đề Minh Họa Số 16.bản in
No ratings yet
QG 24-25 - Đề Phát Triển Từ Đề Minh Họa Số 16.bản in
6 pages
An Analysis of Four Missing Data Treatment Methods
No ratings yet
An Analysis of Four Missing Data Treatment Methods
13 pages
Meta-Learning-Based Approach For IoT Data Analytics
No ratings yet
Meta-Learning-Based Approach For IoT Data Analytics
9 pages
ISAT 600 Progress Report 2
No ratings yet
ISAT 600 Progress Report 2
6 pages
SICE: An Improved Missing Data Imputation Technique: Open Access Research
No ratings yet
SICE: An Improved Missing Data Imputation Technique: Open Access Research
21 pages
Khattab Et Al-2023-Journal of Electrical Systems and Information Technology
No ratings yet
Khattab Et Al-2023-Journal of Electrical Systems and Information Technology
20 pages
#AI Mistakes CEO Make & How To Avoid Them
No ratings yet
#AI Mistakes CEO Make & How To Avoid Them
17 pages
Machine Learing Mooc
No ratings yet
Machine Learing Mooc
1 page
Linked in Marketing
No ratings yet
Linked in Marketing
26 pages
Unit 2 Notes - Docx-3
No ratings yet
Unit 2 Notes - Docx-3
14 pages
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

M Akaba 2019

Uploaded by

M Akaba 2019

Uploaded by

A Comparison of Strategies for Missing Values in

Data on Machine Learning Classification

# Study Methods/Algorithms Dataset Metrics Findings

TABLE 2. SUMMARY OF DATA SET CHARACTERISTICS USED IN THE EXPERIMENTS

TABLE 3. RESULTS OF DATASET 1 (Numerical)

Models/Metrics Missing Data Methods

Fig. 1.Performance result ML vs MS on Dataset 1 (Numerical)

TABLE 4. RESULTS DATASET 2 (Categorical)

Models/Metrics Missing Data

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.