0% found this document useful (0 votes)
13 views13 pages

Challenges in KNN Classification: Shichao Zhang

This paper discusses the challenges associated with KNN classification, including K computation, nearest neighbor selection, search, and classification rules. It reviews recent approaches to address these issues and evaluates their effectiveness using experiments on 15 UCI benchmark datasets. The paper aims to provide a roadmap for future KNN research and suggests new classification rules to handle training sample imbalances.

Uploaded by

hrm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views13 pages

Challenges in KNN Classification: Shichao Zhang

This paper discusses the challenges associated with KNN classification, including K computation, nearest neighbor selection, search, and classification rules. It reviews recent approaches to address these issues and evaluates their effectiveness using experiments on 15 UCI benchmark datasets. The paper aims to provide a roadmap for future KNN research and suggests new classification rules to handle training sample imbalances.

Uploaded by

hrm
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 34, NO.

10, OCTOBER 2022 4663

Challenges in KNN Classification


Shichao Zhang , Senior Member, IEEE

Abstract—The KNN algorithm is one of the most popular data mining algorithms. It has been widely and successfully applied to data
analysis applications across a variety of research topics in computer science. This paper illustrates that, despite its success, there remain
many challenges in KNN classification, including K computation, nearest neighbor selection, nearest neighbor search and classification
rules. Having established these issues, recent approaches to their resolution are examined in more detail, thereby providing a potential
roadmap for ongoing KNN-related research, as well as some new classification rules regarding how to tackle the issue of training sample
imbalance. To evaluate the proposed approaches, some experiments were conducted with 15 UCI benchmark datasets.

Index Terms—Data mining, lazy learning, KNN classification, classification rule

1 INTRODUCTION the test data within the training dataset. After this, a predic-
tion is made on the basis of the class of test data that most
NN (nearest neighbor) classification is an efficient solu-
tion to approximation, which was first proposed as a
nonparametric discrimination in statistics [1]. However, it
frequently occurs amongst the K neighbors. This is referred
to as the majority rule (which is similar to the Bayesian
rule). From the above procedure of KNN classification, it
has long suffered from the key issue of overfitting [2], [3]. k
indicates that there are mainly four challenging issues,
nearest neighbor (KNN) classification was also advocated
K computation, nearest neighbour selection, nearest neigh-
by Fix and Hodges [1] as a possible solution to this issue.
bour search, and classification rule.
Here, K objects are found in a training dataset that are clos-
It can be very difficult to set a suitable K for a given train-
est to a test object/data. A label is then assigned according
ing dataset. Training samples often have different distribu-
to the predominance of the majority class in this neighbor-
tions in the sample space, which can lead to there being no
hood. This is the standard prediction approach to KNN
obviously suitable K for the whole training sample space.
classification, known as the majority rule.
This has resulted in two research directions as follows. One
KNN classification has the remarkable property that,
is to set different K values to different sample subspaces
under very mild conditions, the error rate of a KNN algo-
[10], [11]. Another is to set different K values to different
rithm tends towards being Bayes optimal as the sample size
test samples [12]. From Zhang et al., the efficiency is pretty
tends towards infinity [4]. For any data analysis application,
good when clustering the sample space to 3-5 subspaces
if establishing a model with some training dataset is proving
[11]. It is time-consuming to set different K values to differ-
troublesome, it is likely that a KNN algorithm will provide
ent test samples.
the best solution [5]. As a result, KNN algorithms have been
Nearest neighbor selection has been studied extensively. It
widely used in research and are considered to being one of
is really a procedure of determining a proximity measure.
the top-10 data mining algorithms [6]. In this era of big data,
Much of the related research has focused on constructing dis-
KNN approaches provide a particularly efficient way of
tance functions for measuring the proximity. Song, et al.
identifying useful patterns and developing case-based rea-
proposed two KNN methods for measures the informative-
soning algorithms for artificial intelligence (AI) [7], [8].
ness of test data, Locally Informative-KNN (LI-KNN) and
As with other classification algorithms, KNN classifica-
Globally Informative-KNN (GI-KNN) [13]. Each of them is as
tion is a two-phase procedure: model training and test data
a query-based distance metric to measure the closeness
prediction. In the training phase, KNN only involves find-
between objects. However, no distance function has yet been
ing a suitable K for a given training dataset [9]. The most
identified that is suitable for all training samples regardless
common method for this is the cross-validation. In the pre-
of their distribution. In other words, there remains a need for
diction phase, the first step is a search for K data points in
distance function for the selection of the K nearest neighbor
the training dataset that are most relevant to a query (test
points that can work effectively across most training samples.
data/sample). Without other information, the K most rele-
On the other hand, feature selection is useful to choosing the
vant data points are taken to be the K nearest neighbors of
K nearest neighbors [14]. This can be a lazy procedure
because it depends on data mining tasks.
 The author is with the College of Computer Science and Technology, Central Nearest neighbor search is particularly challenging and
South University, Changsha 410083, PR China. remains unsolved because it is a complete sample space
E-mail: zhangsc@csu.edu.cn. search when looking for all the K nearest neighbors for each
Manuscript received 21 Aug. 2019; revised 21 Dec. 2020; accepted 31 Dec. 2020. test data. As a result, KNN classification is often referred
Date of publication 5 Jan. 2021; date of current version 12 Sept. 2022. to as a lazy data mining method. There are some efforts
(Corresponding author: Shichao Zhang.)
Recommended for acceptance by Y. Zhang. devoted to resolving the problem of how to undertake a
Digital Object Identifier no. 10.1109/TKDE.2021.3049250 truly effective nearest neighbor search. From the lately
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
4664 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 34, NO. 10, OCTOBER 2022

related reports, most of them were focused on seeking K pattern classification [2]. Wettschereck and Dietterich exper-
approximate nearest neighbors. Li, et al. examined 16 imentally compared the nearest-neighbor and nearest-
approximate nearest neighbor search algorithms in different hyperrectangle algorithms [3]. Hastie and Tibshirani pro-
domains [15]. And then, they proposed a nearest neighbor posed a KNN method for classification and regression [19].
search method that achieves both high query efficiency and They designed a general model based on a local Linear
high recall empirically on majority of the datasets under Discriminant Analysis, called the DANN algorithm. Singh,
a wide range of settings. et al. applied KNN method to image understanding [20].
The majority rule has been used widely and successfully They designed a nearest neighbor algorithm, aiming to find
in real applications as a KNN classification principle. How- the nearest average distance rather than nearest maximum
ever, if a training dataset has unbalanced classes, the major- number of neighbors. Peng, et al. advocated to use other
ity rule is unable to work effectively. This is why there is classification rules to KNN classification. A locally adaptive
little research that has sought to extend the use of KNN neighborhood morphing classification method was devel-
algorithms to cost/risk-sensitive learning. oped to minimize bias [21]. Chen and Shao applied the
The rest of this paper is organized as follows: Section 2 KNN method to estimate the value of missing data [22]. A
briefly introduces the problem of K computation. The meth- jackknife variance estimation was designed for improving
ods for nearest neighbor selection are reviewed in Section 3. the nearest-neighbor imputation. Domeniconi and Gunopu-
Nearest neighbor search is the topic of Section 4. An over- los presented an adaptive KNN algorithm for pattern classi-
view of the issues surrounding classification rules is pro- fication [23]. The maximum margin boundary found by the
vided in Section 5. Section 6 evaluates the efficiency of some SVM is used to determine the most discriminant direction
new classification rules that have been developed to over the neighborhood of test data. Tao, et al. proposed a
improve KNN classification. Some conclusions are put for- RKNN (reverse k nearest neighbor) method for retrieving
ward in Section 7. dynamic multidimensional datasets [24]. It utilizes a con-
ventional data-partitioning index on the dataset and does
not require any pre-computation. Zhu and Basir designed a
2 K COMPUTATION fuzzy KNN algorithm for remote sensing image classifica-
Setting a suitable K for a given training dataset is a key step tion [25]. In the algorithm, each nearest neighbor provides
in KNN classification. There are two main ways in which evidence on the belongingness of the input pattern to be
this can be accomplished. The first option is for the data classified, and it is evaluated based on a measure of disap-
analyst employing KNN classification to assume that the proval to achieve the adaptive capability during the classifi-
users will provide the K for their datasets. However, it is cation process. Zhang and Zhou applied KNN method to
clearly challenging for users to establish effective K values multilabel classification [26]. They designed an ML-KNN
in this way. algorithm, or a multi-label lazy learning approach. Qin,
The second option is to use all of the samples in the given et al. applied the KNN method to cost-sensitive classifica-
training dataset, i.e., to attack the issue of K computation tion [27]. The neighborhood in the minor class is set much
with training samples. There are three main approaches to more cost. Liu, Wu and Zhang designed a KNN algorithm
dealing with K computation. We briefly outline them below. for multilabel classification [28]. A nearest neighbor selec-
Setting a Single K Value for the Whole Sample Space. Since its tion was designed for multilabel classification. The certainty
inception almost all approaches to KNN classification factor is further adopted to well address the problem of
focused on setting only one suitable K for a given training unbalanced and uncertain data. Gallego, et al. developed a
dataset. A natural way of going about this was to use cross- clustering-based KNN classification. The K nearest neigh-
validation to find a K for the training dataset. This method bors are taken as the initial points [29].
has been successfully applied in real applications in statis- Gou et al. proposed a KNN algorithm based on local con-
tics and data mining [16]. One way of computing an optimal straint representation [30]. Specifically, it first finds the K
K for a dataset is to use what is known as holdout cross-vali- nearest neighbors of the test data in each class of the training
dation. First of all, a suitable K for each sample in a dataset data according to the euclidean metric. Then it constructs K
is searched for. Then, the K that delivers the highest classifi- local mean vectors in each class according to the K nearest
cation efficiency is chosen for the whole sample space. neighbors in the previous step. Finally, it uses the local mean
Another technique is to use m-fold cross-validation. This vector to fit the test data and combines local constraints to get
first partitions the training dataset into m mutually-disjoint the final representation-based distance metric. It predicts class
subsets. Then, cross-validation is used to generate a viable labels of the test data through the representation-based dis-
K for each subset. Finally, the K that delivers the best classi- tance metric. In addition, Gou et al. also proposed a KNN
fication efficiency is chosen for the whole sample space. algorithm based on local mean representation [31]. It is differ-
After KNN method was invented by Fix and Hodges [1], ent from the previous algorithmist. Although it is also the first
a lot of research has been devoted to setting a single K for to construct a local mean vector in each class of training data.
the whole sample space. For example, Loftsgaarden and However, in the following steps, when performing linear
Quesenberry referred the KNN method to a nonparametric representation of the test data, it performs two linear repre-
solution that was applied to estimate the multivariate den- sentations to obtain the optimal relationship representation.
sity function [17]. Sebestyen took the KNN method as an Finally, it predicts the class label of the test data through a
important tool of decision making and collected it to his new metric function based on the local mean.
monograph “Decision-making Processes in Pattern Recog- Setting Different K Values for Different Samples. It is well
nition” [18]. Cover and Hart designed a KNN algorithm for known that instances are nonuniformly distributed in a
ZHANG: CHALLENGES IN KNN CLASSIFICATION 4665

sample space. It therefore seems reasonable that different


test data should be given different K values. Thus, some
recent work has proposed the approach of setting an opti-
mal K for each test sample. For example, Guo et al. has pre-
sented an approach where a KNN model is constructed for
a sample that replaces the sample itself, to serve as the basis
of classification [32]. The value of K is automatically deter-
mined, can vary according to the sample and is optimal in
terms of classification accuracy. Li et al. have proposed an
improved KNN classification algorithm that uses different
numbers of nearest neighbors for different categories, rather
than having a fixed number across all categories [33]. More
samples (nearest neighbors) are used to decide whether test
data should be classified to a category, if there are more
samples for that category in the training dataset. Yu et al.
have put forward a method, called optimally pruned K-
nearest neighbors (OP-kNNs), that can compete with other
state-of-the-art methods while remaining fast [34]. Setting
an optimal K for each sample has also been proposed in the
context of graph sparse reconstruction (see [12]), called G-
optimal-K. When compared with the preceding three opti-
mal-K methods above in experiments, the G-optimal-K
approach produced the best results. We will therefore Fig. 1. Decision tree with class leaves and K value leaves.
briefly examine this approach in greater detail.
The idea of Zhang et al. was to revise conventional KNN samples. Another is that the approximation K values work
algorithms using a sparse reconstruction framework [12]. well compared with the real-time computed K. Last one is
This can generate different K values for different test sam- that the approximation K values can be trained before given
ples and make the best use of prior knowledge in the training a data mining task, whereas the real-time computed K is
dataset. The G-optimal-K approach was designed with three obtained after given the data mining task which is a lazy
regularization terms. A reconstruction process is adopted to procedure.
move between training and test samples that obtains a K
value for every test sample. In the reconstruction process, a
least square loss function is applied to achieve the minimal 3 NEAREST NEIGHBOR SELECTION
reconstruction error. An L1-norm is then applied to generate As having mentioned, nearest neighbor selection is really a
element-wise sparsity for selecting the different K values for procedure of determining a proximity measure. Most of the
the different test samples. To improve the reconstruction research relating to nearest neighbor selection involves
performance, an L2;1 -norm is employed to generate row studying the distance function or similarity metrics for mea-
sparsity, thus removing noisy samples. Finally, an Locality suring the proximity between KNN classification objects.
Preserving Projection (LPP) regularization term is suggested Numerous techniques have been developed for modifying
to preserve the local sample structure. KNN classification in terms of distance measurement selec-
K Value Approximation. The G-optimal-K approach has tion/construction and this has become a hot topic in KNN
since been extended to approximation computation in an algorithm research [19], [21], [36], [37]. Currently a variety
algorithm, called Ktree (see [35]). This is illustrated in Fig. 1. of distance measures are available, such as euclidean, Ham-
To get an optimal K for a test data, one must compute a K ming, Minkowsky, Mahalanobis, Camberra, Chebychev,
value for each new data item, one by one, before predicting Quadratic, Correlation, Chi-square, and hyperrectangle
the class of the test data. A major issue with K computation [38], Value Difference Metrics [39] and Minimal Risk Met-
is that it is both expensive and time-consuming if users rics [40], with an additional option being grey distance [41].
would like to set different K values to predict different data However, distance functions generally do not perform
classifications. Therefore, Zhang et al. advocated approxi- consistently well, even under specified conditions [42]. This
mating the optimal K of the test data with its nearest makes the use of a KNN approach highly experience depen-
neighbor’s optimal K by using what is called a Ktree [35]. A dent. Various attempts have been made to remedy this situ-
Ktree is built to rapidly search for the nearest neighbor and ation. Amongst these, Discriminant Adaptive Nearest
K value for the data. The K computation proceeds as fol- Neighbor (DANN) is notable for carrying out a local linear
lows. In the training phase, the KNN method is Usedom to discriminant analysis to deform the distance metric based
build a Ktree for the training dataset, where each leaf node on the 50 nearest neighbors [19]. Local Flexible Metric based
is a training sample with an optimal K value. In the predic- on Support Vector Machine (LFM-SVM) also deforms the
tion phase, the KNN method is used to search the Ktree to metric by feature weighting. Here, the weights are inferred
obtain the nearest neighbor for the test data. The optimal K by training an SVM on the entire dataset [23]. Klocal Hyper-
of the nearest neighbor is assigned to the test data. plane distance Nearest Neighbor (HkNN) uses a collection
The K value approximation delivers three results as fol- of 15-70 nearest neighbors from each class to span a linear
lows. One is that different K values can be set with training subspace for that class. Classification is then based not on
4666 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 34, NO. 10, OCTOBER 2022

the distance to prototypes but on the distance to linear sub- More recently, a new measure named neighborhood
spaces [37]. counting has been proposed that can define the similarity
There are other kinds of distance defined by the data between two data points by using the number of neighbor-
properties. Examples here include: tangent distance using hoods [42]. To measure the similarity between two data
the USPS zip code dataset [43], shape context-based distance points, all neighborhoods of covering both the two data
using the MNIST digit dataset [44], distances between histo- points are counted and the number of such neighborhoods
grams of texts using the CUReT dataset [45] and geometric as a measure of similarity. As the features of high-dimen-
blur-based distances using Caltech-101 [46]. These measures sional data are often correlated the above kinds of measures
can be extended by kernel techniques so as to estimate a can easily become meaningless. Some approaches have
curved local neighborhood [47]. This makes the space been designed to deal with this issue, e.g., by applying vari-
around the samples nearer or further from the test data, able aggregation to define the measure [36], [52], [53]. Aside
depending on class-conditional probability distributions. from the above kinds of measures, another strategy that can
There are also many other efforts to measuring the proximity be applied is to consider the geometrical placement of the
between samples. For example, Blanzieri and Ricci presented neighbors rather than their actual distances [54]. This
a minimum risk metric (MRM) for classification tasks that approach is effective in some cases, but is in conflict with
exploits estimates of the posterior probabilities [40]. The human intuition when the data is manifold. L opez et al. pro-
MRM is optimal, in the sense that it optimizes the finite mis- posed a nearest neighbor classification for high-dimensional
classification risk, whereas the Short and Fukunaga Metric data [55]. Specifically, it first sorts all the features through
minimize the difference between finite risk and asymptotic the FR strategy. Then it selects the first r features with larger
risk. Domeniconi, et al. built a locally adaptive nearest-neigh- weights. Finally, a new distance function is constructed to
bor classification method to try to minimize bias [36]. a chi- predict the class label based on the selected features and the
squared distance analysis was employed to compute a flexi- test data. Feng et al. proposed a new distance measurement
ble metric for producing neighborhoods that are highly function to solve the problem of class imbalance [56]. It first
adaptive to query locations. Neighborhoods are elongated reconstructs the data with a projection matrix, and then cal-
along less relevant feature dimensions and constricted along culates the KL divergence between different classes. Finally,
most influential ones. Peng, et al. advocated an adaptive it gets a distance metric matrix by solving the proposed
KNN classification method for minimizing bias [47]. A qua- objective function. In order to solve the problem of outliers,
siconformal transformed kernels was applied to compute Mehta et al. proposed a KNN classification through har-
neighborhoods over which the class probabilities tend to be monic mean distance [57]. It looks for K nearest centroid
more homogeneous. Athitsos, et al. applied the KNN method neighbors of the test data in each class. In addition, it also
to information retrieval [48]. A method, BoostMap, was calculates a local centroid mean vector in each class, and
designed for efficient nearest neighbor retrieval under com- uses the nearest harmonic mean distance between the test
putationally expensive distance measures. Chen, et al. devel- data and the local centroid mean to predict the class label of
oped a KNN search by utilizing the distance lower bound to the test data. Syaliman et al. proposed a KNN algorithm
avoid the calculation of the distance itself if the lower bound based on local mean and distance weight [58]. It not only
is already larger than the global minimum distance [49]. finds the local mean vector in each class, but also applies
They constructed a lower bound tree (LB-tree) by agglomera- weights based on the distance. The class weights farther
tively clustering all the sample points to be searched. Li, et al. from the test data are smaller, on the contrary, the class
proposed a KNN algorithm with local probability centers of weights closer to the test data are larger. Jiao et al. proposed
each class [33]. It can reduce the number of negative contrib- a paired distance metric for KNN [59]. It avoids the tradi-
uting points which are the known samples falling on the tional method using only one distance metric for global
wrong side of the ideal decision boundary, in a training set data. It also sets weights on features and has resolved the
and by restricting their influence regions. Liu, Wu and Zhang uncertainty in the output of the classifier. Nguyen et al. pro-
presented a nearest neighbor selection was designed for mul- posed a large-margin distance metric learning approach
tilabel classification [28]. The target labels of test data are pre- [60]. It can maximize the margin of each training example. It
dicted with the help of those relevant and reliable data, also solves the optimization problem that the proposed for-
which explored by the concept of shelly nearest neighbor. mula is non-convex. Weinberger et al. proposed a distance
Song, et al. proposed two KNN methods for measures the measurement function for KNN [61]. It can make K nearest
informativeness of test data [13]. That is, Locally Informa- neighbors always belong to the same class and maximize
tive-KNN and Globally Informative-KNN were constructed the distance between different classes. Goldberger et al. pro-
as a query-based distance metric to measure the closeness posed a new mahalanobis distance measure, it can learn the
between objects. Zhang, Cao and Wang developed a low-dimensional linear embedding of a data [62]. In this
weighted heterogeneous distance Metric [50]. With the way, it can reduce the computational complexity of the
WHDM, the reduced random subspace-based Bagging algorithm and speed up the KNN classification. Mensink
(RRSB) algorithm is proposed for construct ensemble classi- et al. proposed two distance-based classifiers (i.e., KNN and
fier, which can increase the diversity of component classifiers nearest class mean (NCM) ) [63]. In addition, it also introdu-
without damaging the accuracy of the component classifiers. ces a new distance measurement function to improve their
Gou, et al. designed a generalized mean distance-based KNN performance. In the experiment, it is verified that the NCM
classifier (GMDKNN) [51]. The multi-local mean vectors of a algorithm has better performance. V. Daviset al. used infor-
test data in each class are calculated by adopting its class- mation theory to learn a mahalanobis distance function [64].
specific K nearest neighbors. It minimizes the KL divergence between two Gaussian
ZHANG: CHALLENGES IN KNN CLASSIFICATION 4667

Fig. 2. Ideal nearest neighbors of test data A. Fig. 4. Nearest neighbors below or above test data A.

Fig. 5. Some nearest neighbors further away from test data A.


Fig. 3. Nearest neighbors to the left or right of test data A.
to identify the nearest neighbors within the shell surround-
distributions by constraining the distance function. Finally, it ing test data A. First of all, one searches for the K nearest
learns a mahalanobis matrix A by optimizing the objective neighbors of test data A in the training dataset. Each fea-
function. Nguyen et al. proposed a new metric learning ture/attribute is then treated as an axis and the left and
method through maximization of the Jeffrey divergence [65]. right nearest neighbors are selected from the K nearest
Specifically, it first generates two multivariate Gaussian dis- neighbors for test data A. Finally, the nearest neighbors
tributions from local pairwise constraints. Then it maximizes within the shell of test data A are selected from the left and
the Jeffrey divergence of these two distributions to get a right nearest neighbors on all axes.
linear transformation. Finally, it turns the problem into Quadratic-selection can only identify those nearest
an unconstrained optimization problem and solves it. neighbors that are distributed around test data A. It is there-
Globerson et al. proposed a new metric learning for classifica- fore worth taking note of the following cases.
tion tasks [66]. It can make the points of the same class be close
Point 1 Some nearest neighbors amongst the K nearest
to each other and the points of different classes are far away. It
neighbors will be selected many times;
optimizes the equivalent convex dual form of the proposed
function to solve the metric matrix. FISHER et al. discussed Point 2 The number of selected shell-nearest neighbors of an
the application of multiple measures to the same principles in item of test data, S, is less or equal to K, i.e., S  K;
classification problems [67]. Wang et al. proposed a feature Point 3 Different test data will produce different S values.
extraction algorithm [68]. Its core idea is to draw the data of In the case of point 1 above, the number of times a near-
same class in its neighbors, and push data of different classes est neighbor is selected can be used to compute the weight
away from it as much as possible. It avoids the problem of of the nearest neighbor. This is a new way of setting weights
small sample size in traditional Linear Discriminant Analysis for samples. We will discuss how to make use of this case in
(LDA). In addition, it has also been extended to nonlinear detail in Section 5.
feature extraction through a kernel function. In relation to point 2, the set of selected shell nearest
Lately a very different approach to nearest neighbor neighbors of an item of test data is a subset of the K nearest
selection has been adopted, called the shell-KNN algorithm, neighbors. Note that, when the nearest neighbors of the test
as mentioned in Section 2 [16], [69]. It first selects the K near- data are further away, as shown in Fig. 5, the quadratic-
est neighbors using distance functions. Then, the shell near- selection approach may fail to locate them.
est neighbors are chosen from the K nearest neighbors. This For point 3, it delivers a fact that different test data can be
has therefore been described as quadratic-selection. set different K values. In the shell-KNN algorithm, all near-
Generally, there is an expectation that all the selected est neighbors are expected to distribute around the test
nearest neighbors for a test data will be ideally distributed data. Some points therefore need to be discarded from the
around the test data, as illustrated in Fig. 2. selected K nearest neighbors. In other words, the number of
Fig. 2 shows an ideal nearest neighbor distribution for shell-nearest neighbors should be less or equal to K for each
test data A. However, the collection of training samples for item of test data.
real applications is effectively random and the training sam-
ples often have different distributions. Consequently, the
nearest neighbors of a test data will not have an ideal distri- 4 NEAREST NEIGHBOR SEARCH
bution. Other potential cases are shown in Figs. 3, 4, and 5. It is often suggested in the literature that KNN classification
According to experiments conducted by Zhang [16], [69], is a lazy form of learning. It does not need to train any model
it is much more efficient to take nearest neighbors closely to fit the given training samples, beyond setting the K values.
distributed around test data A. It takes quadratic-selection This means that, for each item of test data, KNN classification
4668 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 34, NO. 10, OCTOBER 2022

has to search the whole training sample space to obtain the K


nearest neighbors. This is a time-consuming procedure that
weakens the range of possible applications for KNN algo-
rithms. Therefore, Samet designed an algorithm for finding
K nearest neighbors of a test data [70]. It adopted a pruning
technique with the maxnearestdist as distance upper bound.
For supporting the increasing functionalities of smart-
phones, it is important to fast locate one of the nearest objec-
tives for a customer. Therefore, there are recently some
reports on searching K approximate nearest neighbors. For
example, Jegou, Douze and Schmid proposed a product
quantization for nearest neighbor search [71]. The idea is to
decompose the space into a Cartesian product of low-dimen-
sional subspaces and to quantize each subspace separately.
Fig. 6. A K*Tree.
And then, many improved models have been developed for
spatiotemporal data. Li and Hu applied the product quantiza-
tion to develop an approximate nearest neighbor search algo- selected from different domains [80]. And then, they pro-
rithm for high order data [15]. They incorporated the high posed a nearest neighbor search method that achieves both
order structures of data into the process of designing a more high query efficiency and high recall empirically on major-
effective subspace decomposition way. Pan, et al. designed a ity of the datasets under a wide range of settings.
Product quantization with dual codebooks for approximate In view of the fact that KNN classification is an approxi-
nearest neighbor search [72]. It uses dual codebooks simulta- mate solution, effort has recently been undertaken to
neously to reduce the quantization error in each subspace. overcome the lazy aspects of KNN classification with an
Also, a grouping strategy was presented to group the data- improved approach to approximation that can achieve
base vectors by their encoding modes, and thus the extra good prediction efficiency and accuracy when compared
memory cost caused by dual codebooks can be reduced. Chiu, to standard KNN classification algorithms. This is known
et al. presented a ranking model with learning neighborhood as K*Tree [35]. K*Tree is not a classifier. It is just a particu-
relationships embedded in the index space [73]. In this model, lar kind of tree that has a range of useful information in its
the nearest neighbor probabilities are estimated by employing leaf nodes, to facilitate obtaining the K nearest neighbors
neural networks to characterize the neighborhood relation- for test data quickly. An example of a K*Tree is provided
ships, i.e., the density function of nearest neighbors with in Fig. 6.
respect to the query. Lu, et al. advocated an approximate near- K*Tree is an extension of the Ktree illustrated in Fig. 1b,
est neighbor search via virtual hypersphere partitioning [74]. with samples added to each leaf node. These samples
The idea is to impose a virtual hypersphere, centered at the include the K nearest neighbors of the data in a leaf node
query, in the original feature space and only examine points and the K nearest neighbors of the nearest neighbor of the
inside the hypersphere. Munoz, et al. developed a large scale data in the leaf node. In other words, in a K*Tree, each of
approximate nearest neighbor search for high dimensional the samples in its leaf nodes can be expressed as ðk; e1 ; < e11 ;
data [75]. They used a nearest neighbor graph created over e12 ; . . . ; e1k > ; e2 ; < e21 ; e22 ; . . . ; e2k > ; . . . ; ei ; < ei1 ; ei2 ; . . . ;
large collections. This graph is created based on the fusion of eik > ; . . . ; en < en1 ; en2 ; . . . ; enk > Þ, where, k is an integer
multiple hierarchical clustering results, where a minimum- value, < e1 ; e2 ; . . . ; ei ; . . . ; en > is a vector of the training
spanning-tree structure is used to connect all elements in a samples that have the same k value, and < ei1 ; ei2 ; . . . ; eik >
cluster. Malheiros and Walter constructed a data-partitioning is a vector of the k nearest samples of ei . However, there are
error-controlled strategy for approximate nearest neighbor many samples with the same K value, so, information has to
searching [76]. By changing the size of the candidate neigh- be added to the leaf nodes of a K*Tree sample by sample.
borhood, the precision and performance are balanced when K*Tree can be used in the following way. For an item
searching for neighbors. Tellez, et al. thought, for intrinsically of test data T, the K*Tree can first be searched to obtain
high-dimensional data, the only possible solution is to com- the nearest sample, ei , of T in a leaf node. The K value
promise and use approximate or probabilistic approaches in this leaf node can then be taken as the K value of T.
[77]. And then, they proposed a singleton indexes for nearest Finally, the K nearest samples of T can be searched for
neighbor search. Ozan, et al. designed a vector quantization just amongst the samples attached to the nearest sample
method for approximate nearest neighbor search which ena- ei , as follows:
bles faster and more accurate retrieval on publicly available
datasets [78]. The vector quantization is defined as a multiple ei ; < ei1 ; e12 ; . . . ; eik >
affine subspace learning problem and the quantization cent-
< e1i1 ; e2i2 ; . . . ; ei1i1 >
k
roids are explored on multiple affine subspaces. Dasgupta
and Sinha theoretically studied three Randomized Partition
< e1i2 ; e2i2 ; . . . ; ei2i2 >
k
Trees for Nearest Neighbor Search [79]. And then, they com- (1)
bined classical k-d tree partitioning with randomization and ..
overlapping cells. .
To compare these approximate nearest neighbor search < e1ik ; e2ik ; . . . ; eikik > :
k
algorithms, Li, et al. examined 16 representative algorithms
ZHANG: CHALLENGES IN KNN CLASSIFICATION 4669

This means that KNN classification using the K*Tree 5.2 The Weighting Classification Rule
approach does not need to search the whole training sample In the majority rule, the K nearest neighbors are implicitly
space, significantly enhancing its performance. assumed to have equal weight for any decision, regardless
Clearly, K*Tree is only an approximation of finding the K of their distance from the test data. This rule therefore
nearest neighbors of an item of test data. There are some adheres to the notion that it is conceptually preferable to
particular things to note about it: give different weights to the K nearest neighbors according
to their distance from the test data, with closer neighbors
P1 The test data’s nearest neighbor will definitely be having greater weight. The distance weighting-based classi-
included in the set of K nearest neighbors for the test fication rule is as follows:
data. Other K-1 nearest neighbors will be close enough
to the test data. X 1
Y ¼ arg max IðYi ¼ cÞ ; (3)
P2 The prediction efficiency of KNN classification using c2DomðY Þ X 2KNNðT Þ dðXi; XÞ
i
K*Tree is almost the same as that of KNN classifica-
tion using Ktree.
P3 KNN classification with K*Tree is much more robust where dðXi; XÞ is the distance between ðXi; YiÞ and the test
than KNN classification with Ktree and standard data T. dMAX ¼ MAXXi 2KNNðT Þ fdðXi; XÞg. The Eq. (2) can
KNN classification. be changed to Eq. (3) as follows:
From the above, it can be seen that K*Tree classification
X  
approaches are very different from traditional KNN classifi- dðXi; XÞ
Y ¼ arg max IðYi ¼ cÞ 1  : (4)
cation methods. KNN classification with K*Tree is not a lazy c2DomðY Þ X 2KNNðT Þ dMAX
i
learning approach because the K*Tree has to be trained before
predicting the test data. Its advantage is that there is no need A general weighting classification rule is as follows:
to search the whole sample space. However, there are still
two main limitations in K*Tree classification as follows. X
Y ¼ arg max wi  IðYi ¼ cÞ: (5)
c2DomðY Þ X 2KNNðT Þ
1) K*Tree classification provides only an approximate i

solution to a query. It is not clear how to estimate the


confidence of the answer? A kernel function classification rule is as follows:
2) It is not clear which tree is the best structure to X
store the K and the nearest neighbors for the K*Tree Y ¼ YiKðXi; XÞ: (6)
classification? Xi 2KNNðT Þ

5 CLASSIFICATION RULES where KðXi; XÞ is a kernel function. Eq. (5) is constructed


for a numerical value. If the decision attribute is of a certain
Once the K nearest neighbors of an item of test data have
character type, Yi can be replaced with the indictor function.
been selected from training samples, the class label of the
test data needs to be predicted, using a classification rule/
5.3 CF (Certainty Factor) Classification Rule
principle. In general, the most popularly-used rules for
There are numerous variations upon the above classification
KNN classification are the majority rule and its various
rules. However, neither of these classification rules work
forms of weighting. Recently, two classification rules have
well for the imbalanced datasets typical of real applications,
been proposed against datasets with imbalanced classes.
such as tumor diagnosis in medical tests or investments in
Apart from recalling these classification rules, some sugges-
the stock market. Generally, this is a challenge when the
tions are advocated to improve the classification rules of
data mining tasks are sensitive to cost or risk.
shell nearest neighbor classification in this section.
To deal with this issue, [81], [82] has proposed ways of
increasing the competitiveness of minor classes when
5.1 The Majority Classification Rule
undertaking imbalanced data classification, known as Cer-
This is a simple yet efficient approach to classification that
tainty Factor KNN (CF-KNN) classification. These are sum-
predicts the class of the test data according to the class of
marized below.
the majority of the K nearest neighbors.
The CF measure is incorporated into the KNN classifica-
For a training dataset, D, with n features and a decision
tion as follows. For a test sample T ¼ ðX; Y Þ, assume pðY ¼
attribute, let D ¼ fðXi;YiÞg; Xi ¼ ðXi1 ;Xi2 ; . . . ; Xin Þ, DomðY Þ ¼
cijDÞ is the ratio of ci in the training dataset, D, and pðY ¼
fc1; c2; . . . ; cmg be the domain of the decision attribute, Y .
cijKNNðT ÞÞ is the ratio of ci in the set of K nearest neigh-
One can obtain K nearest neighbors of a query (test data), T ¼
bors, KNNðT Þ. If pðY ¼ cijKNNðT ÞÞ  pðY ¼ cijDÞ, the CF
ðX; Y Þ, from the training dataset. KNNðT Þ is the set of these
is computed as follows:
K nearest neighbors. The majority rule for KNN classification
is as follows:
CF ðY ¼ ci; KNNðT ÞÞ
X pðY ¼ cijKNNðT ÞÞ  pðY ¼ cijDÞ (7)
Y ¼ arg max IðYi ¼ cÞ; (2) ¼ :
1  pðY ¼ cijDÞ
c2DomðY Þ Xi2KNNðT Þ

It means CF ðY ¼ ci; KNNðT ÞÞ > 0. If pðY ¼ cijKNNðT ÞÞ <


where IðÞ is an indicator function. pðY ¼ cijDÞ, the CF is computed as follows:
4670 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 34, NO. 10, OCTOBER 2022

TABLE 1
Information About the Downloaded Datasets

Datasets Number of Number of samples Dimensions Classes


samples after deleted some
OCCUDS 1,994 1,194 101 10
Chess 3,196 1,860 36 2
CNAE 1,080 696 856 9
German 1,000 762 20 2
Fig. 7. The shell nearest neighbors are not ideally distributed around test
Ionosphere 351 250 34 2
data A.
Isolet 1,560 936 617 2
Letter 20,000 11,984 16 26
CF ðY ¼ ci; KNNðT ÞÞ Segment 2,130 1,518 19 7
USPS 9,298 5,758 256 10
pðY ¼ cijKNNðT ÞÞ  pðY ¼ cijDÞ (8) Vehicle 846 512 18 4
¼ :
pðY ¼ cijDÞ Waveform 2,746 1,267 21 3
Yeast 1,484 945 1,470 10
Arcene 200 110 10,000 2
It means CF ðY ¼ ci; KNNðT ÞÞ < 0. Carcinom 174 99 9,182 11
The CF strategy for KNN classification can be defined as CLLSUB 111 61 11,340 3
follows:

Y ¼ arg max CF ðY ¼ ci; KNNðT ÞÞ: (9) five nearest neighbors are very close to test data A. This is a
1im
very interesting case when using KNN classification in data
mining applications. In this paper, we formally discuss this
According to the description of CF, CF ðY ¼ ci; KNNðT ÞÞ
case as follows.
will have a value in the range [1, 1]. If CF ðY ¼
Using the quadratic-selection rule, we can obtain 2n sam-
ci; KNNðT ÞÞ > 0, it will increase that the class of the query
ples, left(X1), right(X1), ... , left(Xn), right(Xn) from the set
should be predicted to be Y ¼ ci. If CF ðY ¼ ci; KNNðT ÞÞ <
KNNðT Þ, where left(Xi), right(Xi) are the left and right
0, however, it will decrease that the class of the query
nearest neighbors of the test data, T, with respect to the fea-
should be predicted to be Y ¼ ci. If CF ðY ¼ ci; KNNðT ÞÞ ¼
ture Xi, respectively. The majority rule of KNN classifica-
0, it will be the same as it is for the training set D that the
tion can be extended as follows:
class of the query should be predicted to be. In other words,
the class of T is undetermined for binary classification X
Y ¼ arg max
applications. c2DomðY Þ Xi2KNNðT Þ
(12)
5.4 Lift Classification Rule ðIðleftðXiÞ ¼ cÞ þ IðrightðXiÞ ¼ cÞÞ:
The above CF-KNN classification can be modified by using
the lift measure, also known as Lift-KNN classification as From Eq. (12), if a nearest neighbor is selected many
follows: times, it will be the winner. This is another way of address-
ing imbalanced classes with Case 1 (mentioned in Section 3).
pðC ¼ cijKNNðT ÞÞ For Fig. 7, let the label of B be c1, the label of C be c2, and the
LiftðY ¼ ci; KNNðT ÞÞ ¼ : (10) label of D, E and F be c3, and n =10. From Eq. (2), the label of
pðC ¼ cijDÞ
T is assigned to c3. From Eq. (12), the label of T is voted to
Clearly, LiftðY ¼ ci; KNNðT ÞÞ > 0. If the lift of a class c1. If we let the label of B be c1, the label of C, D, E and F be
is less than or equal to 1, the probability of the class is not c2. From Eq. (2), the label of T is predicted to be c2. In the
increased for the K nearest neighbors. However, if LiftðY ¼ case of Eq. (12), the label of T is not determined, due to the
ci; KNNðT ÞÞ > 1, the probability of the class is increased fact that c1 and c2 receive the same votes. To attack this
for the K nearest neighbors. In that case, the Lift-KNN strat- issue when mining imbalanced data, we can modify
egy for KNN classification can be defined as follows: Eq. (13) as follows:

Y ¼ arg max LiftðY ¼ ci; KNNðT ÞÞ: (11) Y ¼ arg maxfcountðXi; YiÞl IðYi ¼ cÞg; (13)
1im c2DomðY Þ

If LiftðY ¼ ci; KNNðT ÞÞ ¼ 1, the class of T is undeter- where the countðXi; YiÞ is the number of selected nearest
mined for binary classification applications. neighbors ðXi; YiÞ, and “l” is greater than 1.

5.5 Shell KNN Classification Rule 6 EXPERIMENTS


Recalling Fig. 2 in Section 3, shell nearest neighbors are well A large number of new ideas have been presented in this
distributed around the test data A. In real applications, the paper. For the sake of simplicity, just a few representatives
shell nearest neighbors may not be ideal, due to the fact that of the new classification rules in Section 5 will be evaluated
training examples are randomly collected. This case is illus- in this section. The classification accuracy of the new classi-
trated in Fig. 7 as follows. fication rules was compared with the majority classification
With the quadratic-selection rule, sample B is selected rule (referred to here as the standard KNN classification
many more times than samples C, D, E and F, although all rule) across 15 datasets, as shown in Table 1, below.
ZHANG: CHALLENGES IN KNN CLASSIFICATION 4671

Fig. 8. Classification accuracy for the original datasets using different K values.

6.1 Experimental Setting modified to meet the requirements of the evaluation. So,
The 15 datasets above were downloaded from the UCI 80 percent of the samples belonging to a certain class in the
machine learning library. They include 5 binary datasets and binary datasets were deleted, with all of the remaining samples
10 multi-class datasets. The 15 datasets needed to be slightly forming the new datasets. In the multi-class datasets, if the
4672 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 34, NO. 10, OCTOBER 2022

Fig. 9. Classification accuracy for the unbalanced datasets with different K values.

number of classifications was odd, we removed 80 percent of A set of experiments was conducted on the above datasets
the samples belonging to the odd-numbered classes (i.e., 1, 3, using the classification rules mentioned in Section 5. The pri-
5...). If the number of classifications was even, we removed 80 mary goal was to compare their performance with that of the
percent of the samples belonging to the even class (i.e., 2, 4, 6...). standard classification rule, but a further objective was to test
ZHANG: CHALLENGES IN KNN CLASSIFICATION 4673

TABLE 2
Binary Results on Imbalanced Data

Datasets Chess German Ionosphere Isolet Arcene


ACC SEN SPE ACC SEN SPE ACC SEN SPE ACC SEN SPE ACC SEN SPE
Majority 0.690 0.261 0.772 0.662 0.684 0.297 0.836 0.904 0.040 0.681 0.731 0.305 0.750 0.905 0.132
Weighting 0.729 0.174 0.853 0.897 0.972 0.027 0.875 0.971 0.012 0.745 0.858 0.179 0.741 0.881 0.181
CF 0.720 0.174 0.840 0.916 0.995 0.003 0.891 0.989 0.040 0.771 0.899 0.130 0.671 0.767 0.290
Lift 0.727 0.144 0.842 0.918 0.993 0.003 0.900 0.987 0.012 0.778 0.905 0.141 0.671 0.767 0.290
ShellKNN 0.821 0 1 0.921 1 0 0.900 1 0 0.833 1 0 0.800 1 0

TABLE 3
Average Classification Accuracy for Multiple Classifications in the Unbalanced Datasets

Datasets CCUDS CNAE Letter Segment USPS Vehicle Waveform Yeast Carcinom CLLSUB
Majority 13.02 13.65 4.71 19.43 15.32 31.05 49.86 48.99 20.61 74.92
Weighting 14.41 14.51 5.38 20.03 16.71 36.33 58.39 42.12 26.57 73.61
CF 15.13 18.68 5.70 20.01 17.16 38.65 54.46 48.99 19.90 56.07
Lift 15.69 18.53 5.82 21.61 16.92 39.45 60.93 49.00 17.58 54.92
ShellKNN 16.75 17.26 6.54 21.74 26.80 42.77 70.86 25.82 10.07 80.33

TABLE 4
Standard Deviation of the Classification Accuracy

Datasets CCUDS CNAE Letter Segment USPS Vehicle Waveform Yeast Carcinom CLLSUB
Majority 0.01 0.12 0.01 0.01 0.15 0.01 0.01 0 0.03 0.02
Weighting 0.02 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.03 0.03
CF 0.01 0.16 0.01 0.01 0.01 0.02 0.01 0 0.03 0.07
Lift 0.01 0.16 0.01 0.01 0.01 0.01 0.01 0 0.02 0.07
ShellKNN 0 0.04 0.01 0 0 0 0 0 0.01 0.01

their effect on the classification imbalances in the data. Each well. For other classification rules, there was little difference
dataset was first divided into a test set and a training set using in their performance on the original datasets, though the
10-fold cross-validation. Then, all the classification rules were weighted classification rules performed particularly poorly
examined using the original dataset (no sample deletion, i.e., on some datasets, such as Chess and Yeast.
unclassified and unbalanced data). This was to ensure
the experiment was using different K values. After this, all the 6.3 Unbalanced Datasets With Binary Classes
classification rules were tested using different K values on the The second set of experiments was conducted with the
unbalanced datasets. Finally, K = 5 was selected to perform 10 modified datasets where there were unbalanced binary clas-
experiments on the unbalanced datasets, to examine the aver- ses. The results are presented in Fig. 9.
age and variance of the classification accuracy. Fig. 9 shows the classification accuracy for all of the clas-
It should be noted that, for the binary dataset, we not sification rules for a class-unbalanced dataset, with Chess,
only obtained the classification accuracy (ACC), but also the German, Ionosphere and Isolet being the binary datasets.
sensitivity (SEN) and specificity (SPE) for the unbalanced For the German, Ionosphere and Isolet datasets, the perfor-
dataset. mance of the Majority classification rules and Weighted
classification rules was not very good because these two
6.2 KNN Classification of the Original Datasets classification rules do not consider the importance of small
The first set of experiments were conducted using the down- sample classes when dealing with class imbalances.
loaded datasets without any changes. The results are shown The CF, Lift and ShellKNN classification rules slightly
in Fig. 8, which presents the classification accuracy of the increased the competitiveness of the small classes in the
classification rules for the 15 original datasets with different unbalanced classifications. Thus, in most cases, their perfor-
K values. It can be seen that the accuracy of these classifica- mance was much better than the Majority and Weighted clas-
tion rules does not differ greatly for most of the datasets, sification rules.
especially in relation to the OCCUDS, CNAE, Isolet, Letter, Table 2 shows the ACC, SEN, and SPE results for the
Segments, Vehicle and Waveform datasets. The ShellKnn binary dataset. Here, it can be seen that the ShellKnn
classification rule performs the worst on the Yeast dataset, classification rule performed the best for the binary-class
but the best on the German, Ionosphere and USPS datasets. datasets, with the Majority classification rule being the
Overall, the ShellKnn classification rule performs pretty worst.
4674 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 34, NO. 10, OCTOBER 2022

6.4 Unbalanced Multi-Class Datasets [7] X. Zhu, S. Zhang, W. He, R. Hu, C. Lei, and P. Zhu, “One-step
multi-view spectral clustering,” IEEE Trans. Knowl. Data Eng.,
The third set of experiments were conducted using the vol. 31, no. 10, pp. 2022–2034, Oct. 2019.
modified datasets with multi-class imbalances. The results [8] X. Zhu, S. Zhang, Y. Li, J. Zhang, L. Yang, and Y. Fang, “Low-rank
for the average classification accuracy and variance across sparse subspace for spectral clustering,” IEEE Trans. Knowl. Data
the 10 experiments are presented in Tables 3 and 4. Eng., vol. 31, no. 8, pp. 1532–1543, Aug. 2019.
[9] S. Zhang, D. Cheng, Z. Deng, M. Zong, and X. Deng, “A novel kNN
With regard to the variance, all of the classification rules algorithm with data-driven k parameter computation,” Pattern Rec-
were stable, the variance was small and the results were ognit. Lett., vol. 109, pp. 44–54, 2018.
robust. For the average classification accuracy, the ShellKNN. [10] M. Tan, S. Zhang, and L. Wu, “Mutual kNN based spectral
clustering,” Neural Comput. Appl., vol. 32, no. 11, pp. 6435–6442,
Lift, and CF classification rules improved the average classifi-
2020.
cation accuracy by 2.74, 0.89, and 0.32 percent, respectively, in [11] S. Zhang, J. Zhang, X. Zhu, Y. Qin, and C. Zhang, “Missing value
relation to the Majority classification rule. In particular, t on imputation based on data clustering,” in Transactions on Computa-
For the USPS and Waveform datasets, the ShellKNN classifi- tional Science I. Berlin, Germany: Springer, 2008, pp. 128–138.
[12] S. Zhang, X. Li, M. Zong, X. Zhu, and D. Cheng, “Learning k for
cation rule improved the classification accuracy by 11.48 and kNN classification,” ACM Trans. Intell. Syst. Technol., vol. 8, no. 3,
21 percent, respectively. 2017, Art. no. 43.
Across the various unbalanced datasets, the ShellKnn [13] Y. Song, J. Huang, D. Zhou, H. Zha, and C. L. Giles, “IKNN: Infor-
classification rule generally performed well in comparison mative K-nearest neighbor pattern classification,” in Proc. Eur.
Conf. Princ. Data Mining Knowl. Discov., 2007, pp. 248–264.
to the other classification rules. [14] X. Zhu, S. Zhang, R. Hu, Y. Zhu, and J. Song, “Local and global
structure preservation for robust unsupervised spectral feature
selection,” IEEE Trans. Knowl. Data Eng., vol. 30, no. 3, pp. 517–529,
7 CONCLUSION Mar. 2018.
[15] L. Li and Q. Hu, “Optimized high order product quantization for
This paper has systemically reviewed the latest research approximate nearest neighbors search,” Front. Comput. Sci.,
regarding KNN classification that is addressed to its four most vol. 14, no. 2, pp. 259–272, 2020.
[16] S. Zhang, “Parimputation: From imputation and null-imputation
challenging issues. This paper has focused on introducing the to partially imputation,” IEEE Intell. Informat. Bull., vol. 9, no. 1,
main results recently established within our research group. pp. 32–38, Nov. 2008.
These can be distinguished from other extant approaches by [17] D. O. Loftsgaarden et al., “A nonparametric estimate of a multi-
their interest in delivering new research directions for KNN variate density function,” The Ann. Math. Statist., vol. 36, no. 3,
pp. 1049–1051, 1965.
classification. [18] G. S. Sebestyen, Decision-Making Processes in Pattern Recognition.
Although the approaches presented here are both effi- Indianapolis, IN, USA: Macmillan Publishing Co. 1962.
cient and promising, there are still some open issues that [19] T. Hastie and R. Tibshirani, “Discriminant adaptive nearest neigh-
bor classification and regression,” in Proc. 8th Int. Conf. Neural Inf.
require further research: Process. Syst., 1996, pp. 409–415.
1) How to set different K values for different kinds of [20] S. Singh, J. Haddon, and M. Markou, “Nearest neighbour strate-
gies for image understanding,” in Proc. Workshop Adv. Concepts
test data so that the results delivered by the KNN Intell. Vis. Syst., 1999, pp. 2–7.
classification algorithm will offer the best possible [21] J. Peng, D. R. Heisterkamp, and H. Dai, “LDA/SVM driven near-
performance whilst remaining robust. est neighbor classification,” in Proc. IEEE Comput. Soc. Conf. Com-
put. Vis. Pattern Recognit., 2001, vol. 1, pp. I–I.
2) How to establish the best tree structures for building [22] J. Chen and J. Shao, “Jackknife variance estimation for nearest-
KTree and K*Tree when a decision tree is not the best neighbor imputation,” J. Amer. Statist. Assoc., vol. 96, no. 453,
data structure for saving or rapidly searching for the K pp. 260–269, 2001.
values and the nearest neighbors of a leaf node. [23] C. Domeniconi and D. Gunopulos, “Adaptive nearest neighbor
classification using support vector machines,” in Proc. 14th Int.
3) How to make KNN classification efficient when min- Conf. Neural Inf. Process. Syst., 2002, pp. 665–672.
ing big data. [24] Y. Tao, D. Papadias, and X. Lian, “Reverse kNN search in arbi-
trary dimensionality,” in Proc. 30th Int. Conf. Very Large Data Bases,
2004, pp. 744–755.
ACKNOWLEDGMENTS [25] H. Zhu and O. Basir, “An adaptive fuzzy evidential nearest neighbor
This work was supported in part by the Natural Science formulation for classifying remote sensing images,” IEEE Trans. Geosci.
Remote Sens., vol. 43, no. 8, pp. 1874–1889, Aug. 2005.
Foundation of China under Grants 61836016 and 61672177. [26] Z. Minling and Z. Zhihua, “ML-KNN: A lazy learning approach to
multi-label learning,” Pattern Recognit., vol. 40, no. 7, pp. 2038–2048,
2007.
REFERENCES [27] Z. Qin, A. T. Wang, C. Zhang, and S. Zhang, “Cost-sensitive clas-
[1] E. Fix and J. L. Hodges Jr, “Discriminatory analysis-nonparametric sification with k-nearest neighbors,” in Proc. Int. Conf. Knowl. Sci.
discrimination: Small sample performance,” California Univ. Eng. Manage., 2013, pp. 112–131.
Berkeley, Tech. Rep., 1952. [28] H. Liu, X. Wu, and S. Zhang, “Neighbor selection for multilabel
[2] T. M. Cover and P. Hart, “Nearest neighbor pattern classification,” classification,” Neurocomputing, vol. 182, pp. 187–196, 2016.
IEEE Trans. Inf. Theory, vol. IT-13, no. 1, pp. 21–27, Jan. 1967. [29] A.-J. Gallego, J. Calvo-Zaragoza, J. J. Valero-Mas, and J. R. Rico-Juan,
[3] D. Wettschereck and T. G. Dietterich, “An experimental compari- “Clustering-based k-nearest neighbor classification for large-scale
son of the nearest-neighbor and nearest-hyperrectangle algo- data with neural codes representation,” Pattern Recognit., vol. 74,
rithms,” Mach. Learn., vol. 19, no. 1, pp. 5–27, 1995. pp. 531–543, 2018.
[4] Z. Lei, Y. Jiang, P. Zhao, and J. Wang, “News event tracking using [30] J. Gou, W. Qiu, Z. Yi, X. Shen, Y. Zhan, and W. Ou, “Locality con-
an improved hybrid of kNN and SVM,” in Proc. Int. Conf. Future strained representation-based k-nearest neighbor classification,”
Gener. Commun. Netw., 2009, pp. 431–438. Knowl.-Based Syst., vol. 167, pp. 38–52, 2019.
[5] “30 questions to test a data scientist on k-nearest neighbors [31] J. Gou, W. Qiu, Z. Yi, Y. Xu, Q. Mao, and Y. Zhan, “A local mean
(kNN) algorithm,” 2017. [Online]. Available: https://www. representation-based k-nearest neighbor classifier,” ACM Trans.
analyticsvidhya.com/blog/2017/09/30-questions-test-k-nearest- Intell. Syst. Technol., vol. 10, no. 3, pp. 1–25, 2019.
neighbors-algorithm/ [32] G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “KNN model-based
[6] X. Wu et al., “Top 10 algorithms in data mining,” Knowl. Inf. Syst., approach in classification,” in Proc. OTM Confederated Int. Conf.
vol. 14, no. 1, pp. 1–37, 2008. “On Move Meaningful Internet Syst.”, 2003, pp. 986–996.
ZHANG: CHALLENGES IN KNN CLASSIFICATION 4675

[33] B. Li, Y. W. Chen, and Y. Q. Chen, “The nearest neighbor algo- [61] K. Q. Weinberger, J. Blitzer, and L. Saul, “Distance metric learning
rithm of local probability centers,” IEEE Trans. Syst., Man, Cybern., for large margin nearest neighbor classification,” in Proc. 18th Int.
B Cybern., vol. 38, no. 1, pp. 141–154, Feb. 2008. Conf. Neural Inf. Process. Syst., 2005, vol. 18, pp. 1473–1480.
[34] Q. Yu, Y. Miche, A. Sorjamaa, A. Guillen, A. Lendasse, and [62] J. Goldberger, G. E. Hinton, S. Roweis, and R. R. Salakhutdinov,
E. Severin, “OP-KNN: Method and applications,” Advances Artif. “Neighbourhood components analysis,” in Proc. 17th Int. Conf.
Neural Syst., vol. 2010, 2010, Art. no. 1. Neural Inf. Process. Syst., 2004, vol. 17, pp. 513–520.
[35] S. Zhang, X. Li, M. Zong, X. Zhu, and R. Wang, “Efficient kNN clas- [63] T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka, “Distance-
sification with different numbers of nearest neighbors,” IEEE Trans. based image classification: Generalizing to new classes at near-
Neural Netw. Learn. Syst., vol. 29, no. 5, pp. 1774–1785, May 2018. zero cost,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 11,
[36] C. Domeniconi, J. Peng, and D. Gunopulos, “Locally adaptive pp. 2624–2637, Nov. 2013.
metric nearest-neighbor classification,” IEEE Trans. Pattern Anal. [64] J. V. Davis, B. Kulis, P. Jain, S. Sra, and I. S. Dhillon, “Information-
Mach. Intell., vol. 24, no. 9, pp. 1281–1285, Sep. 2002. theoretic metric learning,” in Proc. 24th Int. Conf. Mach. Learn.,
[37] P. Vincent and Y. Bengio, “K-local hyperplane and convex dis-
2007, pp. 209–216.
tance nearest neighbor algorithms,” in Proc. Int. Conf. Neural Inf. [65] B. Nguyen, C. Morell, and B. De Baets, “Supervised distance met-
Process. Syst., 2002, pp. 985–992. ric learning through maximization of the jeffrey divergence,” Pat-
[38] S. Salzberg, “A nearest hyperrectangle learning method,” Machi. tern Recognit., vol. 64, pp. 215–225, 2017.
Learn., vol. 6, no. 3, pp. 251–276, 1991. [66] A. Globerson and S. Roweis, “Metric learning by collapsing classes,” in
[39] C. Stanfill and D. Waltz, “Toward memory-based reasoning,”
Proc. 18th Int. Conf. Neural Inf. Process. Syst., 2005, vol. 18, pp. 451–458.
Commun. ACM, vol. 29, no. 12, pp. 1213–1228, 1986.
[40] E. Blanzieri and F. Ricci, “Probability based metrics for nearest [67] R. A. Fisher, “The use of multiple measurements in taxonomic
neighbor classification and case-based reasoning,” in Proc. Int. Conf. problems,” Ann. Eugenics, vol. 7, no. 2, pp. 179–188, 1936.
Case-Based Reasoning, 1999, pp. 14–28. [68] F. Wang and C. Zhang, “Feature extraction by maximizing the
[41] S. Zhang, “Nearest neighbor selection for iteratively kNN average neighborhood margin,” in Proc. IEEE Conf. Comput. Vis.
imputation,” J. Syst. Softw., vol. 85, no. 11, pp. 2541–2552, 2012. Pattern Recognit., 2007, pp. 1–8.
[42] H. Wang, “Nearest neighbors by neighborhood counting,” IEEE [69] S. Zhang, “Shell-neighbor method and its application in missing
Trans. Pattern Anal. Mach. Intell., vol. 28, no. 6, pp. 942–953, Jun. 2006. data imputation,” Appl. Intell., vol. 35, no. 1, pp. 123–133, 2011.
[43] P. Simard, Y. LeCun, and J. S. Denker, “Efficient pattern recogni- [70] H. Samet, “K-nearest neighbor finding using maxnearestdist,” IEEE
tion using a new transformation distance,” in Proc. Int. Conf. Neu- Trans. Pattern Anal. Mach. Intell., vol. 30, no. 2, pp. 243–252, Feb. 2008.
ral Inf. Process. Syst., 1993, pp. 50–58. [71] H. Jegou, M. Douze, and C. Schmid, “Product quantization for
[44] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object nearest neighbor search,” IEEE Trans. Pattern Anal. Mach. Intell.,
recognition using shape contexts,” IEEE Trans. Pattern Anal. Mach. vol. 33, no. 1, pp. 117–128, Jan. 2011.
Intell., vol. 24, no. 4, pp. 509–522, Apr. 2002. [72] Z. Pan, L. Wang, Y. Wang, and Y. Liu, “Product quantization with
[45] M. Varma and A. Zisserman, “A statistical approach to texture dual codebooks for approximate nearest neighbor search,” Neuro-
classification from single images,” Int. J. Comput. Vis., vol. 62, computing, vol. 401, pp. 59–68, 2020.
no. 1/2, pp. 61–81, 2005. [73] C.-Y. Chiu, A. Prayoonwong, and Y.-C. Liao, “Learning to index
[46] A. C. Berg, T. L. Berg, and J. Malik, “Shape matching and object rec- for nearest neighbor search,” IEEE Trans. Pattern Anal. Mach.
ognition using low distortion correspondences,” in Proc. IEEE Com- Intell., vol. 42, no. 8, pp. 1942–1956, Aug. 2020.
put. Soc. Conf. Comput. Vis. Pattern Recognit., 2005, vol. 1, pp. 26–33. [74] K. Lu, H. Wang, W. Wang, and M. Kudo, “VHP: Approximate
[47] J. Peng, D. R. Heisterkamp, and H. Dai, “Adaptive quasiconformal nearest neighbor search via virtual hypersphere partitioning,”
kernel nearest neighbor classification,” IEEE Trans. Pattern Anal. Proc. VLDB Endowment, vol. 13, no. 9, pp. 1443–1455, 2020.
Mach. Intell., vol. 26, no. 5, pp. 656–661, May 2004. [75] J. V. Munoz, M. A. Gonçalves, Z. Dias, and R. D. S. Torres,
[48] V. Athitsos, J. Alon, S. Sclaroff, and G. Kollios, “BoostMap: “Hierarchical clustering-based graphs for large scale approximate
An embedding method for efficient nearest neighbor retrieval,” IEEE nearest neighbor search,” Pattern Recognit., vol. 96, 2019,
Trans. Pattern Anal. Mach. Intell., vol. 30, no. 1, pp. 89–104, Jan. 2008. Art. no. 106970.
[49] Y.-S. Chen, Y.-P. Hung, T.-F. Yen, and C.-S. Fuh, “Fast and versa- [76] M. de Gomensoro Malheiros and M. Walter, “Spatial sorting: An
tile algorithm for nearest neighbor search based on a lower bound efficient strategy for approximate nearest neighbor searching,”
tree,” Pattern Recognit., vol. 40, no. 2, pp. 360–375, 2007. Comput. Graph., vol. 57, pp. 112–126, 2016.
[50] Y. Zhang, G. Cao, B. Wang, and X. Li, “A novel ensemble method [77] E. S. Tellez, G. Ruiz, and E. Chavez, “Singleton indexes for nearest
for k-nearest neighbor,” Pattern Recognit., vol. 85, pp. 13–25, 2019. neighbor search,” Inf. Syst., vol. 60, pp. 50–68, 2016.
[51] J. Gou, H. Ma, W. Ou, S. Zeng, Y. Rao, and H. Yang, “A general- [78] E. C. Ozan, S. Kiranyaz, and M. Gabbouj, “K-subspaces quantiza-
ized mean distance-based k-nearest neighbor classifier,” Expert tion for approximate nearest neighbor search,” IEEE Trans. Knowl.
Syst. Appl., vol. 115, pp. 356–372, 2019. Data Eng., vol. 28, no. 7, pp. 1722–1733, Jul. 2016.
[52] T. Mary-Huard and S. Robin, “Tailored aggregation for classi- [79] S. Dasgupta and K. Sinha, “Randomized partition trees for exact
fication,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 11, nearest neighbor search,” in Proc. 26th Annu. Conf. Learn. Theory,
pp. 2098–2105, Nov. 2009. 2013, pp. 317–337.
[53] S. Wold, K. Esbensen, and P. Geladi, “Principal component analy- [80] W. Li et al., “Approximate nearest neighbor search on high dimen-
sis,” Chemometrics Intell. Lab. Syst., vol. 2, no. 1/3, pp. 37–52, 1987. sional data-experiments, analyses, and improvement,” IEEE
[54] J. S. Sanchez, F. Pla, and F. J. Ferri, “On the use of neighbourhood- Trans. Knowl. Data Eng., vol. 32, no. 8, pp. 1475–1488, Aug. 2020.
based non-parametric classifiers,” Pattern Recognit. Lett., vol. 18, [81] S. Zhang, “KNN-CF approach: Incorporating certainty factor
no. 11/13, pp. 1179–1186, 1997. to kNN classification,” IEEE Intell. Informat. Bull., vol. 11, no. 1,
[55] J. L
opez and S. Maldonado, “Redefining nearest neighbor classifi- pp. 24–33, Dec. 2010.
cation in high-dimensional settings,” Pattern Recognit. Lett., vol. 110, [82] S. Zhang, “Cost-sensitive kNN classification,” Neurocomputing,
pp. 36–43, 2018. vol. 391, pp. 234–242, 2020.
[56] L. Feng, H. Wang, B. Jin, H. Li, M. Xue, and L. Wang, “Learning a dis-
tance metric by balancing KL-divergence for imbalanced datasets,” Shichao Zhang (Senior Member, IEEE) received
IEEE Trans. Syst., Man, Cybern., Syst., vol. 49, no. 12, pp. 2384–2395, Dec. the PhD degree from Deakin University, Australia.
2019. He is currently a china national distinguished pro-
[57] S. Mehta, X. Shen, J. Gou, and D. Niu, “A new nearest centroid fessor with the Central South University, China. His
neighbor classifier based on k local means using harmonic mean research interests include data mining and big
distance,” Inf., vol. 9, no. 9, 2018, Art. no. 234. data. He has published 90 international journal
[58] K. Syaliman, E. Nababan, and O. Sitompul, “Improving the accu- papers and more than 70 international conference
racy of k-nearest neighbor using local mean based and distance papers. He is a CI for 18 competitive national
weight,” J. Phys., Conf. Series, vol. 978, no. 1, 2018, Art. no. 012047. grants. He is a member of the ACM; and serves/
[59] L. Jiao, X. Geng, and Q. Pan, “BP k nn: k-nearest neighbor classi- served as an associate editor for four journals.
fier with pairwise distance metrics and belief function theory,”
IEEE Access, vol. 7, pp. 48 935–48 947, 2019.
[60] B. Nguyen and B. De Baets, “An approach to supervised distance " For more information on this or any other computing topic,
metric learning based on difference of convex functions pro- please visit our Digital Library at www.computer.org/csdl.
gramming,” Pattern Recognit., vol. 81, pp. 562–574, 2018.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy