0% found this document useful (0 votes)

87 views6 pages

A Neural Network Approach To Ordinal Regression

This document summarizes a neural network approach to ordinal regression proposed by the authors. It begins with background on ordinal regression and existing methods. The authors then describe their method, called NNRank, which generalizes the perceptron algorithm for ordinal regression using a multi-layer neural network. On several benchmark datasets, NNRank outperforms standard neural network classification and achieves performance comparable to support vector machines and Gaussian processes. NNRank has advantages of traditional neural networks like ability to handle large datasets and make rapid predictions.

Uploaded by

张拓

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views6 pages

A Neural Network Approach To Ordinal Regression

Uploaded by

张拓

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

A Neural Network Approach to Ordinal Regression

Jianlin Cheng, Zheng Wang, and Gianluca Pollastri

Abstract— Ordinal regression is an important type of [42], Bayesian hierarchical experts [32], binary classification
learning, which has properties of both classification and approach [16], [26] that decomposes the original ordinal
regression. Here we describe an effective approach to adapt regression problem into a set of binary classifications, and
a traditional neural network to learn ordinal categories.
Our approach is a generalization of the perceptron method the optimization of nonsmooth cost functions [6].
for ordinal regression. On several benchmark datasets, our Most of these methods can be roughly classified into two
method (NNRank) outperforms a neural network classification categories: pairwise constraint approach [22], [24], [15], [5]
method. Compared with the ordinal regression methods using and multi-threshold approach [14], [38], [9]. The former
Gaussian processes and support vector machines, NNRank is to convert the full ranking relation into pairwise order
achieves comparable performance. Moreover, NNRank has
the advantages of traditional neural networks: learning in constraints. The latter tries to learn multiple thresholds to di-
both online and batch modes, handling very large training vide data into ordinal categories. Multi-threshold approaches
datasets, and making rapid predictions. These features make also can be unified under the general, extended binary
NNRank a useful and complementary tool for large-scale classification framework [26].
data mining tasks such as information retrieval, web page The ordinal regression methods have different advantages
ranking, collaborative filtering, and protein ranking in
Bioinformatics. The neural network software is available at: and disadvantages. Prank [14], a perceptron approach that
http://www.cs.missouri.edu/∼ chengji/cheng software.html. generalizes the binary perceptron algorithm to the ordinal
multi-class situation, is a fast online algorithm. However,
like a standard perceptron method, its accuracy suffers when
I. I NTRODUCTION
dealing with non-linear data, while a quadratic kernel version
Rdinal regression (or ranking learning) is an important
O supervised problem of learning a ranking or ordering
of Prank greatly relieves this problem. One class of accurate
large-margin classifier approaches [22], [24] convert the
on instances, which has the property of both classification ordinal relations into O(n2 ) (n: the number of data points)
and metric regression. The learning task of ordinal regression pairwise ranking constraints for the structural risk minimiza-
is to assign data points into a set of finite ordered categories. tion [39], [36]. Thus, it can not be applied to medium size
For example, a teacher rates students’ performance using datasets (> 10,000 data points), without discarding some
A, B, C, D, and E (A > B > C > D > E) [9]. Ordinal pairwise preference relations. It may also overfit noise due
regression is different from classification due to the order to incomparable pairs.
of categories. In contrast to metric regression, the response The other class of powerful large-margin classifier meth-
variables (categories) in ordinal regression is discrete and ods [38], [11] generalize the support vector formulation for
finite. ordinal regression by finding K − 1 thresholds on the real
The research of ordinal regression dates back to the ordinal line that divide data into K ordered categories. The size of
statistics methods in 1980s [28], [29] and machine learning this optimization problem is linear in the number of training
research in 1990s [7], [20], [13]. It has attracted the consider- examples. However, like support vector machine used for
able attention in recent years due to its potential applications classification, the prediction speed is slow when the solution
in many data-intensive domains such as information retrieval is not sparse, which makes it not appropriate for time-critical
[20], web page ranking [24], collaborative filtering [18], tasks. Similarly, another state-of-the-art approach, Gaussian
[3], [41], image retrieval [40], and protein ranking [8] in process method [9], also has the difficulty of handling large
Bioinformatics. training datasets and the problem of slow prediction speed
A number of machine learning methods have been de- in some situations.
veloped or redesigned to address ordinal regression problem Here we describe a new neural network approach for
[33], including perceptron [14] and its kernelized general- ordinal regression that has the advantages of neural network
ization [3], neural network with gradient descent [7], [5], learning: learning in both online and batch mode, training on
Gaussian process [10], [9], [37], large margin classifier (or very large dataset [5], handling non-linear data, good perfor-
support vector machine) [21], [22], [24], [38], [11], [2], [12], mance, and rapid prediction. Our method can be considered
k-partite classifier [1], boosting algorithm [17], [15], con- a generalization of the perceptron learning [14] into multi-
straint classification [19], regression trees [25], Naive Bayes layer perceptrons (neural network) for ordinal regression. Our
Jianlin Cheng and Zheng Wang are with the computer science depart- method is also related to the classic generalized linear models
ment and informatics institute, University of Missouri, Columbia, MO (e.g., cumulative logit model) for ordinal regression [28].
65211, USA (email: chengji@missouri.edu, zwyw6@missouri.edu); Gian- Unlike the neural network method [5] trained on pairs of
luca Pollastri is with the school of computer science and informatics,
University of College Dublin, Belfield, Dublin 4, Ireland (email: gian- examples to learn pairwise order relations, our method works
luca.pollastri@ucd.ie). on individual data points and uses multiple output nodes to

1279
978-1-4244-1821-3/08/$25.002008
c IEEE

Authorized licensed use limited to: UNIVERSITAETSBIBL STUTTGART. Downloaded on February 08,2023 at 22:12:39 UTC from IEEE Xplore. Restrictions apply.
estimate the probabilities of ordinal categories. Thus, our the number of dimensions of input feature vector x and K
method falls into the category of multi-threshold approach. output nodes corresponding to K ordinal categories. There
The learning of our method proceeds similarly as traditional can be one or more hidden layers. Without loss of generality,
neural networks using back-propagation [35]. we use one hidden layer to construct a standard two-layer
On the same benchmark datasets, our method yields the feedforward neural network. Like a standard neural network
performance better than the standard classification neural for classification, input nodes are fully connected with hidden
networks and comparable to the state-of-the-art methods nodes, which in turn are fully connected with output nodes.
using support vector machines and Gaussian processes. In Likewise, the transfer function of hidden nodes can be linear
addition, our method can learn on very large datasets and function, sigmoid function, and tanh function that is used in
make rapid predictions. our experiment. The only difference from traditional neural
network lies in the output layer. Traditional neural networks
−zi
II. M ETHOD use softmax PKe e−zi (or normalized exponential function)
i=1
A. Formulation for output nodes, satisfying the constraint that the sum of
Let D represent an ordinal regression dataset consisting of outputs K i=1 oi is 1. zi is the net input to the output node
n data points (x, y) , where x ∈ Rd is an input feature vector Oi .
and y is its ordinal category from a finite set Y . Without loss In contrast, each output node Oi of our neural network
of generality, we assume that Y = 1, 2, ..., K with ”<” as uses a standard sigmoid function 1+e1−zi , without including
order relation. the outputs from other nodes, as shown in Figure 1. Output
For a standard classification neural network without con- node Oi is used to estimate the probability oi that a data
sidering the order of categories, the goal is to predict the point belongs to category i independently, without subjecting
probability of a data point x belonging to one category k to normalization as traditional neural networks do. Thus,
(y = k). The input is x and the target of encoding the for a data point x of category k, the target vector is
category k is a vector t = (0, ..., 0, 1, 0, ..., 0), where only the (1, , 1, .., 1, 0, 0, 0), in which the first k elements is 1 and
element tk is set to 1 and all others to 0. The goal is to learn others 0. This sets the target value of output nodes Oi (i ≤ k)
a function to map input vector x to a probability distribution to 1 and Oi (i > k) to 0. The targets instruct the neural
vector o = (o1 , o2 , ...ok , ...oK ), where ok is closer to 1 and network to adjust weights to produce probability outputs as
other close as possible to the target vector. It is worth pointing out
K elements are close to zero, subject to the constraint that using independent sigmoid functions for output nodes
i=1 oi = 1.
In contrast, like the perceptron approach [14], our neural does not guaranteed the monotonic relation (o1 >= o2 >=
network approach considers the order of the categories. ... >= oK ), which is not necessary but, desirable for making
If a data point x belongs to category k, it is classified predictions [26]. A more sophisticated approach is to impose
automatically into lower-order categories (1, 2, ..., k − 1) as the inequality constraints on the outputs to improve the
well. So the target vector of x is t = (1, 1, .., 1, 0, 0, 0), performance.
where ti (1 ≤ i ≤ k) is set to 1 and other elements Training of the neural network for ordinal regression
zeros, as shown in Figure 1. Thus, the goal is to learn a proceeds very similarly as standard neural networks. The
function to map the input vector x to a probability vector cost function for a data point x can be relative entropy
o = (o1 , o2 , ..., ok , ...oK ), where oi (i ≤ k) is close to or square error between the target vector and the output
1 and oi (i ≥ k) is close to 0. K vector. For relative K entropy, the cost function for output
i=1 oi is the estimate
nodes is fc = i=1 (ti log oi + (1 − ti ) log(1 − oi )). For
of number of categories (i.e. k) that x belongs to, instead K 2
of 1. The formulation of the target vector is similar to the square error, the error function is fc = i=1 (ti − oi ) .
perceptron approach [14]. It is also related to the classical Previous studies [34] on neural network cost functions show
cumulative probit model for ordinal regression [28], in the that relative entropy and square error functions usually yield
sense that we can consider the output probability vector very similar results. In our experiments, we use square error
(o1 , ...ok , ...oK ) as a cumulativePprobability distribution on function and standard back-propagation to train the neural
K
oi network. The errors are propagated back to output nodes,
categories (1, ..., k, ..., K), i.e., i=1K is the proportion of
and from output nodes to hidden nodes, and finally to input
categories that x belongs to, starting from category 1.
nodes.
The target encoding scheme of our method is related to
Since the transfer function ft of output node Oi is the
but, different from multi-label learning [4] and multiple label
independent sigmoid function 1+e1−zi , the derivative of ft of
learning [23] because our method imposes an order on the e−zi 1 1
labels (or categories). output node Oi is ∂f ∂zi = (1+e−zi )2 = 1+e−zi (1 − 1+e−zi )
t

= oi (1 − oi ). Thus, the net error propagated to output node

ti −oi
B. Learning Oi is ∂f c ∂ft
∂oi ∂zi = oi (1−oi ) × oi (1 − oi ) = ti − oi for relative
Under the formulation, we can use the almost exactly same entropy cost function, ∂f c ∂ft
∂oi ∂zi = −2(ti − oi ) × oi (1 − oi ) =
neural network machinery for ordinal regression. We con- −2oi (ti − oi )(1 − oi ) for square error cost function. The
struct a multi-layer neural network to learn ordinal relations net errors are propagated through neural networks to adjust
from D. The neural network has d inputs corresponding to weights using gradient descent as traditional neural networks

1280 2008 International Joint Conference on Neural Networks (IJCNN 2008)

Authorized licensed use limited to: UNIVERSITAETSBIBL STUTTGART. Downloaded on February 08,2023 at 22:12:39 UTC from IEEE Xplore. Restrictions apply.
number of epochs, and the learning rate. We create a grid
for these three parameters, where the hidden unit num-
ber is in the range [1..15], the epoch number in the set
(50, 200, 500, 1000, 1500, 2000), and the initial learning rate
in the range [0.01..0.5]. During the training, the learning
rate is halved if training errors continuously go up for a
pre-defined number (40, 60, 80, or 100) of epochs. For
experiments on each data split, the neural network parameters
are fully optimized on the training data without using any test
data.
For each experiment, after the parameters are optimized on
the training data, we train five models on the training data
with the optimal parameters, starting from different initial
weights. The ensemble of five trained models are then used
to estimate the generalized performance on the test data. That
is, the average output of five neural network models is used
to make predictions.
Fig. 1. Comparison between a standard classification neural network and We evaluate our method using zero-one error and mean
an ordinal regression neural network. Without loss of generality, the neural absolute error as in [9]. Zero-one error is the percentage
networks are assumed to have one hidden layer and one output layer with
four output nodes. For a data point in category three, the target vector of of wrong assignments of ordinal categories. Mean absolute
the standard neural network is (0, 0, 1, 0), while the target vector of the error is the root mean square difference between assigned
ordinal regression neural network is (1, 1, 1, 0). The transfer function of categories (k ) and true categories (k) of all data points. For
output node i of the standard neural network is the normalized exponential
−zi
function PKe −zi . In contrast, the ordinal regression neural network uses
each dataset, the training and evaluation process is repeated
e
i=1
1
20 times on 20 data splits. Thus, we compute the average
the sigmoid function .
1+e−zi error and the standard deviation of the two metrics as in [9].

B. Comparison with Neural Network Classiﬁcation on Stan-

do. dard Bencharmks
Despite the small difference in the transfer function and
the computation of its derivative, the training of our method We first compare our method (NNRank) with a standard
is the same as traditional neural networks. The network can neural network classification method (NNClass). We imple-
be trained on data in the online mode where weights are ment both NNRank and NNClass using C++. NNRank and
updated per example, or in the batch mode where weights NNClass share most code with minor difference in the trans-
are updated per bunch of examples. fer function of output nodes and its derivative computation
as described in Section II-B.
C. Prediction As Table I shows, NNRank outperforms NNClass in all
In the test phase, to make a prediction, our method scans but one case in terms of both the mean-zero error and the
output nodes in the order O1 , O2 , ..., OK . It stops when the mean absolute error. And on some datasets the improvement
output of a node is smaller than the predefined threshold T of NNRank over NNClass is sizable. For instance, on the
(e.g., 0.5) or no nodes left. The index k of the last node Ok Stock and Pyrimidines datasets, the mean zero-one error of
whose output is bigger than T is the predicted category of NNRank is about 4% less than NNClass; on four datasets
the data point. (Stock, Pyrimidines, Triazines, and Diabetes) the mean ab-
solute error is reduced by about .05. The results show that
III. E XPERIMENTS AND R ESULTS the ordinal regression neural network consistently achieves
A. Standard Benchmark Data and Evaluation Metric the better performance than the standard classification neural
We test our method on eight benchmark datasets for network.
ordinal regression [9]. The eight datasets (Diabetes, Pyrim-
idines, Triazines, Machine CUP, Auto MPG, Boston, Stocks C. Comparison with Neural Network Classification on a
Domain, and Abalone) are originally used for metric re- Protein Similarity Dataset
gression. Chu and Ghahramani [9] discretized the real-value To further verify the effectiveness of neural network ordi-
targets into five equal intervals, corresponding to five ordinal nal regression approach, we evaluate NNRank and NNClass
categories. The authors randomly split each dataset into on a protein similarity dataset - a real ordinal regression
training/test datasets and repeated the partition 20 times dataset used in protein fold recognition [8]. Each data point
independently. We use the exactly same partitions as in [9] in the dataset corresponds to a protein pair (a query protein
to train and test our method. and a template protein). The data points in the dataset are
We use the online mode to train neural networks. The classified into three ordinal similarity categories (fold <
parameters to tune are the number of hidden units, the super family < family), depending on the similarity levels

2008 International Joint Conference on Neural Networks (IJCNN 2008) 1281

Authorized licensed use limited to: UNIVERSITAETSBIBL STUTTGART. Downloaded on February 08,2023 at 22:12:39 UTC from IEEE Xplore. Restrictions apply.
of the protein pairs. The category of each data point was like the perceptron approach [14], our method can learn
assigned by biologists [31]. in both batch and online mode. The online learning ability
A data point representing a query-template protein pair makes our method a good tool for adaptive learning in the
is labeled as fold if the two proteins have similar tertiary real-time. The multi-layer structure of neural network and
structures but do not have evolutionary relationship; super the non-linear transfer function give our method the stronger
family if they have similar structures and weak revolutionary fitting ability than perceptron methods.
relationship; and family if they have similar structures and Second, the neural network can be trained on very large
strong revolutionary relationship. Each data point has 62 datasets iteratively, while training is more complex than
features, corresponding to specific criteria used to measure support vector machines and Gaussian processes. Since the
the similarities between a query and a template protein. training process of our method is the same as traditional
The data points are splitted into a training dataset con- neural networks, average neural network users can use this
sisting of 6018 data points (2910 in fold, 1810 in super method for their tasks.
family, and 1298 in family) and a test dataset containing Third, neural network method can make rapid prediction
747 data points (395 in fold, 166 in super family, and 186 once models are trained. The ability of learning on very
in family). Both NNRank and NNClass are trained on the large dataset and predicting in time makes our method a
training dataset and evaluated on the test dataset. useful and competitive tool for ordinal regression tasks,
The mean zero one error and mean absolute error of particularly for time-critical and large-scale ranking prob-
NNRank are 23.96% and 0.258, respectively. The mean zero lems in information retrieval, web page ranking, collab-
one error and mean absolute error of NNClass are 25.03% orative filtering, and the emerging fields of Bioinformat-
and 0.277, respectively. The mean zero one error of NNRank ics. To facilitate the application of this new approach,
is 1.1% lower than NNClass. The mean absolute error of we make both NNRank and NNClass to accept a gen-
NNRank is 0.019 less than NNClass. The experiment shows eral input format and freely available to the community at
that NNRank performs better than NNClass on a large, real http://www.cs.missouri.edu/∼ chengji/cheng software.html.
ordinal regression dataset. There are some directions to further improve the neural
network (or multi-layer perceptron) approach for ordinal
D. Comparison with Gaussian Processes and Support Vector
regression. One direction is to design a transfer function
Machines on Standard Benchmarks
to ensure the monotonic decrease of the outputs of the
To further evaluate the performance of our method, we neural network; the other direction is to derive the general
compare NNRank with two Gaussian process methods (GP- error bounds of the method under the binary classification
MAP and GP-EP) [9] and a support vector machine method framework [26]. Furthermore, the other flavors of imple-
(SVM) [38] implemented in [9]. The results of the three mentations of the multi-threshold multi-layer perceptron ap-
methods are quoted from [9]. Table II reports the zero-one proach for ordinal regression are possible. Since machine
error on the eight datasets. NNRank achieves the best results learning ranking is a fundamental problem that has wide
on Diabetes, Triazines, and Abalone, GP-EP on Pyrimidines, applications in many diverse domains such as web page
Auto MPG, and Boston, GP-MAP on Machine, and SVM on ranking, information retrieval, image retrieval, collaborative
Stocks. filtering, bioinformatics and so on, we believe the further
Table III reports the mean absolute error on the eight exploration of the neural network (or multi-layer perceptron)
datasets. NNRank yields the best results on Diabetes and approach for ranking and ordinal regression is worthwhile.
Abalone, GP-EP on Pyrimidines, Auto MPG, and Boston,
GP-MAP on Triazines and Machine, SVM on Stocks. R EFERENCES
In summary, on the eight datasets, the performance of [1] S. Agarwal and D. Roth. Learnability of bipartite ranking functions.
NNRank is comparable to the three state-of-the-art methods In Proc. of the 18th annual conference on learning theory (COLT-05).
2005.
for ordinal regression. [2] F. Aiolli and A. Sperduti. Learning preferences for multiclass
problems. In Advances in Neural Information Processing Systems 17
IV. D ISCUSSION AND F UTURE W ORK (NIPS). 2004.
We have described a novel approach to adapt traditional [3] J. Basilico and T. Hofmann. Unifying collaborative and content-based
filtering. In Proceedings of the twenty-first international conference
neural networks for ordinal regression. Our neural network on machine learning (ICML), page 9. ACM press, New York, USA,
approach can be considered a generalization of one-layer 2004.
perceptron approach [14] into multi-layer. On the standard [4] C. Bishop. Neural Networks for Pattern Recognition. Oxford
University Press, USA, 1996.
benchmark of ordinal regression, our method outperforms [5] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds,
standard neural networks used for classification. Further- N. Hamilton, and G. Hullender. Learning to rank using gradient
more, on the same benchmark, our method achieves the descent. In Proc. of Internaltional Conference on Machine Learning
(ICML-05), pages 89–97. 2005.
similar performance as the two state-of-the-art methods (sup- [6] C.J.C. Burges, R. Ragno, and Q. V. Le. Learning to rank with nons-
port vector machines and Gaussian processes) for ordinal mooth cost functions. In Advances in Neural Information Processing
regression. Systems (NIPS) 20. MIT press, Cambridge, MA, 2006.
[7] R. Caruana, S. Baluja, and T. Mitchell. Using the future to sort out the
Compared with existing methods for ordinal regression, present: Rankprop and multitask learning for medical risk evaluation.
our method has several advantages of neural networks. First, In Advances in neural information processing systems 8 (NIPS). 1996.

1282 2008 International Joint Conference on Neural Networks (IJCNN 2008)

Authorized licensed use limited to: UNIVERSITAETSBIBL STUTTGART. Downloaded on February 08,2023 at 22:12:39 UTC from IEEE Xplore. Restrictions apply.
[8] J. Cheng and P. Baldi. A machine learning information retrieval [34] M.D. Richard and R.P. Lippman. Neural network classifiers estimate
approach to protein fold recognition. Bioinformatics, 22:1456–1463, bayesian a-posteriori probabilities. Neural Computation, 3:461–483,
2006. 1991.
[9] W. Chu and Z. Ghahramani. Gaussian processes for ordinal regression. [35] D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning Internal
Journal of Machine Learning Research, 6:1019–1041, 2005. Representations by Error Propagation. In D. E. Rumelhart and J. L.
[10] W. Chu and Z. Ghahramani. Preference learning with Gaussian McClelland, editors, Parallel Distributed Processing: Explorations in
processes. In Proc. of International Conference on Machine Learning the Microstructure of Cognition. Vol. I: Foundations, pages 318–362.
(ICML-05), pages 137–144. 2005. Bradford Books/MIT Press, Cambridge, MA., 1986.
[11] W. Chu and S.S. Keerthi. New approaches to support vector ordinal [36] B. Schölkopf and A.J. Smola. Learning with Kernels, Support Vector
regression. In Proc. of International Conference on Machine Learning Machines, Regularization, Optimization and Beyond. MIT University
(ICML-05), pages 145–152. 2005. Press, Cambridge, MA, 2002.
[12] W. Chu and S.S. Keerthi. Support vector ordinal regression. Neural [37] A. Schwaighofer, V. Tresp, and K. Yu. Hiearachical bayesian mod-
Computation, 19(3), 2007. elling with gaussian processes. In Advances in Neural Information
[13] W. W. Cohen, R. E. Schapire, and Y. Singer. Learning to order things. Processing Systems 17 (NIPS). MIT press, 2005.
Journal of Artificial Intelligence Research, 10:243–270, 1999. [38] A. Shashua and A. Levin. Ranking with large margin principle: two
[14] K. Crammer and Y. Singer. Pranking with ranking. In Advances in approaches. In Advances in Neural Information Processing Systems
Neural Information Processing Systems (NIPS) 14, pages 641–647. 15 (NIPS). 2003.
MIT press, Cambridge, MA, 2002. [39] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag,
[15] O. Dekel, J. Keshet, and Y. Singer. Log-linear models for label ranking. Berlin, Germany, 1995.
In Proc. of the 21st international conference on machine learning [40] H. Wu, H. Lu, and S. Ma. A practical svm-based algorithm for ordinal
(ICML-06), pages 209–216. 2004. regression in image retrieval. pages 612–621, 2003.
[16] E. Frank and M. Hall. A simple approach to ordinal classification. In [41] S. Yu, K. Yu, V. Tresp, and H. P. Kriegel. Collaborative ordinal
Proc. of the European Conference on Machine Learning. 2001. regression. In Proc. of 23rd international conference on machine
[17] Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer. An efficient boosting learning, pages 1089–1096. 2006.
algorithm for combining preferences. Journal of Machine Learning [42] H. Zhang, L. Jiang, and J. Su. Augmenting naive bayes for ranking.
Research, 4:933–969, 2003. In International Conference on Machine Learning (ICML-05). 2005.
[18] D. Goldberg, D. Nichols, B. Oki, and D. Terry. Using collaborative
filtering to weave an information tapestry. Communications of the
ACM, 35:61–70, 1992.
[19] S. Har-Peled, D. Roth, and D. Zimak. Constraint classification: a
new approach to multiclass classification and ranking. In Advances in
Neural Information Processing Systems 15 (NIPS). 2002.
[20] R. Herbrich, T. Graepel, P. Bollmann-Sdorra, and K. Obermayer.
Learning preference relations for information retrieval. In Proc. of
ICML workshop on text categorization and machine learning, pages
80–84. 1998.
[21] R. Herbrich, T. Graepel, and K. Obermayer. Support vector learning
for ordinal regression. In Proc. of 9th International Conference on
Artificial Neural Networks (ICANN), pages 97–102. 1999.
[22] R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank bound-
aries for ordinal regression. In A. J. Smola, P. Bartlett, B. Scholkopf,
and D. Schuurmans, editors, Advances in Large Margin Classifiers,
pages 115–132. MIT Press, Cambridge, MA, 2000.
[23] R. Jin and Z. Ghahramani. Learning with multiple labels. In Advances
in Neural Information Processing Systems (NIPS) 15. MIT press,
Cambridge, MA, 2003.
[24] I. Joachims. Optimizing search engines using clickthrough data. In
David Hand, Daniel Keim, and Raymond NG, editors, Proc. of 8th
ACM SIGKDD International conference on knowledge discovery and
data mining, pages 133–142. 2002.
[25] S. Kramer, G. Widmer, B. Pfahringer, and M. DeGroeve. Prediction
of ordinal classes using regression trees. Fundamenta Informaticae,
47:1–13, 2001.
[26] L. Li and H. Lin. Ordinal regression by extended binary classification.
In Advances in Neural Information Processing Systems (NIPS) 20.
MIT press, Cambridge, MA, 2006.
[27] D. J. C. MacKay. A practical bayesian framework for back propagation
networks. Neural Computation, 4:448–472, 1992.
[28] P. McCullagh. Regression models for ordinal data. Journal of the
Royal Statistical Society B, 42:109–142, 1980.
[29] P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman
and Hall, London, 1983.
[30] T. P. Minka. A family of algorithms for approximate bayesian
inference. PhD Thesis, Massachusetts Institute of Technology, 2001.
[31] A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia. SCOP:
A structural classification of proteins database for the investigation of
sequences and structures. J. Mol. Biol., 247:536–540, 1995.
[32] U. Paquet, S. Holden, and A. Naish-Guzman. Bayesian hierarchical
ordinal regression. In Proc. of the international conference on artifical
neural networks. 2005.
[33] S. Rajaram, A. Garg, X.S. Zhou, and T.S. Huang. Classification
approach towards ranking and sorting problems. In Machine Learning:
ECML 2003, vol. 2837 of Lecture Notes in Artificail Intelligence (N.
Lavrac, D. gamberger, H. Blockeel and L. Todorovski eds.), pages
301–312. Springer-Verlag, 2003.

2008 International Joint Conference on Neural Networks (IJCNN 2008) 1283

Authorized licensed use limited to: UNIVERSITAETSBIBL STUTTGART. Downloaded on February 08,2023 at 22:12:39 UTC from IEEE Xplore. Restrictions apply.
Mean zero-one error Mean absolute error
Dataset NNRank NNClass NNRank NNClass
Stocks 12.68±1.8% 16.97± 2.3% 0.127±0.01 0.173±0.02
Pyrimidines 37.71±8.1% 41.87±7.9% 0.450±0.09 0.508±0.11
Auto MPG 27.13±2.0% 28.82±2.7% 0.281±0.02 0.307±0.03
Machine 17.03±4.2% 17.80±4.4% 0.186±0.04 0.192±0.06
Abalone 21.39±0.3% 21.74± 0.4% 0.226±0.01 0.232±0.01
Triazines 52.55±5.0% 52.84±5.9% 0.730±0.06 0.790±0.09
Boston 26.38±3.0% 26.62±2.7% 0.295±0.03 0.297±0.03
Diabetes 44.90±12.5% 43.84±10.0% 0.546±0.15 0.592±0.09

TABLE I
T HE RESULTS OF NNR ANK AND NNC LASS ON THE EIGHT DATASETS. T HE RESULTS ARE THE AVERAGE ERROR OVER 20 TRIALS ALONG WITH THE
STANDARD DEVIATION .

Data NNRank SVM GP-MAP GP-EP

Triazines 52.55±5.0% 54.19±1.5% 52.91±2.2% 52.62±2.7%
Pyrimidines 37.71±8.1% 41.46±8.5% 39.79±7.2% 36.46±6.5%
Diabetes 44.90±12.5% 57.31±12.1% 54.23±13.8% 54.23±13.8%
Machine 17.03±4.2% 17.37±3.6% 16.53±3.6% 16.78±3.9%
Auto MPG 27.13±2.0% 25.73±2.2% 23.78±1.9% 23.75±1.7%
Boston 26.38±3.0% 25.56±2.0% 24.88±2.0% 24.49±1.9%
Stocks 12.68±1.8% 10.81±1.7% 11.99±2.3% 12.00±2.1%
Abalone 21.39±0.3% 21.58±0.3% 21.50±0.2% 21.56±0.4%

TABLE II
Z ERO - ONE ERROR OF NNR ANK , SVM, GP-MAP, AND GP-EP ON THE EIGHT DATASETS. SVM DENOTES THE SUPPORT VECTOR MACHINE METHOD
[38], [9]. GP-MAP AND GP-EP ARE TWO G AUSSIAN PROCESS METHODS USING L APLACE APPROXIMATION [27] AND EXPECTATION PROPAGATION
[30] RESPECTIVELY [9]. T HE RESULTS ARE THE AVERAGE ERROR OVER 20 TRIALS ALONG WITH THE STANDARD DEVIATION. W E USE BOLDFACE TO
DENOTE THE BEST RESULTS .

Data NNRank SVM GP-MAP GP-EP

Triazines 0.730±0.07 0.698±0.03 0.687±0.02 0.688±0.03
Pyrimidines 0.450±0.10 0.450±0.11 0.427±0.09 0.392±0.07
Diabetes 0.546±0.15 0.746±0.14 0.662±0.14 0.665±0.14
Machine 0.186±0.04 0.192±0.04 0.185±0.04 0.186±0.04
Auto MPG 0.281±0.02 0.260±0.02 0.241±0.02 0.241±0.02
Boston 0.295±0.04 0.267±0.02 0.260±0.02 0.259±0.02
Stocks 0.127±0.02 0.108±0.02 0.120±0.02 0.120±0.02
Abalone 0.226±0.01 0.229±0.01 0.232±0.01 0.234±0.01

TABLE III
M EAN ABSOLUTE ERROR OF NNR ANK , SVM, GP-MAP, AND GP-EP ON THE EIGHT DATASETS. SVM DENOTES THE SUPPORT VECTOR MACHINE
METHOD [38], [9]. GP-MAP AND GP-EP ARE TWO G AUSSIAN PROCESS METHODS USING L APLACE APPROXIMATION AND EXPECTATION
PROPAGATION RESPECTIVELY [9]. T HE RESULTS ARE THE AVERAGE ERROR OVER 20 TRIALS ALONG WITH THE STANDARD DEVIATION. W E USE
BOLDFACE TO DENOTE THE BEST RESULTS.

1284 2008 International Joint Conference on Neural Networks (IJCNN 2008)

Authorized licensed use limited to: UNIVERSITAETSBIBL STUTTGART. Downloaded on February 08,2023 at 22:12:39 UTC from IEEE Xplore. Restrictions apply.

A Neural Network Approach To Ordinal Regression: Jianlin Cheng, Zheng Wang, and Gianluca Pollastri
No ratings yet
A Neural Network Approach To Ordinal Regression: Jianlin Cheng, Zheng Wang, and Gianluca Pollastri
6 pages
Ordinal Regression Methods: Survey and Experimental Study
No ratings yet
Ordinal Regression Methods: Survey and Experimental Study
20 pages
Accepted Manuscript
No ratings yet
Accepted Manuscript
13 pages
Is An Ordinal Class Structure Useful in Classier Learning (2009)
No ratings yet
Is An Ordinal Class Structure Useful in Classier Learning (2009)
22 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Three Approaches To Ordinal Classification (Slides 2009)
No ratings yet
Three Approaches To Ordinal Classification (Slides 2009)
25 pages
Contextual Image Classification: Understanding Visual Data for Effective Classification
From Everand
Contextual Image Classification: Understanding Visual Data for Effective Classification
Fouad Sabry
No ratings yet
Kernel Methods: Fundamentals and Applications
From Everand
Kernel Methods: Fundamentals and Applications
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Application and Comparison of Several Machine Learning Algorithms and Their Integration Models in Regression Problems
No ratings yet
Application and Comparison of Several Machine Learning Algorithms and Their Integration Models in Regression Problems
9 pages
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
From Everand
Mesh Generation: Advances and Applications in Computer Vision Mesh Generation
Fouad Sabry
No ratings yet
Mivar NETs and logical inference with the linear complexity
From Everand
Mivar NETs and logical inference with the linear complexity
Varlamov, Oleg O.
No ratings yet
Union-Find Data Structures and Algorithms: Definitive Reference for Developers and Engineers
From Everand
Union-Find Data Structures and Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Chen 2021
No ratings yet
Chen 2021
16 pages
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
From Everand
Pathways to Machine Learning and Soft Computing: 邁向機器學習與軟計算之路（國際英文版）
Jyh-Horng Jeng
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
From Everand
Radial Basis Networks: Fundamentals and Applications for The Activation Functions of Artificial Neural Networks
Fouad Sabry
No ratings yet
Challenges in Computational Statistics and Data Mining (Matwin & Mielniczuk 2015-07-08)
No ratings yet
Challenges in Computational Statistics and Data Mining (Matwin & Mielniczuk 2015-07-08)
404 pages
Backtracking Algorithms and Applications: Definitive Reference for Developers and Engineers
From Everand
Backtracking Algorithms and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Bayesian Variational Recurrent Neural Networks For Prognostics and Health Management of Complex Systems
No ratings yet
Bayesian Variational Recurrent Neural Networks For Prognostics and Health Management of Complex Systems
99 pages
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
From Everand
Principles of MapReduce Systems: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning Guide
No ratings yet
Machine Learning Guide
185 pages
Directed Acyclic Graphs in Theory and Practice: Definitive Reference for Developers and Engineers
From Everand
Directed Acyclic Graphs in Theory and Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
BDA-PPT Final
No ratings yet
BDA-PPT Final
28 pages
Chapter 2: Artificial Intelligence (Deep Learning and Machine Learning)
No ratings yet
Chapter 2: Artificial Intelligence (Deep Learning and Machine Learning)
9 pages
Machine Learning
100% (1)
Machine Learning
185 pages
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
From Everand
Foundations of Scheduling Algorithms: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Sorting Algorithms and Techniques: Definitive Reference for Developers and Engineers
From Everand
Sorting Algorithms and Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Computational Geometry: Exploring Geometric Insights for Computer Vision
From Everand
Computational Geometry: Exploring Geometric Insights for Computer Vision
Fouad Sabry
No ratings yet
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
From Everand
Big-O Notation Demystified: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Logistic Regression
No ratings yet
Logistic Regression
11 pages
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
From Everand
Graph Data Modeling and Analytics with Neo4j: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Python Data Science Cookbook - (Chapter 10 Large-Scale Machine Learning – Online Learning) PDF
No ratings yet
Python Data Science Cookbook - (Chapter 10 Large-Scale Machine Learning – Online Learning) PDF
24 pages
Neural Networks and Fuzzy Logic
From Everand
Neural Networks and Fuzzy Logic
C. Naga Bhaskar
No ratings yet
Enthought Python Machine Learning SciKit Learn Cheat Sheets 1 3 v1.0
No ratings yet
Enthought Python Machine Learning SciKit Learn Cheat Sheets 1 3 v1.0
3 pages
Designing Resilient Distributed Systems with CAP: Definitive Reference for Developers and Engineers
From Everand
Designing Resilient Distributed Systems with CAP: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
From Everand
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
Fouad Sabry
No ratings yet
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
From Everand
Contemporary Machine Learning Methods: Harnessing Scikit-Learn and TensorFlow
Adam Jones
No ratings yet
Assignment
No ratings yet
Assignment
14 pages
Mastering Data Structures and Algorithms in Python & Java
From Everand
Mastering Data Structures and Algorithms in Python & Java
Sachin Naha
No ratings yet
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
ML Word To PDF
No ratings yet
ML Word To PDF
229 pages
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
From Everand
Transformers in Deep Learning Architecture: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machinelearning
No ratings yet
Machinelearning
59 pages
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
Vector Database: Definitive Reference for Developers and Engineers
From Everand
Vector Database: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Model Combination in Multiclass Classification
No ratings yet
Model Combination in Multiclass Classification
182 pages
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
From Everand
Efficient Parallel Computing with Dask: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
Certificates
No ratings yet
Certificates
4 pages
(Ebook PDF) Health Services Research Methods 3rd Edition Instant Download
100% (1)
(Ebook PDF) Health Services Research Methods 3rd Edition Instant Download
44 pages
Planner 20250528080818 Class 5 Holiday HW 25-26
No ratings yet
Planner 20250528080818 Class 5 Holiday HW 25-26
6 pages
Drag Force Report
No ratings yet
Drag Force Report
8 pages
Gmail - Your ADHD Test Results Are Inside
No ratings yet
Gmail - Your ADHD Test Results Are Inside
2 pages
Captain's Skills
No ratings yet
Captain's Skills
174 pages
Introduction - To - Womens - Gender - and - Sexuality Studies
No ratings yet
Introduction - To - Womens - Gender - and - Sexuality Studies
13 pages
Unit 6 Lesson 1 Anglais
No ratings yet
Unit 6 Lesson 1 Anglais
6 pages
Kechari Mudra
100% (1)
Kechari Mudra
3 pages
Certificate of Recognition: 1 Place
No ratings yet
Certificate of Recognition: 1 Place
4 pages
Pharma Sales Executives Across Tamilnadu
No ratings yet
Pharma Sales Executives Across Tamilnadu
1 page
Jaea Davidson Resume
No ratings yet
Jaea Davidson Resume
2 pages
OJT Narrative Report
No ratings yet
OJT Narrative Report
11 pages
Personality Development
No ratings yet
Personality Development
53 pages
12 Definitive Traits of A Middle Child
No ratings yet
12 Definitive Traits of A Middle Child
2 pages
Planificare U.I. Going For Gold Upper-Interm.
No ratings yet
Planificare U.I. Going For Gold Upper-Interm.
16 pages
July 2025 Timetable
100% (1)
July 2025 Timetable
58 pages
Elevator Pitch Hand Out
No ratings yet
Elevator Pitch Hand Out
2 pages
Anaphora Resolution PDF
No ratings yet
Anaphora Resolution PDF
63 pages
B.Sc. Eligibility
No ratings yet
B.Sc. Eligibility
10 pages
Group Activity in Modaltext
No ratings yet
Group Activity in Modaltext
4 pages
Cooperative-Learning-In-Foreign-Language-Teaching (SS)
No ratings yet
Cooperative-Learning-In-Foreign-Language-Teaching (SS)
10 pages
Math & Science Department Terms Plan 2015/2016 TEACHER: Ms. Rahaf Jabbour GRADE: 1 SUBJECT: Science Periods Allotted: 4
No ratings yet
Math & Science Department Terms Plan 2015/2016 TEACHER: Ms. Rahaf Jabbour GRADE: 1 SUBJECT: Science Periods Allotted: 4
6 pages
Private Hotel Management Colleges in Delhi NCR
No ratings yet
Private Hotel Management Colleges in Delhi NCR
4 pages
IDBI
No ratings yet
IDBI
3 pages
CURRICULUM VITAE at 2024
No ratings yet
CURRICULUM VITAE at 2024
3 pages
EFA and CFA
No ratings yet
EFA and CFA
36 pages
21st Module 5
No ratings yet
21st Module 5
5 pages
TPS 30
No ratings yet
TPS 30
40 pages
DLL Science 6 q3 w10
No ratings yet
DLL Science 6 q3 w10
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

A Neural Network Approach To Ordinal Regression

Uploaded by

A Neural Network Approach To Ordinal Regression

Uploaded by

A Neural Network Approach to Ordinal Regression

Jianlin Cheng, Zheng Wang, and Gianluca Pollastri

= oi (1 − oi ). Thus, the net error propagated to output node

1280 2008 International Joint Conference on Neural Networks (IJCNN 2008)

B. Comparison with Neural Network Classiﬁcation on Stan-

2008 International Joint Conference on Neural Networks (IJCNN 2008) 1281

1282 2008 International Joint Conference on Neural Networks (IJCNN 2008)

2008 International Joint Conference on Neural Networks (IJCNN 2008) 1283

Data NNRank SVM GP-MAP GP-EP

Data NNRank SVM GP-MAP GP-EP

1284 2008 International Joint Conference on Neural Networks (IJCNN 2008)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.