Polynomial Kernel - Wikipedia
Polynomial Kernel - Wikipedia
org/wiki/Polynomial_kernel
Polynomial kernel
In machine learning, the polynomial kernel is a kernel function
commonly used with support vector machines (SVMs) and other kernelized
models, that represents the similarity of vectors (training samples) in a
feature space over polynomials of the original variables, allowing learning of
non-linear models.
Intuitively, the polynomial kernel looks not only at the given features of
input samples to determine their similarity, but also combinations of these.
In the context of regression analysis, such combinations are known as
interaction features. The (implicit) feature space of a polynomial kernel is
equivalent to that of polynomial regression, but without the combinatorial Illustration of the mapping . On the left a set of
blowup in the number of parameters to be learned. When the input features samples in the input space, on the right the same
are binary-valued (booleans), then the features correspond to logical samples in the feature space where the
conjunctions of input features.[1] polynomial kernel (for some values of the
parameters and ) is the inner product. The
hyperplane learned in feature space by an SVM is
Definition an ellipse in the input space.
where x and y are vectors in the input space, i.e. vectors of features computed from training or test samples and c ≥ 0 is a
free parameter trading off the influence of higher-order versus lower-order terms in the polynomial. When c = 0, the kernel
is called homogeneous.[3] (A further generalized polykernel divides xTy by a user-specified scalar parameter a.[4])
The nature of φ can be seen from an example. Let d = 2, so we get the special case of the quadratic kernel. After using the
multinomial theorem (twice—the outermost application is the binomial theorem) and regrouping,
Practical use
Although the RBF kernel is more popular in SVM classification than the polynomial kernel, the latter is quite popular in
natural language processing (NLP).[1][5] The most common degree is d = 2 (quadratic), since larger degrees tend to overfit
on NLP problems.
Various ways of computing the polynomial kernel (both exact and approximate) have been devised as alternatives to the
usual non-linear SVM training algorithms, including:
full expansion of the kernel prior to training/testing with a linear SVM,[5] i.e. full computation of the mapping φ as in
polynomial regression;
basket mining (using a variant of the apriori algorithm) for the most commonly occurring feature conjunctions in a
training set to produce an approximate expansion;[6]
1 of 2 7/12/2020, 12:08 AM
Polynomial kernel - Wikipedia https://en.wikipedia.org/wiki/Polynomial_kernel
One problem with the polynomial kernel is that it may suffer from numerical instability: when xTy + c < 1,
K(x, y) = (xTy + c)d tends to zero with increasing d, whereas when xTy + c > 1, K(x, y) tends to infinity.[4]
References
1. Yoav Goldberg and Michael Elhadad (2008). splitSVM: Fast, Space-Efficient, non-Heuristic, Polynomial Kernel
Computation for NLP Applications. Proc. ACL-08: HLT.
2. "Archived copy" (https://web.archive.org/web/20130415231446/http://www.cs.tufts.edu/~roni/Teaching/CLT/LN/lecture1
8.pdf) (PDF). Archived from the original (https://www.cs.tufts.edu/~roni/Teaching/CLT/LN/lecture18.pdf) (PDF) on
2013-04-15. Retrieved 2012-11-12.
3. Shashua, Amnon (2009). "Introduction to Machine Learning: Class Notes 67577". arXiv:0904.3664v1 (https://arxiv.org/a
bs/0904.3664v1) [cs.LG (https://arxiv.org/archive/cs.LG)].
4. Lin, Chih-Jen (2012). Machine learning software: design and practical use (http://www.csie.ntu.edu.tw/~cjlin/talks/mlss_
kyoto.pdf) (PDF). Machine Learning Summer School. Kyoto.
5. Chang, Yin-Wen; Hsieh, Cho-Jui; Chang, Kai-Wei; Ringgaard, Michael; Lin, Chih-Jen (2010). "Training and testing low-
degree polynomial data mappings via linear SVM" (http://jmlr.csail.mit.edu/papers/v11/chang10a.html). Journal of
Machine Learning Research. 11: 1471–1490.
6. Kudo, T.; Matsumoto, Y. (2003). Fast methods for kernel-based text analysis. Proc. ACL.
Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms
of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.
2 of 2 7/12/2020, 12:08 AM