Scalable Kernel Ordinal Regression via Doubly Stochastic Gradients

Ordinal regression (OR) is one of the most important machine learning tasks. The kernel method is a major technique to achieve nonlinear OR. However, traditional kernel OR solvers are inefficient due to increased complexity introduced by multiple ordinal thresholds as well as the cost of kernel comp...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems Vol. 32; no. 8; pp. 3677 - 3689
Main Authors:	Gu, Bin, Geng, Xiang, Li, Xiang, Shi, Wanli, Zheng, Guansheng, Deng, Cheng, Huang, Heng
Format:	Journal Article
Language:	English
Published:	Piscataway IEEE 01.08.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Algorithms Approximation algorithms Cognitive tasks Convergence Doubly stochastic gradients (DSGs) Hilbert space Iterative methods Kernel kernel learning Kernels Learning algorithms Machine learning Memory management Optimization ordinal regression (OR) random features Solvers Stochastic processes Stochasticity Support vector machines Thresholds
ISSN:	2162-237X, 2162-2388, 2162-2388
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Ordinal regression (OR) is one of the most important machine learning tasks. The kernel method is a major technique to achieve nonlinear OR. However, traditional kernel OR solvers are inefficient due to increased complexity introduced by multiple ordinal thresholds as well as the cost of kernel computation. Doubly stochastic gradient (DSG) is a very efficient and scalable kernel learning algorithm that combines random feature approximation with stochastic functional optimization. However, the theory and algorithm of DSG can only support optimization tasks within the unique reproducing kernel Hilbert space (RKHS), which is not suitable for OR problems where the multiple ordinal thresholds usually lead to multiple RKHSs. To address this problem, we construct a kernel whose RKHS can contain the decision function with multiple thresholds. Based on this new kernel, we further propose a novel DSG-like algorithm, DSGOR. In each iteration of DSGOR, we update the decision functional as well as the function bias with appropriately set learning rates for each. Our theoretic analysis shows that DSGOR can achieve <inline-formula> <tex-math notation="LaTeX">O(1/t) </tex-math></inline-formula> convergence rate, which is as good as DSG, even though dealing with a much harder problem. Extensive experimental results demonstrate that our algorithm is much more efficient than traditional kernel OR solvers, especially on large-scale problems.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2020.3015937