QR Code

Adaptive Learning-Based k -Nearest Neighbor Classifiers With Resilience to Class Imbalance

The classification accuracy of a <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>-nearest neighbor (<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN) classifier is largely depend...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems Vol. 29; no. 11; pp. 5713 - 5725
Main Authors:	Mullick, Sankha Subhra, Datta, Shounak, Das, Swagatam
Format:	Journal Article
Language:	English
Published:	IEEE 01.11.2018
Subjects:	<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k -nearest neighbor (<italic xmlns:ali="http://www.niso.org/schemas/ali/1.0/" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">k NN) Classification algorithms Clustering algorithms Heuristic learning imbalanced classification Learning systems parameter adaptation Resilience supervised learning Task analysis Training Tuning
ISSN:	2162-237X, 2162-2388
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The classification accuracy of a <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>-nearest neighbor (<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN) classifier is largely dependent on the choice of the number of nearest neighbors denoted by <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>. However, given a data set, it is a tedious task to optimize the performance of <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN by tuning <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>. Moreover, the performance of <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN degrades in the presence of class imbalance, a situation characterized by disparate representation from different classes. We aim to address both the issues in this paper and propose a variant of <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN called the Adaptive <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN (Ada-<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN). The Ada-<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN classifier uses the density and distribution of the neighborhood of a test point and learns a suitable point-specific <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula> for it with the help of artificial neural networks. We further improve our proposal by replacing the neural network with a heuristic learning method guided by an indicator of the local density of a test point and using information about its neighboring training points. The proposed heuristic learning algorithm preserves the simplicity of <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN without incurring serious computational burden. We call this method Ada-<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN2. Ada-<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN and Ada-<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN2 perform very competitive when compared with <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN, five of <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN's state-of-the-art variants, and other popular classifiers. Furthermore, we propose a class-based global weighting scheme (Global Imbalance Handling Scheme or GIHS) to compensate for the effect of class imbalance. We perform extensive experiments on a wide variety of data sets to establish the improvement shown by Ada-<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN and Ada-<inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN2 using the proposed GIHS, when compared with <inline-formula> <tex-math notation="LaTeX">k </tex-math></inline-formula>NN, and its 12 variants specifically tailored for imbalanced classification.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2018.2812279