Similarity measure method based on spectra subspace and locally linear embedding algorithm

Due to the high dimensionality, redundancy, noise and nonlinearity of the near infrared (NIR) spectra data result the difficulty of the similarity measure. This paper presented a similarity measure method SSLLE based on the spectra subspace and locally linear embedding (LLE) algorithm. Firstly, we d...

Full description

Saved in:
Bibliographic Details
Published in:Infrared physics & technology Vol. 100; pp. 57 - 61
Main Authors: Qin, Yuhua, Duan, Kai, Wu, Lijun, Xu, Baoding
Format: Journal Article
Language:English
Published: Elsevier B.V 01.08.2019
Subjects:
ISSN:1350-4495, 1879-0275
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Due to the high dimensionality, redundancy, noise and nonlinearity of the near infrared (NIR) spectra data result the difficulty of the similarity measure. This paper presented a similarity measure method SSLLE based on the spectra subspace and locally linear embedding (LLE) algorithm. Firstly, we divided the high dimensional spectra data into several subspaces according to the absorption band of the major chemical compositions, which effectively avoids the influence of irrelevant features and noise and reduces the dimension and computation complexity of the LLE. Then, we modified the LLE algorithm by introducing the geodesic distance instead of Euclidean distance, which solves the measure problem of the Euclidean distance in high dimensional space. In order to make the sample more evenly distributed, the method of distance calculation in LLE was also modified. For each spectra subspace, the distance matrix was calculated according to the embedding that was mapped from the high dimensional space by using the modified LLE. Subsequently, the spectral similarity matrix of the sample set was integrated by adding all of the individual distance matrices of each subspace so that the sample with the highest similarity can be found. In order to investigate the effectiveness of the algorithm, the spectral projection of the samples was analyzed first, the results showed that the SSLLE distinguished the tobacco samples from different areas significantly better than the methods of principal component analysis (PCA) and LLE. Secondly, we compared the results of searching the most spectrally similar sample with the target tobacco, it showed that the SSLLE had the minimum differences in the chemical composition, and the highest consistency with the recommendation of the experts than that of PCA and LLE algorithm. It also had good robustness and precision.
ISSN:1350-4495
1879-0275
DOI:10.1016/j.infrared.2019.05.006