Attention-based vector quantized variational autoencoder for anomaly detection by using orthogonal subspace constraints

This paper introduces a new framework that uses a vector quantized variational autoencoder (VQVAE) enhanced by orthogonal subspace constraints (OSC) and pyramid criss-cross attention (PCCA). The framework was designed for anomaly detection in industrial product image datasets. Previous studies on mo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition Jg. 164; S. 111500
Hauptverfasser: Yu, Qien, Dai, Shengxin, Dong, Ran, Ikuno, Soichiro
Format: Journal Article
Sprache:Englisch
Veröffentlicht: Elsevier Ltd 01.08.2025
Schlagworte:
ISSN:0031-3203
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper introduces a new framework that uses a vector quantized variational autoencoder (VQVAE) enhanced by orthogonal subspace constraints (OSC) and pyramid criss-cross attention (PCCA). The framework was designed for anomaly detection in industrial product image datasets. Previous studies on modeling low-dimensional feature distributions have been unable to effectively distinguish between normal features and noisy/abnormal information, which is effectively addressed using OSC in this study. Then, the vector quantized mechanism is embodied in these two complementary subspaces to obtain normal and abnormal embedding subspaces and discrete representations for normal and noisy information, respectively. The proposed approach robustly represents low-dimensional discrete manifolds to present the information from normal data using a limited number of feature vectors. Additionally, two PCCA modules are proposed to capture feature maps from different layers in the encoder and decoder, benefitting the low-dimensional mapping and reconstruction process. The features of different layers are treated as the query (Q), key (K), and value (V), which could capture both low-level and high-level features, incorporating comprehensive contextual information. The effectiveness of the proposed framework for anomaly detection is assessed by comparing its performance with those of the state-of-the-art approaches on various publicly available industrial product image datasets. •Introduced a novel vector quantized variational autoencoder (VQVAE) framework for robust anomaly detection.•Utilized pyramid criss-cross attention (PCCA) modules to enhance feature encoding and reconstruction.•Effectively presented normal and abnormal features by using orthogonal subspace constraints (OSCs).•Achieved notable improvements in anomaly detection on several industrial image datasets.
ISSN:0031-3203
DOI:10.1016/j.patcog.2025.111500