CLIP-MDC: CLIP encoder based multimodal defect classification with synthetic anomaly generation for real-time surface defect detection.
Saved in:
| Title: | CLIP-MDC: CLIP encoder based multimodal defect classification with synthetic anomaly generation for real-time surface defect detection. |
|---|---|
| Authors: | Ha, Taewon (AUTHOR), Hwang, Chaeseon (AUTHOR), Jeong, Jongpil1 (AUTHOR) jpjeong@skku.edu |
| Source: | Journal of Intelligent Manufacturing. Jan2026, p1-23. |
| Subject Terms: | *DEFECT tracking (Computer software development), *INDUSTRY 4.0, OUTLIER detection |
| Abstract: | In this study, using various text prompts that combine objects and defect types, we establish a semantic space linking images and texts, enabling explainable defect predictions using natural language. We introduce contrastive language–image pre-training-based multimodal defect classification (CLIP-MDC), a framework designed for multimodal defect detection and classification in smart manufacturing. The model integrates a lightweight backbone network with contrastive language–image pre-training (CLIP) encoders to perform both pixel-level anomaly segmentation and image-level defect classification effectively in supervised and weakly supervised settings. Additionally, we incorporate a Perlin noise-based synthetic anomaly generation technique to facilitate learning in environments with limited labeled data, and the dual prediction architecture enables accurate simultaneous inference of defect location and type. In experiments on the MVTec AD and KSDD2 datasets, the model achieved outstanding performance with an area under the receiver operating characteristic curve (AUROC) of 99.9%, an area under the per-region overlap curve (AUPRO) of 98.6%, a pixel-level AUROC (P-AUROC) of 99.9%, and an average precision for localization (\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$AP_{loc}$$\end{document}) of 87.6%. It also demonstrated real-time capability, registering an average inference speed of 6.6 |
| Copyright of Journal of Intelligent Manufacturing is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Business Source Index |
Be the first to leave a comment!
Full Text Finder
Nájsť tento článok vo Web of Science