Generative hybrid models for fraud detection in auto insurance with a comparative analysis of VAE, GAN, and diffusion approaches

Fraud claim detection in auto insurance remains a vital yet complex challenge, mainly due to imbalanced data sets, non-linear feature interactions, and the necessity for explicable predictions. While traditional Machine Learning (ML) approaches show promise, they frequently struggle from poor genera...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Discover Artificial Intelligence Ročník 5; číslo 1; s. 313 - 23
Hlavní autoři:	Bekkaye, Chadia, Oukhouya, Hassan, Zari, Tarek, Guerbaz, Raby, El Bouanani, Hicham
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Cham Springer International Publishing 01.12.2025 Springer Nature B.V Springer
Témata:	Accuracy Algorithms Artificial Intelligence Auto insurance Automobile insurance Classification Comparative analysis Computer Science Datasets Engineering Feature selection Fraud detection Fraud prevention Generative hybrid models Health insurance Insurance claims Insurance fraud Isolation forest Machine leaning algorithms Medical research Methods Neural networks Oversampling methods Support vector machines Survival analysis Classification metrics Isolation forest Performance comparison Oversampling methods Probabilistic calibration Auto insurance Machine leaning algorithms Fraud detection Model stability Generative hybrid models
ISSN:	2731-0809, 2731-0809
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Fraud claim detection in auto insurance remains a vital yet complex challenge, mainly due to imbalanced data sets, non-linear feature interactions, and the necessity for explicable predictions. While traditional Machine Learning (ML) approaches show promise, they frequently struggle from poor generalization, limited interpretability, and inadequate treatment of rare fraudulent cases. The present paper proposes a new hybrid approach involving generative models —namely Variational AutoEncoders (VAEs), Generative Adversarial Networks (GANs), and Diffusion Models (DMs)—with an ensemble of classifiers including eXtreme Gradient Boosting (XGBoost), Random Forest (RF), and Light Gradient Boosting (Light GBM), coupled with Isolation Forest (IF) for anomaly detection and oversampling-based techniques (SMOTE and ADASYN) to ameliorate class balance. In total, 18 hybrid combinations were developed and evaluated across classification performance (AUC-ROC, Accuracy, Precision, Recall, F1-score), probabilistic calibration (Brier Score and Log loss), and stochastic stability (Monte Carlo Variance and Bootstrap Variance). The experimental findings—backed up by graphical analysis based on radar plots, ROC curves, 3D metric visualization, and SHAP explainability—confirm that DM coupled with XGBoost and SMOTE (DM_XGBoost_SMOTE) and DM with Light GBM and SMOTE (DM_Light GBM_SMOTE) outperform alternative combinations. In particular, DM_XGBoost_SMOTE achieves a well balanced compromise between accuracy, confidence calibration, and robustness. This work underlines the efficiency of Diffusion-based hybrid models in fraud detection and opens the way for their implementation in high-risk, real-world insurance environments.
Bibliografie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2731-0809 2731-0809
DOI:	10.1007/s44163-025-00574-5