Video deepfake detection using a hybrid CNN-LSTM-Transformer model for identity verification

The proliferation of deepfake technology poses significant challenges due to its potential for misuse in creating highly convincing manipulated videos. Deep learning (DL) techniques have emerged as powerful tools for analyzing and identifying subtle inconsistencies that distinguish genuine content f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications Jg. 84; H. 33; S. 40617 - 40636
Hauptverfasser:	Petmezas, Georgios, Vanian, Vazgken, Konstantoudakis, Konstantinos, Almaloglou, Elena E. I., Zarpalas, Dimitris
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York Springer US 01.10.2025 Springer Nature B.V
Schlagworte:	Artificial intelligence Artificial neural networks Biometrics Computer Communication Networks Computer Science Data Structures and Information Theory Datasets Deception Deepfake Forensic computing Machine learning Multimedia Information Systems Neural networks Research methodology Spatial analysis Special Purpose and Application-Based Systems Support vector machines Track 6: Computer Vision for Multimedia Applications Verification Video compression Biometric authentication Transformer networks 3D Morphable Models (3DMMs) Identity verification Video deepfake detection Video forensics
ISSN:	1573-7721, 1380-7501, 1573-7721
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The proliferation of deepfake technology poses significant challenges due to its potential for misuse in creating highly convincing manipulated videos. Deep learning (DL) techniques have emerged as powerful tools for analyzing and identifying subtle inconsistencies that distinguish genuine content from deepfakes. This paper introduces a novel approach for video deepfake detection that integrates 3D Morphable Models (3DMMs) with a hybrid CNN-LSTM-Transformer model, aimed at enhancing detection accuracy and efficiency. Our model leverages 3DMMs for detailed facial feature extraction, a CNN for fine-grained spatial analysis, an LSTM for short-term temporal dynamics, and a Transformer for capturing long-term dependencies in sequential data. This architecture effectively addresses critical challenges in current detection systems by handling both local and global temporal information. The proposed model employs an identity verification approach, comparing test videos with reference videos containing genuine footage of the individuals. Trained and validated on the VoxCeleb2 dataset, with further testing on three additional datasets, our model demonstrates superior performance to existing state-of-the-art methods, maintaining robustness across different video qualities, compression levels and manipulation types. Additionally, it operates efficiently in time-sensitive scenarios, significantly outperforming existing methods in inference speed. By relying solely on pristine, unmanipulated data for training, our approach enhances adaptability to new and sophisticated manipulations, setting a new benchmark for video deepfake detection technologies. This study not only advances the framework for detecting deepfakes but also underscores its potential for practical deployment in areas critical for digital forensics and media integrity.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-024-20548-6