Advance Fake Video Detection via Vision Transformers

Gespeichert in:
Bibliographische Detailangaben
Titel: Advance Fake Video Detection via Vision Transformers
Autoren: Joy Battocchio, Stefano Dell'Anna, Andrea Montibeller, Giulia Boato
Quelle: Proceedings of the ACM Workshop on Information Hiding and Multimedia Security. :1-11
Publication Status: Preprint
Verlagsinformationen: ACM, 2025.
Publikationsjahr: 2025
Schlagwörter: FOS: Computer and information sciences, Artificial Intelligence (cs.AI), Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, Multimedia forensics, AI-generated video detection, vision transformers, latent video diffusion models, deepfakes, Computer Science - Multimedia, Multimedia (cs.MM)
Beschreibung: Recent advancements in AI-based multimedia generation have enabled the creation of hyper-realistic images and videos, raising concerns about their potential use in spreading misinformation. The widespread accessibility of generative techniques, which allow for the production of fake multimedia from prompts or existing media, along with their continuous refinement, underscores the urgent need for highly accurate and generalizable AI-generated media detection methods, underlined also by new regulations like the European Digital AI Act. In this paper, we draw inspiration from Vision Transformer (ViT)-based fake image detection and extend this idea to video. We propose an {original} %innovative framework that effectively integrates ViT embeddings over time to enhance detection performance. Our method shows promising accuracy, generalization, and few-shot learning capabilities across a new, large and diverse dataset of videos generated using five open source generative techniques from the state-of-the-art, as well as a separate dataset containing videos produced by proprietary generative methods.
Publikationsart: Article
Conference object
DOI: 10.1145/3733102.3733129
DOI: 10.48550/arxiv.2504.20669
Zugangs-URL: http://arxiv.org/abs/2504.20669
Rights: CC BY
Dokumentencode: edsair.doi.dedup.....50d299d936c5cf5e1ba687e35495643f
Datenbank: OpenAIRE
Beschreibung
Abstract:Recent advancements in AI-based multimedia generation have enabled the creation of hyper-realistic images and videos, raising concerns about their potential use in spreading misinformation. The widespread accessibility of generative techniques, which allow for the production of fake multimedia from prompts or existing media, along with their continuous refinement, underscores the urgent need for highly accurate and generalizable AI-generated media detection methods, underlined also by new regulations like the European Digital AI Act. In this paper, we draw inspiration from Vision Transformer (ViT)-based fake image detection and extend this idea to video. We propose an {original} %innovative framework that effectively integrates ViT embeddings over time to enhance detection performance. Our method shows promising accuracy, generalization, and few-shot learning capabilities across a new, large and diverse dataset of videos generated using five open source generative techniques from the state-of-the-art, as well as a separate dataset containing videos produced by proprietary generative methods.
DOI:10.1145/3733102.3733129