T5-based anomaly-behavior video captioning using semantic relation mining

Video data consist of a series of images that change over time. The sequence of frames in a video provides important information on the motion and continuity of the video. Therefore, this dynamic information can be used to analyze the movement and behavior patterns of objects. Video captioning, whic...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Applied soft computing Ročník 185; s. 113923
Hlavní autori: Kim, Min-Jeong, Chung, Kyungyong
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Elsevier B.V 01.12.2025
Predmet:
ISSN:1568-4946
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:Video data consist of a series of images that change over time. The sequence of frames in a video provides important information on the motion and continuity of the video. Therefore, this dynamic information can be used to analyze the movement and behavior patterns of objects. Video captioning, which is used to explain a video, can describe the content of the video data and provide subtitles or descriptions in various languages. It can also explain the main points in a video with complex content, facilitating the information provided to users. In captioning, semantic analysis is used to identify the overall context of the data and generate the correct captions. However, captions are usually generated by focusing on major objects and actions, making it difficult to capture the details. In this paper, we propose text-to-text transfer transformer (T5)-based abnormal behavior video capturing using semantic relation mining. The proposed method generates captions with semantic features from video data based on environmental factors and improves the accuracy of video description by identifying the similarity of each caption for similar video and caption classification. This enables the classification and search of video data and is useful in video analysis systems, such as video monitoring and media analysis. •We propose a text-to-text transfer transformer (T5)-based abnormal behavior video capturing.•A semantic relation mining module is implemented.•A frequency matrix represents the objects and actions appearing in each frame.•Complex semantic relationships are among object locations, visual relationships, interactions.•The module is generalizable and can be provided insights into abnormal behavior patterns.
ISSN:1568-4946
DOI:10.1016/j.asoc.2025.113923