T5-based anomaly-behavior video captioning using semantic relation mining

Video data consist of a series of images that change over time. The sequence of frames in a video provides important information on the motion and continuity of the video. Therefore, this dynamic information can be used to analyze the movement and behavior patterns of objects. Video captioning, whic...

Full description

Saved in:
Bibliographic Details
Published in:Applied soft computing Vol. 185; p. 113923
Main Authors: Kim, Min-Jeong, Chung, Kyungyong
Format: Journal Article
Language:English
Published: Elsevier B.V 01.12.2025
Subjects:
ISSN:1568-4946
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Video data consist of a series of images that change over time. The sequence of frames in a video provides important information on the motion and continuity of the video. Therefore, this dynamic information can be used to analyze the movement and behavior patterns of objects. Video captioning, which is used to explain a video, can describe the content of the video data and provide subtitles or descriptions in various languages. It can also explain the main points in a video with complex content, facilitating the information provided to users. In captioning, semantic analysis is used to identify the overall context of the data and generate the correct captions. However, captions are usually generated by focusing on major objects and actions, making it difficult to capture the details. In this paper, we propose text-to-text transfer transformer (T5)-based abnormal behavior video capturing using semantic relation mining. The proposed method generates captions with semantic features from video data based on environmental factors and improves the accuracy of video description by identifying the similarity of each caption for similar video and caption classification. This enables the classification and search of video data and is useful in video analysis systems, such as video monitoring and media analysis. •We propose a text-to-text transfer transformer (T5)-based abnormal behavior video capturing.•A semantic relation mining module is implemented.•A frequency matrix represents the objects and actions appearing in each frame.•Complex semantic relationships are among object locations, visual relationships, interactions.•The module is generalizable and can be provided insights into abnormal behavior patterns.
ISSN:1568-4946
DOI:10.1016/j.asoc.2025.113923