Query-Driven Video Summarization for Long Video Footage Analysis Using Faster-RCNN and Determinantal Point Processes

With ever growing volume of video content, video summarization becomes crucial for efficiently condensing lengthy videos into informative representations. This paper introduces a novel approach for video summarization by combining object identification with Determinantal Point Processes (DPP). Objec...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Procedia computer science Ročník 258; s. 3989 - 3999
Hlavní autoři: Bhute, Maitrey M., Tare, Sanskar S., S, Sridhar Raj
Médium: Journal Article
Jazyk:angličtina
Vydáno: Elsevier B.V 2025
Témata:
ISSN:1877-0509, 1877-0509
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract With ever growing volume of video content, video summarization becomes crucial for efficiently condensing lengthy videos into informative representations. This paper introduces a novel approach for video summarization by combining object identification with Determinantal Point Processes (DPP). Object identification, powered by advanced computer vision techniques, tracks and recognizes objects across video frames (Convolutional Neural Network/Computer Vision), allowing for identification and prioritization of frames with key objects and respective interactions. The proposed algorithm leverages DPP and R-CNN concurrently to select frames while avoiding redundancy and preserving diversity Extensive experiments using diverse video datasets demonstrates that this method performs competitively with existing summarization techniques while tested on TVSum dataset [12], a popular baseline. The results highlight the efficacy of the proposed algorithm in striking a balance between information retention and summary length. It provides accuracy of 84.34% and recall of 13.09%. This research contributes to the field of multimedia content analysis, with potential applications in video indexing, retrieval, and content recommendation systems. Through integration of object identification and DPP, this work introduces an innovative dimension to video summarization. The proposed approach holds promise for generating more informative and visually coherent video summaries. This work holds the potential to advance the state-of-the-art in video summarization techniques, benefiting various multimedia applications by aiding content analysis and user experience.
AbstractList With ever growing volume of video content, video summarization becomes crucial for efficiently condensing lengthy videos into informative representations. This paper introduces a novel approach for video summarization by combining object identification with Determinantal Point Processes (DPP). Object identification, powered by advanced computer vision techniques, tracks and recognizes objects across video frames (Convolutional Neural Network/Computer Vision), allowing for identification and prioritization of frames with key objects and respective interactions. The proposed algorithm leverages DPP and R-CNN concurrently to select frames while avoiding redundancy and preserving diversity Extensive experiments using diverse video datasets demonstrates that this method performs competitively with existing summarization techniques while tested on TVSum dataset [12], a popular baseline. The results highlight the efficacy of the proposed algorithm in striking a balance between information retention and summary length. It provides accuracy of 84.34% and recall of 13.09%. This research contributes to the field of multimedia content analysis, with potential applications in video indexing, retrieval, and content recommendation systems. Through integration of object identification and DPP, this work introduces an innovative dimension to video summarization. The proposed approach holds promise for generating more informative and visually coherent video summaries. This work holds the potential to advance the state-of-the-art in video summarization techniques, benefiting various multimedia applications by aiding content analysis and user experience.
Author S, Sridhar Raj
Bhute, Maitrey M.
Tare, Sanskar S.
Author_xml – sequence: 1
  givenname: Maitrey M.
  surname: Bhute
  fullname: Bhute, Maitrey M.
  organization: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
– sequence: 2
  givenname: Sanskar S.
  surname: Tare
  fullname: Tare, Sanskar S.
  organization: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
– sequence: 3
  givenname: Sridhar Raj
  surname: S
  fullname: S, Sridhar Raj
  organization: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
BookMark eNp9UMtuwjAQtCoqlVK-oBf_QFKbxEl86AFBoZUQpQ96tRx7g4zARnZAol9fUzj01L3srEazmplb1LHOAkL3lKSU0OJhne68UyEdkAFLSZ4WjFyhLq3KMiGM8M4ffIP6IaxJnKyqOC27qH3bgz8mY28OYPGX0eDwx367ld58y9Y4ixvn8czZ1YWcONfKFeChlZtjMAEvg4nkRIYWfPI-ms-xtBqPIZ5bY6Vt5QYvnLEtXkSbEAKEO3TdyE2A_mX30HLy9Dl6Tmav05fRcJaoAWUkYUznJNMxo65rUuZ5VapC5jUw1aiG1CeoC51VklGpIG8kZ7rhUnMOnMsq66Hs_Fd5F4KHRuy8idGOghJx6k6sxW934tSdILmI3UXV41kF0drBgBdBGbAKtPGgWqGd-Vf_AyOPfSs
Cites_doi 10.1007/978-3-319-46478-7_47
10.1016/j.neunet.2023.01.047
10.1109/AVSS.2016.7738018
10.1016/j.patcog.2023.109578
10.1109/ICRIS.2019.00060
10.1109/ICME.2006.262855
10.1016/j.patrec.2020.12.016
10.1145/3240508.3240651
10.1145/3477495.3531965
10.1109/JIOT.2019.2950469
10.1007/s11760-020-01791-4
10.1109/CVPR.2015.7299154
10.1109/WACV56688.2023.00554
10.1561/2200000044
10.1007/978-3-030-01219-9_32
10.1109/ICMEW.2014.6890674
10.1016/j.jvcir.2019.06.004
10.1007/s11042-023-14925-w
ContentType Journal Article
Copyright 2025 The Author(s)
Copyright_xml – notice: 2025 The Author(s)
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.procs.2025.04.650
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1877-0509
EndPage 3999
ExternalDocumentID 10_1016_j_procs_2025_04_650
S1877050925017545
GroupedDBID --K
0R~
1B1
457
5VS
6I.
71M
9DU
AAEDT
AAEDW
AAFTH
AAIKJ
AALRI
AAQFI
AAXUO
AAYWO
ABMAC
ABWVN
ACGFS
ACRPL
ACVFH
ADBBV
ADCNI
ADEZE
ADNMO
ADVLN
AEUPX
AEXQZ
AFPUW
AFTJW
AGHFR
AIGII
AITUG
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
E3Z
EBS
EJD
EP3
FDB
FNPLU
HZ~
IXB
KQ8
M41
M~E
O-L
O9-
OK1
P2P
ROL
SES
SSZ
~HD
AAYXX
CITATION
ID FETCH-LOGICAL-c2150-55d403d016dbb074487c6a4be5cfcf0ba4bed6d38a51ace4fa95df9ad99e99a83
ISSN 1877-0509
IngestDate Thu Nov 27 01:00:22 EST 2025
Wed Dec 10 14:41:06 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Object Detection
Video Summarization
Query-Based personalization
Determinantal Point Process
Faster R-CNN
Language English
License This is an open access article under the CC BY-NC-ND license.
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c2150-55d403d016dbb074487c6a4be5cfcf0ba4bed6d38a51ace4fa95df9ad99e99a83
OpenAccessLink https://dx.doi.org/10.1016/j.procs.2025.04.650
PageCount 11
ParticipantIDs crossref_primary_10_1016_j_procs_2025_04_650
elsevier_sciencedirect_doi_10_1016_j_procs_2025_04_650
PublicationCentury 2000
PublicationDate 2025
2025-00-00
PublicationDateYYYYMMDD 2025-01-01
PublicationDate_xml – year: 2025
  text: 2025
PublicationDecade 2020
PublicationTitle Procedia computer science
PublicationYear 2025
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Fu, Wang (bib4771) 2021; 143
Sharghi, A., Borji, A., Li, C., Yang, T., & Gong, B. (2018). Improving sequential determinantal point processes for supervised video summarization. In
Nair, Mohan (bib4770) 2021; 15
Wang, C., & Peng, Z. (2019, June). Design and implementation of an object detection system using faster R-CNN. In 2019 International Conference on Robots & Intelligent System (ICRIS) (pp. 204-206). IEEE.
Sreeja, Kovoor (bib4765) 2019; 62
Li, H., Ke, Q., Gong, M., & Drummond, T. (2023). Progressive video summarization via multimodal self-supervised learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 5584-5593).
Darabi, K., & Ghinea, G. (2014, July). Personalized video summarization by highest quality frames. In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) (pp. 1-6). IEEE.
Muhammad, Hussain, Tanveer, Sannino, de Albuquerque (bib4764) 2019; 7
Lai, P.K., Décombas, M., Moutet, K., & Laganiere, R. (2016, August). Video summarization of surveillance cameras. In 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 286-294). IEEE.
Kulesza, A., & Taskar, B. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3), 123-286.
Borodin, A. (2009). Determinantal point processes. arXiv preprint arXiv:0911.1153.
(pp. 517-533).
Li, Chen, Xie, Han (bib4766) 2023; 161
Zhang, K., Chao, W.L., Sha, F., & Grauman, K. (2016). Video summarization with long short-term memory. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14 (pp. 766-782). Springer International Publishing.
Zhu, Zhao, Hua, Wu (bib4772) 2023; 140
Feng, L., Li, Z., Kuang, Z., & Zhang, W. (2018, October). Extractive video summarizer with memory augmented neural networks. In Proceedings of the 26th ACM international conference on Multimedia (pp. 976-983).
Liu, Y., Walder, C., & Xie, L. (2022, July). Determinantal Point Process Likelihoods for Sequential Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1653-1663).
Song, Y., Vallmitjana, J., Stent, A., & Jaimes, A. (2015). Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5179-5187).
Sabha, A., & Selwal, A. (2023). Data-driven enabled approaches for criteria-based video summarization: a comprehensive survey, taxonomy, and future directions. Multimedia Tools and Applications, 1-75.
Zhao, Z., Jiang, S., Huang, Q., & Zhu, G. (2006, July). Highlight summarization in sports video based on replay detection. In 2006 IEEE international conference on multimedia and expo (pp. 1613-1616). IEEE.
10.1016/j.procs.2025.04.650_bib4768
10.1016/j.procs.2025.04.650_bib4779
10.1016/j.procs.2025.04.650_bib4769
Zhu (10.1016/j.procs.2025.04.650_bib4772) 2023; 140
10.1016/j.procs.2025.04.650_bib4775
10.1016/j.procs.2025.04.650_bib4776
Fu (10.1016/j.procs.2025.04.650_bib4771) 2021; 143
10.1016/j.procs.2025.04.650_bib4777
10.1016/j.procs.2025.04.650_bib4767
10.1016/j.procs.2025.04.650_bib4778
Li (10.1016/j.procs.2025.04.650_bib4766) 2023; 161
Nair (10.1016/j.procs.2025.04.650_bib4770) 2021; 15
10.1016/j.procs.2025.04.650_bib4762
Sreeja (10.1016/j.procs.2025.04.650_bib4765) 2019; 62
10.1016/j.procs.2025.04.650_bib4773
10.1016/j.procs.2025.04.650_bib4763
10.1016/j.procs.2025.04.650_bib4774
Muhammad (10.1016/j.procs.2025.04.650_bib4764) 2019; 7
10.1016/j.procs.2025.04.650_bib4780
References_xml – reference: Sabha, A., & Selwal, A. (2023). Data-driven enabled approaches for criteria-based video summarization: a comprehensive survey, taxonomy, and future directions. Multimedia Tools and Applications, 1-75.
– volume: 161
  start-page: 359
  year: 2023
  end-page: 370
  ident: bib4766
  article-title: Video summarization for event-centric videos
  publication-title: Neural Networks
– reference: Song, Y., Vallmitjana, J., Stent, A., & Jaimes, A. (2015). Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5179-5187).
– reference: Liu, Y., Walder, C., & Xie, L. (2022, July). Determinantal Point Process Likelihoods for Sequential Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1653-1663).
– volume: 62
  start-page: 340
  year: 2019
  end-page: 358
  ident: bib4765
  article-title: Towards genre-specific frameworks for video summarisation: A survey
  publication-title: Journal of Visual Communication and Image Representation
– reference: Li, H., Ke, Q., Gong, M., & Drummond, T. (2023). Progressive video summarization via multimodal self-supervised learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 5584-5593).
– reference: Zhang, K., Chao, W.L., Sha, F., & Grauman, K. (2016). Video summarization with long short-term memory. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14 (pp. 766-782). Springer International Publishing.
– reference: Darabi, K., & Ghinea, G. (2014, July). Personalized video summarization by highest quality frames. In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) (pp. 1-6). IEEE.
– reference: Sharghi, A., Borji, A., Li, C., Yang, T., & Gong, B. (2018). Improving sequential determinantal point processes for supervised video summarization. In
– reference: Wang, C., & Peng, Z. (2019, June). Design and implementation of an object detection system using faster R-CNN. In 2019 International Conference on Robots & Intelligent System (ICRIS) (pp. 204-206). IEEE.
– volume: 15
  start-page: 735
  year: 2021
  end-page: 742
  ident: bib4770
  article-title: Static video summarization using multi-CNN with sparse autoencoder and random forest classifier
  publication-title: Signal, Image and Video Processing
– reference: Kulesza, A., & Taskar, B. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3), 123-286.
– volume: 140
  start-page: 109578
  year: 2023
  ident: bib4772
  article-title: Topic-aware video summarization using multimodal transformer
  publication-title: Pattern Recognition
– reference: Feng, L., Li, Z., Kuang, Z., & Zhang, W. (2018, October). Extractive video summarizer with memory augmented neural networks. In Proceedings of the 26th ACM international conference on Multimedia (pp. 976-983).
– reference: Zhao, Z., Jiang, S., Huang, Q., & Zhu, G. (2006, July). Highlight summarization in sports video based on replay detection. In 2006 IEEE international conference on multimedia and expo (pp. 1613-1616). IEEE.
– reference: Lai, P.K., Décombas, M., Moutet, K., & Laganiere, R. (2016, August). Video summarization of surveillance cameras. In 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 286-294). IEEE.
– volume: 7
  start-page: 4455
  year: 2019
  end-page: 4463
  ident: bib4764
  article-title: Cost-effective video summarization using deep CNN with hierarchical weighted fusion for IoT surveillance networks
  publication-title: IEEE Internet of Things Journal
– volume: 143
  start-page: 19
  year: 2021
  end-page: 26
  ident: bib4771
  article-title: Self-attention binary neural tree for video summarization
  publication-title: Pattern Recognition Letters
– reference: (pp. 517-533).
– reference: Borodin, A. (2009). Determinantal point processes. arXiv preprint arXiv:0911.1153.
– ident: 10.1016/j.procs.2025.04.650_bib4768
  doi: 10.1007/978-3-319-46478-7_47
– volume: 161
  start-page: 359
  year: 2023
  ident: 10.1016/j.procs.2025.04.650_bib4766
  article-title: Video summarization for event-centric videos
  publication-title: Neural Networks
  doi: 10.1016/j.neunet.2023.01.047
– ident: 10.1016/j.procs.2025.04.650_bib4778
  doi: 10.1109/AVSS.2016.7738018
– volume: 140
  start-page: 109578
  year: 2023
  ident: 10.1016/j.procs.2025.04.650_bib4772
  article-title: Topic-aware video summarization using multimodal transformer
  publication-title: Pattern Recognition
  doi: 10.1016/j.patcog.2023.109578
– ident: 10.1016/j.procs.2025.04.650_bib4774
  doi: 10.1109/ICRIS.2019.00060
– ident: 10.1016/j.procs.2025.04.650_bib4779
  doi: 10.1109/ICME.2006.262855
– volume: 143
  start-page: 19
  year: 2021
  ident: 10.1016/j.procs.2025.04.650_bib4771
  article-title: Self-attention binary neural tree for video summarization
  publication-title: Pattern Recognition Letters
  doi: 10.1016/j.patrec.2020.12.016
– ident: 10.1016/j.procs.2025.04.650_bib4767
  doi: 10.1145/3240508.3240651
– ident: 10.1016/j.procs.2025.04.650_bib4777
  doi: 10.1145/3477495.3531965
– volume: 7
  start-page: 4455
  issue: 5
  year: 2019
  ident: 10.1016/j.procs.2025.04.650_bib4764
  article-title: Cost-effective video summarization using deep CNN with hierarchical weighted fusion for IoT surveillance networks
  publication-title: IEEE Internet of Things Journal
  doi: 10.1109/JIOT.2019.2950469
– volume: 15
  start-page: 735
  year: 2021
  ident: 10.1016/j.procs.2025.04.650_bib4770
  article-title: Static video summarization using multi-CNN with sparse autoencoder and random forest classifier
  publication-title: Signal, Image and Video Processing
  doi: 10.1007/s11760-020-01791-4
– ident: 10.1016/j.procs.2025.04.650_bib4773
  doi: 10.1109/CVPR.2015.7299154
– ident: 10.1016/j.procs.2025.04.650_bib4769
  doi: 10.1109/WACV56688.2023.00554
– ident: 10.1016/j.procs.2025.04.650_bib4780
  doi: 10.1561/2200000044
– ident: 10.1016/j.procs.2025.04.650_bib4762
  doi: 10.1007/978-3-030-01219-9_32
– ident: 10.1016/j.procs.2025.04.650_bib4776
– ident: 10.1016/j.procs.2025.04.650_bib4775
  doi: 10.1109/ICMEW.2014.6890674
– volume: 62
  start-page: 340
  year: 2019
  ident: 10.1016/j.procs.2025.04.650_bib4765
  article-title: Towards genre-specific frameworks for video summarisation: A survey
  publication-title: Journal of Visual Communication and Image Representation
  doi: 10.1016/j.jvcir.2019.06.004
– ident: 10.1016/j.procs.2025.04.650_bib4763
  doi: 10.1007/s11042-023-14925-w
SSID ssj0000388917
Score 2.3425815
Snippet With ever growing volume of video content, video summarization becomes crucial for efficiently condensing lengthy videos into informative representations. This...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 3989
SubjectTerms Determinantal Point Process
Faster R-CNN
Object Detection
Query-Based personalization
Video Summarization
Title Query-Driven Video Summarization for Long Video Footage Analysis Using Faster-RCNN and Determinantal Point Processes
URI https://dx.doi.org/10.1016/j.procs.2025.04.650
Volume 258
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: M~E
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF5FhQMXWl6ipaA9cDNGTrx-7BEVIg5tVJpS9Wbty6oDcirXqcqFv8JfZWZ37ToQVfTAxXJmlXXs-TIvz4OQt8qUY5HDH0mWLAuZUhho0jKMGTdjmcdC2LfnZ4fZbJafn_Pj0ehXVwtz_T2r6_zmhl_-V1YDDZiNpbP3YHe_KRDgHJgOR2A7HP-J8V9WpvkRfmxQjAVnlTbLYG4r1HzFpU0sPMQRQ25xusQ3U4P2JC6LYCqwhUJ4cjCb-XTlPm8GuHq8rOq2qzLwaYjexLVEQJ3NVseBEYHXsr3nf7FyM_mORNUCkoKj97fxAx8SBwX6TTTBvF-xGYbzptIXQD4Ri2G0wtU0e9GaZ1mI3Wac5tlA8_J44nq5e4ka85wPtDPYU3yj5HdBiAXqHYVt2CcJtrBNXVfb9T7bf-i_PiuxS3hbFHaTAjcpIlakGBN6MMnA98Lc0J-3QTxspcPtVOf-TrrOVjaH8K8fs9n6GVg0pzvksXdF6AcHoSdkZOqnZLsb80G91H9G2iGiqAUNXUMUBURRRJRf9IiiHaKoRRQdIIoCougaoqhFFO0R9Zx8nX46Pfgc-mEdoQKrMQqTRLMo1nDfWkqwS8ERVqlg0iSqVGUk8VSnOs5FMhbKsFLwRJdcaM4N5yAuXpCtelmbl4TGsZFRIoRIVcySRIJTwJSMJipT4M4LvUvedQ8ROGV7shR38G6XpN2DLjzgnblYAHbu-uLe_a7zijzCTy4-t0-22mZlXpOH6rqtrpo3Fji_AS3bnew
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Query-Driven+Video+Summarization+for+Long+Video+Footage+Analysis+Using+Faster-RCNN+and+Determinantal+Point+Processes&rft.jtitle=Procedia+computer+science&rft.au=Bhute%2C+Maitrey+M.&rft.au=Tare%2C+Sanskar+S.&rft.au=S%2C+Sridhar+Raj&rft.date=2025&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=258&rft.spage=3989&rft.epage=3999&rft_id=info:doi/10.1016%2Fj.procs.2025.04.650&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_procs_2025_04_650
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon