Query-Driven Video Summarization for Long Video Footage Analysis Using Faster-RCNN and Determinantal Point Processes
With ever growing volume of video content, video summarization becomes crucial for efficiently condensing lengthy videos into informative representations. This paper introduces a novel approach for video summarization by combining object identification with Determinantal Point Processes (DPP). Objec...
Uloženo v:
| Vydáno v: | Procedia computer science Ročník 258; s. 3989 - 3999 |
|---|---|
| Hlavní autoři: | , , |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Elsevier B.V
2025
|
| Témata: | |
| ISSN: | 1877-0509, 1877-0509 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | With ever growing volume of video content, video summarization becomes crucial for efficiently condensing lengthy videos into informative representations. This paper introduces a novel approach for video summarization by combining object identification with Determinantal Point Processes (DPP). Object identification, powered by advanced computer vision techniques, tracks and recognizes objects across video frames (Convolutional Neural Network/Computer Vision), allowing for identification and prioritization of frames with key objects and respective interactions. The proposed algorithm leverages DPP and R-CNN concurrently to select frames while avoiding redundancy and preserving diversity
Extensive experiments using diverse video datasets demonstrates that this method performs competitively with existing summarization techniques while tested on TVSum dataset [12], a popular baseline. The results highlight the efficacy of the proposed algorithm in striking a balance between information retention and summary length. It provides accuracy of 84.34% and recall of 13.09%. This research contributes to the field of multimedia content analysis, with potential applications in video indexing, retrieval, and content recommendation systems. Through integration of object identification and DPP, this work introduces an innovative dimension to video summarization. The proposed approach holds promise for generating more informative and visually coherent video summaries. This work holds the potential to advance the state-of-the-art in video summarization techniques, benefiting various multimedia applications by aiding content analysis and user experience. |
|---|---|
| AbstractList | With ever growing volume of video content, video summarization becomes crucial for efficiently condensing lengthy videos into informative representations. This paper introduces a novel approach for video summarization by combining object identification with Determinantal Point Processes (DPP). Object identification, powered by advanced computer vision techniques, tracks and recognizes objects across video frames (Convolutional Neural Network/Computer Vision), allowing for identification and prioritization of frames with key objects and respective interactions. The proposed algorithm leverages DPP and R-CNN concurrently to select frames while avoiding redundancy and preserving diversity
Extensive experiments using diverse video datasets demonstrates that this method performs competitively with existing summarization techniques while tested on TVSum dataset [12], a popular baseline. The results highlight the efficacy of the proposed algorithm in striking a balance between information retention and summary length. It provides accuracy of 84.34% and recall of 13.09%. This research contributes to the field of multimedia content analysis, with potential applications in video indexing, retrieval, and content recommendation systems. Through integration of object identification and DPP, this work introduces an innovative dimension to video summarization. The proposed approach holds promise for generating more informative and visually coherent video summaries. This work holds the potential to advance the state-of-the-art in video summarization techniques, benefiting various multimedia applications by aiding content analysis and user experience. |
| Author | S, Sridhar Raj Bhute, Maitrey M. Tare, Sanskar S. |
| Author_xml | – sequence: 1 givenname: Maitrey M. surname: Bhute fullname: Bhute, Maitrey M. organization: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India – sequence: 2 givenname: Sanskar S. surname: Tare fullname: Tare, Sanskar S. organization: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India – sequence: 3 givenname: Sridhar Raj surname: S fullname: S, Sridhar Raj organization: School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India |
| BookMark | eNp9UMtuwjAQtCoqlVK-oBf_QFKbxEl86AFBoZUQpQ96tRx7g4zARnZAol9fUzj01L3srEazmplb1LHOAkL3lKSU0OJhne68UyEdkAFLSZ4WjFyhLq3KMiGM8M4ffIP6IaxJnKyqOC27qH3bgz8mY28OYPGX0eDwx367ld58y9Y4ixvn8czZ1YWcONfKFeChlZtjMAEvg4nkRIYWfPI-ms-xtBqPIZ5bY6Vt5QYvnLEtXkSbEAKEO3TdyE2A_mX30HLy9Dl6Tmav05fRcJaoAWUkYUznJNMxo65rUuZ5VapC5jUw1aiG1CeoC51VklGpIG8kZ7rhUnMOnMsq66Hs_Fd5F4KHRuy8idGOghJx6k6sxW934tSdILmI3UXV41kF0drBgBdBGbAKtPGgWqGd-Vf_AyOPfSs |
| Cites_doi | 10.1007/978-3-319-46478-7_47 10.1016/j.neunet.2023.01.047 10.1109/AVSS.2016.7738018 10.1016/j.patcog.2023.109578 10.1109/ICRIS.2019.00060 10.1109/ICME.2006.262855 10.1016/j.patrec.2020.12.016 10.1145/3240508.3240651 10.1145/3477495.3531965 10.1109/JIOT.2019.2950469 10.1007/s11760-020-01791-4 10.1109/CVPR.2015.7299154 10.1109/WACV56688.2023.00554 10.1561/2200000044 10.1007/978-3-030-01219-9_32 10.1109/ICMEW.2014.6890674 10.1016/j.jvcir.2019.06.004 10.1007/s11042-023-14925-w |
| ContentType | Journal Article |
| Copyright | 2025 The Author(s) |
| Copyright_xml | – notice: 2025 The Author(s) |
| DBID | 6I. AAFTH AAYXX CITATION |
| DOI | 10.1016/j.procs.2025.04.650 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1877-0509 |
| EndPage | 3999 |
| ExternalDocumentID | 10_1016_j_procs_2025_04_650 S1877050925017545 |
| GroupedDBID | --K 0R~ 1B1 457 5VS 6I. 71M 9DU AAEDT AAEDW AAFTH AAIKJ AALRI AAQFI AAXUO AAYWO ABMAC ABWVN ACGFS ACRPL ACVFH ADBBV ADCNI ADEZE ADNMO ADVLN AEUPX AEXQZ AFPUW AFTJW AGHFR AIGII AITUG AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ E3Z EBS EJD EP3 FDB FNPLU HZ~ IXB KQ8 M41 M~E O-L O9- OK1 P2P ROL SES SSZ ~HD AAYXX CITATION |
| ID | FETCH-LOGICAL-c2150-55d403d016dbb074487c6a4be5cfcf0ba4bed6d38a51ace4fa95df9ad99e99a83 |
| ISSN | 1877-0509 |
| IngestDate | Thu Nov 27 01:00:22 EST 2025 Wed Dec 10 14:41:06 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | Object Detection Video Summarization Query-Based personalization Determinantal Point Process Faster R-CNN |
| Language | English |
| License | This is an open access article under the CC BY-NC-ND license. |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-c2150-55d403d016dbb074487c6a4be5cfcf0ba4bed6d38a51ace4fa95df9ad99e99a83 |
| OpenAccessLink | https://dx.doi.org/10.1016/j.procs.2025.04.650 |
| PageCount | 11 |
| ParticipantIDs | crossref_primary_10_1016_j_procs_2025_04_650 elsevier_sciencedirect_doi_10_1016_j_procs_2025_04_650 |
| PublicationCentury | 2000 |
| PublicationDate | 2025 2025-00-00 |
| PublicationDateYYYYMMDD | 2025-01-01 |
| PublicationDate_xml | – year: 2025 text: 2025 |
| PublicationDecade | 2020 |
| PublicationTitle | Procedia computer science |
| PublicationYear | 2025 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | Fu, Wang (bib4771) 2021; 143 Sharghi, A., Borji, A., Li, C., Yang, T., & Gong, B. (2018). Improving sequential determinantal point processes for supervised video summarization. In Nair, Mohan (bib4770) 2021; 15 Wang, C., & Peng, Z. (2019, June). Design and implementation of an object detection system using faster R-CNN. In 2019 International Conference on Robots & Intelligent System (ICRIS) (pp. 204-206). IEEE. Sreeja, Kovoor (bib4765) 2019; 62 Li, H., Ke, Q., Gong, M., & Drummond, T. (2023). Progressive video summarization via multimodal self-supervised learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 5584-5593). Darabi, K., & Ghinea, G. (2014, July). Personalized video summarization by highest quality frames. In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) (pp. 1-6). IEEE. Muhammad, Hussain, Tanveer, Sannino, de Albuquerque (bib4764) 2019; 7 Lai, P.K., Décombas, M., Moutet, K., & Laganiere, R. (2016, August). Video summarization of surveillance cameras. In 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 286-294). IEEE. Kulesza, A., & Taskar, B. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3), 123-286. Borodin, A. (2009). Determinantal point processes. arXiv preprint arXiv:0911.1153. (pp. 517-533). Li, Chen, Xie, Han (bib4766) 2023; 161 Zhang, K., Chao, W.L., Sha, F., & Grauman, K. (2016). Video summarization with long short-term memory. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14 (pp. 766-782). Springer International Publishing. Zhu, Zhao, Hua, Wu (bib4772) 2023; 140 Feng, L., Li, Z., Kuang, Z., & Zhang, W. (2018, October). Extractive video summarizer with memory augmented neural networks. In Proceedings of the 26th ACM international conference on Multimedia (pp. 976-983). Liu, Y., Walder, C., & Xie, L. (2022, July). Determinantal Point Process Likelihoods for Sequential Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1653-1663). Song, Y., Vallmitjana, J., Stent, A., & Jaimes, A. (2015). Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5179-5187). Sabha, A., & Selwal, A. (2023). Data-driven enabled approaches for criteria-based video summarization: a comprehensive survey, taxonomy, and future directions. Multimedia Tools and Applications, 1-75. Zhao, Z., Jiang, S., Huang, Q., & Zhu, G. (2006, July). Highlight summarization in sports video based on replay detection. In 2006 IEEE international conference on multimedia and expo (pp. 1613-1616). IEEE. 10.1016/j.procs.2025.04.650_bib4768 10.1016/j.procs.2025.04.650_bib4779 10.1016/j.procs.2025.04.650_bib4769 Zhu (10.1016/j.procs.2025.04.650_bib4772) 2023; 140 10.1016/j.procs.2025.04.650_bib4775 10.1016/j.procs.2025.04.650_bib4776 Fu (10.1016/j.procs.2025.04.650_bib4771) 2021; 143 10.1016/j.procs.2025.04.650_bib4777 10.1016/j.procs.2025.04.650_bib4767 10.1016/j.procs.2025.04.650_bib4778 Li (10.1016/j.procs.2025.04.650_bib4766) 2023; 161 Nair (10.1016/j.procs.2025.04.650_bib4770) 2021; 15 10.1016/j.procs.2025.04.650_bib4762 Sreeja (10.1016/j.procs.2025.04.650_bib4765) 2019; 62 10.1016/j.procs.2025.04.650_bib4773 10.1016/j.procs.2025.04.650_bib4763 10.1016/j.procs.2025.04.650_bib4774 Muhammad (10.1016/j.procs.2025.04.650_bib4764) 2019; 7 10.1016/j.procs.2025.04.650_bib4780 |
| References_xml | – reference: Sabha, A., & Selwal, A. (2023). Data-driven enabled approaches for criteria-based video summarization: a comprehensive survey, taxonomy, and future directions. Multimedia Tools and Applications, 1-75. – volume: 161 start-page: 359 year: 2023 end-page: 370 ident: bib4766 article-title: Video summarization for event-centric videos publication-title: Neural Networks – reference: Song, Y., Vallmitjana, J., Stent, A., & Jaimes, A. (2015). Tvsum: Summarizing web videos using titles. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5179-5187). – reference: Liu, Y., Walder, C., & Xie, L. (2022, July). Determinantal Point Process Likelihoods for Sequential Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1653-1663). – volume: 62 start-page: 340 year: 2019 end-page: 358 ident: bib4765 article-title: Towards genre-specific frameworks for video summarisation: A survey publication-title: Journal of Visual Communication and Image Representation – reference: Li, H., Ke, Q., Gong, M., & Drummond, T. (2023). Progressive video summarization via multimodal self-supervised learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 5584-5593). – reference: Zhang, K., Chao, W.L., Sha, F., & Grauman, K. (2016). Video summarization with long short-term memory. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VII 14 (pp. 766-782). Springer International Publishing. – reference: Darabi, K., & Ghinea, G. (2014, July). Personalized video summarization by highest quality frames. In 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW) (pp. 1-6). IEEE. – reference: Sharghi, A., Borji, A., Li, C., Yang, T., & Gong, B. (2018). Improving sequential determinantal point processes for supervised video summarization. In – reference: Wang, C., & Peng, Z. (2019, June). Design and implementation of an object detection system using faster R-CNN. In 2019 International Conference on Robots & Intelligent System (ICRIS) (pp. 204-206). IEEE. – volume: 15 start-page: 735 year: 2021 end-page: 742 ident: bib4770 article-title: Static video summarization using multi-CNN with sparse autoencoder and random forest classifier publication-title: Signal, Image and Video Processing – reference: Kulesza, A., & Taskar, B. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3), 123-286. – volume: 140 start-page: 109578 year: 2023 ident: bib4772 article-title: Topic-aware video summarization using multimodal transformer publication-title: Pattern Recognition – reference: Feng, L., Li, Z., Kuang, Z., & Zhang, W. (2018, October). Extractive video summarizer with memory augmented neural networks. In Proceedings of the 26th ACM international conference on Multimedia (pp. 976-983). – reference: Zhao, Z., Jiang, S., Huang, Q., & Zhu, G. (2006, July). Highlight summarization in sports video based on replay detection. In 2006 IEEE international conference on multimedia and expo (pp. 1613-1616). IEEE. – reference: Lai, P.K., Décombas, M., Moutet, K., & Laganiere, R. (2016, August). Video summarization of surveillance cameras. In 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 286-294). IEEE. – volume: 7 start-page: 4455 year: 2019 end-page: 4463 ident: bib4764 article-title: Cost-effective video summarization using deep CNN with hierarchical weighted fusion for IoT surveillance networks publication-title: IEEE Internet of Things Journal – volume: 143 start-page: 19 year: 2021 end-page: 26 ident: bib4771 article-title: Self-attention binary neural tree for video summarization publication-title: Pattern Recognition Letters – reference: (pp. 517-533). – reference: Borodin, A. (2009). Determinantal point processes. arXiv preprint arXiv:0911.1153. – ident: 10.1016/j.procs.2025.04.650_bib4768 doi: 10.1007/978-3-319-46478-7_47 – volume: 161 start-page: 359 year: 2023 ident: 10.1016/j.procs.2025.04.650_bib4766 article-title: Video summarization for event-centric videos publication-title: Neural Networks doi: 10.1016/j.neunet.2023.01.047 – ident: 10.1016/j.procs.2025.04.650_bib4778 doi: 10.1109/AVSS.2016.7738018 – volume: 140 start-page: 109578 year: 2023 ident: 10.1016/j.procs.2025.04.650_bib4772 article-title: Topic-aware video summarization using multimodal transformer publication-title: Pattern Recognition doi: 10.1016/j.patcog.2023.109578 – ident: 10.1016/j.procs.2025.04.650_bib4774 doi: 10.1109/ICRIS.2019.00060 – ident: 10.1016/j.procs.2025.04.650_bib4779 doi: 10.1109/ICME.2006.262855 – volume: 143 start-page: 19 year: 2021 ident: 10.1016/j.procs.2025.04.650_bib4771 article-title: Self-attention binary neural tree for video summarization publication-title: Pattern Recognition Letters doi: 10.1016/j.patrec.2020.12.016 – ident: 10.1016/j.procs.2025.04.650_bib4767 doi: 10.1145/3240508.3240651 – ident: 10.1016/j.procs.2025.04.650_bib4777 doi: 10.1145/3477495.3531965 – volume: 7 start-page: 4455 issue: 5 year: 2019 ident: 10.1016/j.procs.2025.04.650_bib4764 article-title: Cost-effective video summarization using deep CNN with hierarchical weighted fusion for IoT surveillance networks publication-title: IEEE Internet of Things Journal doi: 10.1109/JIOT.2019.2950469 – volume: 15 start-page: 735 year: 2021 ident: 10.1016/j.procs.2025.04.650_bib4770 article-title: Static video summarization using multi-CNN with sparse autoencoder and random forest classifier publication-title: Signal, Image and Video Processing doi: 10.1007/s11760-020-01791-4 – ident: 10.1016/j.procs.2025.04.650_bib4773 doi: 10.1109/CVPR.2015.7299154 – ident: 10.1016/j.procs.2025.04.650_bib4769 doi: 10.1109/WACV56688.2023.00554 – ident: 10.1016/j.procs.2025.04.650_bib4780 doi: 10.1561/2200000044 – ident: 10.1016/j.procs.2025.04.650_bib4762 doi: 10.1007/978-3-030-01219-9_32 – ident: 10.1016/j.procs.2025.04.650_bib4776 – ident: 10.1016/j.procs.2025.04.650_bib4775 doi: 10.1109/ICMEW.2014.6890674 – volume: 62 start-page: 340 year: 2019 ident: 10.1016/j.procs.2025.04.650_bib4765 article-title: Towards genre-specific frameworks for video summarisation: A survey publication-title: Journal of Visual Communication and Image Representation doi: 10.1016/j.jvcir.2019.06.004 – ident: 10.1016/j.procs.2025.04.650_bib4763 doi: 10.1007/s11042-023-14925-w |
| SSID | ssj0000388917 |
| Score | 2.3425815 |
| Snippet | With ever growing volume of video content, video summarization becomes crucial for efficiently condensing lengthy videos into informative representations. This... |
| SourceID | crossref elsevier |
| SourceType | Index Database Publisher |
| StartPage | 3989 |
| SubjectTerms | Determinantal Point Process Faster R-CNN Object Detection Query-Based personalization Video Summarization |
| Title | Query-Driven Video Summarization for Long Video Footage Analysis Using Faster-RCNN and Determinantal Point Processes |
| URI | https://dx.doi.org/10.1016/j.procs.2025.04.650 |
| Volume | 258 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: M~E dateStart: 20100101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Lb9NAEF5FhQMXWl6ipaA9cDNGTrx-7BEVIg5tVJpS9Wbty6oDcirXqcqFv8JfZWZ37ToQVfTAxXJmlXXs-TIvz4OQt8qUY5HDH0mWLAuZUhho0jKMGTdjmcdC2LfnZ4fZbJafn_Pj0ehXVwtz_T2r6_zmhl_-V1YDDZiNpbP3YHe_KRDgHJgOR2A7HP-J8V9WpvkRfmxQjAVnlTbLYG4r1HzFpU0sPMQRQ25xusQ3U4P2JC6LYCqwhUJ4cjCb-XTlPm8GuHq8rOq2qzLwaYjexLVEQJ3NVseBEYHXsr3nf7FyM_mORNUCkoKj97fxAx8SBwX6TTTBvF-xGYbzptIXQD4Ri2G0wtU0e9GaZ1mI3Wac5tlA8_J44nq5e4ka85wPtDPYU3yj5HdBiAXqHYVt2CcJtrBNXVfb9T7bf-i_PiuxS3hbFHaTAjcpIlakGBN6MMnA98Lc0J-3QTxspcPtVOf-TrrOVjaH8K8fs9n6GVg0pzvksXdF6AcHoSdkZOqnZLsb80G91H9G2iGiqAUNXUMUBURRRJRf9IiiHaKoRRQdIIoCougaoqhFFO0R9Zx8nX46Pfgc-mEdoQKrMQqTRLMo1nDfWkqwS8ERVqlg0iSqVGUk8VSnOs5FMhbKsFLwRJdcaM4N5yAuXpCtelmbl4TGsZFRIoRIVcySRIJTwJSMJipT4M4LvUvedQ8ROGV7shR38G6XpN2DLjzgnblYAHbu-uLe_a7zijzCTy4-t0-22mZlXpOH6rqtrpo3Fji_AS3bnew |
| linkProvider | ISSN International Centre |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Query-Driven+Video+Summarization+for+Long+Video+Footage+Analysis+Using+Faster-RCNN+and+Determinantal+Point+Processes&rft.jtitle=Procedia+computer+science&rft.au=Bhute%2C+Maitrey+M.&rft.au=Tare%2C+Sanskar+S.&rft.au=S%2C+Sridhar+Raj&rft.date=2025&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=258&rft.spage=3989&rft.epage=3999&rft_id=info:doi/10.1016%2Fj.procs.2025.04.650&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_procs_2025_04_650 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon |