A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes
Deep learning base solutions for computer vision made life easier for humans. Video data contain a lot of hidden information and patterns, that can be used for Human Action Recognition (HAR). HAR can apply to many areas, such as behavior analysis, intelligent video surveillance, and robotic vision....
Uložené v:
| Vydané v: | The Journal of supercomputing Ročník 78; číslo 2; s. 2873 - 2908 |
|---|---|
| Hlavní autori: | , , , , |
| Médium: | Journal Article |
| Jazyk: | English |
| Vydavateľské údaje: |
New York
Springer US
01.02.2022
Springer Nature B.V |
| Predmet: | |
| ISSN: | 0920-8542, 1573-0484 |
| On-line prístup: | Získať plný text |
| Tagy: |
Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
|
| Abstract | Deep learning base solutions for computer vision made life easier for humans. Video data contain a lot of hidden information and patterns, that can be used for Human Action Recognition (HAR). HAR can apply to many areas, such as behavior analysis, intelligent video surveillance, and robotic vision. Occlusion, viewpoint variation, and illumination are some issues that make the HAR task more difficult. Some action classes have similar actions or some overlapping parts in them. This, among many other problems, is the main reason that contributes the most to misclassification. Traditional hand-engineering and machine learning-based solutions lack the ability to handle overlapping actions. In this paper, we propose a deep learning-based spatiotemporal HAR framework for overlapping human actions in long videos. Transfer learning techniques are used for deep feature extraction. Fine-tuned pre-trained CNN models learn the spatial relationship at the frame level. An optimized Deep Autoencoder was used to squeeze high-dimensional deep features. An RNN with LSTM was used to learn the long-term temporal relationships. An iterative module added at the end to fine-tune the trained model on new videos that learns and adopt changes. Our proposed framework achieved state-of-the-art performance in spatiotemporal HAR for overlapping human actions in long visual data streams for non-stationary surveillance environments. |
|---|---|
| AbstractList | Deep learning base solutions for computer vision made life easier for humans. Video data contain a lot of hidden information and patterns, that can be used for Human Action Recognition (HAR). HAR can apply to many areas, such as behavior analysis, intelligent video surveillance, and robotic vision. Occlusion, viewpoint variation, and illumination are some issues that make the HAR task more difficult. Some action classes have similar actions or some overlapping parts in them. This, among many other problems, is the main reason that contributes the most to misclassification. Traditional hand-engineering and machine learning-based solutions lack the ability to handle overlapping actions. In this paper, we propose a deep learning-based spatiotemporal HAR framework for overlapping human actions in long videos. Transfer learning techniques are used for deep feature extraction. Fine-tuned pre-trained CNN models learn the spatial relationship at the frame level. An optimized Deep Autoencoder was used to squeeze high-dimensional deep features. An RNN with LSTM was used to learn the long-term temporal relationships. An iterative module added at the end to fine-tune the trained model on new videos that learns and adopt changes. Our proposed framework achieved state-of-the-art performance in spatiotemporal HAR for overlapping human actions in long visual data streams for non-stationary surveillance environments. |
| Author | Hasan, Najam Ul Bilal, Muhammad Rho, Seungmin Yasmin, Sadaf Maqsood, Muazzam |
| Author_xml | – sequence: 1 givenname: Muhammad surname: Bilal fullname: Bilal, Muhammad organization: Department of Computer Science, COMSATS University Islamabad – sequence: 2 givenname: Muazzam surname: Maqsood fullname: Maqsood, Muazzam email: muazzam.maqsood@cuiatk.edu.pk organization: Department of Computer Science, COMSATS University Islamabad – sequence: 3 givenname: Sadaf surname: Yasmin fullname: Yasmin, Sadaf organization: Department of Computer Science, COMSATS University Islamabad – sequence: 4 givenname: Najam Ul surname: Hasan fullname: Hasan, Najam Ul organization: Department of Electrical and Computer Engineering College of Engineering, Dhofar University – sequence: 5 givenname: Seungmin surname: Rho fullname: Rho, Seungmin email: smrho@cau.ac.kr organization: Department of Industrial Security, Chung-Ang University |
| BookMark | eNp9kMtKxDAUhoMoOF5ewFXAdTTXtlkO4g0EN7oOmfRkzNhJatJRXPjudqaC4MJVTsL_nZPzHaH9mCIgdMboBaO0viyMcV4TyhmhQquayD00Y6oWhMpG7qMZ1ZySRkl-iI5KWVFKpajFDH3N8ZBtLB4y7sDmGOKSLGyBFoP3wQWIAy69HUIaYN2nbDv8slnbiK0b3yLO4NIyhl3ts13DR8qv2KexXYpLbGOL0zvkzvZ92N4nynW2FCgn6MDbrsDpz3mMnm-un67uyMPj7f3V_IE4wfRAFAetXOMc9eBE21SV4JZ5tWhbUIox5aXjLaud02zB9UJWtVe-0tS5lnKrxDE6n_r2Ob1toAxmlTY5jiMNrzivmJa6HlPNlHI5lZLBGxeG7eZxVBQ6w6jZyjaTbDPKNjvZRo4o_4P2Oaxt_vwfEhNUxnBcQv791T_UN1dOlxo |
| CitedBy_id | crossref_primary_10_32604_cmc_2023_035512 crossref_primary_10_32604_cmc_2023_028743 crossref_primary_10_32604_cmc_2023_034563 crossref_primary_10_1007_s11042_023_17529_6 crossref_primary_10_1007_s11227_021_04007_9 crossref_primary_10_3390_app142210243 crossref_primary_10_3233_JIFS_230890 crossref_primary_10_1007_s11227_021_04076_w crossref_primary_10_32604_cmc_2023_039289 crossref_primary_10_1016_j_asoc_2023_110810 crossref_primary_10_1007_s11042_023_17864_8 crossref_primary_10_1080_01969722_2025_2521708 crossref_primary_10_3390_electronics12173567 crossref_primary_10_3390_s22030866 crossref_primary_10_1371_journal_pone_0318620 crossref_primary_10_1007_s00202_025_03298_y crossref_primary_10_1038_s41598_024_81437_4 crossref_primary_10_1109_TKDE_2024_3386794 crossref_primary_10_1007_s00521_024_10962_0 crossref_primary_10_1016_j_compeleceng_2024_109902 crossref_primary_10_1109_ACCESS_2025_3546266 crossref_primary_10_3390_healthcare11040609 crossref_primary_10_1016_j_eswa_2023_120311 crossref_primary_10_3390_healthcare9111579 crossref_primary_10_3390_diagnostics14202274 crossref_primary_10_1080_1206212X_2023_2232169 |
| Cites_doi | 10.5220/0007371104200426 10.1016/j.future.2019.01.029 10.1109/CVPR.2017.337 10.1109/CVPR.2009.5206848 10.1109/CVPR.2014.223 10.1109/ICSC45622.2019.8938371 10.1109/ICCV.2015.522 10.1109/TNNLS.2020.2978613 10.1109/CVPR.2016.308 10.1016/j.cviu.2016.03.013 10.1109/ACCESS.2018.2812835 10.1007/978-3-030-17795-9_10 10.1109/WACV.2015.152 10.1109/CVPR.2018.00817 10.1109/ACCESS.2020.2977856 10.1016/j.patrec.2018.08.003 10.1109/CVPR.2017.243 10.1109/TPAMI.2012.59 10.1063/5.0028564 10.1109/CVPR.2014.83 10.1007/11744047_33 10.1109/CVPR.2015.7298878 10.1145/2964284.2964299 10.1109/CVPR.2016.219 10.1109/ICCV.2015.510 10.3389/fnhum.2017.00334 10.1109/CVPR42600.2020.00394 10.1109/CVPR.2019.00033 10.1109/CVPR.2017.195 10.1109/CVPR.2016.213 10.1007/978-3-319-46484-8_2 10.1109/CVPR.2019.00035 10.1007/978-3-319-46493-0_38 10.1109/SIBGRAPI.2017.13 10.1109/ACCESS.2020.2976496 10.1007/978-3-030-20893-6_23 10.1109/TPAMI.2016.2537337 |
| ContentType | Journal Article |
| Copyright | The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021 The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021. |
| Copyright_xml | – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021 – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021. |
| DBID | AAYXX CITATION JQ2 |
| DOI | 10.1007/s11227-021-03957-4 |
| DatabaseName | CrossRef ProQuest Computer Science Collection |
| DatabaseTitle | CrossRef ProQuest Computer Science Collection |
| DatabaseTitleList | ProQuest Computer Science Collection |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1573-0484 |
| EndPage | 2908 |
| ExternalDocumentID | 10_1007_s11227_021_03957_4 |
| GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29L 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYOK AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDBF ABDPE ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACUHS ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADQRH ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHSBF AHYZX AI. AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. B0M BA0 BBWZM BDATZ BGNMA BSONS CAG COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EAD EAP EAS EBD EBLON EBS EDO EIOEI EJD EMK EPL ESBYG ESX F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ H~9 I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW LAK LLZTM M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P9O PF0 PT4 PT5 QOK QOS R4E R89 R9I RHV RNI ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW VH1 W23 W48 WH7 WK8 YLTOR Z45 Z7R Z7X Z7Z Z83 Z88 Z8M Z8N Z8R Z8T Z8W Z92 ZMTXR ~8M ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ABJCF ABRTQ ACSTC ADHKG ADKFA AEZWR AFDZB AFFHD AFHIU AFKRA AFOHR AGQPQ AHPBZ AHWEU AIXLP ARAPS ATHPR AYFIA BENPR BGLVJ CCPQU CITATION HCIFZ K7- M7S PHGZM PHGZT PQGLB PTHSS JQ2 |
| ID | FETCH-LOGICAL-c319t-52e95c8cc0fec3d86632a1f5bdde55115f4c2d17cc91b29b467f5f690ccd02a53 |
| IEDL.DBID | RSV |
| ISICitedReferencesCount | 32 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000673635300003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0920-8542 |
| IngestDate | Thu Sep 25 00:53:48 EDT 2025 Sat Nov 29 04:27:41 EST 2025 Tue Nov 18 22:35:59 EST 2025 Fri Feb 21 02:46:33 EST 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Keywords | Human action recognition Transfer learning Overlapping actions Video content analysis Deep autoencoder |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c319t-52e95c8cc0fec3d86632a1f5bdde55115f4c2d17cc91b29b467f5f690ccd02a53 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| PQID | 2622619497 |
| PQPubID | 2043774 |
| PageCount | 36 |
| ParticipantIDs | proquest_journals_2622619497 crossref_citationtrail_10_1007_s11227_021_03957_4 crossref_primary_10_1007_s11227_021_03957_4 springer_journals_10_1007_s11227_021_03957_4 |
| PublicationCentury | 2000 |
| PublicationDate | 2022-02-01 |
| PublicationDateYYYYMMDD | 2022-02-01 |
| PublicationDate_xml | – month: 02 year: 2022 text: 2022-02-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | New York |
| PublicationPlace_xml | – name: New York |
| PublicationSubtitle | An International Journal of High-Performance Computer Design, Analysis, and Use |
| PublicationTitle | The Journal of supercomputing |
| PublicationTitleAbbrev | J Supercomput |
| PublicationYear | 2022 |
| Publisher | Springer US Springer Nature B.V |
| Publisher_xml | – name: Springer US – name: Springer Nature B.V |
| References | Salau AO, Jain S (2019) Feature extraction: a survey of the types, techniques, applications. In: 2019 International Conference on Signal Processing and Communication (ICSC), IEEE, pp 158–164 MuhammadKAhmadJMehmoodIRhoSBaikSWConvolutional neural networks based fire detection in surveillance videosIEEE Access20186181741818310.1109/ACCESS.2018.2812835 Yang X, Yang X, Liu M-Y, Xiao F, Davis LS, Kautz J (2019) Step: spatio-temporal progressive learning for video action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 264–272 Zhuang F et al (2019) A comprehensive survey on transfer learning. arXiv preprint http://arxiv.org/abs/arXiv:1911.02685 WangCYangHMeinelCImage captioning with deep bidirectional LSTMs and multi-task learningACM Trans Multimedia Comput Commun Appl (TOMM)2018142120 Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International Conference on Machine Learning, pp 843–852 Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4489–4497 BasharASurvey on evolving deep learning neural network architecturesJ Artif Intell20191027382 Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255 He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European Conference on Computer Vision, Springer, pp 630–645 Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732 O’Mahony N, et al. (2019) Deep learning vs. traditional computer vision. In: Science and Information Conference, Springer, pp 128–144 Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint http://arxiv.org/abs/arXiv:1212.0402 Wojcicki S (2020) YouTube at 15: my personal journey and the road ahead. Youtube official Blog. https://blog.youtube/news-and-events/youtube-at-15-my-personal-journey (Accessed 12 Nov 2020) Girdhar R, Ramanan D, Gupta A, Sivic J, Russell B (2017) Actionvlad: learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 971–980 Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708 LiangCLiuDQiLGuanLMulti-modal human action recognition with sub-action exploiting and class-privacy preserved collaborative representation learningIEEE Access20208399203993310.1109/ACCESS.2020.2976496 Cisco (2020) Cisco annual internet report (2018–2023) white paper. https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html (Accessed 12 Nov 2020) Zheng Z, An G, Wu D, Ruan Q (2020) Global and local knowledge-aware attention network for action recognition. In: IEEE Transactions on Neural Networks and Learning Systems YuJA discriminative deep model with feature fusion and temporal attention for human action recognitionIEEE Access20208432434325510.1109/ACCESS.2020.2977856 Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1933–1941 LinY-PJungT-PImproving EEG-based emotion classification using conditional transfer learningFront Hum Neurosci20171133410.3389/fnhum.2017.00334 Zhu W, Hu J, Sun G, Cao X, Qiao Y (2016) A key volume mining deep framework for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1991–1999 Diba A, et al. (2017) Temporal 3d convnets: new architecture and transfer learning for video classification. arXiv preprint http://arxiv.org/abs/arXiv:1711.08200 Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4597–4605 Zhang D, Dai X, Wang Y-F (2020) METAL: minimum effort temporal activity localization in untrimmed videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3882–3892 Caetano CA, De Melo VHC, dos Santos JA, Schwartz WR (2017) Activity recognition based on a magnitude-orientation stream network. In: 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), IEEE, pp 47–54 LiuA-ASuY-TNieW-ZKankanhalliMHierarchical clustering multi-task learning for joint human action grouping and recognitionIEEE Trans Pattern Anal Mach Intell201639110211410.1109/TPAMI.2016.2537337 Cai Z, Wang L, Peng X, Qiao Y (2014) Multi-view super vector for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 596–603 Donahue J et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2625–2634 Wang L et al (2016) Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, Springer, pp 20–36 JiSXuWYangMYuK3D convolutional neural networks for human action recognitionIEEE Trans Pattern Anal Mach Intell201235122123110.1109/TPAMI.2012.59 Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional LSTMs. In: Proceedings of the 24th ACM international conference on Multimedia, pp 988–997 Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826 Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, Springer, pp 428–441 Shi F, Laganiere R, Petriu E (2015) Gradient boundary histograms for action recognition. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp 1107–1114 Nazir S, Qian Y, Yousaf M, Velastin Carroza SA, Izquierdo E, Vazquez E (2019) Human action recognition using multi-kernel learning for temporal residual network T. YouTube-Team (2020) 60 hours per minute and 4 billion views a day on YouTube. Youtube official Blog. https://blog.youtube/news-and-events/holy-nyans-60-hours-per-minute-and-4/ (Accessed 12 Nov 2020) UllahAMuhammadKHaqIUBaikSWAction recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environmentsFutur Gener Comput Syst20199638639710.1016/j.future.2019.01.029 PengXWangLWangXQiaoYBag of visual words and fusion methods for action recognition: comprehensive study and good practiceComput Vis Image Underst201615010912510.1016/j.cviu.2016.03.013 Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 244–253 Long X, Gan C, De Melo G, Wu J, Liu X, Wen S (2018) Attention clusters: purely attention based local feature integration for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7834–7843 Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1251–1258 Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint http://arxiv.org/abs/arXiv:1409.1556 Karthikayani K, Arunachalam A (2020) A survey on deep learning feature extraction techniques. In: AIP Conference Proceedings, vol 2282, no 1, AIP Publishing LLC, p 020035 MuhammadKHussainTBaikSWEfficient CNN based summarization of surveillance videos for resource-constrained devicesPattern Recogn Lett202013037037510.1016/j.patrec.2018.08.003 GeorgiouTLiuYChenWLewMA survey of traditional and deep learning-based feature descriptors for high dimensional data in computer visionInt J Multimedia Inf Retr20199136 Girdhar R, Ramanan D (2017) Attentional pooling for action recognition. In: Advances in Neural Information Processing Systems, pp 34–45 Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576 Zhu Y, Lan Z, Newsam S, Hauptmann A (2018) Hidden two-stream convolutional networks for action recognition. In: Asian Conference on Computer Vision, Springer, pp 363–378 3957_CR10 K Muhammad (3957_CR26) 2018; 6 3957_CR11 3957_CR12 3957_CR19 3957_CR13 3957_CR14 A Bashar (3957_CR33) 2019; 1 3957_CR15 3957_CR16 3957_CR42 3957_CR44 3957_CR45 3957_CR40 K Muhammad (3957_CR25) 2020; 130 3957_CR41 3957_CR1 3957_CR5 3957_CR4 3957_CR47 3957_CR3 3957_CR48 3957_CR2 3957_CR49 3957_CR9 3957_CR8 J Yu (3957_CR17) 2020; 8 X Peng (3957_CR43) 2016; 150 3957_CR7 3957_CR6 T Georgiou (3957_CR31) 2019; 9 3957_CR32 A-A Liu (3957_CR46) 2016; 39 3957_CR34 3957_CR30 3957_CR39 3957_CR35 3957_CR36 C Wang (3957_CR28) 2018; 14 3957_CR37 3957_CR38 S Ji (3957_CR22) 2012; 35 3957_CR20 3957_CR21 3957_CR29 3957_CR24 3957_CR27 Y-P Lin (3957_CR23) 2017; 11 A Ullah (3957_CR50) 2019; 96 C Liang (3957_CR18) 2020; 8 |
| References_xml | – reference: Caetano CA, De Melo VHC, dos Santos JA, Schwartz WR (2017) Activity recognition based on a magnitude-orientation stream network. In: 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), IEEE, pp 47–54 – reference: Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4489–4497 – reference: MuhammadKHussainTBaikSWEfficient CNN based summarization of surveillance videos for resource-constrained devicesPattern Recogn Lett202013037037510.1016/j.patrec.2018.08.003 – reference: Cai Z, Wang L, Peng X, Qiao Y (2014) Multi-view super vector for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 596–603 – reference: Zhu Y, Lan Z, Newsam S, Hauptmann A (2018) Hidden two-stream convolutional networks for action recognition. In: Asian Conference on Computer Vision, Springer, pp 363–378 – reference: PengXWangLWangXQiaoYBag of visual words and fusion methods for action recognition: comprehensive study and good practiceComput Vis Image Underst201615010912510.1016/j.cviu.2016.03.013 – reference: Yang X, Yang X, Liu M-Y, Xiao F, Davis LS, Kautz J (2019) Step: spatio-temporal progressive learning for video action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 264–272 – reference: LinY-PJungT-PImproving EEG-based emotion classification using conditional transfer learningFront Hum Neurosci20171133410.3389/fnhum.2017.00334 – reference: T. YouTube-Team (2020) 60 hours per minute and 4 billion views a day on YouTube. Youtube official Blog. https://blog.youtube/news-and-events/holy-nyans-60-hours-per-minute-and-4/ (Accessed 12 Nov 2020) – reference: JiSXuWYangMYuK3D convolutional neural networks for human action recognitionIEEE Trans Pattern Anal Mach Intell201235122123110.1109/TPAMI.2012.59 – reference: Zhu W, Hu J, Sun G, Cao X, Qiao Y (2016) A key volume mining deep framework for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1991–1999 – reference: Donahue J et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2625–2634 – reference: Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1251–1258 – reference: Wang L et al (2016) Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, Springer, pp 20–36 – reference: WangCYangHMeinelCImage captioning with deep bidirectional LSTMs and multi-task learningACM Trans Multimedia Comput Commun Appl (TOMM)2018142120 – reference: Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4597–4605 – reference: Shi F, Laganiere R, Petriu E (2015) Gradient boundary histograms for action recognition. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp 1107–1114 – reference: Diba A, et al. (2017) Temporal 3d convnets: new architecture and transfer learning for video classification. arXiv preprint http://arxiv.org/abs/arXiv:1711.08200 – reference: LiangCLiuDQiLGuanLMulti-modal human action recognition with sub-action exploiting and class-privacy preserved collaborative representation learningIEEE Access20208399203993310.1109/ACCESS.2020.2976496 – reference: Karthikayani K, Arunachalam A (2020) A survey on deep learning feature extraction techniques. In: AIP Conference Proceedings, vol 2282, no 1, AIP Publishing LLC, p 020035 – reference: Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 244–253 – reference: Zhang D, Dai X, Wang Y-F (2020) METAL: minimum effort temporal activity localization in untrimmed videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3882–3892 – reference: Nazir S, Qian Y, Yousaf M, Velastin Carroza SA, Izquierdo E, Vazquez E (2019) Human action recognition using multi-kernel learning for temporal residual network – reference: Zhuang F et al (2019) A comprehensive survey on transfer learning. arXiv preprint http://arxiv.org/abs/arXiv:1911.02685 – reference: Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint http://arxiv.org/abs/arXiv:1409.1556 – reference: BasharASurvey on evolving deep learning neural network architecturesJ Artif Intell20191027382 – reference: MuhammadKAhmadJMehmoodIRhoSBaikSWConvolutional neural networks based fire detection in surveillance videosIEEE Access20186181741818310.1109/ACCESS.2018.2812835 – reference: UllahAMuhammadKHaqIUBaikSWAction recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environmentsFutur Gener Comput Syst20199638639710.1016/j.future.2019.01.029 – reference: Zheng Z, An G, Wu D, Ruan Q (2020) Global and local knowledge-aware attention network for action recognition. In: IEEE Transactions on Neural Networks and Learning Systems – reference: Long X, Gan C, De Melo G, Wu J, Liu X, Wen S (2018) Attention clusters: purely attention based local feature integration for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7834–7843 – reference: Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint http://arxiv.org/abs/arXiv:1212.0402 – reference: Girdhar R, Ramanan D (2017) Attentional pooling for action recognition. In: Advances in Neural Information Processing Systems, pp 34–45 – reference: Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, Springer, pp 428–441 – reference: Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International Conference on Machine Learning, pp 843–852 – reference: Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576 – reference: Salau AO, Jain S (2019) Feature extraction: a survey of the types, techniques, applications. In: 2019 International Conference on Signal Processing and Communication (ICSC), IEEE, pp 158–164 – reference: YuJA discriminative deep model with feature fusion and temporal attention for human action recognitionIEEE Access20208432434325510.1109/ACCESS.2020.2977856 – reference: O’Mahony N, et al. (2019) Deep learning vs. traditional computer vision. In: Science and Information Conference, Springer, pp 128–144 – reference: Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1933–1941 – reference: Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826 – reference: Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional LSTMs. In: Proceedings of the 24th ACM international conference on Multimedia, pp 988–997 – reference: GeorgiouTLiuYChenWLewMA survey of traditional and deep learning-based feature descriptors for high dimensional data in computer visionInt J Multimedia Inf Retr20199136 – reference: Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708 – reference: Cisco (2020) Cisco annual internet report (2018–2023) white paper. https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html (Accessed 12 Nov 2020) – reference: Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255 – reference: He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European Conference on Computer Vision, Springer, pp 630–645 – reference: Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732 – reference: LiuA-ASuY-TNieW-ZKankanhalliMHierarchical clustering multi-task learning for joint human action grouping and recognitionIEEE Trans Pattern Anal Mach Intell201639110211410.1109/TPAMI.2016.2537337 – reference: Wojcicki S (2020) YouTube at 15: my personal journey and the road ahead. Youtube official Blog. https://blog.youtube/news-and-events/youtube-at-15-my-personal-journey (Accessed 12 Nov 2020) – reference: Girdhar R, Ramanan D, Gupta A, Sivic J, Russell B (2017) Actionvlad: learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 971–980 – ident: 3957_CR13 – ident: 3957_CR19 doi: 10.5220/0007371104200426 – volume: 1 start-page: 73 issue: 02 year: 2019 ident: 3957_CR33 publication-title: J Artif Intell – volume: 96 start-page: 386 year: 2019 ident: 3957_CR50 publication-title: Futur Gener Comput Syst doi: 10.1016/j.future.2019.01.029 – ident: 3957_CR10 doi: 10.1109/CVPR.2017.337 – ident: 3957_CR29 doi: 10.1109/CVPR.2009.5206848 – ident: 3957_CR4 doi: 10.1109/CVPR.2014.223 – ident: 3957_CR32 doi: 10.1109/ICSC45622.2019.8938371 – ident: 3957_CR48 doi: 10.1109/ICCV.2015.522 – ident: 3957_CR14 doi: 10.1109/TNNLS.2020.2978613 – ident: 3957_CR36 doi: 10.1109/CVPR.2016.308 – ident: 3957_CR12 – volume: 150 start-page: 109 year: 2016 ident: 3957_CR43 publication-title: Comput Vis Image Underst doi: 10.1016/j.cviu.2016.03.013 – volume: 6 start-page: 18174 year: 2018 ident: 3957_CR26 publication-title: IEEE Access doi: 10.1109/ACCESS.2018.2812835 – ident: 3957_CR30 doi: 10.1007/978-3-030-17795-9_10 – ident: 3957_CR42 doi: 10.1109/WACV.2015.152 – ident: 3957_CR15 doi: 10.1109/CVPR.2018.00817 – ident: 3957_CR3 – volume: 8 start-page: 43243 year: 2020 ident: 3957_CR17 publication-title: IEEE Access doi: 10.1109/ACCESS.2020.2977856 – volume: 130 start-page: 370 year: 2020 ident: 3957_CR25 publication-title: Pattern Recogn Lett doi: 10.1016/j.patrec.2018.08.003 – ident: 3957_CR35 doi: 10.1109/CVPR.2017.243 – volume: 35 start-page: 221 issue: 1 year: 2012 ident: 3957_CR22 publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2012.59 – ident: 3957_CR34 doi: 10.1063/5.0028564 – ident: 3957_CR44 doi: 10.1109/CVPR.2014.83 – volume: 14 start-page: 1 issue: 2 year: 2018 ident: 3957_CR28 publication-title: ACM Trans Multimedia Comput Commun Appl (TOMM) – ident: 3957_CR41 doi: 10.1007/11744047_33 – ident: 3957_CR6 doi: 10.1109/CVPR.2015.7298878 – ident: 3957_CR27 doi: 10.1145/2964284.2964299 – ident: 3957_CR47 doi: 10.1109/CVPR.2016.219 – ident: 3957_CR7 doi: 10.1109/ICCV.2015.510 – volume: 11 start-page: 334 year: 2017 ident: 3957_CR23 publication-title: Front Hum Neurosci doi: 10.3389/fnhum.2017.00334 – ident: 3957_CR2 – ident: 3957_CR20 doi: 10.1109/CVPR42600.2020.00394 – ident: 3957_CR16 doi: 10.1109/CVPR.2019.00033 – ident: 3957_CR38 doi: 10.1109/CVPR.2017.195 – volume: 9 start-page: 1 year: 2019 ident: 3957_CR31 publication-title: Int J Multimedia Inf Retr – ident: 3957_CR8 doi: 10.1109/CVPR.2016.213 – ident: 3957_CR9 doi: 10.1007/978-3-319-46484-8_2 – ident: 3957_CR39 – ident: 3957_CR21 doi: 10.1109/CVPR.2019.00035 – ident: 3957_CR37 doi: 10.1007/978-3-319-46493-0_38 – ident: 3957_CR40 doi: 10.1109/SIBGRAPI.2017.13 – volume: 8 start-page: 39920 year: 2020 ident: 3957_CR18 publication-title: IEEE Access doi: 10.1109/ACCESS.2020.2976496 – ident: 3957_CR49 – ident: 3957_CR24 – ident: 3957_CR1 – ident: 3957_CR11 doi: 10.1007/978-3-030-20893-6_23 – ident: 3957_CR5 – volume: 39 start-page: 102 issue: 1 year: 2016 ident: 3957_CR46 publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2016.2537337 – ident: 3957_CR45 |
| SSID | ssj0004373 |
| Score | 2.4244924 |
| Snippet | Deep learning base solutions for computer vision made life easier for humans. Video data contain a lot of hidden information and patterns, that can be used for... |
| SourceID | proquest crossref springer |
| SourceType | Aggregation Database Enrichment Source Index Database Publisher |
| StartPage | 2873 |
| SubjectTerms | Artificial Intelligence based Deep Video Data Analytics Compilers Computer Science Computer vision Data transmission Deep learning Feature extraction Human activity recognition Human motion Interpreters Iterative methods Machine learning Machine vision Occlusion Processor Architectures Programming Languages Surveillance Video data |
| Title | A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes |
| URI | https://link.springer.com/article/10.1007/s11227-021-03957-4 https://www.proquest.com/docview/2622619497 |
| Volume | 78 |
| WOSCitedRecordID | wos000673635300003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAVX databaseName: SpringerLINK Contemporary 1997-Present customDbUrl: eissn: 1573-0484 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0004373 issn: 0920-8542 databaseCode: RSV dateStart: 19970101 isFulltext: true titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22 providerName: Springer Nature |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA66evDi-sT1RQ7eNNBkkzY5LuLiaRF8sLeSTtNFWLqyrd787yZpukVRQXsqbRJCHjPfMDPfIHThQLiOpCbMYg_CtcqIlIUmimuIjQQrAMEXm0gmEzmdqruQFFa10e6tS9JL6i7ZjTKWEBdSEDnnEuHraEM4thlno98_ddmQw8avrKxhJAVnIVXm-zE-q6MOY35xi3ptM-7_b547aDugSzxqjsMuWjPlHuq3lRtwuMj76H2Ea49Y7cdQN2JGnELLsfGcElYV4coHWwfuqjn21fxwkweBV3FH9r1o47uwBcB4vihnWJc5drGhc-3oH2ZtL3BQ3VQH6HF883B9S0IhBgL2htbWWDVKgASICgPDXFqUwjQtRGZlo0VcVBQcWE4TAEUzpjIrfAtRWLsbII-YFsND1CsXpTlCmMa5oEmh7SM4hVxJYYwBrmOZRxmPBoi2-5FCYCl3xTLmacev7NY3teub-vVN-QBdrvq8NBwdv7Y-bbc5Dfe1SlnMnCnJVTJAV-22dr9_Hu34b81P0BZz-RM-7PsU9erlqzlDm_BWP1fLc3-OPwCGmu7g |
| linkProvider | Springer Nature |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEB58gV58i-szB28aaLLJNjmKKIq6CD7wVtJpughLFbt687-bZFOLooL2VNokhDxmvmG-mQHY8yDcJMpQ7rAHFUbnVKnSUC0M9qxCJwAxFJtI-311f6-vYlBY3bDdG5dkkNRtsBvjPKWeUpB45xIVkzAtfJkdb6Nf37XRkN2xX1k7w0hJwWOozPdjfFZHLcb84hYN2uZk4X_zXIT5iC7J4fg4LMGErZZhoancQOJFXoG3QzIKiNV9jHUjBtQrtILYkFPCqSJSB7J1zF01JKGaHxnHQZAP3pF7Lxt-F3EAmAwfqwExVUE8N3RofPqHQdMLPVS39SrcnhzfHJ3SWIiBoruhI2esWi1RISalxW6hHErhhpUyd7LRIS4mS4G8YCmiZjnXuRO-pSyd3Y1YJNzI7hpMVY-VXQfCeoVkaWncIwXDQitprUVheqpIcpF0gDX7kWHMUu6LZQyzNr-yX9_MrW8W1jcTHdj_6PM0ztHxa-utZpuzeF_rjPe4NyWFTjtw0Gxr-_vn0Tb-1nwXZk9vLi-yi7P--SbMcR9LESjgWzA1en6x2zCDr6OH-nknnOl3wRjxxA |
| linkToPdf | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Nb9QwEB2VFiEutHyJhVJ86A2sxl57Yx8rYEVVtKoooN4iZ2yvKq3Sqht6639n7CQNIEBC5BQlthP5843mzRuA_QTCXWEcl4Q9uHK25sZEx61yOAsGaQPEnGyiXCzM2Zk9-SGKP7PdB5dkF9OQVJqa9uDSx4Mx8E1IWfJELyiSo4mrO7ClyJJJpK5Pp1_HyMhp52O2ZCQZrWQfNvP7Nn4-mka8-YuLNJ888-3__-cdeNCjTnbYTZOHsBGaR7A9ZHRg_QJ_DDeHrM1Ilh72-SSWPB10noWsNUGfY-tMwu41rVYsZ_ljXXwEu-Uj0X0ceF-MgDFbXTRL5hrPEmd05ZIsxHKohQnCh_UT-DJ___ntB94naOBIK7clIzZYjQaxiAGn3hB6kU5EXdOeSUhM6KhQelEiWlFLW9OmHHUkexzRF9Lp6VPYbC6a8AyYmHktyujo0kqgt0aHEFC5mfFFrYoJiGFsKuzVy1MSjVU16i6n_q2of6vcv5WawOvbOpeddsdfS-8OQ17163hdyZlMJqay5QTeDEM8vv5za8__rfgruHfybl59PFocv4D7MoVYZGb4Lmy2V9_CS7iL1-35-movT-_vtsL6qA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+transfer+learning-based+efficient+spatiotemporal+human+action+recognition+framework+for+long+and+overlapping+action+classes&rft.jtitle=The+Journal+of+supercomputing&rft.au=Bilal+Muhammad&rft.au=Muazzam%2C+Maqsood&rft.au=Yasmin+Sadaf&rft.au=Hasan%2C+Najam+Ul&rft.date=2022-02-01&rft.pub=Springer+Nature+B.V&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=78&rft.issue=2&rft.spage=2873&rft.epage=2908&rft_id=info:doi/10.1007%2Fs11227-021-03957-4&rft.externalDBID=NO_FULL_TEXT |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon |