A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes

Deep learning base solutions for computer vision made life easier for humans. Video data contain a lot of hidden information and patterns, that can be used for Human Action Recognition (HAR). HAR can apply to many areas, such as behavior analysis, intelligent video surveillance, and robotic vision....

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:The Journal of supercomputing Ročník 78; číslo 2; s. 2873 - 2908
Hlavní autori: Bilal, Muhammad, Maqsood, Muazzam, Yasmin, Sadaf, Hasan, Najam Ul, Rho, Seungmin
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: New York Springer US 01.02.2022
Springer Nature B.V
Predmet:
ISSN:0920-8542, 1573-0484
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Abstract Deep learning base solutions for computer vision made life easier for humans. Video data contain a lot of hidden information and patterns, that can be used for Human Action Recognition (HAR). HAR can apply to many areas, such as behavior analysis, intelligent video surveillance, and robotic vision. Occlusion, viewpoint variation, and illumination are some issues that make the HAR task more difficult. Some action classes have similar actions or some overlapping parts in them. This, among many other problems, is the main reason that contributes the most to misclassification. Traditional hand-engineering and machine learning-based solutions lack the ability to handle overlapping actions. In this paper, we propose a deep learning-based spatiotemporal HAR framework for overlapping human actions in long videos. Transfer learning techniques are used for deep feature extraction. Fine-tuned pre-trained CNN models learn the spatial relationship at the frame level. An optimized Deep Autoencoder was used to squeeze high-dimensional deep features. An RNN with LSTM was used to learn the long-term temporal relationships. An iterative module added at the end to fine-tune the trained model on new videos that learns and adopt changes. Our proposed framework achieved state-of-the-art performance in spatiotemporal HAR for overlapping human actions in long visual data streams for non-stationary surveillance environments.
AbstractList Deep learning base solutions for computer vision made life easier for humans. Video data contain a lot of hidden information and patterns, that can be used for Human Action Recognition (HAR). HAR can apply to many areas, such as behavior analysis, intelligent video surveillance, and robotic vision. Occlusion, viewpoint variation, and illumination are some issues that make the HAR task more difficult. Some action classes have similar actions or some overlapping parts in them. This, among many other problems, is the main reason that contributes the most to misclassification. Traditional hand-engineering and machine learning-based solutions lack the ability to handle overlapping actions. In this paper, we propose a deep learning-based spatiotemporal HAR framework for overlapping human actions in long videos. Transfer learning techniques are used for deep feature extraction. Fine-tuned pre-trained CNN models learn the spatial relationship at the frame level. An optimized Deep Autoencoder was used to squeeze high-dimensional deep features. An RNN with LSTM was used to learn the long-term temporal relationships. An iterative module added at the end to fine-tune the trained model on new videos that learns and adopt changes. Our proposed framework achieved state-of-the-art performance in spatiotemporal HAR for overlapping human actions in long visual data streams for non-stationary surveillance environments.
Author Hasan, Najam Ul
Bilal, Muhammad
Rho, Seungmin
Yasmin, Sadaf
Maqsood, Muazzam
Author_xml – sequence: 1
  givenname: Muhammad
  surname: Bilal
  fullname: Bilal, Muhammad
  organization: Department of Computer Science, COMSATS University Islamabad
– sequence: 2
  givenname: Muazzam
  surname: Maqsood
  fullname: Maqsood, Muazzam
  email: muazzam.maqsood@cuiatk.edu.pk
  organization: Department of Computer Science, COMSATS University Islamabad
– sequence: 3
  givenname: Sadaf
  surname: Yasmin
  fullname: Yasmin, Sadaf
  organization: Department of Computer Science, COMSATS University Islamabad
– sequence: 4
  givenname: Najam Ul
  surname: Hasan
  fullname: Hasan, Najam Ul
  organization: Department of Electrical and Computer Engineering College of Engineering, Dhofar University
– sequence: 5
  givenname: Seungmin
  surname: Rho
  fullname: Rho, Seungmin
  email: smrho@cau.ac.kr
  organization: Department of Industrial Security, Chung-Ang University
BookMark eNp9kMtKxDAUhoMoOF5ewFXAdTTXtlkO4g0EN7oOmfRkzNhJatJRXPjudqaC4MJVTsL_nZPzHaH9mCIgdMboBaO0viyMcV4TyhmhQquayD00Y6oWhMpG7qMZ1ZySRkl-iI5KWVFKpajFDH3N8ZBtLB4y7sDmGOKSLGyBFoP3wQWIAy69HUIaYN2nbDv8slnbiK0b3yLO4NIyhl3ts13DR8qv2KexXYpLbGOL0zvkzvZ92N4nynW2FCgn6MDbrsDpz3mMnm-un67uyMPj7f3V_IE4wfRAFAetXOMc9eBE21SV4JZ5tWhbUIox5aXjLaud02zB9UJWtVe-0tS5lnKrxDE6n_r2Ob1toAxmlTY5jiMNrzivmJa6HlPNlHI5lZLBGxeG7eZxVBQ6w6jZyjaTbDPKNjvZRo4o_4P2Oaxt_vwfEhNUxnBcQv791T_UN1dOlxo
CitedBy_id crossref_primary_10_32604_cmc_2023_035512
crossref_primary_10_32604_cmc_2023_028743
crossref_primary_10_32604_cmc_2023_034563
crossref_primary_10_1007_s11042_023_17529_6
crossref_primary_10_1007_s11227_021_04007_9
crossref_primary_10_3390_app142210243
crossref_primary_10_3233_JIFS_230890
crossref_primary_10_1007_s11227_021_04076_w
crossref_primary_10_32604_cmc_2023_039289
crossref_primary_10_1016_j_asoc_2023_110810
crossref_primary_10_1007_s11042_023_17864_8
crossref_primary_10_1080_01969722_2025_2521708
crossref_primary_10_3390_electronics12173567
crossref_primary_10_3390_s22030866
crossref_primary_10_1371_journal_pone_0318620
crossref_primary_10_1007_s00202_025_03298_y
crossref_primary_10_1038_s41598_024_81437_4
crossref_primary_10_1109_TKDE_2024_3386794
crossref_primary_10_1007_s00521_024_10962_0
crossref_primary_10_1016_j_compeleceng_2024_109902
crossref_primary_10_1109_ACCESS_2025_3546266
crossref_primary_10_3390_healthcare11040609
crossref_primary_10_1016_j_eswa_2023_120311
crossref_primary_10_3390_healthcare9111579
crossref_primary_10_3390_diagnostics14202274
crossref_primary_10_1080_1206212X_2023_2232169
Cites_doi 10.5220/0007371104200426
10.1016/j.future.2019.01.029
10.1109/CVPR.2017.337
10.1109/CVPR.2009.5206848
10.1109/CVPR.2014.223
10.1109/ICSC45622.2019.8938371
10.1109/ICCV.2015.522
10.1109/TNNLS.2020.2978613
10.1109/CVPR.2016.308
10.1016/j.cviu.2016.03.013
10.1109/ACCESS.2018.2812835
10.1007/978-3-030-17795-9_10
10.1109/WACV.2015.152
10.1109/CVPR.2018.00817
10.1109/ACCESS.2020.2977856
10.1016/j.patrec.2018.08.003
10.1109/CVPR.2017.243
10.1109/TPAMI.2012.59
10.1063/5.0028564
10.1109/CVPR.2014.83
10.1007/11744047_33
10.1109/CVPR.2015.7298878
10.1145/2964284.2964299
10.1109/CVPR.2016.219
10.1109/ICCV.2015.510
10.3389/fnhum.2017.00334
10.1109/CVPR42600.2020.00394
10.1109/CVPR.2019.00033
10.1109/CVPR.2017.195
10.1109/CVPR.2016.213
10.1007/978-3-319-46484-8_2
10.1109/CVPR.2019.00035
10.1007/978-3-319-46493-0_38
10.1109/SIBGRAPI.2017.13
10.1109/ACCESS.2020.2976496
10.1007/978-3-030-20893-6_23
10.1109/TPAMI.2016.2537337
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021
The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021
– notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021.
DBID AAYXX
CITATION
JQ2
DOI 10.1007/s11227-021-03957-4
DatabaseName CrossRef
ProQuest Computer Science Collection
DatabaseTitle CrossRef
ProQuest Computer Science Collection
DatabaseTitleList
ProQuest Computer Science Collection
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-0484
EndPage 2908
ExternalDocumentID 10_1007_s11227_021_03957_4
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
.4S
.86
.DC
.VR
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29L
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYOK
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDBF
ABDPE
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACUHS
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADMLS
ADQRH
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHSBF
AHYZX
AI.
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARCSS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
B0M
BA0
BBWZM
BDATZ
BGNMA
BSONS
CAG
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EAD
EAP
EAS
EBD
EBLON
EBS
EDO
EIOEI
EJD
EMK
EPL
ESBYG
ESX
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
H~9
I-F
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
LAK
LLZTM
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P2P
P9O
PF0
PT4
PT5
QOK
QOS
R4E
R89
R9I
RHV
RNI
ROL
RPX
RSV
RZC
RZE
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TSG
TSK
TSV
TUC
TUS
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
VH1
W23
W48
WH7
WK8
YLTOR
Z45
Z7R
Z7X
Z7Z
Z83
Z88
Z8M
Z8N
Z8R
Z8T
Z8W
Z92
ZMTXR
~8M
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ABJCF
ABRTQ
ACSTC
ADHKG
ADKFA
AEZWR
AFDZB
AFFHD
AFHIU
AFKRA
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ARAPS
ATHPR
AYFIA
BENPR
BGLVJ
CCPQU
CITATION
HCIFZ
K7-
M7S
PHGZM
PHGZT
PQGLB
PTHSS
JQ2
ID FETCH-LOGICAL-c319t-52e95c8cc0fec3d86632a1f5bdde55115f4c2d17cc91b29b467f5f690ccd02a53
IEDL.DBID RSV
ISICitedReferencesCount 32
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000673635300003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0920-8542
IngestDate Thu Sep 25 00:53:48 EDT 2025
Sat Nov 29 04:27:41 EST 2025
Tue Nov 18 22:35:59 EST 2025
Fri Feb 21 02:46:33 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Keywords Human action recognition
Transfer learning
Overlapping actions
Video content analysis
Deep autoencoder
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c319t-52e95c8cc0fec3d86632a1f5bdde55115f4c2d17cc91b29b467f5f690ccd02a53
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2622619497
PQPubID 2043774
PageCount 36
ParticipantIDs proquest_journals_2622619497
crossref_citationtrail_10_1007_s11227_021_03957_4
crossref_primary_10_1007_s11227_021_03957_4
springer_journals_10_1007_s11227_021_03957_4
PublicationCentury 2000
PublicationDate 2022-02-01
PublicationDateYYYYMMDD 2022-02-01
PublicationDate_xml – month: 02
  year: 2022
  text: 2022-02-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationSubtitle An International Journal of High-Performance Computer Design, Analysis, and Use
PublicationTitle The Journal of supercomputing
PublicationTitleAbbrev J Supercomput
PublicationYear 2022
Publisher Springer US
Springer Nature B.V
Publisher_xml – name: Springer US
– name: Springer Nature B.V
References Salau AO, Jain S (2019) Feature extraction: a survey of the types, techniques, applications. In: 2019 International Conference on Signal Processing and Communication (ICSC), IEEE, pp 158–164
MuhammadKAhmadJMehmoodIRhoSBaikSWConvolutional neural networks based fire detection in surveillance videosIEEE Access20186181741818310.1109/ACCESS.2018.2812835
Yang X, Yang X, Liu M-Y, Xiao F, Davis LS, Kautz J (2019) Step: spatio-temporal progressive learning for video action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 264–272
Zhuang F et al (2019) A comprehensive survey on transfer learning. arXiv preprint http://arxiv.org/abs/arXiv:1911.02685
WangCYangHMeinelCImage captioning with deep bidirectional LSTMs and multi-task learningACM Trans Multimedia Comput Commun Appl (TOMM)2018142120
Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International Conference on Machine Learning, pp 843–852
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4489–4497
BasharASurvey on evolving deep learning neural network architecturesJ Artif Intell20191027382
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European Conference on Computer Vision, Springer, pp 630–645
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732
O’Mahony N, et al. (2019) Deep learning vs. traditional computer vision. In: Science and Information Conference, Springer, pp 128–144
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint http://arxiv.org/abs/arXiv:1212.0402
Wojcicki S (2020) YouTube at 15: my personal journey and the road ahead. Youtube official Blog. https://blog.youtube/news-and-events/youtube-at-15-my-personal-journey (Accessed 12 Nov 2020)
Girdhar R, Ramanan D, Gupta A, Sivic J, Russell B (2017) Actionvlad: learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 971–980
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708
LiangCLiuDQiLGuanLMulti-modal human action recognition with sub-action exploiting and class-privacy preserved collaborative representation learningIEEE Access20208399203993310.1109/ACCESS.2020.2976496
Cisco (2020) Cisco annual internet report (2018–2023) white paper. https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html (Accessed 12 Nov 2020)
Zheng Z, An G, Wu D, Ruan Q (2020) Global and local knowledge-aware attention network for action recognition. In: IEEE Transactions on Neural Networks and Learning Systems
YuJA discriminative deep model with feature fusion and temporal attention for human action recognitionIEEE Access20208432434325510.1109/ACCESS.2020.2977856
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1933–1941
LinY-PJungT-PImproving EEG-based emotion classification using conditional transfer learningFront Hum Neurosci20171133410.3389/fnhum.2017.00334
Zhu W, Hu J, Sun G, Cao X, Qiao Y (2016) A key volume mining deep framework for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1991–1999
Diba A, et al. (2017) Temporal 3d convnets: new architecture and transfer learning for video classification. arXiv preprint http://arxiv.org/abs/arXiv:1711.08200
Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4597–4605
Zhang D, Dai X, Wang Y-F (2020) METAL: minimum effort temporal activity localization in untrimmed videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3882–3892
Caetano CA, De Melo VHC, dos Santos JA, Schwartz WR (2017) Activity recognition based on a magnitude-orientation stream network. In: 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), IEEE, pp 47–54
LiuA-ASuY-TNieW-ZKankanhalliMHierarchical clustering multi-task learning for joint human action grouping and recognitionIEEE Trans Pattern Anal Mach Intell201639110211410.1109/TPAMI.2016.2537337
Cai Z, Wang L, Peng X, Qiao Y (2014) Multi-view super vector for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 596–603
Donahue J et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2625–2634
Wang L et al (2016) Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, Springer, pp 20–36
JiSXuWYangMYuK3D convolutional neural networks for human action recognitionIEEE Trans Pattern Anal Mach Intell201235122123110.1109/TPAMI.2012.59
Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional LSTMs. In: Proceedings of the 24th ACM international conference on Multimedia, pp 988–997
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, Springer, pp 428–441
Shi F, Laganiere R, Petriu E (2015) Gradient boundary histograms for action recognition. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp 1107–1114
Nazir S, Qian Y, Yousaf M, Velastin Carroza SA, Izquierdo E, Vazquez E (2019) Human action recognition using multi-kernel learning for temporal residual network
T. YouTube-Team (2020) 60 hours per minute and 4 billion views a day on YouTube. Youtube official Blog. https://blog.youtube/news-and-events/holy-nyans-60-hours-per-minute-and-4/ (Accessed 12 Nov 2020)
UllahAMuhammadKHaqIUBaikSWAction recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environmentsFutur Gener Comput Syst20199638639710.1016/j.future.2019.01.029
PengXWangLWangXQiaoYBag of visual words and fusion methods for action recognition: comprehensive study and good practiceComput Vis Image Underst201615010912510.1016/j.cviu.2016.03.013
Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 244–253
Long X, Gan C, De Melo G, Wu J, Liu X, Wen S (2018) Attention clusters: purely attention based local feature integration for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7834–7843
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1251–1258
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint http://arxiv.org/abs/arXiv:1409.1556
Karthikayani K, Arunachalam A (2020) A survey on deep learning feature extraction techniques. In: AIP Conference Proceedings, vol 2282, no 1, AIP Publishing LLC, p 020035
MuhammadKHussainTBaikSWEfficient CNN based summarization of surveillance videos for resource-constrained devicesPattern Recogn Lett202013037037510.1016/j.patrec.2018.08.003
GeorgiouTLiuYChenWLewMA survey of traditional and deep learning-based feature descriptors for high dimensional data in computer visionInt J Multimedia Inf Retr20199136
Girdhar R, Ramanan D (2017) Attentional pooling for action recognition. In: Advances in Neural Information Processing Systems, pp 34–45
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
Zhu Y, Lan Z, Newsam S, Hauptmann A (2018) Hidden two-stream convolutional networks for action recognition. In: Asian Conference on Computer Vision, Springer, pp 363–378
3957_CR10
K Muhammad (3957_CR26) 2018; 6
3957_CR11
3957_CR12
3957_CR19
3957_CR13
3957_CR14
A Bashar (3957_CR33) 2019; 1
3957_CR15
3957_CR16
3957_CR42
3957_CR44
3957_CR45
3957_CR40
K Muhammad (3957_CR25) 2020; 130
3957_CR41
3957_CR1
3957_CR5
3957_CR4
3957_CR47
3957_CR3
3957_CR48
3957_CR2
3957_CR49
3957_CR9
3957_CR8
J Yu (3957_CR17) 2020; 8
X Peng (3957_CR43) 2016; 150
3957_CR7
3957_CR6
T Georgiou (3957_CR31) 2019; 9
3957_CR32
A-A Liu (3957_CR46) 2016; 39
3957_CR34
3957_CR30
3957_CR39
3957_CR35
3957_CR36
C Wang (3957_CR28) 2018; 14
3957_CR37
3957_CR38
S Ji (3957_CR22) 2012; 35
3957_CR20
3957_CR21
3957_CR29
3957_CR24
3957_CR27
Y-P Lin (3957_CR23) 2017; 11
A Ullah (3957_CR50) 2019; 96
C Liang (3957_CR18) 2020; 8
References_xml – reference: Caetano CA, De Melo VHC, dos Santos JA, Schwartz WR (2017) Activity recognition based on a magnitude-orientation stream network. In: 2017 30th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), IEEE, pp 47–54
– reference: Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4489–4497
– reference: MuhammadKHussainTBaikSWEfficient CNN based summarization of surveillance videos for resource-constrained devicesPattern Recogn Lett202013037037510.1016/j.patrec.2018.08.003
– reference: Cai Z, Wang L, Peng X, Qiao Y (2014) Multi-view super vector for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 596–603
– reference: Zhu Y, Lan Z, Newsam S, Hauptmann A (2018) Hidden two-stream convolutional networks for action recognition. In: Asian Conference on Computer Vision, Springer, pp 363–378
– reference: PengXWangLWangXQiaoYBag of visual words and fusion methods for action recognition: comprehensive study and good practiceComput Vis Image Underst201615010912510.1016/j.cviu.2016.03.013
– reference: Yang X, Yang X, Liu M-Y, Xiao F, Davis LS, Kautz J (2019) Step: spatio-temporal progressive learning for video action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 264–272
– reference: LinY-PJungT-PImproving EEG-based emotion classification using conditional transfer learningFront Hum Neurosci20171133410.3389/fnhum.2017.00334
– reference: T. YouTube-Team (2020) 60 hours per minute and 4 billion views a day on YouTube. Youtube official Blog. https://blog.youtube/news-and-events/holy-nyans-60-hours-per-minute-and-4/ (Accessed 12 Nov 2020)
– reference: JiSXuWYangMYuK3D convolutional neural networks for human action recognitionIEEE Trans Pattern Anal Mach Intell201235122123110.1109/TPAMI.2012.59
– reference: Zhu W, Hu J, Sun G, Cao X, Qiao Y (2016) A key volume mining deep framework for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1991–1999
– reference: Donahue J et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2625–2634
– reference: Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1251–1258
– reference: Wang L et al (2016) Temporal segment networks: towards good practices for deep action recognition. In: European Conference on Computer Vision, Springer, pp 20–36
– reference: WangCYangHMeinelCImage captioning with deep bidirectional LSTMs and multi-task learningACM Trans Multimedia Comput Commun Appl (TOMM)2018142120
– reference: Sun L, Jia K, Yeung D-Y, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp 4597–4605
– reference: Shi F, Laganiere R, Petriu E (2015) Gradient boundary histograms for action recognition. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp 1107–1114
– reference: Diba A, et al. (2017) Temporal 3d convnets: new architecture and transfer learning for video classification. arXiv preprint http://arxiv.org/abs/arXiv:1711.08200
– reference: LiangCLiuDQiLGuanLMulti-modal human action recognition with sub-action exploiting and class-privacy preserved collaborative representation learningIEEE Access20208399203993310.1109/ACCESS.2020.2976496
– reference: Karthikayani K, Arunachalam A (2020) A survey on deep learning feature extraction techniques. In: AIP Conference Proceedings, vol 2282, no 1, AIP Publishing LLC, p 020035
– reference: Girdhar R, Carreira J, Doersch C, Zisserman A (2019) Video action transformer network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 244–253
– reference: Zhang D, Dai X, Wang Y-F (2020) METAL: minimum effort temporal activity localization in untrimmed videos. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3882–3892
– reference: Nazir S, Qian Y, Yousaf M, Velastin Carroza SA, Izquierdo E, Vazquez E (2019) Human action recognition using multi-kernel learning for temporal residual network
– reference: Zhuang F et al (2019) A comprehensive survey on transfer learning. arXiv preprint http://arxiv.org/abs/arXiv:1911.02685
– reference: Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint http://arxiv.org/abs/arXiv:1409.1556
– reference: BasharASurvey on evolving deep learning neural network architecturesJ Artif Intell20191027382
– reference: MuhammadKAhmadJMehmoodIRhoSBaikSWConvolutional neural networks based fire detection in surveillance videosIEEE Access20186181741818310.1109/ACCESS.2018.2812835
– reference: UllahAMuhammadKHaqIUBaikSWAction recognition using optimized deep autoencoder and CNN for surveillance data streams of non-stationary environmentsFutur Gener Comput Syst20199638639710.1016/j.future.2019.01.029
– reference: Zheng Z, An G, Wu D, Ruan Q (2020) Global and local knowledge-aware attention network for action recognition. In: IEEE Transactions on Neural Networks and Learning Systems
– reference: Long X, Gan C, De Melo G, Wu J, Liu X, Wen S (2018) Attention clusters: purely attention based local feature integration for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7834–7843
– reference: Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint http://arxiv.org/abs/arXiv:1212.0402
– reference: Girdhar R, Ramanan D (2017) Attentional pooling for action recognition. In: Advances in Neural Information Processing Systems, pp 34–45
– reference: Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: European Conference on Computer Vision, Springer, pp 428–441
– reference: Srivastava N, Mansimov E, Salakhudinov R (2015) Unsupervised learning of video representations using lstms. In: International Conference on Machine Learning, pp 843–852
– reference: Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
– reference: Salau AO, Jain S (2019) Feature extraction: a survey of the types, techniques, applications. In: 2019 International Conference on Signal Processing and Communication (ICSC), IEEE, pp 158–164
– reference: YuJA discriminative deep model with feature fusion and temporal attention for human action recognitionIEEE Access20208432434325510.1109/ACCESS.2020.2977856
– reference: O’Mahony N, et al. (2019) Deep learning vs. traditional computer vision. In: Science and Information Conference, Springer, pp 128–144
– reference: Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1933–1941
– reference: Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 2818–2826
– reference: Wang C, Yang H, Bartz C, Meinel C (2016) Image captioning with deep bidirectional LSTMs. In: Proceedings of the 24th ACM international conference on Multimedia, pp 988–997
– reference: GeorgiouTLiuYChenWLewMA survey of traditional and deep learning-based feature descriptors for high dimensional data in computer visionInt J Multimedia Inf Retr20199136
– reference: Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4700–4708
– reference: Cisco (2020) Cisco annual internet report (2018–2023) white paper. https://www.cisco.com/c/en/us/solutions/collateral/executive-perspectives/annual-internet-report/white-paper-c11-741490.html (Accessed 12 Nov 2020)
– reference: Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp 248–255
– reference: He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European Conference on Computer Vision, Springer, pp 630–645
– reference: Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732
– reference: LiuA-ASuY-TNieW-ZKankanhalliMHierarchical clustering multi-task learning for joint human action grouping and recognitionIEEE Trans Pattern Anal Mach Intell201639110211410.1109/TPAMI.2016.2537337
– reference: Wojcicki S (2020) YouTube at 15: my personal journey and the road ahead. Youtube official Blog. https://blog.youtube/news-and-events/youtube-at-15-my-personal-journey (Accessed 12 Nov 2020)
– reference: Girdhar R, Ramanan D, Gupta A, Sivic J, Russell B (2017) Actionvlad: learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 971–980
– ident: 3957_CR13
– ident: 3957_CR19
  doi: 10.5220/0007371104200426
– volume: 1
  start-page: 73
  issue: 02
  year: 2019
  ident: 3957_CR33
  publication-title: J Artif Intell
– volume: 96
  start-page: 386
  year: 2019
  ident: 3957_CR50
  publication-title: Futur Gener Comput Syst
  doi: 10.1016/j.future.2019.01.029
– ident: 3957_CR10
  doi: 10.1109/CVPR.2017.337
– ident: 3957_CR29
  doi: 10.1109/CVPR.2009.5206848
– ident: 3957_CR4
  doi: 10.1109/CVPR.2014.223
– ident: 3957_CR32
  doi: 10.1109/ICSC45622.2019.8938371
– ident: 3957_CR48
  doi: 10.1109/ICCV.2015.522
– ident: 3957_CR14
  doi: 10.1109/TNNLS.2020.2978613
– ident: 3957_CR36
  doi: 10.1109/CVPR.2016.308
– ident: 3957_CR12
– volume: 150
  start-page: 109
  year: 2016
  ident: 3957_CR43
  publication-title: Comput Vis Image Underst
  doi: 10.1016/j.cviu.2016.03.013
– volume: 6
  start-page: 18174
  year: 2018
  ident: 3957_CR26
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2018.2812835
– ident: 3957_CR30
  doi: 10.1007/978-3-030-17795-9_10
– ident: 3957_CR42
  doi: 10.1109/WACV.2015.152
– ident: 3957_CR15
  doi: 10.1109/CVPR.2018.00817
– ident: 3957_CR3
– volume: 8
  start-page: 43243
  year: 2020
  ident: 3957_CR17
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.2977856
– volume: 130
  start-page: 370
  year: 2020
  ident: 3957_CR25
  publication-title: Pattern Recogn Lett
  doi: 10.1016/j.patrec.2018.08.003
– ident: 3957_CR35
  doi: 10.1109/CVPR.2017.243
– volume: 35
  start-page: 221
  issue: 1
  year: 2012
  ident: 3957_CR22
  publication-title: IEEE Trans Pattern Anal Mach Intell
  doi: 10.1109/TPAMI.2012.59
– ident: 3957_CR34
  doi: 10.1063/5.0028564
– ident: 3957_CR44
  doi: 10.1109/CVPR.2014.83
– volume: 14
  start-page: 1
  issue: 2
  year: 2018
  ident: 3957_CR28
  publication-title: ACM Trans Multimedia Comput Commun Appl (TOMM)
– ident: 3957_CR41
  doi: 10.1007/11744047_33
– ident: 3957_CR6
  doi: 10.1109/CVPR.2015.7298878
– ident: 3957_CR27
  doi: 10.1145/2964284.2964299
– ident: 3957_CR47
  doi: 10.1109/CVPR.2016.219
– ident: 3957_CR7
  doi: 10.1109/ICCV.2015.510
– volume: 11
  start-page: 334
  year: 2017
  ident: 3957_CR23
  publication-title: Front Hum Neurosci
  doi: 10.3389/fnhum.2017.00334
– ident: 3957_CR2
– ident: 3957_CR20
  doi: 10.1109/CVPR42600.2020.00394
– ident: 3957_CR16
  doi: 10.1109/CVPR.2019.00033
– ident: 3957_CR38
  doi: 10.1109/CVPR.2017.195
– volume: 9
  start-page: 1
  year: 2019
  ident: 3957_CR31
  publication-title: Int J Multimedia Inf Retr
– ident: 3957_CR8
  doi: 10.1109/CVPR.2016.213
– ident: 3957_CR9
  doi: 10.1007/978-3-319-46484-8_2
– ident: 3957_CR39
– ident: 3957_CR21
  doi: 10.1109/CVPR.2019.00035
– ident: 3957_CR37
  doi: 10.1007/978-3-319-46493-0_38
– ident: 3957_CR40
  doi: 10.1109/SIBGRAPI.2017.13
– volume: 8
  start-page: 39920
  year: 2020
  ident: 3957_CR18
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2020.2976496
– ident: 3957_CR49
– ident: 3957_CR24
– ident: 3957_CR1
– ident: 3957_CR11
  doi: 10.1007/978-3-030-20893-6_23
– ident: 3957_CR5
– volume: 39
  start-page: 102
  issue: 1
  year: 2016
  ident: 3957_CR46
  publication-title: IEEE Trans Pattern Anal Mach Intell
  doi: 10.1109/TPAMI.2016.2537337
– ident: 3957_CR45
SSID ssj0004373
Score 2.4244924
Snippet Deep learning base solutions for computer vision made life easier for humans. Video data contain a lot of hidden information and patterns, that can be used for...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 2873
SubjectTerms Artificial Intelligence based Deep Video Data Analytics
Compilers
Computer Science
Computer vision
Data transmission
Deep learning
Feature extraction
Human activity recognition
Human motion
Interpreters
Iterative methods
Machine learning
Machine vision
Occlusion
Processor Architectures
Programming Languages
Surveillance
Video data
Title A transfer learning-based efficient spatiotemporal human action recognition framework for long and overlapping action classes
URI https://link.springer.com/article/10.1007/s11227-021-03957-4
https://www.proquest.com/docview/2622619497
Volume 78
WOSCitedRecordID wos000673635300003&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAVX
  databaseName: SpringerLINK Contemporary 1997-Present
  customDbUrl:
  eissn: 1573-0484
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0004373
  issn: 0920-8542
  databaseCode: RSV
  dateStart: 19970101
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA66evDi-sT1RQ7eNNBkkzY5LuLiaRF8sLeSTtNFWLqyrd787yZpukVRQXsqbRJCHjPfMDPfIHThQLiOpCbMYg_CtcqIlIUmimuIjQQrAMEXm0gmEzmdqruQFFa10e6tS9JL6i7ZjTKWEBdSEDnnEuHraEM4thlno98_ddmQw8avrKxhJAVnIVXm-zE-q6MOY35xi3ptM-7_b547aDugSzxqjsMuWjPlHuq3lRtwuMj76H2Ea49Y7cdQN2JGnELLsfGcElYV4coHWwfuqjn21fxwkweBV3FH9r1o47uwBcB4vihnWJc5drGhc-3oH2ZtL3BQ3VQH6HF883B9S0IhBgL2htbWWDVKgASICgPDXFqUwjQtRGZlo0VcVBQcWE4TAEUzpjIrfAtRWLsbII-YFsND1CsXpTlCmMa5oEmh7SM4hVxJYYwBrmOZRxmPBoi2-5FCYCl3xTLmacev7NY3teub-vVN-QBdrvq8NBwdv7Y-bbc5Dfe1SlnMnCnJVTJAV-22dr9_Hu34b81P0BZz-RM-7PsU9erlqzlDm_BWP1fLc3-OPwCGmu7g
linkProvider Springer Nature
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEB58gV58i-szB28aaLLJNjmKKIq6CD7wVtJpughLFbt687-bZFOLooL2VNokhDxmvmG-mQHY8yDcJMpQ7rAHFUbnVKnSUC0M9qxCJwAxFJtI-311f6-vYlBY3bDdG5dkkNRtsBvjPKWeUpB45xIVkzAtfJkdb6Nf37XRkN2xX1k7w0hJwWOozPdjfFZHLcb84hYN2uZk4X_zXIT5iC7J4fg4LMGErZZhoancQOJFXoG3QzIKiNV9jHUjBtQrtILYkFPCqSJSB7J1zF01JKGaHxnHQZAP3pF7Lxt-F3EAmAwfqwExVUE8N3RofPqHQdMLPVS39SrcnhzfHJ3SWIiBoruhI2esWi1RISalxW6hHErhhpUyd7LRIS4mS4G8YCmiZjnXuRO-pSyd3Y1YJNzI7hpMVY-VXQfCeoVkaWncIwXDQitprUVheqpIcpF0gDX7kWHMUu6LZQyzNr-yX9_MrW8W1jcTHdj_6PM0ztHxa-utZpuzeF_rjPe4NyWFTjtw0Gxr-_vn0Tb-1nwXZk9vLi-yi7P--SbMcR9LESjgWzA1en6x2zCDr6OH-nknnOl3wRjxxA
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1Nb9QwEB2VFiEutHyJhVJ86A2sxl57Yx8rYEVVtKoooN4iZ2yvKq3Sqht6639n7CQNIEBC5BQlthP5843mzRuA_QTCXWEcl4Q9uHK25sZEx61yOAsGaQPEnGyiXCzM2Zk9-SGKP7PdB5dkF9OQVJqa9uDSx4Mx8E1IWfJELyiSo4mrO7ClyJJJpK5Pp1_HyMhp52O2ZCQZrWQfNvP7Nn4-mka8-YuLNJ888-3__-cdeNCjTnbYTZOHsBGaR7A9ZHRg_QJ_DDeHrM1Ilh72-SSWPB10noWsNUGfY-tMwu41rVYsZ_ljXXwEu-Uj0X0ceF-MgDFbXTRL5hrPEmd05ZIsxHKohQnCh_UT-DJ___ntB94naOBIK7clIzZYjQaxiAGn3hB6kU5EXdOeSUhM6KhQelEiWlFLW9OmHHUkexzRF9Lp6VPYbC6a8AyYmHktyujo0kqgt0aHEFC5mfFFrYoJiGFsKuzVy1MSjVU16i6n_q2of6vcv5WawOvbOpeddsdfS-8OQ17163hdyZlMJqay5QTeDEM8vv5za8__rfgruHfybl59PFocv4D7MoVYZGb4Lmy2V9_CS7iL1-35-movT-_vtsL6qA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+transfer+learning-based+efficient+spatiotemporal+human+action+recognition+framework+for+long+and+overlapping+action+classes&rft.jtitle=The+Journal+of+supercomputing&rft.au=Bilal+Muhammad&rft.au=Muazzam%2C+Maqsood&rft.au=Yasmin+Sadaf&rft.au=Hasan%2C+Najam+Ul&rft.date=2022-02-01&rft.pub=Springer+Nature+B.V&rft.issn=0920-8542&rft.eissn=1573-0484&rft.volume=78&rft.issue=2&rft.spage=2873&rft.epage=2908&rft_id=info:doi/10.1007%2Fs11227-021-03957-4&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0920-8542&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0920-8542&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0920-8542&client=summon