Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition

Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically captured with lower-quality video cameras under unconstrained conditions, where facial appearances vary according to pose, illumination, sca...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) s. 1 - 6
Hlavní autoři:	Parchami, Mostafa, Bashbaghi, Saman, Granger, Eric, Sayed, Saif
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 01.08.2017
Témata:	Cameras Face Image reconstruction Portals Probes Robustness Training
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically captured with lower-quality video cameras under unconstrained conditions, where facial appearances vary according to pose, illumination, scale, expression, etc. These video ROIs are typically compared against facial models designed with high-quality reference still ROI of each target individual enrolled to the system. In this paper, an efficient Canonical Face Representation CNN (CFR-CNN) is proposed for accurate still-to-video FR from a single sample per person, where still and video ROIs are captured in different conditions. Given a facial ROI captured under unconstrained video conditions, the CRF-CNN reconstructs it as a high-quality canonical ROI for matching that corresponds to the conditons of reference still ROIs (e.g., well-illuminated, sharp, frontal views with neutral expression). A deep autoencoder network is trained using a novel weighted loss function that can robustly generate similar face embeddings for the same subjects. Then, during operations, those face embeddings belonging to pairs of still and video ROIs from a target individual are accurately matched using a fully-connected classification network. Experimental results obtained with the COX Face and Chokepoint datasets indicate that the proposed CFR-CNN can achieve convincing level of accuracy. The computational complexity (number of operations, network parameters and layers) is significantly lower than state-of-the-art CNNs for video FR, and suggests that the CFR-CNN represents a cost-effective solution for real-time applications.
AbstractList	Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically captured with lower-quality video cameras under unconstrained conditions, where facial appearances vary according to pose, illumination, scale, expression, etc. These video ROIs are typically compared against facial models designed with high-quality reference still ROI of each target individual enrolled to the system. In this paper, an efficient Canonical Face Representation CNN (CFR-CNN) is proposed for accurate still-to-video FR from a single sample per person, where still and video ROIs are captured in different conditions. Given a facial ROI captured under unconstrained video conditions, the CRF-CNN reconstructs it as a high-quality canonical ROI for matching that corresponds to the conditons of reference still ROIs (e.g., well-illuminated, sharp, frontal views with neutral expression). A deep autoencoder network is trained using a novel weighted loss function that can robustly generate similar face embeddings for the same subjects. Then, during operations, those face embeddings belonging to pairs of still and video ROIs from a target individual are accurately matched using a fully-connected classification network. Experimental results obtained with the COX Face and Chokepoint datasets indicate that the proposed CFR-CNN can achieve convincing level of accuracy. The computational complexity (number of operations, network parameters and layers) is significantly lower than state-of-the-art CNNs for video FR, and suggests that the CFR-CNN represents a cost-effective solution for real-time applications.
Author	Granger, Eric Parchami, Mostafa Bashbaghi, Saman Sayed, Saif
Author_xml	– sequence: 1 givenname: Mostafa surname: Parchami fullname: Parchami, Mostafa email: mostafa.parchami@mavs.uta.edu organization: Comput. Sci. & Eng. Dept., Univ. of Texas at Arlington, Arlington, TX, USA – sequence: 2 givenname: Saman surname: Bashbaghi fullname: Bashbaghi, Saman email: bashbaghi@livia.etsmtl.ca organization: Ecole de Technol. Super., Univ. du Quebec, Montreal, QC, Canada – sequence: 3 givenname: Eric surname: Granger fullname: Granger, Eric email: eric.granger@etsmtl.ca organization: Ecole de Technol. Super., Univ. du Quebec, Montreal, QC, Canada – sequence: 4 givenname: Saif surname: Sayed fullname: Sayed, Saif email: saif.sayed@uta.edu organization: Comput. Sci. & Eng. Dept., Univ. of Texas at Arlington, Arlington, TX, USA
BookMark	eNotj8tKAzEUQCPowlY_QNzkB1LzcPJYluKjUHDR6rbczNyUwDQZkrTg36vY1dkcDpwZuU45ISEPgi-E4O5p-bXdLiQXZmG5sV2nrshMdMpq6ZRzt2T8rDEd6IA4UTi1jKnPA5ZKW6YjQkm0ZH-qjQ75CDGxmM5QIqRGC04FK6YGLeZUaciF1hbHkbXMznHATAP0-Ov1-ZDin3RHbgKMFe8vnJPd68tu9c42H2_r1XLDouONCY-eO4GgtTNWBGGlDvoZlAh66LWVaJWWynNljOitNd562XnLoUNANGpOHv-zERH3U4lHKN_7y7_6Aa4hVyk
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/AVSS.2017.8078553
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library (IEL) (UW System Shared) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	1538629399 9781538629390
EndPage	6
ExternalDocumentID	8078553
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i90t-1beb091ea669781f1826f64a31f6dc682e83623b03771c887b8b25b80a5eaee73
IEDL.DBID	RIE
IngestDate	Thu Jun 29 18:37:00 EDT 2023
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i90t-1beb091ea669781f1826f64a31f6dc682e83623b03771c887b8b25b80a5eaee73
PageCount	6
ParticipantIDs	ieee_primary_8078553
PublicationCentury	2000
PublicationDate	2017-Aug.
PublicationDateYYYYMMDD	2017-08-01
PublicationDate_xml	– month: 08 year: 2017 text: 2017-Aug.
PublicationDecade	2010
PublicationTitle	2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
PublicationTitleAbbrev	AVSS
PublicationYear	2017
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.7550116
Snippet	Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Cameras Face Image reconstruction Portals Probes Robustness Training
Title	Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition
URI	https://ieeexplore.ieee.org/document/8078553
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3Pa8IwFA4qO-y0DR37TQ47Lto2bdIex5jsJIIyvEmSvoLgGtHUv9-82DkGu-wWQkrghfS9l_d93yPk2UfU4C9SwhSq3aLgHcsN56xIM1mlmGOkJjSbkJNJvlgU0w55OXFhACCAz2CIw1DLL61p8KlshNroWca7pCulOHK12kJlHBWj18_ZDLFactiu-9UwJfiL8cX_drokgx_iHZ2eXMoV6UDdJ-tQ1qclwIaqxlmUnkT4MXWWhqYPdGt1s3O0tF8-z2ereu8TYG8xGhQrv9lF9Y76AJX6K71eM2cZEvAsrZTf8oQisvWAzMfv87cP1jZJYKsicizWoL3LByUEqldVmC5UIlU8rkRpRJ5A7l0U1xGXMjb-j6JznWQ6j1QGCkDya9KrbQ03hCotDDeySGIVp6WufOCCanP4mUhB8FvSR0MtN0cZjGVro7u_p-_JOZ7FESv3QHpu28AjOTN7t9ptn8LZHQBgI54k
linkProvider	IEEE
linkToHtml	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3fS8MwEA5zCvqksom_zYOPZrZNmraPIo6Jcww2ZG8jSa8wmM3Y2v395rI5EXzxLYSUwIX07nLf9x0h9y6iBneRIqZQ7RYF71hqOGeZiJNCYI4hjG82kQwG6WSSDRvkYceFAQAPPoMODn0tP7emxqeyR9RGj2O-R_ZjIaJgw9balirDIHt8-hiNEK2VdLYrf7VM8R6je_y_vU5I-4d6R4c7p3JKGlC2yNwX9mkOsKCqriyKTyIAmVaW-rYPdGl1vapobj9dps9m5dqlwM5m1GtWfvOLyhV1ISp1l3o-Z5VlSMGztFBuyx2OyJZtMu6-jJ97bNsmgc2yoGKhBu2cPigpUb-qwIShkELxsJC5kWkEqXNSXAc8SULj_ik61VGs00DFoAASfkaapS3hnFClpeEmyaJQhSLXhQtdUG8OP5MCJL8gLTTUdLERwphubXT59_QdOeyN3_vT_uvg7Yoc4blskHPXpFkta7ghB2ZdzVbLW3-OX8L3oWs
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+14th+IEEE+International+Conference+on+Advanced+Video+and+Signal+Based+Surveillance+%28AVSS%29&rft.atitle=Using+deep+autoencoders+to+learn+robust+domain-invariant+representations+for+still-to-video+face+recognition&rft.au=Parchami%2C+Mostafa&rft.au=Bashbaghi%2C+Saman&rft.au=Granger%2C+Eric&rft.au=Sayed%2C+Saif&rft.date=2017-08-01&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FAVSS.2017.8078553&rft.externalDocID=8078553