Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition

Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically captured with lower-quality video cameras under unconstrained conditions, where facial appearances vary according to pose, illumination, sca...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) s. 1 - 6
Hlavní autoři: Parchami, Mostafa, Bashbaghi, Saman, Granger, Eric, Sayed, Saif
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.08.2017
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically captured with lower-quality video cameras under unconstrained conditions, where facial appearances vary according to pose, illumination, scale, expression, etc. These video ROIs are typically compared against facial models designed with high-quality reference still ROI of each target individual enrolled to the system. In this paper, an efficient Canonical Face Representation CNN (CFR-CNN) is proposed for accurate still-to-video FR from a single sample per person, where still and video ROIs are captured in different conditions. Given a facial ROI captured under unconstrained video conditions, the CRF-CNN reconstructs it as a high-quality canonical ROI for matching that corresponds to the conditons of reference still ROIs (e.g., well-illuminated, sharp, frontal views with neutral expression). A deep autoencoder network is trained using a novel weighted loss function that can robustly generate similar face embeddings for the same subjects. Then, during operations, those face embeddings belonging to pairs of still and video ROIs from a target individual are accurately matched using a fully-connected classification network. Experimental results obtained with the COX Face and Chokepoint datasets indicate that the proposed CFR-CNN can achieve convincing level of accuracy. The computational complexity (number of operations, network parameters and layers) is significantly lower than state-of-the-art CNNs for video FR, and suggests that the CFR-CNN represents a cost-effective solution for real-time applications.
AbstractList Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically captured with lower-quality video cameras under unconstrained conditions, where facial appearances vary according to pose, illumination, scale, expression, etc. These video ROIs are typically compared against facial models designed with high-quality reference still ROI of each target individual enrolled to the system. In this paper, an efficient Canonical Face Representation CNN (CFR-CNN) is proposed for accurate still-to-video FR from a single sample per person, where still and video ROIs are captured in different conditions. Given a facial ROI captured under unconstrained video conditions, the CRF-CNN reconstructs it as a high-quality canonical ROI for matching that corresponds to the conditons of reference still ROIs (e.g., well-illuminated, sharp, frontal views with neutral expression). A deep autoencoder network is trained using a novel weighted loss function that can robustly generate similar face embeddings for the same subjects. Then, during operations, those face embeddings belonging to pairs of still and video ROIs from a target individual are accurately matched using a fully-connected classification network. Experimental results obtained with the COX Face and Chokepoint datasets indicate that the proposed CFR-CNN can achieve convincing level of accuracy. The computational complexity (number of operations, network parameters and layers) is significantly lower than state-of-the-art CNNs for video FR, and suggests that the CFR-CNN represents a cost-effective solution for real-time applications.
Author Granger, Eric
Parchami, Mostafa
Bashbaghi, Saman
Sayed, Saif
Author_xml – sequence: 1
  givenname: Mostafa
  surname: Parchami
  fullname: Parchami, Mostafa
  email: mostafa.parchami@mavs.uta.edu
  organization: Comput. Sci. & Eng. Dept., Univ. of Texas at Arlington, Arlington, TX, USA
– sequence: 2
  givenname: Saman
  surname: Bashbaghi
  fullname: Bashbaghi, Saman
  email: bashbaghi@livia.etsmtl.ca
  organization: Ecole de Technol. Super., Univ. du Quebec, Montreal, QC, Canada
– sequence: 3
  givenname: Eric
  surname: Granger
  fullname: Granger, Eric
  email: eric.granger@etsmtl.ca
  organization: Ecole de Technol. Super., Univ. du Quebec, Montreal, QC, Canada
– sequence: 4
  givenname: Saif
  surname: Sayed
  fullname: Sayed, Saif
  email: saif.sayed@uta.edu
  organization: Comput. Sci. & Eng. Dept., Univ. of Texas at Arlington, Arlington, TX, USA
BookMark eNotj8tKAzEUQCPowlY_QNzkB1LzcPJYluKjUHDR6rbczNyUwDQZkrTg36vY1dkcDpwZuU45ISEPgi-E4O5p-bXdLiQXZmG5sV2nrshMdMpq6ZRzt2T8rDEd6IA4UTi1jKnPA5ZKW6YjQkm0ZH-qjQ75CDGxmM5QIqRGC04FK6YGLeZUaciF1hbHkbXMznHATAP0-Ov1-ZDin3RHbgKMFe8vnJPd68tu9c42H2_r1XLDouONCY-eO4GgtTNWBGGlDvoZlAh66LWVaJWWynNljOitNd562XnLoUNANGpOHv-zERH3U4lHKN_7y7_6Aa4hVyk
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/AVSS.2017.8078553
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1538629399
9781538629390
EndPage 6
ExternalDocumentID 8078553
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i90t-1beb091ea669781f1826f64a31f6dc682e83623b03771c887b8b25b80a5eaee73
IEDL.DBID RIE
IngestDate Thu Jun 29 18:37:00 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-1beb091ea669781f1826f64a31f6dc682e83623b03771c887b8b25b80a5eaee73
PageCount 6
ParticipantIDs ieee_primary_8078553
PublicationCentury 2000
PublicationDate 2017-Aug.
PublicationDateYYYYMMDD 2017-08-01
PublicationDate_xml – month: 08
  year: 2017
  text: 2017-Aug.
PublicationDecade 2010
PublicationTitle 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)
PublicationTitleAbbrev AVSS
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7551095
Snippet Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Cameras
Face
Image reconstruction
Portals
Probes
Robustness
Training
Title Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition
URI https://ieeexplore.ieee.org/document/8078553
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA61ePCk0opvcvBo2t1k89ijiMVTKbRIbyWPWSjUTWm3_f0m6VoRvHgLISQwIZlM5vu-QejJKaoLk5ekpFaSQutw5gyXxLCyAkaNYFWWik3I8VjN5-Wkg56PXBgASOAzGMRmyuU7b3fxq2wYtdE5ZyfoREpx4Gq1ico8K4cvH9NpxGrJQTvuV8GU5C9G5_9b6QL1f4h3eHJ0KZeoA3UPrVJaHzuANda7xkfpyQg_xo3HqegD3niz2zbY-c8Q55NlvQ8BcLAYToqV3-yieovDAxWHI71akcaTSMDzuNJhySOKyNd9NBu9zV7fSVskgSzLrCG5ARNcPmghonpVFcOFShSa5ZVwVigKKrgoZjImZW7DjWKUodyoTHPQAJJdoW7ta7hGWDBqlc1NmEwVhXTKcVty5kDnVFPLb1AvGmqxPshgLFob3f7dfYfO4l4csHL3qNtsdvCATu2-WW43j2nvvgBJFp_8
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEA21CnpSacVvc_Bo6m6y2WSPIpaKtRRapLeSZGehUDel3e3vN0lrRfDiLYSQwIRkMpn33iB0n0uqEh1nJKNGkEQpd-Y0F0SzrABGdcqKKBSbEIOBnEyyYQM97LgwABDAZ9DxzZDLz62p_VfZo9dG55ztoX2eJDTasLW2qco4yh6fPkYjj9YSne3IXyVTgsfoHv9vrRPU_qHe4eHOqZyiBpQtNA-JfZwDLLCqK-vFJz0AGVcWh7IPeGl1vapwbj9dpE9m5dqFwM5mOGhWfvOLyhV2T1TsDvV8TipLPAXP4kK5JXc4Ilu20bj7Mn7ukW2ZBDLLoorEGrRz-qDS1OtXFT5gKNJEsbhIc5NKCtI5KaYjJkRs3J2ipaZcy0hxUACCnaFmaUs4Rzhl1EgTazeZTBKRy5ybjLMcVEwVNfwCtbyhpouNEMZ0a6PLv7vv0GFv_N6f9l8Hb1foyO_LBjl3jZrVsoYbdGDW1Wy1vA37-AXEqqND
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+14th+IEEE+International+Conference+on+Advanced+Video+and+Signal+Based+Surveillance+%28AVSS%29&rft.atitle=Using+deep+autoencoders+to+learn+robust+domain-invariant+representations+for+still-to-video+face+recognition&rft.au=Parchami%2C+Mostafa&rft.au=Bashbaghi%2C+Saman&rft.au=Granger%2C+Eric&rft.au=Sayed%2C+Saif&rft.date=2017-08-01&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FAVSS.2017.8078553&rft.externalDocID=8078553