Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition
Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically captured with lower-quality video cameras under unconstrained conditions, where facial appearances vary according to pose, illumination, sca...
Saved in:
| Published in: | 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) pp. 1 - 6 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.08.2017
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically captured with lower-quality video cameras under unconstrained conditions, where facial appearances vary according to pose, illumination, scale, expression, etc. These video ROIs are typically compared against facial models designed with high-quality reference still ROI of each target individual enrolled to the system. In this paper, an efficient Canonical Face Representation CNN (CFR-CNN) is proposed for accurate still-to-video FR from a single sample per person, where still and video ROIs are captured in different conditions. Given a facial ROI captured under unconstrained video conditions, the CRF-CNN reconstructs it as a high-quality canonical ROI for matching that corresponds to the conditons of reference still ROIs (e.g., well-illuminated, sharp, frontal views with neutral expression). A deep autoencoder network is trained using a novel weighted loss function that can robustly generate similar face embeddings for the same subjects. Then, during operations, those face embeddings belonging to pairs of still and video ROIs from a target individual are accurately matched using a fully-connected classification network. Experimental results obtained with the COX Face and Chokepoint datasets indicate that the proposed CFR-CNN can achieve convincing level of accuracy. The computational complexity (number of operations, network parameters and layers) is significantly lower than state-of-the-art CNNs for video FR, and suggests that the CFR-CNN represents a cost-effective solution for real-time applications. |
|---|---|
| AbstractList | Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically captured with lower-quality video cameras under unconstrained conditions, where facial appearances vary according to pose, illumination, scale, expression, etc. These video ROIs are typically compared against facial models designed with high-quality reference still ROI of each target individual enrolled to the system. In this paper, an efficient Canonical Face Representation CNN (CFR-CNN) is proposed for accurate still-to-video FR from a single sample per person, where still and video ROIs are captured in different conditions. Given a facial ROI captured under unconstrained video conditions, the CRF-CNN reconstructs it as a high-quality canonical ROI for matching that corresponds to the conditons of reference still ROIs (e.g., well-illuminated, sharp, frontal views with neutral expression). A deep autoencoder network is trained using a novel weighted loss function that can robustly generate similar face embeddings for the same subjects. Then, during operations, those face embeddings belonging to pairs of still and video ROIs from a target individual are accurately matched using a fully-connected classification network. Experimental results obtained with the COX Face and Chokepoint datasets indicate that the proposed CFR-CNN can achieve convincing level of accuracy. The computational complexity (number of operations, network parameters and layers) is significantly lower than state-of-the-art CNNs for video FR, and suggests that the CFR-CNN represents a cost-effective solution for real-time applications. |
| Author | Granger, Eric Parchami, Mostafa Bashbaghi, Saman Sayed, Saif |
| Author_xml | – sequence: 1 givenname: Mostafa surname: Parchami fullname: Parchami, Mostafa email: mostafa.parchami@mavs.uta.edu organization: Comput. Sci. & Eng. Dept., Univ. of Texas at Arlington, Arlington, TX, USA – sequence: 2 givenname: Saman surname: Bashbaghi fullname: Bashbaghi, Saman email: bashbaghi@livia.etsmtl.ca organization: Ecole de Technol. Super., Univ. du Quebec, Montreal, QC, Canada – sequence: 3 givenname: Eric surname: Granger fullname: Granger, Eric email: eric.granger@etsmtl.ca organization: Ecole de Technol. Super., Univ. du Quebec, Montreal, QC, Canada – sequence: 4 givenname: Saif surname: Sayed fullname: Sayed, Saif email: saif.sayed@uta.edu organization: Comput. Sci. & Eng. Dept., Univ. of Texas at Arlington, Arlington, TX, USA |
| BookMark | eNotj8tKAzEUQCPowlY_QNzkB1LzcPJYluKjUHDR6rbczNyUwDQZkrTg36vY1dkcDpwZuU45ISEPgi-E4O5p-bXdLiQXZmG5sV2nrshMdMpq6ZRzt2T8rDEd6IA4UTi1jKnPA5ZKW6YjQkm0ZH-qjQ75CDGxmM5QIqRGC04FK6YGLeZUaciF1hbHkbXMznHATAP0-Ov1-ZDin3RHbgKMFe8vnJPd68tu9c42H2_r1XLDouONCY-eO4GgtTNWBGGlDvoZlAh66LWVaJWWynNljOitNd562XnLoUNANGpOHv-zERH3U4lHKN_7y7_6Aa4hVyk |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/AVSS.2017.8078553 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1538629399 9781538629390 |
| EndPage | 6 |
| ExternalDocumentID | 8078553 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i90t-1beb091ea669781f1826f64a31f6dc682e83623b03771c887b8b25b80a5eaee73 |
| IEDL.DBID | RIE |
| IngestDate | Thu Jun 29 18:37:00 EDT 2023 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i90t-1beb091ea669781f1826f64a31f6dc682e83623b03771c887b8b25b80a5eaee73 |
| PageCount | 6 |
| ParticipantIDs | ieee_primary_8078553 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-Aug. |
| PublicationDateYYYYMMDD | 2017-08-01 |
| PublicationDate_xml | – month: 08 year: 2017 text: 2017-Aug. |
| PublicationDecade | 2010 |
| PublicationTitle | 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) |
| PublicationTitleAbbrev | AVSS |
| PublicationYear | 2017 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.7550116 |
| Snippet | Video-based face recognition (FR) is a challenging task in real-world applications. In still-to-video FR, probe facial regions of interest (ROIs) are typically... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 1 |
| SubjectTerms | Cameras Face Image reconstruction Portals Probes Robustness Training |
| Title | Using deep autoencoders to learn robust domain-invariant representations for still-to-video face recognition |
| URI | https://ieeexplore.ieee.org/document/8078553 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NawIxEA1WeuipLVr6TQ49NrprdpPssZRKTyIoxZskuxMQ7EY06-_vJG4thV56C0sgMJNkMjvvvSHkCUpQ3NiEpTbPWWaUZlriwQN0NmRc6bJSsdmEnEzUYlFMO-T5yIUBgAg-g0EYxlp-5com_CobBm30POcn5ERKceBqtYXKNCmGLx-zWcBqyUE771fDlBgvxuf_W-mC9H-Id3R6DCmXpAN1j6xjWZ9WABuqG--C9GSAH1PvaGz6QLfONDtPK_eJeT5b1XtMgNFiNCpWfrOL6h3FByrFI71eM-9YIOA5ajUueUQRubpP5uO3-es7a5sksFWReJYaMBjyQQsR1KtsSBesyDRPrahKoUboCnzhmIRLmZZ4oxhlRrlRic5BA0h-Rbq1q-GaUGEKnM5thjdeBqrSBncZH5k0qsLb5Ib0gqGWm4MMxrK10e3fn-_IWfDFASt3T7p-28ADOS33frXbPkbffQFHdqC7 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NSwMxEB1qFfSk0or1MwePpt3d7Ef2KGKpWEuhRXorye4ECnVT2t3-fpN0rQhevIUlEJhJMpmd994APGCGnEnlUV9FEQ0lF1Qk5uChcTaGjIss567ZRDIa8dksHTfgcc-FQUQHPsOuHbpafq6zyv4q61lt9ChiB3AYhWHg7dhadanS99Le08dkYtFaSbee-atliosY_dP_rXUG7R_qHRnvg8o5NLBowdIV9kmOuCKiKrUVn7QAZFJq4to-kLWW1aYkuf40mT5dFFuTAhubEadZ-c0vKjbEPFGJOdTLJS01tRQ8TZQwS-5xRLpow7T_Mn0e0LpNAl2kXkl9idIEfRRxbPWrlE0YVBwK5qs4z2IeGGeYN470WJL4mblTJJdBJLknIhSICbuAZqELvAQSy9RMZyo0d16IPBfS7DMWSN_pwiuvAy1rqPlqJ4Qxr2109ffnezgeTN-H8-Hr6O0aTqxfdsi5G2iW6wpv4SjblovN-s758QvMxqQC |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2017+14th+IEEE+International+Conference+on+Advanced+Video+and+Signal+Based+Surveillance+%28AVSS%29&rft.atitle=Using+deep+autoencoders+to+learn+robust+domain-invariant+representations+for+still-to-video+face+recognition&rft.au=Parchami%2C+Mostafa&rft.au=Bashbaghi%2C+Saman&rft.au=Granger%2C+Eric&rft.au=Sayed%2C+Saif&rft.date=2017-08-01&rft.pub=IEEE&rft.spage=1&rft.epage=6&rft_id=info:doi/10.1109%2FAVSS.2017.8078553&rft.externalDocID=8078553 |