Improving transfer learning accuracy by reusing Stacked Denoising Autoencoders

Transfer learning is a process that allows reusing a learning machine trained on a problem to solve a new problem. Transfer learning studies on shallow architectures show low performance as they are generally based on hand-crafted features obtained from experts. It is therefore interesting to study...

Full description

Saved in:

Bibliographic Details
Published in:	Conference proceedings - IEEE International Conference on Systems, Man, and Cybernetics pp. 1380 - 1387
Main Authors:	Kandaswamy, Chetak, Silva, Luis M., Alexandre, Luis A., Sousa, Ricardo, Santos, Jorge M., de Sa, Joaquim Marques
Format:	Conference Proceeding
Language:	English
Published:	IEEE 01.10.2014
Subjects:	Computer architecture Deep Learning Error analysis Noise reduction Shape Training Transfer Learning Visualization Yttrium
ISSN:	1062-922X
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Transfer learning is a process that allows reusing a learning machine trained on a problem to solve a new problem. Transfer learning studies on shallow architectures show low performance as they are generally based on hand-crafted features obtained from experts. It is therefore interesting to study transference on deep architectures, known to directly extract the features from the input data. A Stacked Denoising Autoencoder (SDA) is a deep model able to represent the hierarchical features needed for solving classification problems. In this paper we study the performance of SDAs trained on one problem and reused to solve a different problem not only with different distribution but also with a different tasks. We propose two different approaches: 1) unsupervised feature transference, and 2) supervised feature transference using deep transfer learning. We show that SDAs using the unsupervised feature transference outperform randomly initialized machines on a new problem. We achieved 7% relative improvement on average error rate and 41% on average computation time to classify typed uppercase letters. In the case of supervised feature transference, we achieved 5.7% relative improvement in the average error rate, by reusing the first and second hidden layer, and 8.5% relative improvement for the average error rate and 54% speed up w.r.t the baseline by reusing all three hidden layers for the same data. We also explore transfer learning between geometrical shapes and canonical shapes, we achieved 7.4% relative improvement on average error rate in case of supervised feature transference approach.
ISSN:	1062-922X
DOI:	10.1109/SMC.2014.6974107