EvalDNN: A Toolbox for Evaluating Deep Neural Network Models

Recent studies have shown that the performance of deep learning models should be evaluated using various important metrics such as robustness and neuron coverage, besides the widely-used prediction accuracy metric. However, major deep learning frameworks currently only provide APIs to evaluate a mod...

Full description

Saved in:

Bibliographic Details
Published in:	Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings pp. 45 - 48
Main Authors:	Tian, Yongqiang, Zeng, Zhihua, Wen, Ming, Liu, Yepang, Kuo, Tzu-yang, Cheung, Shing-Chi
Format:	Conference Proceeding
Language:	English
Published:	ACM 01.10.2020
Subjects:	Benchmark testing Deep learning Deep Learning Model Evaluation Load modeling Measurement Neurons Robustness
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Recent studies have shown that the performance of deep learning models should be evaluated using various important metrics such as robustness and neuron coverage, besides the widely-used prediction accuracy metric. However, major deep learning frameworks currently only provide APIs to evaluate a model's accuracy. In order to comprehensively assess a deep learning model, framework users and researchers often need to implement new metrics by themselves, which is a tedious job. What is worse, due to the large number of hyper-parameters and inadequate documentation, evaluation results of some deep learning models are hard to reproduce, especially when the models and metrics are both new.To ease the model evaluation in deep learning systems, we have developed EvalDNN, a user-friendly and extensible toolbox supporting multiple frameworks and metrics with a set of carefully designed APIs. Using EvalDNN, evaluation of a pre-trained model with respect to different metrics can be done with a few lines of code. We have evaluated EvalDNN on 79 models from TensorFlow, Keras, GluonCV, and PyTorch. As a result of our effort made to reproduce the evaluation results of existing work, we release a performance benchmark of popular models, which can be a useful reference to facilitate future research. The tool and benchmark are available at https://github.com/yqtianust/EvalDNN and https://yqtianust.github.io/EvalDNN-benchmark/, respectively. A demo video of EvalDNN is available at: https://youtu.be/v69bNJN2bJc.
DOI:	10.1145/3377812.3382133