EvalDNN: A Toolbox for Evaluating Deep Neural Network Models

Recent studies have shown that the performance of deep learning models should be evaluated using various important metrics such as robustness and neuron coverage, besides the widely-used prediction accuracy metric. However, major deep learning frameworks currently only provide APIs to evaluate a mod...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: Companion Proceedings s. 45 - 48
Hlavní autoři:	Tian, Yongqiang, Zeng, Zhihua, Wen, Ming, Liu, Yepang, Kuo, Tzu-yang, Cheung, Shing-Chi
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	ACM 01.10.2020
Témata:	Benchmark testing Deep learning Deep Learning Model Evaluation Load modeling Measurement Neurons Robustness
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Recent studies have shown that the performance of deep learning models should be evaluated using various important metrics such as robustness and neuron coverage, besides the widely-used prediction accuracy metric. However, major deep learning frameworks currently only provide APIs to evaluate a model's accuracy. In order to comprehensively assess a deep learning model, framework users and researchers often need to implement new metrics by themselves, which is a tedious job. What is worse, due to the large number of hyper-parameters and inadequate documentation, evaluation results of some deep learning models are hard to reproduce, especially when the models and metrics are both new.To ease the model evaluation in deep learning systems, we have developed EvalDNN, a user-friendly and extensible toolbox supporting multiple frameworks and metrics with a set of carefully designed APIs. Using EvalDNN, evaluation of a pre-trained model with respect to different metrics can be done with a few lines of code. We have evaluated EvalDNN on 79 models from TensorFlow, Keras, GluonCV, and PyTorch. As a result of our effort made to reproduce the evaluation results of existing work, we release a performance benchmark of popular models, which can be a useful reference to facilitate future research. The tool and benchmark are available at https://github.com/yqtianust/EvalDNN and https://yqtianust.github.io/EvalDNN-benchmark/, respectively. A demo video of EvalDNN is available at: https://youtu.be/v69bNJN2bJc.
DOI:	10.1145/3377812.3382133