PertNAS: Architectural Perturbations for Memory-Efficient Neural Architecture Search

Differentiable Neural Architecture Search (NAS) relies on aggressive weight-sharing to reduce its search cost. This leads to GPU-memory bottlenecks that hamper the algorithm's scalability. To resolve these bottlenecks, we propose a perturbations-based evolutionary approach that significantly re...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	2023 60th ACM/IEEE Design Automation Conference (DAC) s. 1 - 6
Hlavní autoři:	Ahmad, Afzal, Xie, Zhiyao, Zhang, Wei
Médium:	Konferenční příspěvek
Jazyk:	angličtina
Vydáno:	IEEE 09.07.2023
Témata:	Costs Design automation Memory architecture Memory management Microprocessors Perturbation methods Scalability
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Popis
Shrnutí:	Differentiable Neural Architecture Search (NAS) relies on aggressive weight-sharing to reduce its search cost. This leads to GPU-memory bottlenecks that hamper the algorithm's scalability. To resolve these bottlenecks, we propose a perturbations-based evolutionary approach that significantly reduces the memory cost while largely maintaining the efficiency benefits of weight-sharing. Our approach makes minute changes to compact neural architectures and measures their impact on performance. In this way, it extracts high-quality motifs from the search space. We utilize these perturbations to perform NAS in compact models evolving over time to traverse the search space. Our method disentangles GPU-memory consumption from search space size, offering exceptional scalability to large search spaces. Results show competitive accuracy on multiple benchmarks, including CIFAR10, ImageNet2012, and NASBench-301. Specifically, our approach improves accuracy on ImageNet and NASBench-301 by 0.3% and 0.87%, respectively. Furthermore, the memory consumption of search is reduced by roughly 80% against state-of-the-art weight-shared differentiable NAS works while achieving a search time of only 6 GPU hours.
DOI:	10.1109/DAC56929.2023.10247756