Logically synthesized and hardware-accelerated restricted Boltzmann machines for combinatorial optimization and integer factorization

The restricted Boltzmann machine (RBM) is a stochastic neural network capable of solving a variety of difficult tasks including non-deterministic polynomial-time hard combinatorial optimization problems and integer factorization. The RBM is ideal for hardware acceleration as its architecture is comp...

Full description

Saved in:

Bibliographic Details
Published in:	Nature electronics Vol. 5; no. 2; pp. 92 - 101
Main Authors:	Patel, Saavan, Canoza, Philip, Salahuddin, Sayeef
Format:	Journal Article
Language:	English
Published:	London Nature Publishing Group UK 01.02.2022 Nature Publishing Group
Subjects:	639/166/987 639/705/1041 639/705/117 639/705/531 Algorithms Central processing units Circuits Combinatorial analysis CPUs Electrical Engineering Engineering Factorization Field programmable gate arrays Graphics processing units Hardware Integers Logic Machine learning Modules Neural networks Neurons Optimization Polynomials Probability Probability distribution Random variables
ISSN:	2520-1131, 2520-1131
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The restricted Boltzmann machine (RBM) is a stochastic neural network capable of solving a variety of difficult tasks including non-deterministic polynomial-time hard combinatorial optimization problems and integer factorization. The RBM is ideal for hardware acceleration as its architecture is compact (requiring few weights and biases) and its simple parallelizable sampling algorithm can find the ground states of difficult problems. However, training the RBM on these problems is challenging as the training algorithm tends to fail for large problem sizes and it can be hard to find efficient mappings. Here we show that multiple, small computational modules can be combined to create field-programmable gate-array-based RBMs capable of solving more complex problems than their individually trained parts. Our approach offers a combination of developments in training, model quantization and efficient hardware implementation for inference. With our implementation, we demonstrate hardware-accelerated factorization of 16-bit numbers with high accuracy and with a speed improvement of 10,000 times over a central processing unit implementation and 1,000 times over a graphics processing unit implementation, as well as a power improvement of 30 and 7 times, respectively. Multiple, small computational modules can be combined to create field-programmable gate-array-based stochastic neural network accelerators that are able to solve more complex problems than their individually trained parts.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2520-1131 2520-1131
DOI:	10.1038/s41928-022-00714-0