1.1k share, 272 points

The AI Hardware Problem

The millennia-old idea of expressing signals and data as a series of discrete states had ignited a revolution in the semiconductor industry during the second half of the 20th century. This new information age thrived on the robust and rapidly evolving field of digital electronics. The abundance of automation and tooling made it relatively manageable to scale designs in complexity and performance as demand grew. However, the power being consumed by AI and machine learning applications cannot feasibly grow as is on existing processing architectures.


In a digital neural network implementation, the weights and input data are stored in system memory and must be fetched and stored continuously through the sea of multiple-accumulate operations within the network. This approach results in most of the power being dissipated in fetching and storing model parameters and input data to the arithmetic logic unit of the CPU, where the actual multiply-accumulate operation takes place. A typical multiply-accumulate operation within a general-purpose CPU consumes more than two orders of magnitude greater than the computation itself.


Their ability to processes 3D graphics requires a larger number of arithmetic logic units coupled to high-speed memory interfaces. This characteristic inherently made them far more efficient and faster for machine learning by allowing hundreds of multiple-accumulate operations to process simultaneously. GPUs tend to utilize floating-point arithmetic, using 32 bits to represent a number by its mantissa, exponent, and sign. Because of this, GPU targeted machine learning applications have been forced to use floating-point numbers.


These dedicated AI chips are offer dramatically larger amounts of data movement per joule when compared to GPUs and general-purpose CPUs. This came as a result of the discovery that with certain types of neural networks, the dramatic reduction in computational precision only reduced network accuracy by a small amount. It will soon become infeasible to increase the number of multiply-accumulate units integrated onto a chip, or reduce bit- precision further.


Outside of the realm of the digital world, It’s known definitively that extraordinarily dense neural networks can operate efficiently with small amounts of power.

Much of the industry believes that the digital aspect of current systems will need to be augmented with a more analog approach in order to take machine learning efficiency further. With analog, computation does not occur in clocked stages of moving data, but rather exploit the inherent properties of a signal and how it interacts with a circuit, combining memory, logic, and computation into a single entity that can operate efficiently in a massively parallel manner. Some companies are beginning to examine returning to the long outdated technology of analog computing to tackle the challenge. Analog computing attempts to manipulate small electrical currents via common analog circuit building blocks, to do math.

These signals can be mixed and compared, replicating the behavior of their digital counterparts. However, while large scale analog computing have been explored for decades for various potential applications, it has never been successfully executed as a commercial solution. Currently, the most promising approach to the problem is to integrate an analog computing element that can be programmed,, into large arrays, that are similar in principle to digital memory. By configuring the cells in an array, an analog signal, synthesized by a digital to analog converter is fed through the network.

As this signal flows through a network of pre-programmed resistors, the currents are added to produce a resultant analog signal, which can be converted back to digital value via an analog to digital converter. Using an analog system for machine learning does however introduce several issues. Analog systems are inherently limited in precision by the noise floor. Though, much like using lower bit-width digital systems, this becomes less of an issue for certain types of networks.

If analog circuitry is used for inferencing, the result may not be deterministic and is more likely to be affected by heat, noise or other external factors than a digital system. Another problem with analog machine learning is that of explain-ability. Unlike digital systems, analog systems offer no easy method to probe or debug the flow of information within them. Some in the industry propose that a solution may lie in the use of low precision high speed analog processors for most situations, while funneling results that require higher confidence to lower speed, high precision and easily interrogated digital systems.

Do not forget to share your opinion with us to provide you with the best posts !

Like it? Share with your friends!

1.1k share, 272 points

What's Your Reaction?

Dislike Dislike
love love
omg omg
scary scary
wtf wtf


Your email address will not be published. Required fields are marked *