VINCENZO LIGUORI, owner, Ocean Logic
Energy efficiency is becoming a critical issue for AI/ML processors. This concern extends not only to IoT devices but also to large data centers, where even nuclear power is being considered. This issue is closely tied to computational efficiency and flexibility.
I expect AI/ML efficiency to be a major trend in 2025. At Ocean Logic, we have been focusing on addressing these often-conflicting goals. Compression, especially after quantization, can help reduce the bandwidth and large storage required by model weights, thereby reducing power consumption.
Our approach involves simultaneously compressing the weights and supporting a variety of user-defined floating point (fp), posit, and integer representations. The implementation is straightforward and allows direct hardware support for well-known formats such as INT8, BFLOAT16, FP8, etc., as well as for integers of different sizes and fp numbers with user-defined exponent and mantissa.
For example, the BFLOAT16 weights of LLama 2 7B can be losslessly compressed by approximately 1.5 times, outperforming both GZIP and BZIP2 with significantly fewer resources and without requiring a large buffer memory for decompression. Weight compression also works well after quantization: the same LLama 2 7B, after being quantized to 7 bits (a lossy process), can then be losslessly compressed to around 3.4 bits.
Furthermore, the AI/ML field is far from settled, with new models continuously emerging, presenting another challenge. Low-resource fp hardware can alleviate this uncertainty by supporting more models more easily.
Exponent Indexed Accumulators (ExIA) are an extremely simple architecture for adding long sequences of fp numbers. They operate in two stages: an accumulation stage where partial results are accumulated and a reconstruction stage where the result is finalized. The ExIAresult is exact, potentially hundreds of bits long.
ExIA also do not require normalized fp numbers as input, allowing them to accept integers and fixed point numbers without conversion. They canbe easily fused with a multiplier, providing an efficient MAC.
In FPGAs, ExIA are extremely compact: a BFLOAT16 MAC, capable of an addition and a multiplication every clock cycle, occupies less than 100 LUTs and 1 DSP, producing an exact result more than 256 bits long. ExIA also have the advantage that, with each fp addition, the amount of logic switching is similar to that of an integer accumulator, with clear power implications.
These two technologies should be able to make a meaningful impact in the design of AI/ML processors. I’m looking forward to an interesting 2025.
Click here to read the 2025 Executive Viewpoints in Semiconductor Digest