Tachyum Inc. today announced that it has reached another milestone in meeting its goal of volume production of the Prodigy Universal Processor in 2021 by achieving 96 percent of silicon designed and layout completed, with only a stable netlist layout to go before the final netlist and tape out.
The company has been making steady progress in its march toward Prodigy’s product release next year. While advancing the state of Prodigy’s design to 96 percent completion, Tachyum further confirmed in verification that Prodigy has correctly executed instructions directly from DDR5 memory through coherent mesh with the Prodigy core producing correct results. The company also confirmed that this latest Prodigy post-layout netlist maintains clock speed targets with no die size growth from its previous netlist layout milestone. With Prodigy’s first package pin-out done, cache miss handling over coherent interconnect has been verified, as have the majority of instructions.
Tachyum has successfully compiled the design to its FPGA emulation of Prodigy and has ensured that compiled processor tiles fit into the FPGA emulation system. Full chip FPGA emulation will be released to layout and manufacturing within the next few weeks. These achievements closely follow Tachyum’s previous milestones in performance testing and hardware connectivity, and software compatibility, and they continue to demonstrate that Prodigy is on its way to market in 2021.
“We continue to successfully meet every key metric of our aggressive timeline for producing Prodigy Universal Processor chips for mass deployment in data center, AI and HPC workload environments,” said Dr. Radoslav Danilak, Tachyum founder and CEO. “By nearing 100 percent completion of the silicon design of our product, we are ensuring that organizations will finally have the performance, power efficiency and cost advantages they need to solve the most challenging issues facing them years earlier than previously expected.”
Tachyum’s Prodigy can run HPC applications, convolutional neural net AI, explainable AI, general AI, bio AI and spiking neural networks, as well as normal data center workloads on a single homogeneous processor platform, with a simple and familiar programming model. Using CPU, GPU, TPU and other accelerators in lieu of Prodigy for these different types of workloads is grossly inefficient. A heterogeneous processing fabric, with unique hardware dedicated to each type of workload (e.g. data center, AI, HPC), results in a significant underutilization of valuable hardware resources, and creates a more challenging programming environment. Prodigy’s ability to seamlessly switch among these various workloads dramatically changes the competitive landscape and the economics of data centers, and Big-AI.
Prodigy significantly improves computational performance, energy consumption, hardware (server) utilization and space requirements compared to existing processors provisioned in hyperscale data centers today. Prodigy will also allow Edge developers for IoT to exploit its low power / high performance, along with its simple programming model to deliver AI to the edge.
Prodigy is truly a universal processor. In addition to native Prodigy code, it also runs legacy x86, ARM and RISC-V binaries. And, with a single, highly efficient processor architecture, Prodigy delivers industry-leading performance across data center, AI, and HPC workloads. Prodigy, the company’s flagship Universal Processor, will enter volume production in 2021. In April the Prodigy chip successfully proved its viability with a complete full chip layout, exceeding clock speed design targets. In August the processor was able to correctly execute short programs, with results automatically verified against the golden software model, while exceeding the target clock speed. The next step is to get a wholly functional FPGA emulation of the Prodigy chip later this year, which is the last prototype milestone before final tape-out.
Prodigy outperforms the fastest Xeon processors at 10x lower power on data center workloads, as well as outperforming NVIDIA’s fastest GPU on HPC, AI training and inference. The 125 HPC Prodigy racks can deliver a 32 tensor EXAFLOPS. Prodigy’s 3X lower cost per MIPS and 10X lower power translates to a 4X lower data center Total Cost of Ownership (TCO), enables billions of dollars of savings for hyperscalers such as Google, Facebook, Amazon, Alibaba, and others. Since Prodigy is the world’s only processor that can switch between data center, AI and HPC workloads, unused servers can be used as CAPEX-free AI or HPC cloud, because the servers have already been amortized.