Cadence Design Systems, Inc. today unveiled its next-generation AI IP and software tools to address the escalating demand for on-device and edge AI processing. The new highly scalable Cadence Neo Neural Processing Units (NPUs) deliver a wide range of AI performance in a low-energy footprint, bringing new levels of performance and efficiency to AI SoCs. Delivering up to 80 TOPS performance in a single core, the Neo NPUs support both classic and new generative AI models and can offload AI/ML execution from any host processor—including application processors, general-purpose microcontrollers and DSPs—with a simple and scalable AMBA AXI interconnect. Complementing the AI hardware, the new NeuroWeave Software Development Kit (SDK) provides developers with a “one-tool” AI software solution across Cadence AI and Tensilica IP products for no-code AI development.
“While most of the recent attention on AI has been cloud-focused, there are an incredible range of new possibilities that both classic and generative AI can enable on the edge and within devices,” said Bob O’Donnell, president and chief analyst at TECHnalysis Research. “From consumer to mobile and automotive to enterprise, we’re embarking on a new era of naturally intuitive intelligent devices. For these to come to fruition, both chip designers and device makers need a flexible, scalable combination of hardware and software solutions that allow them to bring the magic of AI to a wide range of power requirements and compute performance, all while leveraging familiar tools. New chip architectures that are optimized to accelerate ML models and software tools with seamless links to popular AI development frameworks are going to be incredibly important parts of this process.”
The flexible Neo NPUs are well suited for ultra-power-sensitive devices as well as high-performance systems with a configurable architecture, enabling SoC architects to integrate an optimal AI inferencing solution in a broad range of products, including intelligent sensors, IoT and mobile devices, cameras, hearables/wearables, PCs, AR/VR headsets and advanced driver-assistance systems (ADAS). New hardware and performance enhancements and key features/capabilities include:
- Scalability: Single-core solution is scalable from 8 GOPS to 80 TOPS, with further extension to hundreds of TOPS with multicore
- Broad configuration range: supports 256 to 32K MACs per cycle, allowing SoC architects to optimize their embedded AI solution to meet power, performance and area (PPA) tradeoffs
- Integrated support for a myriad of network topologies and operators: enables efficient offloading of inferencing tasks from any host processor—including DSPs, general-purpose microcontrollers or application processors—significantly improving system performance and power
- Ease of deployment: shortens the time to market to meet rapidly evolving next-generation vision, audio, radar, natural language processing (NLP) and generative AI pipelines
- Flexibility: Support for Int4, Int8, Int16, and FP16 data types across a wide set of operations that form the basis of CNN, RNN and transformer-based networks allows flexibility in neural network performance and accuracy tradeoffs
- High performance and efficiency: Up to 20X higher performance than the first-generation Cadence AI IP, with 2-5X the inferences per second per area (IPS/mm2) and 5-10X the inferences per second per Watt (IPS/W)
Since software is a critical part of any AI solution, Cadence also upgraded its common software toolchain with the introduction of the NeuroWeave SDK. Providing customers with a uniform, scalable and configurable software stack across Tensilica DSPs, controllers and Neo NPUs to address all target applications, the NeuroWeave SDK streamlines product development and enables an easy migration as design requirements evolve. It supports many industry-standard domain-specific ML frameworks, including TensorFlow, ONNX, PyTorch, Caffe2, TensorFlow Lite, MXNet, JAX and others for automated end-to-end code generation; Android Neural Network Compiler; TF Lite Delegates for real-time execution; and TensorFlow Lite Micro for microcontroller-class devices.
“For two decades and with more than 60 billion processors shipped, industry-leading SoC customers have relied on Cadence processor IP for their edge and on-device SoCs. Our Neo NPUs capitalize on this expertise, delivering a leap forward in AI processing and performance,” said David Glasco, vice president of research and development for Tensilica IP at Cadence. “In today’s rapidly evolving landscape, it’s critical that our customers are able to design and deliver AI solutions based on their unique requirements and KPIs without concern about whether future neural networks are supported. Toward this end, we’ve made significant investments in our new AI hardware platform and software toolchain to enable AI at every performance, power and cost point and to drive the rapid deployment of AI-enabled systems.”
“At Labforge, we use a cluster of Cadence Tensilica DSPs in our Bottlenose smart camera product line to enable best-in-class AI processing for power-sensitive edge applications,” said Yassir Rizwan, CEO of Labforge, Inc. “Cadence’s AI software is an integral part of our embedded low power AI solution, and we’re looking forward to leveraging the new capabilities and higher performance offered by Cadence’s new NeuroWeave SDK. With an end-to-end compiler toolchain flow, we can better solve challenging AI problems in automation and robotics—accelerating our time to market to capitalize on generative AI-based application demand and opening new market streams that may not have been possible otherwise.”
The Neo NPUs and the NeuroWeave SDK support Cadence’s Intelligent System Design™ strategy by enabling pervasive intelligence through SoC design excellence.