2020 VLSI Symposia: Intel Showcases Intelligent Edge and Energy-Efficient Performance Research

Shannon Davis

5 years ago

This week at the 2020 Symposia on VLSI Technology and Circuits, Intel will present a body of research and technical perspectives on the computing transformation driven by data that is increasingly distributed across the core, edge and endpoints. Chief Technology Officer Mike Mayberry will deliver a plenary keynote, “The Future of Compute: How Data Transformation is Reshaping VLSI,” that highlights the importance of transitioning computing from a hardware/program-centric approach to a data/information-centric approach.

“The sheer volume of data flowing across distributed edge, network and cloud infrastructure demands energy-efficient, powerful processing to happen close to where the data is generated, but is often limited by bandwidth, memory and power resources. The research Intel Labs is showcasing at the VLSI Symposia highlights several novel approaches to more efficient computation that show promise for a range of applications — from robotics and augmented reality to machine vision and video analytics. This body of research is focused on addressing barriers to the movement and computation of data, which represent the biggest data challenges of the future.”
–Vivek K. De, Intel fellow and director of Circuit Technology Research, Intel Labs

Several Intel research papers will be presented that explore new techniques for higher levels of intelligence and energy-efficient performance across the edge-network-cloud systems of the future for a growing number of edge applications. Some topics covered in research papers (full list of research at the end of this News Byte) include:

Enhancing Efficiency and Accuracy of 3D Scene Reconstruction for Edge Robotics Using Ray-Casting Hardware Accelerators

Paper: A Ray-Casting Accelerator in 10nm CMOS for Efficient 3D Scene Reconstruction in Edge Robotics and Augmented Reality Applications

Why It Matters: Certain applications, such as edge robotics and augmented reality, require accurate, fast and power-efficient reconstruction of complex 3D scenes from enormous volumes of data generated by ray-casting operations for real-time dense Simultaneous Localization And Mapping (SLAM). In this research paper, Intel highlights a novel ray-casting hardware accelerator that leverages new techniques to maintain scene reconstruction accuracy while achieving superior energy-efficient performance. These innovative approaches — including techniques like voxel overlap search and hardware-assisted approximation of voxels — reduce demand on local memory access, in addition to improving power efficiency for future edge robotics and augmented reality applications.

Reducing Power Expenditure of Deep Learning-Based Video Stream Analysis With an Event-Driven Visual Data Processing Unit (EPU)

Paper: A 0.05pJ/Pixel 70fps FHD 1Meps Event-Driven Visual Data Processing Unit

Why It Matters: Real-time deep learning-based visual data analytics, used in applications like safety and security, involves rapid object detection from multiple video streams and requires high compute cycles and memory bandwidth. Input frames in these cameras are typically down sampled to minimize that load, which degrades image accuracy. In this research, Intel demonstrates an event-driven visual data processing unit (EPU) that – combined with novel algorithms – can instruct deep learning accelerators to only process visual inputs using motion-based “regions of interest.” This novel approach alleviates the high compute and memory requirements of visual analytics at the edge.

Expanding Local Memory Bandwidth for Artificial Intelligence, Machine Learning and Deep Learning Applications

Paper: 2X-Bandwidth Burst 6T-SRAM for Memory Bandwidth Limited Workloads

Why It Matters: Many AI chips – particularly those used for natural language processing such as voice assistants – are increasingly bound by access to local memory. Frequency doubling or increasing the number of banks to address these challenges come at the cost of worse power and area efficiency, especially in area-constrained edge devices. With this research, Intel has demonstrated using a 6T-SRAM array to provide two times higher read bandwidth on demand in burst mode operation with 51% higher energy efficiency than frequency doubling and 30% better area efficiency than doubling the number of banks.

All-Digital Binary Neural Network Accelerator

Paper: 617TOPS/W All-Digital Binary Neural Network Accelerator in 10nm FinFET CMOS

Why It Matters: In power and resource-constrained edge devices where low precision outputs are acceptable for some applications, analog Binary Neural Networks (BNNs) have been used as an alternative to higher precision neural networks that are more computationally demanding and memory intensive. However, analog BNNs have lower prediction accuracy as they are less tolerant to process variations and noise. Through this research, Intel demonstrates the use of an all-digital BNN that delivers similar energy efficiency as analog in-memory techniques while providing better robustness and scalability to advanced process nodes.

Additional Intel research presented during the 2020 VLSI Symposia includes the following papers:

The Future of Compute: How the Data Transformation is Reshaping VLSI
Low-Clock-Power Digital Standard Cell IPs for High-Performance Graphics/AI Processors in 10nm CMOS
An Autonomous Reconfigurable Power Delivery Network (RPDN) for Many-Core SoCs Featuring Dynamic Current Steering
GaN and Si Transistors on 300mm Si (111) enabled by 3D Monolithic Heterogeneous Integration
Low Swing and Column Multiplexed Bitline Techniques for Low-Vmin, Noise-Tolerant, High-Density, 1R1W 8T-bitcell SRAM in 10nm FinFET CMOS
A Dual-Rail Hybrid Analog/Digital LDO with Dynamic Current Steering for Tunable High PSRR and High Efficiency
A 435MHz, 600Kops/J Side-Channel-Attack Resistant Crypto-Processor for Secure RSA-4K Public-Key Encryption in 14nm CMOS
A 0.26% BER, 10^28 Modeling-Resistant Challenge-Response PUF in 14nm CMOS Featuring Stability-Aware Adversarial Challenge Selection
A SCA-Resistant AES Engine with 6000x Time/Frequency-Domain Leakage Suppression Using Non-Linear Digital Low-Dropout Regulator Cascaded with Arithmetic Countermeasures in 14nm CMOS
CMOS Compatible Process Integration of SOT-MRAM with Heavy-Metal Bi-Layer Bottom Electrode and 10ns Field-Free SOT Switching with STT Assist
A 10nm SRAM Design using Gate-Modulated Self-Collapse Write Assist Enabling 175mV VMIN Reduction with Negligible Power Overhead