NVIDIA’s Hopper GPU, The World’s Fastest AI Chip, Was Created With The Help of AI - Features Nearly 13,000 AI-Designed Circuits

In a blog published over at NVIDIA’s Developer webpage, the company reiterates the benefits and how it, itself, leveraged its AI capabilities to design its greatest GPU to date, the Hopper H100. The NVIDIA GPUs are mostly designed using the state of the art EDA (Electronic Design Automation) tools but with the help of AI which utilizes the PrefixRL methodology, an optimization of Parallel Prefix Circuits using Deep Reinforcement Learning, the company can design smaller, faster and more power-efficient chips while delivering better performance.

— Rajarshi Roy (@rjrshr) July 8, 2022 Arithmetic circuits in computer chips are constructed using a network of logic gates (like NAND, NOR, and XOR) and wires. The desirable circuit should have the following characteristics:

Small: A lower area so that more circuits can fit on a chip. Fast: A lower delay to improve the performance of the chip. Consume less power: A lower power consumption of the chip.

NVIDIA used this methodology to design nearly 13,000 AI-assisted circuits which offer a 25% area reduction compared to the EDA tools which are as fast and functionally equivalent. But PrefixRL is mentioned to be a very computational demanding task and for the physical simulation of each GPU, it takes 256 CPUs and over 32,000 GPU hours. To eliminate this bottleneck, NVIDIA developed Raptor, an in-house distributed reinforcement learning platform that takes special advantage of NVIDIA hardware for this kind of industrial reinforcement learning. Raptor has several features that enhance scalability and training speed such as job scheduling, custom networking, and GPU-aware data structures. In the context of PrefixRL, Raptor makes the distribution of work across a mix of CPUs, GPUs, and Spot instances possible. Networking in this reinforcement learning application is diverse and benefits from the following.

Raptor’s ability to switch between NCCL for point-to-point transfer to transfer model parameters directly from the learner GPU to an inference GPU. Redis for asynchronous and smaller messages such as rewards or statistics. A JIT-compiled RPC to handle high volume and low latency requests such as uploading experience data.

NVIDIA concludes that the application of AI to a real-world circuit design problem can lead to better GPU designs in the future. The full paper is published here and you can also visit the Developer blog here for more information.

NVIDIA s Bleeding Edge Hopper GPU Feature 13 000 Instances of AI Designed Circuits Leading To Smaller  Faster   Power Efficient Designs - 10NVIDIA s Bleeding Edge Hopper GPU Feature 13 000 Instances of AI Designed Circuits Leading To Smaller  Faster   Power Efficient Designs - 89NVIDIA s Bleeding Edge Hopper GPU Feature 13 000 Instances of AI Designed Circuits Leading To Smaller  Faster   Power Efficient Designs - 51NVIDIA s Bleeding Edge Hopper GPU Feature 13 000 Instances of AI Designed Circuits Leading To Smaller  Faster   Power Efficient Designs - 54NVIDIA s Bleeding Edge Hopper GPU Feature 13 000 Instances of AI Designed Circuits Leading To Smaller  Faster   Power Efficient Designs - 12