Nvidia's Delayed Blackwell AI Chips Are Overheating in Servers

From PC Mag: Nvidia's upcoming Blackwell GPUs for AI computing may face further delays because they're prone to overheating when connected to each other on server racks, The Information reports.

The issue has reportedly been traced to the server rack Nvidia designed for Blackwell—which can connect up to 72 GPUs at a time. Nvidia has repeatedly redesigned the racks, which could delay GPU server shipments and the opening of new data centers for Google, Microsoft, or Meta.

In August, a previous report suggested that a "design flaw" had caused the Blackwell GPUs' launch to be delayed by months. It's unclear whether this flaw is the server rack design issue. Nvidia announced Blackwell in March and initially said the GPUs could ship as soon as Q2 2024 before it encountered challenges.

Nvidia indirectly addressed the server rack problem in a statement to Reuters. "Nvidia is working with leading cloud service providers as an integral part of our engineering team and process. The engineering iterations are normal and expected," a company spokesperson said, suggesting a new server design could be on the horizon.

View: Full Article