Nvidia Develops High-Performance ARM-Based "Boulder" Microprocessor

From X-bit Labs: Nvidia Corp. is reportedly working on an ultra high-performance system-on-chip based on ARM architecture, which would challenge AMD Opteron and Intel Xeon microprocessors in the server space. The chip is called project Boulder and it is designed by Nvidia's graphics processing unit team.

It is not a secret that Nvidia is already working on project Denver, an Nvidia high-performance central processing unit (CPU) running the ARM instruction set, which will be fully integrated on the same chip as an Nvidia graphics processing unit (GPU). The first implementation of project Denver is code-named Maxwell graphics processor.

Denver and Maxwell fit perfectly into Nvidia's Echelon extreme-scale computing project. The Echelon design incorporates a large number (~1024) of stream cores and a smaller (~8) number of latency-optimized CPU-like cores on a single chip, sharing a common memory system. In case of Echelon described in 2010, eight stream processors (SPs) would form a streaming multiprocessor (SM) and 128 of SMs would form a large pool of throughput-optimized processing elements (hence, it does not employ Kepler paradigm of many SPs per SM). Echelon is expected to become reality only in 2018 - 2020 timeframe.

Nvidia's Denver/Maxwell will allow running an operating system directly on GPU (or CPU-on-GPU) chip sometimes in 2014. Considering the fact that Denver is a 64-bit ARMv8-compatible architecture, it should offer pretty high compute performance. Apparently, this is not enough for Nvidia, which is why it is also designing Boulder, an ultra-high performance system-on-chip with 8-16+ "fat" ARM-compatible cores as well as high-bandwidth interconnects and I/O, reports Bright Side of News web-site.

Boulder, which is also due in 2014, is said to be aimed at AMD Opteron and Intel Xeon chips in environments where their x86 nature does not matter much, e.g., high-performance computing. Essentially, Nvidia wants HPC servers featuring Tesla compute accelerators to use Boulder instead of traditional x86 central processing units to perform "serial" tasks.

At present, nothing particular is known about Boulder, but its alleged difference from Denver suggests that this will be a high-performance architecture with high-end execution units, massive multi-level, multi-MB caches; advanced branch-predictors; extremely efficient dispatch; advanced scheduling and other features today found on advanced x86 central processing units.

View: Article @ Source Site