Training large language models can be a time-consuming process that requires a substantial amount of hardware. Depending on the size of the model, the time frame for training could range from weeks to months, or even years. This poses a significant challenge for businesses and organizations that rely on the efficient operation of language models and cannot afford to wait extended periods of time for training to be completed.
However, NVIDIA recently announced a significant breakthrough in supercomputing with the unveiling of the Eos supercomputer. This powerful system is equipped with over 10,000 H100 Tensor Core GPUs, allowing it to train a 175 billion-parameter GPT-3 model on 1 billion tokens in under four minutes. This achievement represents a substantial leap in processing speed, as it is three times faster than the previous benchmark set by the MLPerf AI industry standard, which was also established by NVIDIA just six months prior.
The Eos supercomputer represents a monumental achievement in terms of compute power. It harnesses the power of 10,752 GPUs connected through NVIDIA’s Infiniband networking, moving a petabyte of data a second, and employs 860 terabytes of high bandwidth memory, providing an impressive 40 exaflops of AI processing power. This immense computational capability is made possible through a total of 1344 nodes, which are individual servers that companies can rent access to for approximately $37,000 a month, allowing them to expand their AI capabilities without investing heavily in their own infrastructure.
In a series of benchmarking tests, NVIDIA set new records using the Eos supercomputer. These included a 3.9-minute training time for the GPT-3 model, a 2.5-minute mark for training the Stable Diffusion model using 1,024 Hopper GPUs, a mere minute to train DLRM, 55.2 seconds for RetinaNet, 46 seconds for 3D U-Net, and just 7.2 seconds to train the BERT-Large model.
It’s important to note that the GPT-3 model used in the benchmarking tests is not the full-sized iteration, which has approximately 3.7 trillion parameters. The benchmarking test utilized a more compact version of GPT-3, employing 1 billion tokens instead of the full 3.7 trillion, achieving significantly more manageable results in terms of training time.
The performance improvements achieved in these benchmarking tests were primarily due to the use of 10,752 H100 GPUs, a significant increase from the 3,584 Hopper GPUs used in previous benchmarking trials. Despite the substantial growth in the number of GPUs, NVIDIA was able to maintain a 2.8x scaling in performance, representing an impressive 93% efficiency rate due to the effective utilization of software optimization.
NVIDIA’s development efforts in supercomputing were complemented by similar advancements from Microsoft’s Azure team. The Azure team submitted a system using 10,752 H100 GPUs for the benchmarking tests, achieving results within two percent of NVIDIA’s.
NVIDIA’s plan to apply its expanded computing capabilities to a wide range of tasks, including foundational model development, AI-assisted GPU design, neural rendering, multimodal generative AI, and autonomous driving systems, demonstrates the significant impact of this breakthrough.
These developments are crucial in setting new industry standards for generative AI technology. By continually updating benchmarks and testing methodologies, organizations like MLCommons ensure that the market receives credible and reliable performance data for AI hardware and solutions. This helps mitigate concerns about misleading or inaccurate performance claims in the AI industry and provides a benchmark for evaluating the true capabilities of these technologies.
NVIDIA’s emphasis on advancing AI capabilities reflects a broader trend in the industry, as AI technologies continue to evolve and expand into various applications. The company’s commitment to innovation is evident through its recent developments, such as the DGX cloud system and the release of the DGX GH200, further solidifying its position as a leader in AI hardware and supercomputing technology.