MLCommons Releases MLPerf Training v5.1 Results
MLCommons announced results for the MLPerf Training v5.1 benchmark suite, showcasing advancements in the performance of Generative AI (GenAI) scenarios alongside increased hardware diversity. The new benchmark updates reflect the evolving landscape of the Artificial Intelligence (AI) ecosystem.
The MLPerf Training benchmark suite evaluates full system tests that challenge models, software, and hardware across various Machine Learning (ML) applications. This peer-reviewed benchmark suite is designed to foster competition while promoting performance and energy efficiency.
Version 5.1 recorded remarkable system diversity, with 65 unique submissions utilizing 12 different hardware accelerators. A significant 86% increase in multi-node submissions was noted compared to the 4.1 version, highlighting the diverse network architectures employed.
Performance enhancements were observed over version 5.0, particularly in tests associated with GenAI applications, surpassing the predicted trends of Moore’s Law.
“More choices of hardware systems allow customers to compare systems on state-of-the-art MLPerf benchmarks and make informed buying decisions,” said Shriya Rishab, co-chair of the MLPerf Training working group. The increase in multi-node implementations demonstrates a strong focus on scaling and efficiency across the community.
The benchmark results incorporated submissions from 20 organizations, including major players such as AMD, Cisco, and NVIDIA. New contributors included Datacrunch and the University of Florida, emphasizing a broadening industry participation.
Submissions for benchmarks targeting GenAI tasks Self-Adaptive Workflow (SAW) a 24% rise, underscoring a concentrated community effort on these applications. “The increased submissions to genAI benchmarks and the performance improvements recorded indicate a significant community focus on generative scenarios,” noted David Kanter, Head of MLPerf at MLCommons.
The MLPerf suite remains adaptable, with two benchmarks replaced to better represent current technology: Large Language Model Meta AI (LLaMA) 3.1 8B substitutes Bidirectional Encoder Representations from Transformers (BERT), while Flux.1 takes the place of Stable Diffusion v2. These replacements aim to reflect advancements in text-to-image models and language processing.
“The field of AI is a moving target, constantly evolving with new scenarios and capabilities,” said Paul Baumstarck, co-chair of the MLPerf Training working group. The ongoing evolution of the benchmark is intended to measure what is crucial for stakeholders now and in the future.