Skip to main content

Supermicro's NVIDIA B200 systems show performance gains on MLPerf Inference v5.0

Supermicro reported strong performance for its NVIDIA HGX B200 systems on several MLPerf Inference v5.0 benchmarks. The company demonstrated over three times the token generation compared to prior systems, indicating advancement in Artificial Intelligence (AI) capabilities.

Charles Liang, president and CEO of Supermicro, stated, “Supermicro remains a leader in the AI industry, as evidenced by the first new benchmarks released by MLCommons in 2025. Our building block architecture enables us to be first-to-market with a diverse range of systems optimized for various workloads. We continue to collaborate closely with NVIDIA to fine-tune our systems and secure a leadership position in AI workloads.”

Supermicro optimized its air-cooled and liquid-cooled NVIDIA HGX B200 systems for peak performance before the benchmarking. Both variations achieved comparable performance within their operating margins. These optimizations allowed the systems to reach over 1,000 tokens per second for large model inferences, notably outperforming earlier generations.

The SYS-421GE-NBRT-LCC and SYS-A21GE-NBRT systems reached remarkable performance scores on frameworks such as Mixtral and Llama3.1. MLCommons acknowledged Supermicro for the performance gains compared to previous models, stating, “Customers will be pleased by the performance improvements achieved which are validated by the neutral, representative, and reproducible MLPerf results.”

In summary, Supermicro's B200 systems are presenting notable advancements in inference performance on MLPerf benchmarks, with over 100 GPU-optimized models. The company's commitment to enhancing AI performance reflects its ongoing collaboration with NVIDIA.