Supermicro Leads Whisper Benchmark in MLPerf v6.0 with NVIDIA Blackwell Ultra GPUs

Written by Supermicro Experts | Apr 1, 2026 3:00:02 PM

In the latest MLPerf v6.0 Inference Datacenter, closed division results, Supermicro posted the top result on the Whisper benchmark, achieving over 50,562 samples per second. This result reflects our ongoing collaborations with NVIDIA and our shared goal of ensuring that our mutual customers can benefit from the performance demonstrated in the MLPerf results.

The submission used the Supermicro AS-8126GS-NB3RT, incorporating the 8-GPU NVIDIA HGX B300 system, featuring Blackwell Ultra GPUs with 5th Generation NVIDIA NVLink 1.8 TB/s, 2.3 TB of HBM3e GPU memory per system, and dual AMD EPYC™ 9575F CPUs. NVIDIA-powered solutions support a wide range of inference workloads—from generative recommenders to language, vision, and speech AI models—making them a practical platform for varied deployment requirements.

About the Whisper Benchmark

Whisper is an open-source model from OpenAI and serves as the industry’s “gold standard” baseline for speech AI. Whenever a new speech-to-text model is released, developers typically measure its performance against Whisper, which serves as a useful reference point for comparing systems across speech-processing workloads.

The benchmark covers a range of real-world speech workloads:

Transcribing speech in its native language or translating it directly into English.
Creating highly accurate subtitles for videos or podcasts without needing a human editor.
Processing sensitive audio—such as medical dictation or legal meetings—locally on private servers, without sending data to the cloud.
Converting voice notes into text that is then fed into large language models such as Gemini or GPT-4 for summarization.

The benchmark also tests a system’s ability to handle challenging speech conditions:

Heavy Accents: Regional dialects that typically trip up AI systems.
Background Noise: Technical chatter, ambient sound, or wind interference.
Technical Jargon: Specialized terminology across 99 different languages.

Software Efficiency and Infrastructure ROI

Our systems are optimized for use with NVIDIA inference software, including NVIDIA Dynamo, which can deliver efficiency gains on existing AI infrastructure—helping lower cost per token and improve return on investment for operators running inference workloads.

Submissions Across Multiple Systems

For MLPerf Inference v6.0, Supermicro submitted benchmarks on a variety of systems. Red Hat partnered with Supermicro on one of the submissions.

Supermicro and NVIDIA continue to work together across speech, language, and vision AI workloads, with the goal of giving customers access to well-tested, efficient systems for inference.

For more information about NVIDIA and MLPerf results, please visit: https://developer.nvidia.com/blog/nvidia-extreme-co-design-delivers-new-mlperf-inference-records/

View full post