NVIDIA HGX B200
NVIDIA Blackwell platform has arrived on Together AI.
High-performance clusters designed for teams running large-scale LLM and mixture-of-experts (MoE) models.






















Why NVIDIA HGX B200 on Together GPU Clusters?
The world’s most powerful AI infrastructure. Delivered faster. Tuned smarter.
Train faster on 8-GPU HGX nodes
Each HGX B200 system includes 8 Blackwell GPUs with 2nd-gen Transformer Engine and FP8 precision. We tune communication libraries and parallelism strategies for maximum throughput.
Optimized NVLink topologies
We configure 1.8TB/s NVSwitch fabrics per node and extend to spine-leaf or Clos topologies — optimized for dense LLMs or sparsely activated MoE workloads.
High-throughput storage options
Support for AI-native shared storage systems like VAST and Weka, enabling efficient multi-node checkpointing and large-scale dataset streaming.
Deploy and orchestrate at scale
Together delivers HGX B200 clusters at 1K–10K+ GPU scale with your preferred scheduler (Slurm, Kubernetes, Ray), fully validated across network and storage layout.
Delivery in 4–6 weeks, no NVIDIA lottery required
We ship full-rack NVIDIA HGX B200 clusters — not just dev kits — with thousands of GPUs available now. You don’t wait on backorders. You start training.
Run by researchers who train models
Our research team actively runs and tunes training workloads on NVIDIA GB200 systems. You're not just getting hardware — you’re working with experts at the edge of what's possible.

"What truly elevates Together AI to ClusterMax™ Gold is their exceptional support and technical expertise. Together AI’s team, led by Tri Dao — the inventor of FlashAttention — and their Together Kernel Collection (TKC), significantly boost customer performance. We don’t believe the value created by Together AI can be replicated elsewhere without cloning Tri Dao.”
— Dylan Patel, Chief Analyst, SemiAnalysis
Powering reasoning models and AI agents
The NVIDIA HGX B200 is designed for the most demanding AI, data
analytics, and high-performance computing (HPC) workloads.

Accelerated Compute
Purpose-built for AI: NVIDIA Blackwell SXMs integrated with a high-speed interconnect to accelerate AI performance at scale.
Advanced networking: Using NVIDIA Quantum-X800 InfiniBand and Spectrum™-X Ethernet, NVIDIA HGX delivers networking b up to 800 Gb/s.
Baseboard powerhouse: Each single baseboard joins 8 Blackwell GPUs via fifth-generation NVLink, delivering GPU-to-GPU bandwidth of 1.8TB/s.
LLM TRAINING SPEED RELATIVE TO H100
LLM INFERENCE SPEED RELATIVE TO H100
ENERGY EFFICIENCY & TCO RELATIVE TO H100
Technical Specs
NVIDIA HGX B200
Powering AI Pioneers
Leading AI companies are ramping up with NVIDIA Blackwell running on Together AI.
Zoom partnered with Together AI to leverage our research and deliver accelerated performance when training the models powering various Zoom AI Companion features.
With Together GPU Clusters accelerated by NVIDIA HGX B200, experienced a 1.9X improvement in training speeds out-of-the-box over previous generation NVIDIA Hopper GPUs.
Salesforce leverages Together AI for the entire AI journey: from training, to fine-tuning to inference of their models to deliver Agentforce.
Training a Mistral-24B model, Salesforce saw a 2x improvement in training speeds upgrading from NVIDIA HGX H200 to HGX B200. This is accelerating how Salesforce trains models and integrates research results into Agentforce.
During initial tests with the NVIDIA HGX B200, InVideo immediately saw a 25% improvement when running a training job from NVIDIA HGX H200.
Then, in partnership with our researchers, the team made further optimizations and more than doubled this improvement – making the step up to the NVIDIA Blackwell platform even more appealing.
Our latest research & content
Learn more about running turbocharged NVIDIA GB200 NVL72 GPU clusters on Together AI.