Massive datasets, exploding model sizes, and complex simulations require multiple GPUs with extremely fast interconnections and a fully accelerated software stack. The NVIDIA HGX™ AI supercomputing platform brings together the full power of NVIDIA GPUs, NVIDIA® NVLink®, and NVIDIA InfiniBand networking to provide the highest application performance. With its end-to-end performance and flexibility, NVIDIA HGX enables researchers and scientists to combine simulation, data analytics, and AI to drive scientific progress.
NVIDIA HGX combines NVIDIA A100 Tensor Core GPUs with high-speed interconnects to form the world’s most powerful servers. Compared to previous generations, HGX provides up to a 20X AI speedup out of the box with Tensor Float 32 (TF32) and a 2.5X HPC speedup with FP64. NVIDIA HGX delivers a staggering 10 petaFLOPS, forming the world’s most powerful accelerated scale-up server platform for AI and HPC.
We offer fully-managed NVIDIA GPU-based clusters at a fraction of the cost of traditional cloud service providers. These bare-metal servers are completely dedicated to you with no contention and no performance issues due to virtualization overhead.
Our flat-rate, no surprises billing model means we can provide you with a price that is up to 30% lower than the other cloud service providers. We also don't nickel-and-dime you by charging to get your data in to or out of our cloud. Instead, we charge no ingress or egress fees so you never receive a supplemental bill.
DLRM on HugeCTR framework, precision = FP16 | NVIDIA A100 80GB batch size = 48 | NVIDIA A100 40GB batch size = 32 | NVIDIA V100 32GB batch size = 32.
Deep learning models are exploding in size and complexity, requiring a system with large amounts of memory, massive computing power, and fast interconnects for scalability. With NVIDIA NVSwitch™ providing high-speed, all-to-all GPU communications, HGX can handle the most advanced AI models. With A100 80GB GPUs, GPU memory is doubled, delivering up to 1.3TB of memory in a single HGX. Emerging workloads on the very largest models like deep learning recommendation models (DLRM), which have massive data tables, are accelerated up to 3X over HGX powered by A100 40GB GPUs.
HPC applications need to perform an enormous amount of calculations per second. Increasing the compute density of each server node dramatically reduces the number of servers required, resulting in huge savings in cost, power, and space consumed in the data center. For simulations, high-dimension matrix multiplication requires a processor to fetch data from many neighbors for computation, making GPUs connected by NVIDIA NVLink ideal. HPC applications can also leverage TF32 in A100 to achieve up to 11X higher throughput in four years for single-precision, dense matrix-multiply operations. An HGX powered by A100 80GB GPUs delivers a 2X throughput increase over A100 40GB GPUs on Quantum Espresso, a materials simulation, boosting time to insight.
Geometric mean of application speedups vs. P100: Benchmark application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec], MILC [Apex Medium], NAMD [stmv_nve_cuda], PyTorch (BERT-Large Fine Tuner], Quantum Espresso [AUSURF112-jR]; Random Forest FP32 [make_blobs (160000 x 64 : 10)], TensorFlow [ResNet-50], VASP 6 [Si Huge] | GPU node with dual-socket CPUs with 4x NVIDIA P100, V100, or A100 GPUs.
With Cirrascale and NVIDIA HGX, it’s also possible to include NVIDIA networking to accelerate and offload data transfers and ensure the full utilization of computing resources. Smart adapters and switches reduce latency, increase efficiency, enhance security, and simplify data center automation to accelerate end-to-end application performance.
The data center is the new unit of computing, and HPC networking plays an integral role in scaling application performance across the entire data center. NVIDIA InfiniBand is paving the way with software-defined networking, In-Network Computing acceleration, remote direct-memory access (RDMA), and the fastest speeds and feeds.