NVIDIA has announced a significant advancement in AI computing with the introduction of the NVIDIA HGX H200, a GPU based on the NVIDIA Hopper architecture. This platform features the groundbreaking NVIDIA H200 Tensor Core GPU, boasting advanced memory capabilities tailored for handling massive datasets in generative AI and high-performance computing (HPC) workloads.
The H200 is notable for being the first GPU to incorporate HBM3e memory, offering faster and larger memory to accelerate generative AI and large language models, as well as advancing scientific computing for HPC workloads. With HBM3e, the H200 delivers an impressive 141GB of memory at 4.8 terabytes per second—nearly doubling the capacity and providing 2.4 times more bandwidth compared to its predecessor, the NVIDIA A100.
The H200-powered systems, developed in collaboration with leading server manufacturers and cloud service providers, are expected to be available for shipping in the second quarter of 2024.
Ian Buck, Vice President of Hyperscale and HPC at NVIDIA, emphasizes the significance of efficiently processing vast amounts of data at high speed using large, fast GPU memory for creating intelligence with generative AI and HPC applications. He notes that the H200 enhances the industry’s leading end-to-end AI supercomputing platform, making it faster to address some of the world’s most crucial challenges.
The NVIDIA Hopper architecture, which the H200 is based on, already demonstrated a significant performance leap over its predecessor (H100). Ongoing software enhancements, including the recent release of powerful open-source libraries like NVIDIA TensorRT™-LLM, further contribute to perpetual innovation and performance improvements.
The H200 is expected to bring about additional performance leaps, nearly doubling the inference speed on Llama 2, a 70 billion-parameter Large Language Model (LLM), compared to the H100. Future software updates are anticipated to bring further performance improvements and leadership with the H200.
NVIDIA H200 comes in various form factors, including NVIDIA HGX H200 server boards with four- and eight-way configurations. It is compatible with both the hardware and software of HGX H100 systems and is also available in the NVIDIA GH200 Grace Hopper™ Superchip with HBM3e. This flexibility allows deployment in various data center environments, including on premises, cloud, hybrid-cloud, and edge.
Leading global server manufacturers, such as ASRock Rack, ASUS, Dell Technologies, GIGABYTE, Hewlett Packard Enterprise, Lenovo, and others, can update their existing systems with the H200. Major cloud service providers like Amazon Web Services, Google Cloud, Microsoft Azure, and Oracle Cloud Infrastructure, along with CoreWeave, Lambda, and Vultr, are expected to deploy H200-based instances starting next year.
HGX H200, powered by NVIDIA NVLink™ and NVSwitch™ high-speed interconnects, delivers exceptional performance across various application workloads, including LLM training and inference for models exceeding 175 billion parameters. An eight-way HGX H200 provides over 32 petaflops of FP8 deep learning compute and 1.1TB of aggregate high-bandwidth memory, offering top-notch performance in generative AI and HPC applications.
When combined with NVIDIA Grace™ CPUs and ultra-fast NVLink-C2C interconnect, the H200 contributes to the creation of the GH200 Grace Hopper Superchip with HBM3e—an integrated module designed to serve giant-scale HPC and AI applications.
NVIDIA’s accelerated computing platform is supported by robust software tools, including the NVIDIA AI Enterprise suite, enabling developers and enterprises to build and accelerate production-ready applications from AI to HPC.
The NVIDIA H200 is set to be available from global system manufacturers and cloud service providers in the second quarter of 2024. For more in-depth information, visit the official press release.