NVIDIA A30 tensor core GPU 24GB HBM2

Graphic Cards And Accessories\ Datacenter GPUs

Bring accelerated performance to every enterprise workload with NVIDIA A30 Tensor Core GPUs. With NVIDIA Ampere architecture Tensor Cores and Multi-Instance GPU (MIG), it delivers speedups securely across diverse workloads, including AI inference at scale and high-performance computing (HPC) applications. By combining fast memory bandwidth and low-power consumption in a PCIe form factor—optimal for mainstream servers—A30 enables an elastic data center and delivers maximum value for enterprises.


GPU Features NVIDIA A30 24GB
GPU Memory 24 GB HBM2e
Memory bandwidth 933 GB/s
FP64 Tensor Core 10.3 teraFLOPS
TF32 Tensor Core 82 teraFLOPS | 165 teraFLOPS*
BFLOAT16 Tensor Core 165 teraFLOPS | 330 teraFLOPS*
FP16 Tensor Core 165 teraFLOPS | 330 teraFLOPS*
INT8 Tensor Core 330 TOPS | 661 TOPS*
Max thermal design power (TDP) 165W  
Multi-Instance GPUs 4 GPU instances @ 6GB each 2 GPU instances @ 12GB each 1 GPU instance @ 24GB
Interconnect PCIe Gen4: 64GB/s Third-gen NVIDIA® NVLINK® 200GB/s**
Form factor Dual-slot, full-height, full-length (FHFL)
Multi-Instance GPU (MIG) 4 MIGs @ 6GB each 2 MIGs @ 12GB each 1 MIGs @ 24GB
* With sparsity ** NVLink Bridge for up to two GPUs.

Incredible Performance Across Workloads

Mountain Whether using MIG to partition an A30 GPU into smaller instances or NVIDIA NVLink to connect multiple GPUs to speed larger workloads, A30 can readily handle diverse-sized acceleration needs, from the smallest job to the biggest multi-node workload. A30 versatility means IT managers can maximize the utility of every GPU in their data center with mainstream servers, around the clock.
Mountain NVIDIA A30 delivers 165 teraFLOPS (TFLOPS) of TF32 deep learning performance. That’s 20X more AI training throughput and over 5X more inference performance compared to NVIDIA T4 Tensor Core GPU. For HPC, A30 delivers 10.3 TFLOPS of performance, nearly 30 percent more than NVIDIA V100 Tensor Core GPU.
Mountain NVIDIA NVLink in A30 delivers 2X higher throughput compared to the previous generation. Two A30 PCIe GPUs can be connected via an NVLink Bridge to deliver 330 TFLOPs of deep learning performance.
Mountain An A30 GPU can be partitioned into as many as four GPU instances, fully isolated at the hardware level with their own high-bandwidth memory, cache, and compute cores. MIG gives developers access to breakthrough acceleration for all their applications. And IT administrators can offer right-sized GPU acceleration for every job, optimizing utilization and expanding access to every user and application.
Mountain With up to 24GB of highbandwidth memory (HBM2), A30 delivers 933GB/s of GPU memory bandwidth, optimal for diverse AI and HPC workloads in mainstream servers.
Mountain AI networks have millions to billions of parameters. Not all of these parameters are needed for accurate predictions, and some can be converted to zeros, making the models “sparse” without compromising accuracy. Tensor Cores in A30 can provide up to 2X higher performance for sparse models. While the sparsity feature more readily benefits AI inference, it can also improve the performance of model training.