GDDR is Graphics Double Data Rate memory, optimized for high capacity at lower cost than [[HBM]]. GDDR is essentially souped-up DDR RAM designed for graphics cards. It's faster than system RAM but slower than [[HBM]], and critically, it can scale to much higher capacities. GDDR6X can deliver 1TB/s bandwidth across a full GPU, but each memory chip has lower bandwidth than HBM stacks. The key advantage: capacity per dollar. GDDR can reach 24GB-48GB+ on a single GPU at reasonable cost. [[HBM]] typically tops out at 80-120GB at much higher cost. For AI workloads with massive [[Context Window]] requirements during prefill, GDDR's high capacity matters more than raw [[Memory Bandwidth]]. You need to fit the entire context in memory before you can process it. Nvidia's Rubin CPX uses GDDR for exactly this reason. The physics: GDDR sits farther from the processor than HBM and uses narrower buses, so it has higher latency and lower bandwidth per GB. But you can pack more of it around a chip. --- #deeptech #firstprinciple Related: [[HBM]] | [[SRAM]] | [[Memory Bandwidth]] | [[Context Window]] | [[Nvidia-Groq - Inference Disaggregation Play]]