How researchers are simulating quantum bits with NVIDIA GPUs

This month, NVIDIA made waves in the field of quantum computing: it integrated Intel's Grace Hopper chip directly with a quantum processor and demonstrated the ability to simulate quantum systems on a classical supercomputer.
NVIDIA is undoubtedly in a good position in the field of quantum computing: it produces the GPUs used in supercomputers and coveted by AI developers; these GPUs are also essential tools for simulating dozens of quantum bits on classical computers. The development of the new software means that researchers can now use more and more supercomputing resources instead of real quantum computers.
But simulating quantum systems is a uniquely demanding challenge, and those demands continue to grow.
To date, few quantum computer simulations have been able to access more than one multi-GPU node, or even just one GPU. however, NVIDIA has recently made behind-the-scenes advances that now have the potential to alleviate these bottlenecks.
Classical computers serve two purposes in simulating quantum hardware:
First, quantum computer makers can use classical computing to test run their designs. Jinzhao Sun, a postdoctoral researcher at Imperial College London, says, "Classical simulation is a fundamental aspect of understanding and designing quantum hardware, and is often the only means of verifying these quantum systems."
In addition, classical computers can run quantum algorithms instead of actual quantum computers. This capability is of particular interest to researchers studying applications such as molecular dynamics, protein folding, and emerging quantum machine learning - all of which benefit from quantum processing.
Classical simulations are not perfect substitutes for real quantum items, but they can often make suitable knockoffs. There are only so many quantum computers in the world, and classical simulations are more readily available. Classical simulations can also control the noise that plagues real quantum processors, which often disrupts quantum operation. Classical simulations may be slower than true quantum simulations, but researchers can still save time by running fewer of them, said Shinjae Yoo, a computer science and machine learning researcher at Brookhaven National Laboratory in Upton, N.Y.
The problem, then, is size. Because one quantum bit in a quantum system is entangled with other quantum bits in that system, the requirement to accurately simulate the system grows exponentially. As a rule of thumb, each additional quantum bit doubles the classical memory required for simulation: from a single GPU to an entire eight-GPU node, that's an increase of three quantum bits.
Many researchers still dream of going further and further down this exponential slope. says Yoo, "Let's say if we are doing a molecular dynamics simulation, we need more atoms and a larger scale to get a more realistic simulation."

cuQuantum Appliance simulates popular quantum algorithms such as Quantum Fourier Transforms, Shor Algorithms, and Quantum Hegemony Circuits on NVIDIA's H100 80GB Tensor Core GPUs 90-369 times faster than CPU implementations on dual Intel Xeon Platinum 8480C CPUs
Web Link:
Now, some behind-the-scenes advances are bringing relief to these bottlenecks. Typically, NVIDIA's cuQuantum software development kit makes it easier for researchers to run quantum simulations across multiple GPUs. Previously, GPUs needed to communicate through the CPU, which created additional bottlenecks, while aggregate communication frameworks like NVIDIA's NCCL allow users to perform operations like memory-to-memory copying directly between nodes.
cuQuantum is paired with quantum computing toolkits like Canadian startup Xanadu's PennyLane, a stalwart in the field of quantum machine learning that lets researchers use techniques like PyTorch on quantum computers. While PennyLane is designed for use on real quantum hardware, PennyLane's developers specifically added the ability to run on multiple GPU nodes.
GPUs are the key foothold.Yoo believes that replacing CPUs with GPUs could increase the simulation speed of quantum systems by an order of magnitude.
On paper, these advances would allow a classical computer to simulate about 36 quantum bits. In practice, simulations at this scale require too much node time to be practical; today, the more realistic gold standard is 20 or more. Still, that's 10 quantum bits more than the order of magnitude the researchers were able to simulate a few years ago.

Can classic hardware continue to grow in size? The challenge is huge. The jump from the NVIDIA DGX with 160GB of GP memory to the NVIDIA DGX with 320GB of GPU memory is just one quantum bit. However, Jinzhao Sun argues that classical simulations that try to model more than 100 quantum bits are likely to fail.
Real quantum hardware, at least on the surface, has long exceeded those bit counts. IBM, for example, has steadily increased the number of quantum bits in its own general-purpose quantum processors to hundreds, with ambitious plans to increase those numbers to thousands.
This doesn't mean that simulation won't play a role in a thousand-qubit future. Classical computers can play an important role in simulating certain parts of large systems: validating their hardware or testing algorithms that may one day run in full-size systems.
As it turns out, 29 quantum bits can do a lot.