Why IBM is the industry beacon for quantum computing IBM unveils 10,000-word blueprint for ambitious

Why is IBM the industry beacon for quantum computing? Read this article and you'll find out!
With the advent of the quantum processing unit (QPU), IBM is seeing a branching point in the computing paradigm for the first time in history. Extracting the full potential of computing and implementing quantum algorithms with super-polynomial speed will likely require significant advances in quantum error correction technology. At the same time, it is possible to achieve computational benefits in the near future by combining multiple QPUs, error suppression and mitigation to improve the quality of solutions through circuit weaving techniques, and by focusing on heuristic versions of quantum algorithms with progressive acceleration.
To achieve this, the performance of quantum computing hardware needs to improve and software needs to seamlessly integrate quantum and classical processors, which IBM says forms a new architecture - a quantum-centric supercomputer.
01History, Status and Challenges of Quantum Computing
The history of computing is one of advances resulting from the need to perform more complex computations. Increasingly advanced semiconductor manufacturing processes have led to faster, more efficient chips, and special gas pedals like GPUs, TPUs, and AI processors that allow more efficient computation on larger data sets.
Now, for the first time in history, the field of computing is branching out with the advent of quantum computers. When scaled up, quantum computers promise to enable computations that are difficult to achieve with conventional computers: from modeling quantum mechanical systems to linear algebra, factorization, search, and more.
Unlocking the full potential of quantum processors requires the implementation of a large number of calculations. Because the accuracy of quantum gates is substantially lower than that of classical gates, it is believed that error correction is necessary to achieve long-time computations with millions or billions of gates; therefore, most quantum computing platforms are designed with the long-term goal of implementing error-correcting quantum circuits.
An arbitrarily long quantum circuit can perform reliably when the noise rate drops below a constant threshold associated with the architecture, by redundantly encoding each quantum bit and repeatedly measuring the parity bit operator to detect and correct errors. However, the number of quantum bits required to solve the classical puzzle of error-correcting quantum circuits exceeds the size of currently available systems: by several orders of magnitude.
At the same time, as the quality and number of quantum bits in quantum computers continue to grow, we must be able to increase the computational power of quantum circuits. For example, a quantum processing unit (QPU) with 99.99% double quantum bit gate accuracy can implement circuits with several thousand gates and be quite reliable without resorting to error correction techniques, even though such circuits are practically impossible to simulate classically with the help of modern supercomputers. This suggests that computational tasks of commercial or scientific relevance can be done more efficiently, economically and accurately by quantum computing, even without the use of error correction techniques.
For this to happen, three central questions need to be answered.
How to extract useful data from the output of noise-containing quantum circuits in the weak noise regime.
How to design quantum algorithms based on surface circuits that can potentially solve some classical puzzles.
How to improve the efficiency of quantum error correction schemes and make less use of error correction techniques.
Problem (1) is addressed by quantum error mitigation and circuit weaving These techniques scale up the size of quantum circuits that can be reliably executed at a given QPU without resorting to error correction. the IBM team estimated the overhead imposed by state-of-the-art error mitigation methods and discussed recent ideas on how to combine error correction and mitigation. Circuit weaving techniques use structural properties of analog systems, such as geometric localization, to break up large quantum circuits into smaller subcircuits or to combine the results produced by multiple QPUs.
Classical simulation algorithms used in computational physics or chemistry are often heuristic and work well in practice, although they do not have strict performance guarantees. Therefore, rigorous quantum algorithms designed to simulate time evolution need to have lower-cost heuristic versions adapted to recent QPUs; such algorithms would solve problem (2).
To approach problem (3), IBM discusses surface codes called "low-density parity-check quantum codes" (LDPC). These codes can fit more logical quantum bits into a given number of physical quantum bits, so that as the size of the quantum circuit grows, only a constant fraction of the physical quantum bits are used for error correction. These more electronic codes require long-distance connections between quantum bits embedded in a two-dimensional grid, but the benefits of electronicization are expected to outweigh the cost of long-distance connections.
IBM then discusses quantum-centric supercomputing: a new architecture for implementing error mitigation, circuit weaving, and heuristic quantum algorithms for a wide range of classical computations. At the heart of this architecture is the integration and modularity of classical and quantum. Real-time classical integration is needed to tune quantum circuits (dynamic circuits) on top of classical computations, to eventually implement error correction, and to implement circuit weaving and advanced compilation at compile time. And, modularity is needed to enable workflow scaling and acceleration through the use of parallelization. Finally, IBM discusses the requirements for the quantum stack, by defining different layers to integrate classical and quantum computing, thus defining the requirements for latency, parallelization and computational instructions.
From this, a cluster-like architecture - a quantum-centric supercomputer - can be defined. It consists of many quantum computing nodes, including classical computers, control electronics and QPUs. a quantum runtime can be executed on a quantum-centric supercomputer, working in the cloud or on other classical computers, running in parallel.
Of course, realizing the computing power of these machines will require the collaborative efforts of engineers, physicists, computer scientists, and software developers.
02Towards practical quantum circuits
Although in principle quantum computers can replicate any computation performed on conventional hardware, the vast majority of everyday tasks are not expected to benefit from the effects of quantum mechanics. However, the use of quantum mechanics to store and process information can lead to significant speed gains for certain applications.
Of particular interest is that in some tasks, the runtime of a quantum algorithm is an exponent of the problem size n, such as n2 or n3, while the runtime of the best known classical algorithm for solving the problem grows faster than any constant for n, such as 2n or 2√n. We define the runtime as the number of elementary gates of the circuit (or line) that implements the algorithm in a given problem instance. As the problem size n grows, the more favorable scaling of the quantum runtime rapidly compensates for the relatively high cost and slow speed of the quantum gates. From a purely theoretical point of view, these exponential or super-polynomial accelerations are fascinating and provide a compelling practical reason for advancing quantum technology.
Examples of known tasks with exponential quantum speeds include simulations of quantum many-body systems, number-theoretic problems such as integer factoring, solving certain types of linear systems, computation of Betti numbers for topological data analysis, and computation of topological variables of junctions and links.
The simulation of quantum many-body systems has regained the most attention due to its numerous scientific and industrial applications and as an original value proposition for quantum computing. The ground state and thermal equilibrium properties of many-body systems can usually be understood by classical heuristic algorithms such as dynamic mean-field theory (DMFT) or perturbation methods; however, understanding their behavior away from equilibrium in systems governed by coherent dynamics or performing high precision ground state simulations of strongly interacting electrons (e.g., quantum chemical simulations) is a well-known challenge for classical computers.
2.1. quantum error correction
One reason for the ubiquity of traditional classical computers is their ability to reliably store and process information. Small fluctuations in charge or current in microchips can be tolerated due to the highly redundant representation of the collective states of many electrons to logical 0 and 1 states. Quantum error correction codes provide a similar redundant representation of quantum states, protecting them from certain types of errors.
A single logical quantum bit can be encoded into n physical quantum bits by specifying a pair of orthogonal n quantum bit states |0〉 and |1〉, called logic 0 and logic 1. A single quantum bit state α|0〉 + β|1〉 is encoded by the logical state α|0〉 + β|1〉. If any operation affecting fewer than d quantum bits cannot distinguish between logical states |0〉 and |1〉 or map them to each other, then the code has distance d. More generally, a code may have k logical quantum bits encoded into n physical quantum bits, and the code distance d quantifies how many physical quantum bits need to be destroyed before the logical (encoded) state is destroyed. Thus, a good code has a large distance d and a large encoding rate k/n.
Stabilizer-type codes are by far the most studied and promising family of codes. A stabilizer encoding is defined by an exchange list of multi-quantum bit bubble observation variables called stabilizers such that the logical states are the +1 eigenvectors of each stabilizer. We can think of stabilizers as quantum analogues of classical parity bits. The purpose of the syndrome measurement is to identify those stabilizers whose eigenvalues fly out due to errors. The eigenvalues of each stabilizer are measured repeatedly and the result, called the error syndrome, is sent to the classical decoding algorithm. Assuming that the number of faulty quantum bits and gates is very small, the error syndrome provides enough information to identify the errors (modulo stabilizers). The decoder can then output the operations that need to be applied to restore the original logical state.
Most codes designed for quantum computing are of the LDPC type, which means that each stabilizer acts on only a small number of quantum bits, with each quantum bit participating in a small number of stabilizers. The main advantage of quantum LDPC codes is that syndrome measurement can be done with a simple constant depth quantum circuit, which ensures that syndrome information is collected frequently enough to cope with the accumulation of errors. In addition, the errors caused by the syndrome measurement circuit itself are very benign, since the circuit can only propagate errors within a "light cone" of constant size.
A code must satisfy several requirements to be used in quantum computing.1) First, it must have a sufficiently high error threshold: the maximum level of hardware noise it can tolerate; if the error rate is below the threshold, the lifetime of the logical quantum bits can be made particularly long by choosing a sufficiently large code pitch. Otherwise, errors can accumulate faster than the coding can be corrected, and the logical quantum bits become less reliable than the constituent physical quantum bits.2) Second, we need a fast decoding algorithm that performs error correction in real time as the quantum computation proceeds. This can be a challenge because the decoding problem for general stabilizer codes is known to be an NP-hard problem in the worst case.3) Third, we must be able to perform computations on logical quantum bits without compromising conservation.
So far, 2D surface codes are considered to be the undisputed leader in terms of error thresholds. It is close to 1% for the commonly studied depolarization noise, but it has two important drawbacks. First, assigning a physical quantum bit patch of approximately d × d to each logical quantum bit incurs a significant overhead. Unfortunately, it has been shown that the coding rate k/n = O(1/d2) of any two-dimensional stabilizer code vanishes at large code spacing. This means that the coding rate approaches zero as one increases the degree of protection provided by the surface code (quantized by the code distance d). That is, as the size of the quantum circuit increases, the vast majority of physical quantum bits are used for error correction. This is a known fundamental limitation of all quantum codes that can be implemented locally in two-dimensional geometries.
To make error correction more practical and to minimize the overhead of quantum bits, codes with large coding rates k/n are preferable. For example, quantum LDPC codes can achieve a constant coding rate independent of the coding size. In fact, the coding rate can be arbitrarily close to 1. As a comparison, two-dimensional surface codes have an asymptotically vanishing coding rate and a distance of at most √n. Some LDPC codes have a beneficial feature known as single error correction. They provide a highly redundant set of low-weighted Bubbleley observations (called gauge operators) that can be measured to obtain a more reasonable error syndrome. This reduces the syndrome measurement period per logic gate from O(d) to O(1), resulting in very fast logic gates. The syndrome measurement circuit for quantum LDPC codes requires quantum bit connectivity as dictated by the stabilizer structure, i.e., it must be possible to couple quantum bits participating in the same stabilizer. Known examples of LDPC codes with single error correction require 3D or 4D geometries.
A second drawback of surface codes is the difficulty of implementing a computationally general set of logic gates. Surface codes and their variants, such as cellular codes or folded surface codes, can be implemented with low overhead for logical Clifford gates such as CNOT, Hadamard H, and phase shift S. These gates can be implemented by changing the mode of the stabilizer for each time step measurement using a code-deformation approach. However, Clifford gates are not computationally universal gates per se. A common strategy to achieve universality is based on preparing a logical auxiliary state picture - the magic state. The magic state is equivalent to (Clifford operation) a single quantum gate Clifford+T gate set is universal and has a rich algebraic structure that allows quantum algorithms to be compiled more precisely and close to optimality.
Unfortunately, the overhead of extracting energy-efficient magic states is surprisingly large. Several strategies have recently been proposed to reduce this overhead, including high-yield magic state extraction methods, better strategies for preparing "raw" noisy magic states to reduce the req. of extraction rounds, and better surface code implementations of extraction circuits. It remains to be seen whether these methods are competitive with magic state extraction methods.
2.2. error suppression
Although error correction is crucial for implementing large-scale quantum algorithms with powerful computational capabilities, it may be overkill for small- to medium-scale computations; a limited form of correction for shallow quantum circuits can be achieved by combining experimental results from multiple noise-containing qubits to remove the noise contribution to the quantity. These approaches, collectively known as error mitigation, are well suited to today's QPUs because they introduce little overhead in the number of quantum bits and only a small overhead in additional gates.
However, the cost of error mitigation is an increase in the number of circuits (experiments) that need to be performed. In general, this leads to exponential overhead; however, with improvements in hardware and control methods, the base of the exponent can be close to 1 and each experiment can be run in parallel. In addition, known error mitigation methods are only available for a limited class of quantum algorithms that use the output state of a quantum circuit to determine the expected value of an observation.
Probabilistic error cancellation (PEC) aims to approximate an ideal quantum circuit by a weighted sum of noisy circuits that can be implemented on a given quantum computer. The weights assigned to each noisy circuit can be computed analytically if the noise in the system is well characterized, or learned by mitigating the errors in the circuit training set, and these circuits can be simulated classically.
A discussion of existing error mitigation proposals can be found in reference [1].IBM expects that the adoption of PEC will increase due to recent theoretical and experimental advances in quantum noise metrology; error mitigation will continue to be important when error correcting QPUs with 100 or more logical quantum bits become available.
2.3. circuit weaving
We can extend the scope of recent hardware to compensate for other deficiencies, such as limited quantum bits or quantum bit connectivity, by using circuit weaving techniques.
This refers to the process of simulating small quantum circuits on a quantum computer and stitching the results into an estimate of a larger quantum circuit. As in the case of error mitigation, known circuit weaving techniques are applicable to a limited class of quantum algorithms whose purpose is to estimate the expected value of an observable.
The best-known example is circuit cutting. In this approach, a large quantum circuit is approximated by a weighted sum of circuits consisting of small equally spaced subcircuits. Each subcircuit can be executed separately on a small QPU. The overhead introduced by this approach (measured in terms of the number of circuit repetitions) grows exponentially with the number of double quantum bit gates or quantum bits that need to be cut in order to achieve the desired circuit partitioning. Surprisingly, it has recently been shown that the number of circuit cuts can be greatly reduced by running isometric subcircuits in parallel using non-interactive QPUs that can only exchange classical data. This approach requires hardware capable of implementing dynamic circuits in which the control electronics are extended to include independent QPUs.
A second example is entanglement forging, where entangled variational states are decomposed into a weighted sum of product states, or where entanglement between a pair of quantum bit registers is converted to a time-like correlation within a single register. The overhead of this approach typically grows exponentially with the amount of entanglement in the chosen system partition.
A third example, closely related to circuit braiding, uses an embedding approach to decompose the simulation of a large quantum many-body system into smaller subsystems that can be simulated separately at the QPU. The interactions between the subsystems are illustrated by introducing an effective "bath" (bath), which can be either a classical environment or another small quantum system. The decomposition of the original system and the manipulation of the bath parameters are performed on a classical computer that can exchange classical data with the QPU. Well-known examples of quantum embedding methods built on classical counterparts are dynamic mean-field theory, density matrix embedding, and density functional embedding.
2.4. Heuristic quantum algorithms
Heuristic quantum algorithms can be used in the near future to solve classical optimization, machine learning, and quantum simulation problems. These algorithms fall into two categories: those using kernel methods (kernel algorithms) and variational quantum algorithms (VQA). Quantum kernel methods have also been discovered, leading to provable speedups and extensions to a class of data kernels with group structure. For VQA, the basic proposal is very simple: an experimentally controlled test state is used as a variational wave function to minimize the expected energy of a given quantum Hamiltonian or a classical cost function encoding the problem of interest. The test state is usually defined as the output state of a shallow quantum circuit defining the rotation angles of individual gates as variational parameters; these parameters are tuned by a classical feedback loop to optimize the chosen cost function.
Currently, there is no mathematical evidence that VQA can outperform classical algorithms in any task. In fact, it is well known that VQA based on a suitably shallow (constant depth) variational circuit with two- or three-dimensional quantum bit connectivity can be properly simulated on a classical computer, which precludes a quantum advantage. Meanwhile, the performance of VQA based on deep variational circuits is severely weakened by noise. However, as the QPU error rate decreases, scientists should be able to perform VQA at an intermediate stage where quantum circuits are already difficult to simulate classically, but the effects of noise can still be mitigated.
2.5. Summary
In conclusion, the best opportunity to gain quantum advantage is to focus on problems where exponential (super-polynomial) quantum acceleration can be achieved. Although quantum algorithms that achieve such acceleration by formal proof are out of reach for near-term hard devices, its very existence is a convincing proof that quantum mechanical effects (such as interference or entanglement) are beneficial for solving the chosen problem.
Second, the only known way to implement large-scale quantum algorithms is to rely on quantum error-correcting codes. Existing techniques based on surface codes are not satisfactory because of their low coding rate and the high cost of logical non-Clifford gates. Addressing these drawbacks may require advances in quantum coding theory, such as the development of high-threshold fault-tolerant protocols based on quantum LDPC codes and improved quantum bit connections beyond the two-dimensional lattice of the QPU. Complementing error correction with cheaper alternatives, such as error mitigation and circuit braiding, may provide a more scalable way to implement high-latency quantum circuits.
Third, near-term quantum advantages should be possible by exploring lower-cost, possibly heuristic, versions of the algorithm. These heuristic quantum algorithms lack rigorous performance guarantees, but they may be able to demonstrate the quality of a solution after the fact and provide a way to solve problems that cannot be simulated classically.
IBM says it believes these general guidelines determine the future of quantum computing theory and will lead us to important demonstrations of its benefits for solving scientifically important problems in the coming years.
03The road to large quantum systems
The above perspective leads to the challenge of quantum hardware. IBM believes that a hybrid approach using error mitigation, circuit weaving and heuristics will have near-term advantages. In the longer time frame, systems with partial error correction will be key to running more advanced applications, and further down the road, fault-tolerant systems running on LDPC codes with non-local checks that have not yet been fully developed will be key. The first step in all these approaches is the same: we need more quantum bits of hardware capable of high latency operations; we need a tight combination of fast classical computation to handle the high run rate of circuits required for error mitigation and circuit weaving, and the classical overhead of error correction algorithms afterwards.
This motivates us to identify a hardware path that starts with early heuristic small quantum circuits and progresses until reaching an error-correcting computer.
3.1. Periodicity of learning
The first step on this path is to build systems capable of demonstrating near-term advantages in error mitigation and limited forms of error correction.
Just a few years ago, the scale of QPUs was limited by the cost and availability of control electronics, I/O space, the quality of the control software, and a problem known as "breaking the plane" [2], i.e., routing microwave control and readout lines to the quantum at the center of a dense array of bit. Today, these solutions, which directly affect the scaling barrier, have been demonstrated, enabling us to increase the number of quantum bits above 100, beyond the threshold where quantum systems become difficult to simulate classically, and examples of quantum dominance become possible.
The next important milestones are (1) increasing the delay of QPUs to enable exploration of near-term quantum circuits with limited error correction for quantum dominance, and (2) increasing the connectivity of quantum bits in more than two dimensions: either by modifying the sparse connectivity of gates, non-microscopic topologies, or by increasing the number of layers of quantum signals in three-dimensional integration to enable long-term exploration of effective non-two-dimensional LDPC error correction coding. These developments are all needed for our long-term vision, but can be pursued simultaneously.
The work to improve the quality of quantum systems by improving gate delays involves many cycles of learning, experimenting with coupling schemes, process changes, and innovations to control coupling and crosstalk. Scaling this work to large QPUs capable of demonstrating quantum advantages, and eventually to the extreme system scales we anticipate in the distant future, requires combining different technologies with sufficient reliability and skill to make scale limited by cost and demand, rather than technical capability. This increases the reliability, predictability, and manufacturability challenges of QPUs while continuing to incorporate improved technologies into these complex systems; at the same time, the increased development, manufacturing, and testing time for large systems creates a lag in the innovation cycle that must be overcome.
Manufacturing cycle time increases with the complexity of the QPU. Many simple cross-gate QPUs require only single-layer lithography and can be easily fabricated in a day or two. Even IBM's original 5 and 16-bit QPUs for external cloud quantum systems involved only two lithography steps and took a week to fabricate. In contrast, more advanced packaging solutions, such as those at MIT Lincoln Laboratory or IBM's newer "Eagle" QPU, involve dozens of lithography steps and slow process steps that take months to build using unique tools in a research-based facility. The increased cycle time makes it more difficult to achieve the required fidelity and coherence time, as well as to debug fabrication and assembly to obtain reliable QPU yields.

An example of a scheme that can break the signaling plane compatible with the integration of hundreds of quantum bits. It is composed of technologies adapted to conventional CMOS processing.
Reliability in semiconductor manufacturing is not a new issue. In general, conventional semiconductor technologies integrated on a chip have been studied most intensively among the unique component challenges of building scaled machines. Incorporating them into superconducting technologies is less a matter of inventing new methods than of ensuring that related processes are compatible with each other. However, the rapid growth of the volumes we expect to need is a major challenge.
Many failure modes in superconducting quantum systems are undetectable until the QPU is cooled to its operating temperature (below 100 mK). This is a serious bottleneck that makes in-line testing (testing of device subassemblies for key metrics before the QPU build is complete) and process feedforward (future process steps are modified to correct small deviations from earlier steps and stabilize total device performance) difficult or impossible. There are exceptions, where it is feasible to tie simple measurements to final QPU performance at room temperature: for example, resistance measurements of a Josephson junction can accurately predict its critical current and thus the frequency of the quantum bits fabricated with it, a key parameter for fixed frequency systems.
We can use these statistical correlations to make rapid progress in certain parts of the process or in post-process tuning.
If these correlations are not available, we can use a simplified test vehicle; for example, when trying to improve the coherence of quantum bits, we can use a simplified device for good statistics and fast processing, rather than using an entire complex signal transmission stack. Nevertheless, identifying the specific steps that lead to coherence improvement is not an easy task. In materials processing, it is rarely possible to change just one parameter. Changing the metal in a quantum bit may also change the etching parameters, the chemicals compatible with the metal for subsequent processing, or even the allowable temperature range. Once an improved process has been found, it is difficult to determine exactly which steps are critical and which are only expedient.
When conducting materials research, we must collect a large amount of statistical data to make the results meaningful and to provide sufficient certainty. We should carefully document any relevant process splits, and we should publish material process changes that lead to neutral or even negative results, not just publish highly successful work.
Similar difficulties occur in the study of non-material-based devices. Some gates work well between pairs of quantum bits, but exhibit strong coupling, making them unsuitable for larger QPUs or compromising single-quantum-bit performance. Three and four quantum bit experiments are no longer challenging from a technical or budgetary point of view. To be relevant to larger QPUs, research needs to move away from two-qubit demonstrations, especially experiments between single pairs of quantum bits, where many of the critical flaws can be masked by luck.
The mix of long-cycle complex equipment and short-cycle test tools for sub-process development and quantum manipulation is key to continue improving the quality of QPUs and provides a recipe for sustained R&D contributions as the largest QPUs begin to exceed the capabilities of smaller groups and labs. Nonetheless, there is still a need to reduce long cycle times. Some of these will come naturally: First-in-class processes and QPUs often take longer because they tend to include front-end steps, inspections, and in-line testing, which may not be necessary, although general best practices suggest it. While counterproductive from a cost perspective, building the "same" QPU repeatedly to solve manufacturing problems and speed up the innovation cycle may be a successful strategy for the largest QPUs with the most complex manufacturing processes.
3.2. Supporting Hardware
Scaling to larger systems also requires extending the classic control hardware and input/output (I/O) chain to and from the cryogenic chamber. This I/O chain, while still requiring significant customization for the exact QPU being controlled, consists of a large number of more traditional devices; for example, isolators, amplifiers, scaled signaling systems, and more exotic alternatives such as non-ferrite isolators and quantum-limited amplifiers that may improve performance, cost, or size. These components have tremendous potential for sharing across groups pursuing quantum computing and, in some cases, can be purchased commercially.
However, assembling these systems at the scale currently required requires high-capacity cryogenic test capabilities that do not currently exist in the quantum ecosystem, creating a short-term need for vertically integrated quantum system fabrication. The challenge here is to establish a supplier and test ecosystem capable of scaled, low-cost production: a challenge made very difficult by the somewhat speculative nature of the demand.
There is also only one component per system; for example, each quantum computer we deploy requires only one dilution chiller, or in many cases a part of it. Manufacturers of dilution chillers effectively act as system integrators for cryocoolers, wiring solutions, pumping systems, and even some auxiliary electronics. If we can standardize these interfaces so as to maintain the flexibility we need to change quickly as the system scales, it would be easy to implement, for example, a move to a more scalable cooling technology at 4K that would not require redesigning the entire cooling infrastructure.
Currently, each group building a large QPU has its own custom control hardware. Given the completely different control models and requirements, it is unlikely that the analog front ends of these systems can be shared. However, for all types of quantum computers, not just solid-state computers, low-cost and low-power sequencing logic (branching, local and non-local conditions, loops) is required. When we scale to thousands of quantum bits or even more, these may need to be built into a custom processor: an Application Specification Integrated Circuit or ASIC. On top of that, the software that translates quantum circuits into an underlying representation of that control hardware is becoming more and more complex and expensive to produce.
Reducing the cost favors the creation of a generic control platform with a custom analog front-end, and open-spec control protocols like OpenQASM3 [3] are already paving the way for this transformation.
3.3. classical parallelization of quantum processors
To reach the near-term quantum advantage, techniques such as circuit weaving and error mitigation need to be exploited to efficiently scale the capabilities of QPUs: simulating more quantum bits or higher densities with additional circuit executions. These problems can be satisfactorily parallel, where individual circuits can execute completely independently on multiple QPUs, or can benefit from the ability to perform classical communication between these circuits that span multiple QPUs. Introducing control hardware capable of running multiple QPUs as if they were individual QPUs with shared classical logic, or splitting a single QPU into multiple virtual QPUs to enable classical parallelization of quantum workloads, are important near-term techniques for extending this advantage to the limit.
In the long term, these technologies will play a key enabling role when we start to build quantum systems that span multiple chips and multiple cryogenic boxes, i.e., modular quantum systems.
3.4. Modularity
The introduction of modular quantum systems will be key in guiding us from near-term quantum dominance to long-term error-correcting quantum systems.
These systems have repeating cells that can be replaced if defective, and quantum links between chips to entangle cells or perform remote gates. This approach simplifies the design and testing of QPUs and allows us to scale quantum systems at will.
In the short term, given limited or no error correction, unit cells will require high bandwidth and high density links to connect them: there is not enough time to use complex protocols such as entanglement distribution techniques. The simplest proposal to achieve this goal is to extend the quantum bus to the chip, allowing the same gates to be used between distant chips as for a single processor. This "dense modularity" effectively extends the size of the chip. This requires connecting adjacent chips with ultra-low-loss, low-crosstalk lines that are short enough to achieve effective single-mode: the distance between chips must be the distance between quantum bits on a single chip. Some techniques from classical computing hardware may be applicable to this problem, but to increase the flexibility of replacing individual cells, other alternatives are needed.

In addition to the classical parallelization of QPUs shown in (a), long-range quantum connections have high drawbacks in terms of gate speed and delay. As shown in (b)-(e), a large quantum system with high latency may involve three levels of modularity: a very short-distance modularity m, allowing the decomposition of a QPU into multiple chips with minimal gate speed and latency cost. A longer distance connection l, used in a single cryogenic environment to bypass I/O bottlenecks and allow for non-subtle topologies or routing, and a very long distance optical "quantum network" t, allowing nearby QPUs to work together as a single quantum computing node (QCN). We also need on-chip non-local couplers c, as shown in (b), for exploring LDPC codes. In this diagram, the pink line represents quantum communication and the purple line represents classical communication.
The high density of quantum bits in this "dense module" creates a space bottleneck for classical I/O and cooling. Recent proposals to ameliorate this problem include developing high-density connectors and cables, transferring classical signals on and off the chip, and adding multiplexing controls in the time and frequency domains. A longer-term approach to solving this problem is to improve the connection of quantum bits by using modified gates performed on long conventional cables, called l modularity.
In addition to freeing us from control and cooling bottlenecks, these long-distance couplers enable non-two-dimensional topologies, thus not only reducing the average distance between quantum bits, but also opening the door to explore more efficient non-two-dimensional LDPC error correction codes. Thus, developing these long-range couplers not only allows us to extend our near-term system, but also begins to form the basis of how to build quantum systems with multiple QPUs.
Once the techniques for dense modular and long-range couplers are developed and optimized, they will eventually be ported back to quantum bit chips for non-local, non-2D connectivity. Non-local couplers on these chips will eventually allow the implementation of high rate LDPC codes, enabling the completion of our long-term vision.
Finally, connecting multiple quantum computers in an ad hoc fashion will allow us to create larger systems as needed. In this "quantum network" approach, signals are typically envisioned as leaving the dilution chiller, enabled by long-term technological advances using photonic t-links for microwave-to-optical transmission between different chillers.

Types of modularity in long-term scalable quantum systems
A practical quantum computer will likely have all five types of modularity described above: classical parallelization, dense chip-to-chip expansion of a two-dimensional lattice (m), sparse connections with non-microscopic topologies within dilute chillers (l), non-local on-chip coupling for error correction (c), and long-range chiller-to-chiller quantum network work (t). The optimal feature size for each level of modularity is an open question. Individual chip-to-chip modules will still be made as large as possible, maximizing latency and connection bandwidth. Computing on such a system with multiple layers of connections is still a research and development problem.
Modularity needs to occur not only at the scale of the QPU, but at all levels of the system. A modular classification control system allows for easy testing, replacement and assembly of subsystems. It is much easier to create a test infrastructure for a large number of small modules each year than a single, repeatable working monolith. The same is true for refrigeration systems, with the added benefit that it is impractical to transport and deploy a single large refrigeration system. A large number of our current points of failure are in I/O and signaling, so modular solutions that can replace subassemblies are essential; the challenge here is to move the replaceable units from a single unit (cable) to a larger unit (a flexible ribbon cable or other cable assembly).
While the jury is still out on module size and other hardware details, it is certain that the utility of any quantum computer is determined by its ability to solve useful problems with quantum advantage; ultimately, the capabilities provided by the hardware are realized through software, which must be able to program the machine in a flexible, simple, and intuitive way.
04The Future: Quantum-Centric Supercomputers - The Quantum Stack
For quantum computing to succeed in changing the meaning of computing, we need to change the architecture of computing. Quantum computing will not replace classical computing, but rather become an important part of it. IBM believes that the future of computing is quantum-centric supercomputers, where QPUs, CPUs, and GPUs all work together to accelerate computation. In integrating classical and quantum computing, it is important to determine (1) latency, (2) parallelism (both quantum and classical), and (3) what instructions should run on quantum versus classical processors. These points determine the different levels of classical and quantum integration.
Before we get to the stack, we need to redefine quantum circuit: a quantum circuit is a computational program that includes coherent quantum operations on quantum data (e.g., quantum bits), as well as concurrent (or real-time) classical operations. It is an ordered sequence of quantum gates, measurements and resets, which may be conditional and use data from real-time classical calculations. It can be represented at different levels of detail: from abstract unitary operations to setting the exact timing and scheduling of physical operations.

Circuits can be represented at different levels. Unit blocks represent circuits in the library. These circuits can be decomposed into parametric circuits using generic gate circuit sets. Parameterized physical circuits use hardware-supported physical gates, while predefined circuits specify timing, calibration, and pulse shapes.
This is sufficient to represent circuit models, measurement models and computed adiabatic models, as well as special programs such as teleportation. In addition, circuits can be represented at different levels: cells (blocks of cells that can represent circuit libraries such as quantum phase estimation, classical functions, etc.), standard decompositions (reduced to a set of generic gates or expressing classical functions as reversible gates), parametric physical circuits (using hardware-supported physical gates that may include auxiliary quantum bits not used in the circuit, or parameters that can be easily updated in real time), and predefined circuits ( complete timing formation, calibrated gates, or gates that specify pulse shapes).

The quantum software stack consists of four layers, each of which is targeted to perform different levels of work most efficiently. The bottom layer focuses on the execution of quantum circuits. On top of it, the quantum runtime wisely integrates classical and quantum computing, executes raw programs, and implements error mitigation or correction. The layer further up (quantum serverless) provides a seamless programming environment, offering integrated classical and quantum computing through the cloud without burdening developers with infrastructure management. Finally, the top layer allows users to define workflows and develop software applications.
With this extended definition of quantum circuits, it is possible to define a software stack. The diagram above shows a high-level view of the stack, where we have defined four important layers: dynamic circuits, quantum runtime, quantum-free servers, and software applications. At the bottom level, the software needs to focus on executing the circuit; at this level, the circuit is represented by Controller binaries files, which will be very dependent on the superconducting quantum bit hardware, the conditional operations and logic supported, and the control electronics used. It needs the control hardware to be able to move data between different components with low latency while maintaining tight synchronization. For superconducting quantum bits, real-time classical communication will require a delay of about 100 nanoseconds. To achieve this latency, the controller would be located very close to the QPU.
Today, controllers are built with FPGAs to provide the required flexibility, but as we move into larger numbers of quantum bits and more advanced conditional logic, we will need ASICs or even cold CMOS. we call the next layer the quantum operating layer. This is the core quantum computing layer. In its most general form, we expect a quantum computer to run quantum circuits and produce non-classical probability distributions at its output.
Thus, most of the workload is sampled from the distribution or estimated properties. Thus, the quantum runtime needs to include at least two primitive programs: a sampler and an estimator. The sampler collects samples from the quantum circuit to reconstruct the quasi-probability distribution of the output; the estimator allows the user to reasonably calculate the expected value of the observations.
The circuit sent to the runtime will be a parameterized physical circuit. The software will perform a run-time compilation and process the results before returning the corrected results. The runtime compilation will update the parameters, add error suppression techniques such as dynamic decoupling, execution time scheduling and gate/operation parallelization, and generate the control program code. It will also process the results with error mitigation techniques and perform error corrections in the future. The circuit execution time may be as low as 100 microseconds (error correction may even be 1 microsecond), which is not possible on the cloud. It will need to be installed in the cloud as part of that system. It will need to be installed as part of a quantum computer.

Example of a quantum serverless architecture integrating quantum and classical computing. Quantum runtime is illustrated by estimator primitives. Cloud computing is illustrated by general classical computing. Specialized classical computing such as High Precision Computing (HPC) or Graphics Processing Units (GPUs) can be integrated into serverless architectures. In circuit cutting, a specialized classical computer is used to partition a larger circuit into many smaller circuits. For each smaller circuit, an estimator primitive (E1 , - - , EN) is executed and, if needed, a classical computation routine can be used to tune future circuits based on the results of previous estimators. This process can be repeated as needed. In entanglement synthesis, a 2N-bit wave function is decomposed into more N-bit circuits. Entanglement synthesis may need to be offloaded to a dedicated classical processor. For each N -quantum bit circuit, an estimator EN is executed and combined to obtain the global result. This process can be repeated if used in a variational algorithm. Quantum embedding separates the subparts of a problem that can be simulated classically from those that are the most computationally expensive and require quantum computation. A dedicated classical computer can be used to reconcile the previous results of the problem. The quantum simulation uses an estimator EN running on a QPU. the estimator can be used to constrain the prior results of the quantum circuit with classical computations running on a general classical processor. Overall, this set of tools allows larger systems to be simulated with higher accuracy.
At a third level, the software can combine advanced classical computing with quantum computing. As described earlier in this paper, the introduction of classical computation allows for ideas such as circuit weaving. Here, we need to be able to call quantum primitive programs as well as perform classical computations such as circuit partitioning. We call this a workflow (an example of a workflow for circuit weaving is shown above). We call quantum serverless the software architecture and tools that support this approach and allow developers to focus only on the code and not on the classical infrastructure. In addition to circuit weaving, this layer will allow advanced circuit compilation, which may include synthesis, layout and routing, and optimization: all of these are part of circuit simplification that should happen before sending the circuit for execution.
Finally, at the highest level of abstraction, the network platform must allow users to develop software applications in a conscious manner. These applications may require access to data and resources beyond what is needed for quantum computing itself, but need to provide the user with a solution to a more general problem.
Each layer of the software stack we have just described brings a different set of classical computing requirements to quantum computing and identifies a different set of requirements for different developers. Quantum computing needs to enable at least three different types of developers: kernel, algorithm, and model developers. Each developer creates software, tools, and libraries that provide support for each of these layers, thus increasing the coverage of quantum computing.
The kernel developer focuses on enabling quantum circuits to run at high quality and speed on quantum hardware. This includes integrating error suppression, error mitigation, and eventually error correction into the runtime environment and returning simplified application programming interfaces (APIs) to the next layer.
Algorithm developers combine quantum runtime with classical computing to implement circuit weaving and build heuristic quantum algorithms and circuit libraries; the goal is to achieve quantum dominance. Finally, as we show examples of quantum dominance, model developers will be able to build software applications that find useful solutions to their domain-specific complex problems, enabling companies to derive value from quantum computing. The figure below summarizes the types of developers involved in each layer of the software stack, as well as the time scales involved, depending on the type of work performed and the distance from the hardware at which each developer works.

The time scales and resources involved in quantum computing depend on the needs of different types of developers and the level of abstraction at which they work. Quantum researchers and nuclear developers work closer to hardware, while model developers require the highest level of software concepts.
In putting all of this together and extending it to what we call a quantum-centric supercomputer, we do not see quantum computing integrated with classical computing into a single architecture. Instead, the figure below illustrates this integrated architecture as a cluster of quantum computing nodes coupled to classical computing. The darker the color, the closer the classical and quantum nodes are located to reduce latency.
A clustered architecture model that integrates classical processors and QPUs to address latency, parallelization, and instruction allocation between classical and quantum processors. The darker the color, the lower the required latency.
Threaded runtime allows execution of the original language on multiple controllers. Real-time classical communication between controllers can be used to implement functions such as circuit cutting. The figure also shows how a future QPU with quantum parallelization (l and t coupler) could be controlled by a single controller. IBM envisions that there may be workloads that require near-time classical communication (i.e., calculations based on circuit results that must be completed in about 100 microseconds) or shared state between primitives, implemented by data structures. Finally, the coordinating body will be responsible for workflows, serverless, nested programs (common classical + quantum program libraries), circuit weaving toolboxes, and circuit compilation.
05
Conclusion: short, medium and long term vision of quantum computing
IBM has described quantum advantages that can be realized in the next few years on a number of scientifically relevant problems, a milestone that will be achieved by.
(1) Focusing on problems that allow super-polynomial quantum acceleration and advancing theoretical design algorithms: possibly based on the jurisprudence of intermediate depth circuits that can outperform state-of-the-art classical methods.
(2) Use a set of error mitigation techniques and hardware-aware software improvements to maximize the quality of hardware results and extract useful data from the output of noisy quantum circuits.
(3) Improving the hardware to increase the fidelity of the QPU to 99.99% or higher.
(4) Modular architecture design that allows circuits to perform parallelization (with classical communication). Error reduction techniques with mathematical performance guarantees, such as PEC, despite having an exponential classical processing cost, provide a means to quantify the expected running time and the processor quality required for quantum dominance.
These are the near-term prospects for quantum computing.
Advances in the quality and speed of quantum systems will improve the exponential classical processing cost required for error mitigation schemes, and the combination of error mitigation and error correction will drive a gradual transition to fault tolerance. Classical and quantum computing will be tightly integrated, orchestrated, and managed through serverless environments that allow developers to focus only on the code and not the infrastructure.
This is the medium-term future of quantum computing.
Finally, we have seen that achieving polynomial runtimes for large-scale quantum algorithms for full practical applications requires quantum error correction, and that error correction methods like surface codes do not meet long-term needs due to their inefficiency in implementing non-Clifford gates and low coding rates. We outline the way forward provided by the development of more efficient LDPC codes with high error thresholds and the need for modular hard devices with non-2D topologies to investigate these codes.
This more efficient error correction is the long-term future of quantum computing.
https://arxiv.org/abs/2209.06841
Reference Links:
[1]https://journals.jps.jp/doi/full/10.7566/JPSJ.90.032001
[2]https://www.nature.com/articles/s41534-016-0004-0[3]https://dl.acm.org/doi/full/10.1145/3505636