What determines how fast a fault-tolerant quantum computer actually runs?

Hey!

The quantum computing industry is obsessed with two numbers: qubit count and gate fidelity. Both matter for getting to fault tolerance. Neither tells you much about what happens after you get there.

Once you're running a fault-tolerant machine, the question becomes: how many logical operations per second can it execute?

That's your clock speed. That's what determines whether your quantum computer finishes a useful calculation in minutes, days, or years.

Multiple things set that clock:

1. QEC cycle time

Every fault-tolerant quantum computer runs error correction in loops. Each loop (=one round of syndrome extraction) is one tick of your quantum clock. Everything else has to fit inside that tick, or wait for the next one.

On superconducting qubits, that tick is about 1 µs. Google's Willow processor runs at 1.1 µs per cycle [1]. On neutral atoms, the best experimental result so far is 4.45 ms [2]. That's a factor of ~4,000 slower. Not because neutral atoms are bad, but the physics of moving and measuring atoms just takes longer than pulsing microwave signals on a chip.

This single number, the cycle time, sets the upper bound on everything.

2. Decoder round-trip

Every QEC cycle produces syndrome data. A classical processor has to read that data, figure out what errors occurred, and tell the system what to correct. If that classical round-trip takes longer than your QEC cycle, it becomes the bottleneck. Your quantum processor sits idle, waiting for a classical computer to catch up.

Right now, this isn't the problem. The fastest reported result is 550 ns total closed-loop latency on an FPGA. That's end-of-readout to start-of-feedback by using a neural network decoder at code distance 3 [3]. It breaks down as 20 ns for syndrome extraction, 124 ns for decoding, and the rest is signal routing. Comfortably under a 1 µs cycle time. IMPRESSIVE!

But here's the catch: that's distance 3. Seventeen data qubits. A surface code at distance 15 has hundreds of data qubits, and syndrome volume scales as d². The decoding problem gets dramatically harder. Google's Willow decoder, running at distance 5, averages 63 µs latency [1]. It still works because you can pipeline, meaning decode round N while round N+1 runs. But the margin is thinner than people assume. Whether decoders can keep up at distance 15+ without becoming the bottleneck is one of the genuinely open questions in the field [4].

3. Magic state production rate

Clifford gates, meaning things like CNOTs, Hadamards, Pauli rotations, are easy in the surface code. You can do them transversally or through lattice surgery. They're basically free.

Non-Clifford gates are the hard ones. T gates, Toffoli gates. These are the gates that actually make quantum computation powerful, and they require a consumable resource called a magic state. Every time you want to execute a Toffoli, you need to have a magic state ready to inject.

How you produce these magic states has been an active research area for years. Distillation (the traditional approach) takes many noisy magic states and purifies them into fewer clean ones. It works, but it's expensive in qubits and time. A newer approach called cultivation uses fault-tolerant measurements to grow a magic state in a single logical qubit. Google demonstrated this experimentally in December 2025: 99.99% fidelity, but only 8% of attempts succeed [5]. That's real, but it's early.

The production rate of magic states directly caps how many non-Clifford gates per second your machine can execute. And most useful algorithms are measured in Toffoli count. Google's ECDLP-256 circuits need 70–90 million Toffolis [6]. If your magic state factory can't feed the algorithm fast enough, your quantum processor stalls.

4. Feedforward latency

Some quantum protocols require classical decisions in the middle of a circuit. You measure a qubit, decode the result, and conditionally apply a gate based on the outcome. Teleportation-based gates work this way. So does magic state injection.

Each step in that chain from measurement, classical processing, to conditional gate, adds dead time. If the chain is slow, it eats into your cycle budget. On superconducting hardware today, feedforward latency is fast enough that it doesn't dominate [3]. But it's another term in the equation, and it won't automatically stay small at higher code distances.

The bottleneck equation

Your logical clock rate is limited by whichever of these four is slowest.

Today, on every platform, the QEC cycle time dominates. Decoders are becoming fast enough. Magic state production is in its infancy. Feedforward works.

But scale up and the picture shifts. Move from distance 3 to distance 15 and the decoder problem grows by two orders of magnitude. Magic state production at useful fidelities requires dedicated qubit factories running in parallel, consuming real estate on your chip. The thing that dominates today may not be the thing that dominates at scale.

What this means in practice

Google's ECDLP-256 resource estimate puts this in concrete terms [6]. Two circuit variants: one using 1,200 logical qubits and 90 million Toffolis, another using 1,450 logical qubits and 70 million Toffolis. Compiled onto a superconducting architecture with surface code error correction, standard assumptions (10⁻³ physical error rate, planar connectivity, 10 µs control reaction time), both run on fewer than 500,000 physical qubits in about 9 minutes.

Nine minutes. For a problem that would take a classical computer longer than the age of the universe.

That estimate assumes all four bottlenecks are resolved: fast cycle times, decoders that keep up, sufficient magic state throughput, and manageable feedforward. It's optimistic. But the framework is right, and the numbers aren't crazy. They follow from known physics and engineering trajectories.

The metric I'd watch

The industry keeps announcing (logical) qubit counts. These numbers get press coverage because they're easy to compare.

But qubit count without context is like advertising a CPU by its transistor count. It tells you the chip exists. It tells you nothing about whether it computes.

Logical operations per second is the number that matters. It bakes in many important metrics onto one: cycle time, decoder performance, magic state throughput, feedforward overhead.

We're not there yet. But the path from here to there runs through these four bottlenecks, and every one of them is an exciting engineering problem to solve!

Until next time,

References

[1] Google Quantum AI, "Quantum error correction below the surface code threshold," Nature (2024). https://www.nature.com/articles/s41586-024-08449-y

[2] Bluvstein et al., "A fault-tolerant neutral-atom architecture for universal quantum computation," Nature (2025). https://www.nature.com/articles/s41586-025-09848-5

[3] "Real-time Surface-Code Error Correction Using an FPGA-based Neural-Network Decoder," arXiv:2605.04892 (2026). https://arxiv.org/abs/2605.04892

[4] "Local Clustering Decoder as a fast and adaptive hardware decoder for the surface code," Nature Communications (2025). https://www.nature.com/articles/s41467-025-66773-x

[5] Google Quantum AI, "Magic state cultivation on a superconducting quantum processor," arXiv:2512.13908 (2025). https://arxiv.org/abs/2512.13908

[6] Google Quantum AI, "Securing Elliptic Curve Cryptocurrencies against Quantum Vulnerabilities: Resource Estimates and Mitigations," arXiv:2603.28846 (2026). https://arxiv.org/abs/2603.28846

Thumbnail Source: Riverlane

What determines how fast a fault-tolerant quantum computer actually runs?

References

Reply