At last yearâs FOCS I participated in a workshop on quantum learning, which included an open problem session. Here are the open problems I contributed. Check out the complete list for more, including the more âclassicâ problems in the field (e.g. entanglement testing, shadow tomography).
The questions here are mathematically precise, but I donât have ideas for how to prove them. David Aldous would call these âType 2â problems. For clarity, I wonât explain the mathematical formalism of Hamiltonians, etc, but the glossary at the bottom of this page will contain precise versions of the terms Iâm using.
Why work on quantum learning?
When I say quantum learning here, I mean questions in the following vein: we have an unknown quantum system. How can I run experiments on this quantum system to learn its behavior efficiently?
There are a bunch of standard motivations for these questions which Iâve explained in more detail in my papers: (1) it can be used in practice (even, potentially, in the near-term); (2) itâs closely connected to basic questions about correlations in quantum systems; (3) it naturally generalizes various landmark results in classical learning theory. But beyond this, I think this field is natural and important, given the philosophical perspectives informing the development of quantum computing. These perspectives are famous in physics (see More is different and The theory of everything) but I personally only learned about them recently, and they strongly inform my thinking about the topic now.
The biggest successes in physics are in reducing the phenomena of everyday objects down to fundamental laws governing their constituent particles. To understand the behavior of a system, we can isolate a small piece, run experiments on the small-scale, and extrapolate the findings out to the large-scale behavior. The example of interest to us is non-relativistic quantum mechanics, which is a theory of most things (notable exceptions being light and gravity). Thereâs plenty of things that this theory explains, and by our current understanding, a few equations is enough to explain them fully.
But this âreductionistâ perspective is insufficient: with quantum mechanics, we understand the fundamental particles, and yet, when you put many of these particles together to form a material, a chemical reaction, or a person, we lose understanding of their behavior. The issue here is that, extrapolating the behavior of particles to systems of many particles can be very hard. This is especially true for quantum mechanics, because it has an inherent exponentiality to its mathematical formalism.
With quantum computers, we want to build a bridge between the two. One barrier (perhaps the barrier) to understanding large quantum systems is computational intractability: simulating them from the laws of quantum mechanics is intractable. Using quantum computers, simulation can be done efficiently, and we can actually use our small-scale understanding for large systems.
Quantum learning poses the same philosophical question from another perspective. If we have a large system that weâd like to understand, experiments to characterize it can become intractable, even though we know how to run experiments to characterize small-scale systems. There are practical reasons why big quantum systems are hard to wrangle, but even supposing we had perfect quantum control over our systems, extracting information about these systems could conceivably be inherently intractable, for the same reason that simulation is hard. Quantum learning asks whether we can relate the large-scale and the small-scale for experiments, by making these experiments computationally efficient.
That some kind of quantum learning works is paramount to the success of quantum computing as a program: after all, what is the point of working with large quantum systems if we do not have ways to extract insights from them? I do think thereâs room to disagree on whether the current directions are the right ones towards the goal of, say, making scientific progress. Nevertheless, I hope the questions which Iâm suggesting to you now are natural enough to stay true to the spirit of the program, while still producing ideas that can be specialized to practically relevant settings.
1. Learning ground states in polynomial time
A striking phenomenon in learning theory is that, even for âhardâ mathematical objects, learning can be a surprisingly easy task. Our dream is that quantum computers can help scientists who work with quantum systems, and the main computational task that such scientists struggle with is simulating ground states. For a Hamiltonian \(H\), its ground state is its largest eigenvector \(\ket{\Psi}\). Simulation is known to be hard in the worst case, even on a quantum computer, so if we want efficient algorithms, we must find a way to weave through these computational barriers to find practical advantage.
The dual task to simulating ground states is learning ground states. Unlike for simulation, for learning, there are no barriers: we can hope for algorithms which always just work.
Question. Given copies of a ground state \(\ket{\Psi}\) of a geometrically local, \(n\)-qubit Hamiltonian \(H\), can we output a geometrically local Hamiltonian \(\widehat{H}\) whose ground state is close to \(\ket{\Psi}\), in polynomial time? (For this question, you might need the additional assumption that \(H\) is gapped.)
Many Hamiltonians can have the same ground state, so the problems is phrased to allow for this non-uniqueness. The classical version of this problem is trivial (given a \(b \in \{0,1\}^n\) which is the best assignment for an unknown CSP, find a CSP for which it is the best solution), but in the quantum world, with non-commuting terms, things get much harder.
Whatâs known? Getting polynomial sample complexity is straightforward: see this paper of Yu and Wei, for example, though it also follows from generic shadow tomography. These algorithms have exponential time complexity, though.
There are also heuristic algorithms in the physics literature, under the keyword of parent Hamiltonians. These work for simple classes of Hamiltonians, like commuting and (I think even) frustration-free, but not general gapped Hamiltonians. We (Ainesh Bakshi, Allen Liu, Ankur Moitra, and I) had ideas for this general case but none gave a polynomial-time algorithm. This question seems connected to the question of area laws, but my bet is that this question doesnât require making progress on that topic.
2. Does amplitude amplification require inverses?
One of the first quantum algorithms you learn is Groverâs algorithm, along with the closely-related algorithms of amplitude amplification and estimation. These give a quadratic improvement for various kinds of search and estimation tasks.
For example, to estimate the size of an entry of a unitary matrix, \(\left|\braket{0|U|0}\right|^2\), to \(\varepsilon\) error with probability \(0.99\), the naive approach of:
- Initialize to \(\ket{0}\);
- Apply \(U\);
- Measure in the computational basis;
- Repeat.
gives an \(\varepsilon\)-good estimate after \(O(1/\varepsilon^2)\) trials, applying \(U\) that many times. But with amplitude estimation, you only need \(O(1/\varepsilon)\) applications of \(U\) and \(U^\dagger\) to estimate the size of the entry.
However, this improvement requires the ability to apply the inverse, \(U^\dagger\). This is a typical feature of these Grover-ish methods. Itâs often not a big deal, since in many applications \(U\) is a circuit which can be straightforwardly inverted. But what if \(U\) is a unitary that Nature applies, and we canât reverse the arrow of time? Do we need the ability to apply \(U^\dagger\)? Or can we get away with just applications of \(U\)?
Question. Can we prove that amplitude amplification/estimation cannot be done as efficiently when we only have access to \(U\)?
Whatâs known? This question was previously posed in this paper of van Apeldoorn, Cornelissen, GilyeĚn, and Nannicini, but it seems natural enough that itâs probably been posed even earlier. The main challenge here is that standard lower bound techniques are not able to distinguish \(U\) from \(U^\dagger\), and so cannot separate the inverse-ful and inverse-less access models.
Note that, by this paper of Quintino, Dong, Shimbo, Soeda, and Murao, it is possible to convert \(d^3\) uses of \(U\) to one use of \(U^\dagger\), so when \(d\) is small, amplitude amplification/estimation can be done without \(U^\dagger\). However, this conversion from \(U^\dagger\) to \(U\) becomes extremely expensive as \(d\) grows. So, there may still be a separation between the two settings in the regime where \(d\) is sufficiently large.
3. Lower bounds for Hamiltonian learning
The next two open problems deal with Hamiltonian learning from real-time evolutions. The standard setup (see e.g. [HTFS23] and [HKT22] ) is as follows. Let \(H = \sum_{a = 1}^m \lambda_a E_a\) be an \(n\)-qubit \(k\)-local (but not necessarily geometrically local) Hamiltonian, where we are told the \(E_a\)âs but do not know the \(\lambda_a\)âs.
We are given the ability to perform the channel \(\rho \mapsto e^{-iHt} \rho e^{iHt}\) for all \(t > 0\), and we want to learn a specified coefficient \(\lambda_a\) to \(\varepsilon\) error (with success probability \(\geq 0.99\), say). For such a learning algorithm we consider its total time evolution, i.e. the amount of evolution time of \(H\) used over the course of the algorithm.
We know how to solve this problem with \(m/\varepsilon\) total evolution time (note that \(m = \operatorname{poly}(n)\)), and when the Hamiltonian has some underlying geometric locality, we can exploit that to get an algorithm with mild-to-no dependence on the system size, something like \(\operatorname{log}(n)/\varepsilon\).
It makes physical sense that assuming geometric locality would effectively remove the dependence on system size. If we think about evolving with respect to a local Hamiltonian, at infinitesimal times, the effect of the particular \(\lambda_aE_a\) term weâre interested only affects the support of \(E_a\), so most of the \(n\) qubits play no role. But at infinitesimal times, we cannot learn the coefficient, since the effect size is similarly infinitesimal. On the other hand, at larger times, the effect of \(\lambda_a E_a\) is large, but we expect this effect to be âscrambledâ across all \(n\) qubits. So, to extract out this information weâd need to pay a system size-type cost to âunscrambleâ it. If the Hamiltonian is geometrically local, then the âscramblingâ only happens in a ball around the support of \(E_a\), and so to learn a coefficient you only have to understand that ball, which is independent of system size.
This intuitive argument is not easy to formalize. So, we can ask, is the cost of Hamiltonian learning actually intrinsically linked to geometric locality? I would find such a result interesting, because real-time evolutions are a very strong form of access to \(H\); itâs not obvious to me that you really need geometric locality with such a strong form of access. The simplest version of this question is as follows:
Question. Does learning local (but not necessarily geometrically local) Hamiltonians require a total time evolution of \(\operatorname{poly}(n)\)?
Whatâs known? This question has been posed previously here and here. We know very little (embarrassingly little, imo) about lower bounds for Hamiltonian learning. The best lower bound is \(1/\varepsilon\), which follows from a simple âhybridâ argument: perturbing a coefficient by \(\varepsilon\) only affects the algorithm by \(O(\varepsilon T)\) where \(T\) is the total evolution time, so \(T\) has to be large enough to detect this difference.
The task of proving stronger lower bounds has been tackled by papers of Huang, Tong, Fang, and Su and Ma, Flammia, Preskill, and Tong. Neither give improved lower bounds on the problem stated here.
4. Learning Hamiltonians from long time evolutions
This problem also has to do with Hamiltonian learning from real-time evolution; see the previous open problem for the setup.
The literature on this question has gone in two different directions: on the one hand, you can ask for algorithms giving stronger guarantees; on the other, you can ask for algorithms with weaker assumptions on, or weaker access to, the input. This open question is in the latter category: itâs an interesting setting where we have less control in our access model and where, to my knowledge, no technique in the literature comes close to working.
Question. Let \(H\) be a geometrically local Hamiltonian, and suppose we are only given the ability to perform the channel \(\rho \mapsto e^{-iHt} \rho e^{iHt}\) for all \(t > 100\). Can we still do Hamiltonian learning with a total time evolution of \(\operatorname{poly}(n/\varepsilon)\)? How about \(O(1/\varepsilon)\)?
As discussed in the previous open problem, thereâs a sense in which long time evolutions âscrambleâ the information we wish to learn. Mathematically, this translates to the relevant objects becoming more difficult to handle: local observables become non-local, power series converge slower, and sometimes they donât converge at all. Thereâs an additional technical challenge related to identifiability: for a sufficiently large constant \(t\), \(e^{-iHt} = e^{-iGt}\) does not imply that \(H = G\).
Whatâs known? We (Ainesh Bakshi, Allen Liu, Ankur Moitra, and I) posed this question in our paper, but I donât think weâre the first to pose this question. In that paper, we showed that Hamiltonian learning with total evolution time \(1/\varepsilon\) and time resolution \(\Theta(1)\) is possible, but the \(\Theta(1)\) is a small constant.
Glossary
A Hamiltonian on \(n\) qubits is a Hermitian matrix \(H \in \mathbb{C}^{2^n \times 2^n}\). We typically imagine this matrix being specified as a linear combination of terms \(E_a \in \mathbb{C}^{2^n \times 2^n}\) with coefficients \(\lambda_a \in \mathbb{R}\). In other words,
\[H = \sum_{a=1}^m \lambda_a E_a.\]We call a Hamiltonian \(k\)-local if every \(E_a\) is supported on at most \(k\) qubits. Throughout, we additionally impose the constraint that \(E_a\) is a tensor product of Pauli matrices, like \(\sigma_X \otimes \sigma_Z \otimes I \otimes \dots \otimes I\). This can be done without loss of generality (though note that this blows up the number of terms for non-local \(H\)), and itâs important to specify some kind of basis like this so that we can make a sensible definition for what âlearningâ the Hamiltonian means.
For normalization, we also assume that \(-1 \leq \lambda_a \leq 1\) for all \(a = 1,\dots,m\).
We call our Hamiltonian geometrically local if there is some additional âgeometricâ structure to the terms that appear; the strongest version of this assumption is that we imagine the \(n\) qubits are nodes on a lattice (represented by a graph \(G = ([n], E)\)); and every term \(E_a\) has a support \(\operatorname{supp}(E_a)\) with diameter at most \(\Delta\).
A Hamiltonian is gapped if its largest eigenvalue and second largest eigenvalue differ by at least (say) a constant.
For these open problems, we imagine that these parameters \(k\) (and \(\Delta\) if we assume geometric locality) are constant, say 2.