Some settings supporting efficient state preparation

\[\gdef\BB#1{\mathbb{#1}} \gdef\eps\varepsilon \gdef\ket#1{|#1\rangle} \gdef\bra#1{\langle#1|}\]

I wrote most of this list to procrastinate on the flight back from TQC (which was great!). So, for my own reference: here’s some settings where efficient state preparation / data loading is possible, and classical versions of these protocols. Notes:

There might be errors, especially in details of the quantum protocols, and some of the algorithms may be suboptimal (note the streaming setting, in particular). Let me know if you notice either of these.
Some relevant complexity research here is in QSampling (Section 4).
All these runtimes should have an extra \(O(\log n)\) factor, since we assume that indices and entries take \(\log n\) bits/qubits to specify. However, I’m going to follow the convention from classical computing and ignore these factors, hopefully with little resulting confusion.

For all that follows, we are given \(v \in \mathbb{C}^n\) in some way and want to output

for the quantum case, a copy of the state \(\ket{v} = \sum_{i=1}^n \frac{v_i}{\|v\|} \ket{i}\), and
for the classical case, the pair \((i,v_i)\) output with probability \(\frac{\vert v_i\vert^2}{\|v\|^2}\).

You could think about this as strong quantum simulation of state preparation protocols.

type	sparse	uniform	integrable	QRAM	streamed
quantum	\(O(s)\)	\(O(C\log\frac1\delta)\)	\(O(I \log n)\)	\(O(\log n)\) depth	\(O(1)\) space with 2 passes
classical	\(O(s)\)	\(O(C^2\log\frac1\delta)\)	\(O(I\log n)\)	\(O(\log n)\)	\(O(1)\) space with 1 pass

Recall that if we want to prepare an arbitrary quantum state, we need at least \(\Omega(\sqrt{n})\) time by search lower bounds, so for some settings of the above constants, these protocols are exponentially faster than the naive strategy. Further recall that state preparation and sampling both have easy protocols running in \(O(n)\) time.

\(v\) is sparse

We assume that \(v\) has at most \(s\) nonzero entries and we can access a list of the nonzero entries \(((i_1,v_{i_1}),(i_2,v_{i_2}),\ldots,(i_s,v_{i_s}))\). Thus, we have the oracle \(a \to (i_a, v_{i_a})\).

We can prepare the quantum state and classical sample by preparing the vector \(v' \in \BB{C}^s\) where \(v_a' = v_{i_a}\), and then using the oracle to swap out the index \(a\) with \(i_a\). This gives \(O(s)\) classical and quantum time.

\(v\) is close-to-uniform

We assume that \(\max\vert v_i\vert \leq C\frac{\|v\|}{\sqrt{n}}\) and we know \(C, \|v\|\). Notice that we don’t give a lower bound on the size of entries, but we can’t have too many small entries, since this would lower the norm. Also notice that \(C \geq 1\).

Quantumly, given the typical oracle \(\ket{i}\ket{0} \to \ket{i}\ket{v_i}\) we can prepare the state

\[\frac{1}{\sqrt{n}}\sum_{i=1}^n \ket{i}\Big(\frac{v_i\sqrt{n}}{\|v\|C}\ket{0} + \sqrt{1-\frac{\vert v_i\vert ^2n}{\|v\|^2C^2}}\ket{1}\Big).\]

Measuring the ancilla and post-selecting on 0 gives \(\ket{v}\). This happens with probability \(\frac{1}{C^2}\), and with amplitude amplification this means we can get a copy of the state with probability \(\geq 1-\delta\) in \(O(C\log\frac1\delta)\) time.

Classically, we perform rejection sampling from the uniform distribution: pick an index uniformly at random, and keep it with probability \(\frac{v_i^2n}{\|v\|^2C^2}\); otherwise, restart. This outputs the correct distribution and gives a sample in \(O(C^2\log\frac1\delta)\) time.

\(v\) is efficiently integrable

We assume that, given \(1 \leq a \leq b \leq n\), I can compute \(\sqrt{\sum_{i=a}^b |v_i|^2}\) in \(O(I)\) time. This assumption and the resulting quantum preparation routine comes from Grover-Rudolph.

The quantum algorithm uses one core subroutine: adding an extra qubit, sending \(\ket{v^{(k)}} \to \ket{v^{(k+1)}}\), where

\[\ket{v^{(k)}} := \sum_{b \in \{0,1\}^k} \ket{b}\sqrt{\sum_{i=b\cdot 0^{n-k}}^{b\cdot 1^{n-k}} |v_i|^2}\]

All that’s necessary is to apply it \(O(\log n)\) times and add the phase at the end. I haven’t worked it out, but I think you can run the subroutine efficiently using three calls to the integration oracle, giving \(O(I\log n)\) time.

Classically, we can do essentially the same thing: the integration oracle means that we can compute marginal probabilities; that is,

\[\Pr_{s \sim v}[s\text{'s bit representation starts with } b] = \sum_{i=b\cdot 0^{n-k}}^{b\cdot 1^{n-k}} |v_i|^2\]

Thus, we can sample from the distribution on the first bit, then sample from the distribution on the second bit conditioned on our value of the first bit, and so on. This also gives \(O(I\log n)\) time.

\(v\) is stored in a dynamic data structure

We assume that our vector can be stored in a data structure that supports efficient updating of entries. Namely, we use the standard binary search tree data structure (see, for example, Section 2.2.2 of Prakash’s thesis). This is a simple data structure with many nice properties, including \(O(\log n)\) time updates. If you want to prepare many states corresponding to similar vectors, this is a good option.

There’s not much more to say, since the protocol is the same as the integrability protocol. The only difference is that, instead of assuming that we can compute interval sums efficiently, we instead precompute and store all of the integration oracle calls we need for the state preparation procedure in a data structure.

The classical runtime is \(O(\log n)\), and the quantum circuit takes \(O(n)\) gates but only \(O(\log n)\) depth. The quantum algorithm is larger because here, we need to query a linear number of memory cells, as opposed to the integrabilility assumption, where we only needed to run the integration oracle in superposition.

While it may seem that the classical algorithm wins definitively here, the small depth leaves potential for this protocol to run in \(O(\log n)\) time in practice, matching the classical algorithm.

\(v\) is streamed

We assume that we can receive a stream of the entries of \(v\) in order; we wish to produce a state/sample using as little space as possible.

Classically, we can do this with reservoir sampling. The idea is that we maintain a sample \((s, v_s)\) from all of the entries we’ve seen before, along with their squared norm \(\lambda = \sum_{i=1}^k \vert v_i\vert^2\). Then, when we receive a new entry \(v_{k+1}\), we swap our sample to \((k+1,v_{k+1})\) with probability \(\vert v_{k+1}\vert^2/(\lambda + \vert v_{k+1}\vert^2)\) and update our \(\lambda\) to \(\lambda + \vert v_{k+1}\vert^2\). After we go through all of \(v\)’s entries, we get a sample only using \(O(1)\) space. (This is a particularly nice algorithm for sampling from a vector, since it has good locality and can be generalized to get \(O(k)\) samples in \(O(k)\) space and one pass.)

Quantumly, I only know how to prepare a state in one pass with sublinear space if the norm is known. If you know \(\|v\|\), then you can prepare \(\ket{n}\), and as entries come in, rotate to get \(\frac{v_1}{\|v\|}\ket{1} + \sqrt{1-\frac{|v_1|^2}{\|v\|^2}}\ket{n}\), then \(\frac{v_1}{\|v\|}\ket{1} + \frac{v_2}{\|v\|}\ket{2} + \sqrt{1-\frac{|v_1|^2+|v_2|^2}{\|v\|^2}}\ket{n}\), and so on. This uses only \(O(\log n)\) qubits, which I notate here as \(O(1)\) space.

You can relax this assumption to just having an estimate \(\lambda\) of \(\|v\|\) such that \(\frac{1}{\text{poly}(n)} \leq \lambda/\|v\| \leq \text{poly}(n)\). Finally, if you like, you can remove the assumption that you know the norm just by requiring two passes instead of one; in the first pass, compute the norm, and in the second pass, prepare the state. But it’d be nice to remove the assumption entirely.

So, is it possible to prepare a quantum state corresponding to a generic \(v \in \BB{C}^n\), given only one pass through it? Thanks to Chunhao Wang and Nai-Hui Chia for telling me about this problem.