Ewin TangEwin's website
https://www.ewintang.com/
Some settings supporting efficient state preparation<script type="math/tex; mode=display">\gdef\BB#1{\mathbb{#1}}
\gdef\eps\varepsilon
\gdef\ket#1{|#1\rangle}
\gdef\bra#1{\langle#1|}</script>
<p>I wrote most of this list to procrastinate on the flight back from TQC (which was great!).
So, for my own reference: here’s some settings where efficient state preparation / data loading is possible, and classical versions of these protocols. Notes:</p>
<ul>
<li>There might be errors, especially in details of the quantum protocols, and some of the algorithms may be suboptimal (note the streaming setting, in particular). Let me know if you notice either of these.</li>
<li>Some relevant complexity research here is in <a href="https://arxiv.org/abs/1607.05256">QSampling</a> (Section 4).</li>
<li>All these runtimes should have an extra <script type="math/tex">O(\log n)</script> factor, since we assume that indices and entries take <script type="math/tex">\log n</script> bits/qubits to specify.
However, I’m going to follow the convention from classical computing and ignore these factors, hopefully with little resulting confusion.</li>
</ul>
<p>For all that follows, we are given <script type="math/tex">v \in \mathbb{C}^n</script> in some way and want to output</p>
<ol>
<li>for the quantum case, a copy of the state <script type="math/tex">\ket{v} = \sum_{i=1}^n \frac{v_i}{\|v\|} \ket{i}</script>, and</li>
<li>for the classical case, the pair <script type="math/tex">(i,v_i)</script> output with probability <script type="math/tex">\frac{\vert v_i\vert^2}{\|v\|^2}</script>.</li>
</ol>
<p>You could think about this as strong quantum simulation of state preparation protocols.</p>
<table>
<thead>
<tr>
<th style="text-align: center">type</th>
<th style="text-align: center"><a href="#v-is-sparse">sparse</a></th>
<th style="text-align: center"><a href="#v-is-close-to-uniform">uniform</a></th>
<th style="text-align: center"><a href="#v-is-efficiently-integrable">integrable</a></th>
<th style="text-align: center"><a href="#v-is-stored-in-a-dynamic-data-structure">QRAM</a></th>
<th style="text-align: center"><a href="#v-is-streamed">streamed</a></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center">quantum</td>
<td style="text-align: center"><script type="math/tex">O(s)</script></td>
<td style="text-align: center"><script type="math/tex">O(C\log\frac1\delta)</script></td>
<td style="text-align: center"><script type="math/tex">O(I \log n)</script></td>
<td style="text-align: center"><script type="math/tex">O(\log n)</script> depth</td>
<td style="text-align: center"><script type="math/tex">O(1)</script> space with 2 passes</td>
</tr>
<tr>
<td style="text-align: center">classical</td>
<td style="text-align: center"><script type="math/tex">O(s)</script></td>
<td style="text-align: center"><script type="math/tex">O(C^2\log\frac1\delta)</script></td>
<td style="text-align: center"><script type="math/tex">O(I\log n)</script></td>
<td style="text-align: center"><script type="math/tex">O(\log n)</script></td>
<td style="text-align: center"><script type="math/tex">O(1)</script> space with 1 pass</td>
</tr>
</tbody>
</table>
<p>Recall that if we want to prepare an arbitrary quantum state, we need at least <script type="math/tex">\Omega(\sqrt{n})</script> time by search lower bounds, so for some settings of the above constants, these protocols are exponentially faster than the naive strategy.
Further recall that state preparation and sampling both have easy protocols running in <script type="math/tex">O(n)</script> time.</p>
<h2 id="v-is-sparse"><script type="math/tex">v</script> is sparse</h2>
<p>We assume that <script type="math/tex">v</script> has at most <script type="math/tex">s</script> nonzero entries and we can access a list of the nonzero entries <script type="math/tex">((i_1,v_{i_1}),(i_2,v_{i_2}),\ldots,(i_s,v_{i_s}))</script>.
Thus, we have the oracle <script type="math/tex">a \to (i_a, v_{i_a})</script>.</p>
<p>We can prepare the quantum state and classical sample by preparing the vector <script type="math/tex">v' \in \BB{C}^s</script> where <script type="math/tex">v_a' = v_{i_a}</script>, and then using the oracle to swap out the index <script type="math/tex">a</script> with <script type="math/tex">i_a</script>.
This gives <script type="math/tex">O(s)</script> classical and quantum time.</p>
<h2 id="v-is-close-to-uniform"><script type="math/tex">v</script> is close-to-uniform</h2>
<p>We assume that <script type="math/tex">\max\vert v_i\vert \leq C\frac{\|v\|}{\sqrt{n}}</script> and we know <script type="math/tex">C, \|v\|</script>.
Notice that we don’t give a lower bound on the size of entries, but we can’t have too many small entries, since this would lower the norm.
Also notice that <script type="math/tex">C \geq 1</script>.</p>
<p>Quantumly, given the typical oracle <script type="math/tex">\ket{i}\ket{0} \to \ket{i}\ket{v_i}</script> we can prepare the state</p>
<script type="math/tex; mode=display">\frac{1}{\sqrt{n}}\sum_{i=1}^n \ket{i}\Big(\frac{v_i\sqrt{n}}{\|v\|C}\ket{0} + \sqrt{1-\frac{\vert v_i\vert ^2n}{\|v\|^2C^2}}\ket{1}\Big).</script>
<p>Measuring the ancilla and post-selecting on 0 gives <script type="math/tex">\ket{v}</script>.
This happens with probability <script type="math/tex">\frac{1}{C^2}</script>, and with amplitude amplification this means we can get a copy of the state with probability <script type="math/tex">\geq 1-\delta</script> in <script type="math/tex">O(C\log\frac1\delta)</script> time.</p>
<p>Classically, we perform rejection sampling from the uniform distribution: pick an index uniformly at random, and keep it with probability <script type="math/tex">\frac{v_i^2n}{\|v\|^2C^2}</script>; otherwise, restart.
This outputs the correct distribution and gives a sample in <script type="math/tex">O(C^2\log\frac1\delta)</script> time.</p>
<h2 id="v-is-efficiently-integrable"><script type="math/tex">v</script> is efficiently integrable</h2>
<p>We assume that, given <script type="math/tex">1 \leq a \leq b \leq n</script>, I can compute <script type="math/tex">\sqrt{\sum_{i=a}^b |v_i|^2}</script> in <script type="math/tex">O(I)</script> time.
This assumption and the resulting quantum preparation routine comes from <a href="https://arxiv.org/abs/quant-ph/0208112">Grover-Rudolph</a>.</p>
<p>The quantum algorithm uses one core subroutine: adding an extra qubit, sending <script type="math/tex">\ket{v^{(k)}} \to \ket{v^{(k+1)}}</script>, where</p>
<script type="math/tex; mode=display">\ket{v^{(k)}} := \sum_{b \in \{0,1\}^k} \ket{b}\sqrt{\sum_{i=b\cdot 0^{n-k}}^{b\cdot 1^{n-k}} |v_i|^2}</script>
<p>All that’s necessary is to apply it <script type="math/tex">O(\log n)</script> times and add the phase at the end.
I haven’t worked it out, but I think you can run the subroutine efficiently using three calls to the integration oracle, giving <script type="math/tex">O(I\log n)</script> time.</p>
<p>Classically, we can do essentially the same thing: the integration oracle means that we can compute marginal probabilities; that is,</p>
<script type="math/tex; mode=display">\Pr_{s \sim v}[s\text{'s bit representation starts with } b] = \sum_{i=b\cdot 0^{n-k}}^{b\cdot 1^{n-k}} |v_i|^2</script>
<p>Thus, we can sample from the distribution on the first bit, then sample from the distribution on the second bit conditioned on our value of the first bit, and so on.
This also gives <script type="math/tex">O(I\log n)</script> time.</p>
<h2 id="v-is-stored-in-a-dynamic-data-structure"><script type="math/tex">v</script> is stored in a dynamic data structure</h2>
<p>We assume that our vector can be stored in a data structure that supports efficient updating of entries.
Namely, we use the standard binary search tree data structure (see, for example, <a href="https://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014-211.pdf">Section 2.2.2 of Prakash’s thesis</a>).
This is a simple data structure with many nice properties, including <script type="math/tex">O(\log n)</script> time updates.
If you want to prepare many states corresponding to similar vectors, this is a good option.</p>
<p>There’s not much more to say, since the protocol is the same as the integrability protocol.
The only difference is that, instead of assuming that we can compute interval sums efficiently, we instead precompute and store all of the integration oracle calls we need for the state preparation procedure in a data structure.</p>
<p>The classical runtime is <script type="math/tex">O(\log n)</script>, and the <a href="https://arxiv.org/abs/1812.00954">quantum circuit</a> takes <script type="math/tex">O(n)</script> gates but only <script type="math/tex">O(\log n)</script> depth.
The quantum algorithm is larger because here, we need to query a linear number of memory cells, as opposed to the integrabilility assumption, where we only needed to run the integration oracle in superposition.</p>
<p>While it may seem that the classical algorithm wins definitively here, the small depth leaves potential for this protocol to run in <script type="math/tex">O(\log n)</script> time in practice, matching the classical algorithm.</p>
<h2 id="v-is-streamed"><script type="math/tex">v</script> is streamed</h2>
<p>We assume that we can receive a stream of the entries of <script type="math/tex">v</script> in order; we wish to produce a state/sample using as little space as possible.</p>
<p>Classically, we can do this with <a href="https://en.wikipedia.org/wiki/Reservoir_sampling">reservoir sampling</a>.
The idea is that we maintain a sample <script type="math/tex">(s, v_s)</script> from all of the entries we’ve seen before, along with their squared norm <script type="math/tex">\lambda = \sum_{i=1}^k \vert v_i\vert^2</script>.
Then, when we receive a new entry <script type="math/tex">v_{k+1}</script>, we swap our sample to <script type="math/tex">(k+1,v_{k+1})</script> with probability <script type="math/tex">\vert v_{k+1}\vert^2/(\lambda + \vert v_{k+1}\vert^2)</script> and update our <script type="math/tex">\lambda</script> to <script type="math/tex">\lambda + \vert v_{k+1}\vert^2</script>.
After we go through all of <script type="math/tex">v</script>’s entries, we get a sample only using <script type="math/tex">O(1)</script> space.
(This is a particularly nice algorithm for sampling from a vector, since it has good locality and can be generalized to get <script type="math/tex">O(k)</script> samples in <script type="math/tex">O(k)</script> space and one pass.)</p>
<p>Quantumly, I only know how to prepare a state in one pass with sublinear space if the norm is known.
If you know <script type="math/tex">\|v\|</script>, then you can prepare <script type="math/tex">\ket{n}</script>, and as entries come in, rotate to get <script type="math/tex">\frac{v_1}{\|v\|}\ket{1} + \sqrt{1-\frac{|v_1|^2}{\|v\|^2}}\ket{n}</script>, then <script type="math/tex">\frac{v_1}{\|v\|}\ket{1} + \frac{v_2}{\|v\|}\ket{2} + \sqrt{1-\frac{|v_1|^2+|v_2|^2}{\|v\|^2}}\ket{n}</script>, and so on.
This uses only <script type="math/tex">O(\log n)</script> qubits, which I notate here as <script type="math/tex">O(1)</script> space.</p>
<p>You can relax this assumption to just having an estimate <script type="math/tex">\lambda</script> of <script type="math/tex">\|v\|</script> such that <script type="math/tex">\frac{1}{\text{poly}(n)} \leq \lambda/\|v\| \leq \text{poly}(n)</script>.
Finally, if you like, you can remove the assumption that you know the norm just by requiring two passes instead of one; in the first pass, compute the norm, and in the second pass, prepare the state.
But it’d be nice to remove the assumption entirely.</p>
<p>So, <strong>is it possible to prepare a quantum state corresponding to a generic <script type="math/tex">v \in \BB{C}^n</script>, given only one pass through it?</strong> Thanks to <a href="https://www.chunhaowang.com/">Chunhao Wang</a> and <a href="https://www.cs.utexas.edu/~nai/">Nai-Hui Chia</a> for telling me about this problem.</p>
Thu, 13 Jun 2019 00:00:00 +0000
https://www.ewintang.com/blog/2019/06/13/some-settings-supporting-efficient-state-preparation/
https://www.ewintang.com/blog/2019/06/13/some-settings-supporting-efficient-state-preparation/An overview of quantum-inspired classical sampling<script type="math/tex; mode=display">\gdef\SC#1{\mathcal{#1}} %katex
\gdef\BB#1{\mathbb{#1}}
\gdef\eps\varepsilon
\gdef\SQ{\operatorname{SQ}}
\gdef\Q{\operatorname{Q}}
\gdef\Tr{\operatorname{Tr}}
\gdef\ket#1{\left|#1\right\rangle}
\gdef\bra#1{\left\langle#1\right|}
\gdef\poly{\operatorname{poly}}
\gdef\polylog{\operatorname{polylog}}
%\newcommand{\SC}[1]{\mathcal{#1}} %mathjax
%\newcommand{\BB}[1]{\mathbb{#1}}
%\newcommand{\eps}{\varepsilon}
%\newcommand{\SQ}{\operatorname{SQ}}
%\newcommand{\Q}{\operatorname{Q}}
%\newcommand{\Tr}{\operatorname{Tr}}
%\newcommand{\ket}[1]{\left|#1\right\rangle}
%\newcommand{\bra}[1]{\left\langle#1\right|}
%\newcommand{\poly}{\operatorname{poly}}
%\newcommand{\polylog}{\operatorname{polylog}}</script>
<p>This is an adaptation of a talk I gave at Microsoft Research in November 2018.</p>
<p>I exposit the <script type="math/tex">\ell^2</script> sampling techniques used in my recommendation systems work and its follow-ups in dequantized machine learning:</p>
<ul>
<li>Tang – <a href="https://arxiv.org/abs/1807.04271"><em>A quantum-inspired algorithm for recommendation systems</em></a></li>
<li>Tang – <a href="https://arxiv.org/abs/1811.00414"><em>Quantum-inspired classical algorithms for principal component analysis and supervised clustering</em></a>;</li>
<li>Gilyén, Lloyd, Tang – <a href="https://arxiv.org/abs/1811.04909"><em>Quantum-inspired low-rank stochastic regression with logarithmic dependence on the dimension</em></a>;</li>
<li>Chia, Lin, Wang – <a href="https://arxiv.org/abs/1811.04852"><em>Quantum-inspired sublinear classical algorithms for solving low-rank linear systems</em></a>.</li>
</ul>
<p>The core ideas used are super simple.
This goal of this blog post is to break down these ideas into intuition relevant for quantum researchers and create more understanding of this machine learning paradigm.</p>
<p>Notation is defined in the <a href="#glossary">Glossary</a>.</p>
<p>The intended audience is researchers comfortable with probability and linear algebra (SVD, in particular).
Basic quantum knowledge helps with intuition, but is not essential: everything from <a href="#the-model">The model</a> onward is purely classical.
The appendix is optional and explains the dequantized techniques in more detail.</p>
<h2 id="an-introduction-to-dequantization">An introduction to dequantization</h2>
<h3 id="motivation">Motivation</h3>
<p>The best, most sought-after quantum algorithms are those that take in raw, classical input and give some classical output.
For example, Shor’s algorithm for factoring takes this form.
These <em>classical-to-classical</em> algorithms (a term I invented for this post) have the best chance to be efficiently implemented in practice: all you need is a scalable quantum computer. (It’s just that easy!)</p>
<p>Nevertheless, many quantum algorithms aren’t so nice.
Most well-known QML algorithms convert input quantum states to a desired output state or value.
Thus, they do not provide a routine to get necessary copies of these input states (a <em>state preparation</em> routine) and a strategy to extract information from an output state.
Both are essential to making the algorithm useful.</p>
<p>An example of an algorithm that is not classical-to-classical is the <em>swap test</em>.
If we have many copies of the quantum states <script type="math/tex">\ket{a},\ket{b} \in \BB{C}^n</script>, then the swap test <script type="math/tex">\SC{S}</script> estimates their inner product in time polylogarithmic in dimension.
While this routine seems much faster than naively computing <script type="math/tex">\sum_{i=1}^n \bar{a}_ib_i</script> classically, we can only run this algorithm if we know how to prepare the states <script type="math/tex">\ket{a}</script> and <script type="math/tex">\ket{b}</script>.
It may well be the case that state preparation is too expensive for input vectors, making the quantum algorithm as slow as the classical algorithm.
This illustrates the format and failings of most QML algorithms.</p>
<p>You might then ask: can we fill in the missing routines in QML algorithms to get a classical-to-classical algorithm that’s provably fast and useful?
This is an open research problem: see Scott Aaronson’s piece on QML<sup id="fnref:aaronson15"><a href="#fn:aaronson15" class="footnote">1</a></sup>.
We have a variety of partial results towards the affirmative, but as far as I know, they don’t answer the question unless you’re loose with your definitions of at least one of “classical”, “provably fast”, or “useful”.
So let’s settle for a simpler question.</p>
<p><strong>How can we compare the speed of quantum algorithms with quantum input and quantum output to classical algorithms with classical input and classical output?</strong>
Quantum machine learning algorithms can be exponentially faster than the best standard classical algorithms for similar tasks, but this comparison is unfair because the quantum algorithms get outside help through input state preparation.
We want a classical model that helps its algorithms stand a chance against quantum algorithms, while still ensuring that they can be run in nearly all circumstances one would run the quantum algorithm.
The answer I propose: <strong>compare quantum algorithms with quantum state preparation to classical algorithms with <em>sample and query access</em> to input.</strong></p>
<h3 id="the-model">The model</h3>
<p>Before we proceed with definitions, we’ll establish some conventions.
First, we generally consider our input as being some vector in <script type="math/tex">\BB{C}^n</script> or <script type="math/tex">\BB{R}^n</script>, subject to an access model to be described.
Second, we’ll only concern ourselves with an algorithm’s <em>query complexity</em>, the number of accesses to the input.
Our algorithms will have query complexity independent of input dimensions and polynomial in other parameters.
If we assume that each access costs (say) <script type="math/tex">O(1)</script> or <script type="math/tex">O(\log n)</script>, the time complexity is still polylogarithmic in input dimension and at most polynomially worse in other parameters.</p>
<p>Now, we define query access to input; we can get query access simply by having the input in RAM.</p>
<p><strong>Definition.</strong>
We have <em>query access</em> to <script type="math/tex">x \in \BB{C}^n</script> (denoted <script type="math/tex">\Q(x)</script>) if, given <script type="math/tex">i \in [n]</script>, we can efficiently compute <script type="math/tex">x_i</script>.</p>
<p>If we have <script type="math/tex">x</script> stored normally as an array in our classical computer’s memory, we have <script type="math/tex">\Q(x)</script> because finding the <script type="math/tex">i</script>th entry of <script type="math/tex">x</script> can be done with the code <code class="language-plaintext highlighter-rouge">x[i]</code>.
This notion of access can represent more than just memory: we can also have <script type="math/tex">\Q(x)</script> if <script type="math/tex">x</script> is <em>implicitly</em> described.
For example, consider <script type="math/tex">x</script> the vector of squares: <script type="math/tex">x_i = i^2</script> for all <script type="math/tex">i</script>.
We can have access to <script type="math/tex">x</script> without writing <script type="math/tex">x</script> in memory.
This will be important for the algorithms to come.</p>
<p><strong>Definition.</strong>
We have <em>sample and query access</em> to <script type="math/tex">x \in \BB{C}^n</script> (denoted <script type="math/tex">\SQ(x)</script>) if we have query access to <script type="math/tex">x</script>; can produce independent random samples <script type="math/tex">i \in [n]</script> where we sample <script type="math/tex">i</script> with probability <script type="math/tex">|x_i|^2/\|x\|^2</script>; and can query for <script type="math/tex">\|x\|</script>.</p>
<p>Sampling and query access to <script type="math/tex">x</script> will be our classical analogue to assuming quantum state preparation of copies of <script type="math/tex">\ket{x}</script>.
This should make some intuitive sense: our classical analogue <script type="math/tex">\SQ(x)</script> has the standard assumption of query access to input, along with samples, which are essentially measurements of <script type="math/tex">\ket{x}</script> in the computational basis.
Knowledge of <script type="math/tex">\|x\|</script> is for normalization issues, and is often assumed for quantum algorithms as well (though for both classical and quantum algorithms, often approximate knowledge suffices).</p>
<p><strong>Example.</strong>
Like query access, we can get efficient sample and query access from an explicit memory structure.
To get <script type="math/tex">\SQ(x)</script> for a bit vector <script type="math/tex">x \in \{0,1\}^n</script>, store the number of nonzero entries <script type="math/tex">z</script> and a sorted array of the 1-indices <script type="math/tex">D</script>.
For example, we could store <script type="math/tex">x = [1\;1\;0\;0\;1\;0\;0\;0]</script> as</p>
<script type="math/tex; mode=display">z, D = 3,\{1,2,5\}</script>
<p>Then we can find <script type="math/tex">x_i</script> by checking if <script type="math/tex">i \in D</script>, we can sample from <script type="math/tex">x</script> by picking an index from <script type="math/tex">D</script> uniformly at random, and we know <script type="math/tex">\|x\|</script>, since it’s just <script type="math/tex">\sqrt{z}</script>.
This generalizes to an efficient <script type="math/tex">O(\log n)</script> binary search tree data structure for <script type="math/tex">\SQ(x)</script> for any <script type="math/tex">x \in \BB{C}^n</script>.</p>
<p>We can also define sample and query access to matrices as just sample and query access to vectors “in” the matrix.</p>
<p><strong>Definition.</strong>
For <script type="math/tex">A \in \BB{C}^{m\times n}</script>, <script type="math/tex">\SQ(A)</script> is defined as <script type="math/tex">\SQ(A_i)</script> for <script type="math/tex">A_i</script> the rows of <script type="math/tex">A</script>, along with <script type="math/tex">\SQ(\tilde{A})</script> for <script type="math/tex">\tilde{A}</script> the vector of row norms (so <script type="math/tex">\tilde{A}_i = \|A_i\|</script>).</p>
<p>By replacing quantum states with these classical analogues, we form a model based on sample and query access which we codify with the informal definition of “dequantization”.</p>
<p><strong>Definition.</strong>
Let <script type="math/tex">\SC{A}</script> be a quantum algorithm with input <script type="math/tex">\ket{\phi_1},\ldots,\ket{\phi_C}</script> and output either a state <script type="math/tex">\ket{\psi}</script> or a value <script type="math/tex">\lambda</script>.
We say we <em>dequantize</em> <script type="math/tex">\SC{A}</script> if we describe a classical algorithm that, given <script type="math/tex">\SQ(\phi_1),\ldots,\SQ(\phi_C)</script>, can evaluate queries to <script type="math/tex">\SQ(\psi)</script> or output <script type="math/tex">\lambda</script>, with similar guarantees to <script type="math/tex">\SC{A}</script> and query complexity <script type="math/tex">\poly(C)</script>.</p>
<p>That is, given sample and query access to the inputs, we can output sample and query access to a desired vector or a desired value, with at most polynomially larger query complexity.</p>
<p>We justify why this model is a reasonable point of comparison two sections from now, in <a href="#implications">Implications</a>.
Next, though, we will jump into how to build these dequantized protocols.</p>
<h2 id="quantum-for-the-quantum-less">Quantum for the quantum-less</h2>
<p>So far, all dequantized results revolve around three dequantized protocols that we piece together into more useful tasks.
In query complexity independent of <script type="math/tex">m</script> and <script type="math/tex">n</script>, we can perform the following:</p>
<ol>
<li>
<p>(<a href="#1-estimating-inner-products">Inner Product</a>)
For <script type="math/tex">x,y \in \BB{C}^n</script>, given <script type="math/tex">\SQ(x)</script> and <script type="math/tex">\Q(y)</script>, we can estimate <script type="math/tex">\langle x,y\rangle</script> to <script type="math/tex">\|x\|\|y\|\eps</script> error with probability <script type="math/tex">\geq 1-\delta</script> and <script type="math/tex">\text{poly}(\frac1\eps, \log\frac1\delta)</script> queries;</p>
</li>
<li>
<p>(<a href="#2-thin-matrix-vector-product-with-rejection-sampling">Thin Matrix-Vector</a>)
For <script type="math/tex">V \in \BB{C}^{n\times k}, w \in \BB{C}^k</script>, given <script type="math/tex">\SQ(V^\dagger)</script> and <script type="math/tex">\Q(w)</script>, we can simulate <script type="math/tex">\SQ(Vw)</script> with <script type="math/tex">\text{poly}(k)</script> queries;</p>
</li>
<li>
<p>(<a href="#3-low-rank-approximation-briefly">Low-rank Approximation</a>)
For <script type="math/tex">A \in \BB{C}^{m\times n}</script>, given <script type="math/tex">\SQ(A)</script>, a threshold <script type="math/tex">k</script>, and an error parameter <script type="math/tex">\eps</script>, we can output a description of a low-rank approximation of <script type="math/tex">A</script> with <script type="math/tex">\text{poly}(k, \frac{1}{\eps})</script> queries.</p>
<p>Specifically, our output is <script type="math/tex">\SQ(S,\hat{U},\hat{\Sigma})</script> for <script type="math/tex">S \in \BB{C}^{\ell \times n}</script>, <script type="math/tex">\hat{U} \in \BB{C}^{\ell \times k}</script>, and <script type="math/tex">\hat{\Sigma} \in \BB{C}^{k\times k}</script> (<script type="math/tex">\ell = \poly(k,\frac{1}{\eps})</script>), and this implicitly describes the low-rank approximation to <script type="math/tex">A</script>, <script type="math/tex">D := A(S^\dagger\hat{U}\hat{\Sigma}^{-1})(S^\dagger\hat{U}\hat{\Sigma}^{-1})^\dagger</script> (notice rank <script type="math/tex">D \leq k</script>).</p>
<p>This matrix satisfies the following low-rank guarantee with probability <script type="math/tex">\geq 1-\delta</script>: for <script type="math/tex">\sigma := \sqrt{2/k}\|A\|_F</script>, and <script type="math/tex">A_{\sigma} := \sum_{\sigma_i \geq \sigma} \sigma_iu_iv_i^\dagger</script> (using <script type="math/tex">A</script>’s SVD),</p>
<script type="math/tex; mode=display">\|A - D\|_F^2 \leq \|A - A_\sigma\|_F^2 + \eps^2\|A\|_F^2.</script>
<p>This guarantee is non-standard: instead of <script type="math/tex">A_k</script>, we use <script type="math/tex">A_\sigma</script>.
This makes our promise weaker, since it is useless if <script type="math/tex">A</script> has no large singular values.</p>
<p>For intuition, it’s helpful to think of <script type="math/tex">D</script> as <script type="math/tex">A</script> multiplied with a “projector” <script type="math/tex">(S^\dagger\hat{U}\hat{\Sigma}^{-1})(S^\dagger\hat{U}\hat{\Sigma}^{-1})^\dagger</script> that projects the rows of <script type="math/tex">A</script> onto the columns of <script type="math/tex">S^\dagger\hat{U}\hat{\Sigma}^{-1}</script>, where these columns are “singular vectors” with corresponding “singular values” <script type="math/tex">\hat{\sigma}_1,\ldots,\hat{\sigma}_k</script> that are encoded in the diagonal matrix <script type="math/tex">\hat{\Sigma}</script>.
(For those interested, <script type="math/tex">\hat{U}</script> and <script type="math/tex">\hat{\Sigma}</script> are from the SVD of a <em>submatrix</em> of <script type="math/tex">A</script>, hence the evocative notation; see the <a href="#3-low-rank-approximation-briefly">appendix</a> for more details.)</p>
</li>
</ol>
<p>The first two protocols are dequantized swap tests and the third is essentially a dequantized variant of phase estimation seen in quantum recommendation systems<sup id="fnref:kp17"><a href="#fn:kp17" class="footnote">2</a></sup>.</p>
<hr />
<p>Now, we describe how these techniques are used to get the results for recommendation systems, supervised clustering, and low-rank matrix inversion.
We defer the important details of models and error analyses to <a href="#implications">Implications</a>, instead focusing on the algorithms themselves and how they use dequantized protocols.</p>
<h3 id="supervised-clustering">Supervised clustering</h3>
<p>We want to find the distance from a point <script type="math/tex">p \in \BB{R}^n</script> to the centroid (average) of a cluster of points <script type="math/tex">q_1,\ldots,q_{m-1} \in \BB{R}^{n}</script>.
If we assume sample and query access to the data points, computing <script type="math/tex">\|p - \frac{1}{m-1}(q_1 + \cdots + q_{m-1})\|</script> reduces to computing <script type="math/tex">\|Mw\|</script> for</p>
<script type="math/tex; mode=display">% <![CDATA[
M = \begin{bmatrix}
\frac{p}{\|p\|} & \frac{q_1}{\|q_1\|} & \cdots & \frac{q_{m-1}}{\|q_{m-1}\|}
\end{bmatrix}
\qquad
w = \begin{bmatrix} \|p\| \\ \frac{\|q_1\|}{m-1} \\ \vdots \\ \frac{\|q_{m-1}\|}{m-1} \end{bmatrix}. %]]></script>
<p><script type="math/tex">\SQ</script> access to <script type="math/tex">p,q_1,\ldots,q_{m-1}</script> gives <script type="math/tex">\SQ</script> access to <script type="math/tex">M^T</script> and <script type="math/tex">w</script> so the supervised clustering problem reduces to the following:</p>
<p><strong>Problem.</strong>
For <script type="math/tex">M \in \BB{R}^{m\times n}, w \in \BB{R}^n</script>, and <script type="math/tex">\SQ(M^T,w)</script>, approximate <script type="math/tex">(Mw)^T(Mw)</script> to additive <script type="math/tex">\eps</script> error.</p>
<p><strong>Algorithm.</strong>
We can write <script type="math/tex">(Mw)^TMw</script> as the inner product of an order three tensor; through basic tensor arithmetic, it is equal to <script type="math/tex">\langle u, v\rangle</script>, where <script type="math/tex">u,v \in \BB{R}^{m\times n\times n}</script> are</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{aligned} u &= \sum_{i=1}^m\sum_{j=1}^n\sum_{k=1}^n M_{ij}\|M^{(k)}\| e_{i,j,k} \text{ and} \\
v &= \sum_{i=1}^m\sum_{j=1}^n\sum_{k=1}^n \frac{w_jw_kM_{ik}}{\|M^{(k)}\|} e_{i,j,k}. \end{aligned} %]]></script>
<p>Applying the algorithm for inner product (1) gives the desired approximation with <script type="math/tex">O(\|w\|^2\|M\|_F^2\frac{1}{\eps^2} \log\frac{1}{\delta})</script> samples and queries.</p>
<h3 id="recommendation-systems">Recommendation systems</h3>
<p>We want to randomly sample a product <script type="math/tex">j \in [n]</script> that is a good recommendation for a particular user <script type="math/tex">i \in [m]</script>, given incomplete data on user-product preferences.
If we store this data in a matrix <script type="math/tex">A \in \BB{R}^{m\times n}</script> with sampling and query access, in the right model, finding good recommendations reduces to:</p>
<p><strong>Problem.</strong>
For a matrix <script type="math/tex">A \in \BB{R}^{m\times n}</script> along with a row <script type="math/tex">i \in [m]</script>, given <script type="math/tex">\SQ(A)</script>, approximately sample from <script type="math/tex">D_i</script> where <script type="math/tex">D</script> is a sufficiently good low-rank approximation of <script type="math/tex">A</script>.</p>
<p><em>Remark.</em> This task is essentially a variant of PCA, since a low-rank decomposition is dimensionality reduction of the matrix, viewed as a set of row vectors.
This is the “dequantized PCA” I refer to in other work<sup id="fnref:tang18b"><a href="#fn:tang18b" class="footnote">3</a></sup>.</p>
<p><strong>Algorithm.</strong>
<!-- TODO fix the USigma stuff -->
Apply (3) to get <script type="math/tex">\SQ(S,\hat{U},\hat{\Sigma})</script> for a low-rank approximation <script type="math/tex">D = AS^T \hat{U}\hat{\Sigma}^{-1}(\hat{\Sigma}^{-1})^T\hat{U}^T S</script>.
It turns out that this low-rank approximation is good enough to get good recommendations.
So it suffices to sample from <script type="math/tex">D_i = A_iS^TMS</script>, where <script type="math/tex">A_i \in \BB{R}^{1 \times n}, S \in \BB{R}^{\ell \times n}, M = \hat{U}\hat{\Sigma}^{-1}(\hat{\Sigma}^{-1})^T\hat{U}^T \in \BB{R}^{\ell \times \ell}</script> with <script type="math/tex">\ell = \poly(k)</script>.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{bmatrix}
\; \cdots & A_i & \cdots \;
\end{bmatrix} \begin{bmatrix}
& \vdots & \\ & S^T & \\ & \vdots &
\end{bmatrix} \begin{bmatrix}
& & \\ & M & \\ & &
\end{bmatrix} \begin{bmatrix}
& & \\ \; \cdots & S & \cdots \; \\ & &
\end{bmatrix} %]]></script>
<p>Approximate <script type="math/tex">A_iS^T</script> to <script type="math/tex">\ell^2</script> norm using <script type="math/tex">k</script> inner product protocols (1).
Next, compute <script type="math/tex">A_iS^TM</script> with naive matrix-vector multiplication.
Finally, sample from <script type="math/tex">A_iS^T\hat{U}\hat{U}^TS</script>, which is a thin matrix-vector product (2).</p>
<p><em>An aside.</em>
This gives an exponential speedup over previous classical results from 15-20 years ago<sup id="fnref:dkr02"><a href="#fn:dkr02" class="footnote">4</a></sup>.
The story here is quite odd.
From what I can tell, researchers at the time knew the important (read: hard) part of the algorithm, how to compute low-rank approximations fast, but didn’t notice that the resulting knowledge of <script type="math/tex">S</script> and <script type="math/tex">\hat{U}</script> could be used to sample the desired recommendations in sublinear time, which I think is much easier to understand.
This gave me anxiety during research, since I figured there was no way this would have been overlooked.
I’m glad these fears were unfounded; it’s cool that this quantum perspective made this step natural and obvious!</p>
<h3 id="low-rank-matrix-inversion">Low-rank matrix inversion</h3>
<p>The goal here is to mimic a quantum algorithm that can solve systems of equations <script type="math/tex">Ax = b</script> for <script type="math/tex">A</script> low-rank.
The dequantized version of this is:</p>
<p><strong>Problem.</strong>
For a low-rank matrix <script type="math/tex">A \in \BB{R}^{m\times n}</script> and a vector <script type="math/tex">b \in \BB{R}^n</script>, given <script type="math/tex">\SQ(A), \SQ(b)</script>, (approximately) respond to requests for <script type="math/tex">\SQ(A^+b)</script>, where <script type="math/tex">A^+</script> is the pseudoinverse of <script type="math/tex">A</script>.</p>
<p><strong>Algorithm.</strong>
Use the low-rank approximation protocol (3) to get <script type="math/tex">\SQ(S,\hat{U}, \hat{\Sigma})</script>.
From applying the matrix-vector protocol (2), we have <script type="math/tex">\SQ(\hat{V})</script>, where <script type="math/tex">\hat{V} := S^T\hat{U}\hat{\Sigma}^{-1}</script>; with some analysis we can show that the columns of <script type="math/tex">\hat{V}</script> behave like the right singular vectors of <script type="math/tex">A</script>.
Further, <script type="math/tex">\hat{\Sigma}_{ii}</script> behaves like their approximate singular values.
Using this information, we can approximate the vector we want to sample from:</p>
<script type="math/tex; mode=display">A^+b= (A^TA)^+A^Tb \approx \sum_{i=1}^k \frac{1}{\hat{\Sigma}_{ii}^2}\hat{v}_i\hat{v}_i^T A^Tb</script>
<p>We approximate <script type="math/tex">\hat{v}_i^TA^Tb</script> to additive error for all <script type="math/tex">i</script> by noticing that <script type="math/tex">\hat{v}_i^TA^Tb = \Tr(A^Tb\hat{v}_i^T)</script> is an inner product of the order two tensors <script type="math/tex">A^T</script> and <script type="math/tex">b\hat{v}_i^T</script>.
Thus, we can apply (1), since being given <script type="math/tex">\SQ(A)</script> implies <script type="math/tex">\SQ(A^T)</script> for <script type="math/tex">A^T</script> viewed as a long vector.
Finally, using (2), sample from the linear combination using these estimates and <script type="math/tex">\hat{\sigma}_i</script>.</p>
<h2 id="implications">Implications</h2>
<p>We have just described examples of dequantized algorithms for the following problems:</p>
<ul>
<li>Recommendation systems<sup id="fnref:tang18a"><a href="#fn:tang18a" class="footnote">5</a></sup><sup id="fnref:kp17:1"><a href="#fn:kp17" class="footnote">2</a></sup> (this classical algorithm <em>exponentially improves</em> on the previous best!)</li>
<li>PCA<sup id="fnref:tang18b:1"><a href="#fn:tang18b" class="footnote">3</a></sup><sup id="fnref:lmr14"><a href="#fn:lmr14" class="footnote">6</a></sup></li>
<li>Supervised clustering<sup id="fnref:tang18b:2"><a href="#fn:tang18b" class="footnote">3</a></sup><sup id="fnref:lmr13"><a href="#fn:lmr13" class="footnote">7</a></sup></li>
<li>Low-rank matrix inversion<sup id="fnref:rsml16"><a href="#fn:rsml16" class="footnote">8</a></sup><sup id="fnref:glt18"><a href="#fn:glt18" class="footnote">9</a></sup><sup id="fnref:clw18"><a href="#fn:clw18" class="footnote">10</a></sup></li>
</ul>
<p>We address here what to take away from these results.</p>
<h3 id="for-quantum-computing">For quantum computing</h3>
<p>The most important conclusion, in my opinion, is a heuristic:</p>
<p><strong>Heuristic 1.</strong>
Linear algebra problems in low-dimensional spaces (constant, say, or polylogarithmic) likely can be dequantized.</p>
<p>The intuition for this heuristic is that, if your problem operates in a subspace of such low dimension, the main challenge is “finding” this subspace and rotating to it.
Then, we can think about our problem as lying in <script type="math/tex">\BB{C}^d</script> where <script type="math/tex">d</script> is small, and can solve it with a simple polynomial-time (in <script type="math/tex">d</script>) algorithm.
Finding the subspace is an unordered search problem if you squint, so can’t be sped up much by exploiting quantum.</p>
<p><em>Remark.</em> There are high-dimensional problems that cannot be dequantized; for example, given <script type="math/tex">\SQ(v)</script>, it takes <script type="math/tex">\Omega(n)</script> queries to approximately sample from <script type="math/tex">Hv</script>, where <script type="math/tex">H</script> is the Hadamard matrix (this is the Fourier Sampling problem<sup id="fnref:ac16"><a href="#fn:ac16" class="footnote">11</a></sup>).</p>
<p>Why do we care about dequantizing algorithms?
As the name suggests, I argue that this is a reasonable classical analogue to quantum machine learning algorithms.</p>
<p><strong>Heuristic 2.</strong>
For machine learning problems, SQ assumptions are more reasonable than state preparation assumptions.</p>
<p>That is, the practical task of preparing quantum states is probably always harder than the practical task of preparing sample and query access.
Practically, this makes sense, since for state preparation we need, well, quantum computers.</p>
<blockquote><p>
Quantum computing applications that are realizable with zero qubits!
</p><footer>– Scott Aaronson's "elevator pitch" of my work, paraphrased</footer>
</blockquote>
<p>Even assuming the existence of a practical quantum computer, there is evidence that state preparation assumptions are still harder to satisfy than sample and query access, up to polynomial slowdown.
For example, preparing a generic quantum state <script type="math/tex">\ket{v}</script> corresponding to an input vector <script type="math/tex">v</script> takes <script type="math/tex">\Omega(\sqrt{n})</script> quantum queries to <script type="math/tex">v</script> in general, while responding to <script type="math/tex">\SQ(v)</script> accesses takes <script type="math/tex">\Theta(n)</script> classical queries.
Because dequantized algorithms are polynomial in <script type="math/tex">\log n</script>, this means that getting SQ access to a generic vector is much more expensive than running the algorithm.</p>
<p>Of course, we can also consider special classes of vectors where quantum state preparation is easier, but generally SQ access gets proportionally faster as well.
For example, we can quickly prepare vectors where all entries have roughly equal magnitude (think vectors whose entries are either <script type="math/tex">+1</script> or <script type="math/tex">-1</script>), but correspondingly, we can compute SQ accesses to such vectors similarly quickly.</p>
<p>On the classical side, the assumption of SQ access is on par with other typical assumptions to make machine learning algorithms sublinear:</p>
<ul>
<li>There is a classical dynamic data structure that supports SQ access, fast updates, and sparsity in log time.</li>
<li>Given an input vector as a list of nonzero entries, sampling from it takes time linear in sparsity.</li>
<li><script type="math/tex">k</script> independent samples can be prepared with one pass through the data in <script type="math/tex">O(k)</script> space.</li>
</ul>
<hr />
<p>To summarize these heuristics: quantum machine learning for <em>low-dimensional datasets</em> will probably never get speedups as significant as, say, Shor’s algorithm, even in best-case scenarios.
Unfortunately, QML for low-dimensional problems were the most practical algorithms in the literature, so with this research it’s unclear what the state of the field is today.</p>
<p>The story might not be over, though.
We know that quantum computers can “efficiently solve” high-dimensional linear algebra problems<sup id="fnref:hhl08"><a href="#fn:hhl08" class="footnote">12</a></sup>; however, this assumes that we have some way to evolve a quantum system precisely according to input data, a much harder problem than the linear algebra itself.
Nevertheless, I hold out hope that this result can be applied to achieve exponential speedups in machine learning or elsewhere.</p>
<h3 id="for-classical-computing">For classical computing</h3>
<p>I am cautiously optimistic about the implications of this work for classical computing.
The major advantage of dequantized algorithms is sheer speed (asymptotically, at least).
However, the issues listed below prevent dequantized algorithms from being strict improvements over current algorithms.</p>
<ul>
<li>Gaining SQ access to input typically requires preliminary data processing or the use of a data structure.
This means that dequantized algorithms can’t be plugged into existing systems without large amounts of computation.</li>
<li>SQ access to output might not always be useful or practical.</li>
<li>Current dequantized algorithms have large error compared to standard techniques.</li>
<li>Current algorithms have large theoretical exponents, so right now we don’t know whether they run quickly in practice.
I expect we can cut down these exponents greatly.</li>
</ul>
<p>If I had to guess, the best chance for success in dequantized techniques remains recommendation systems, since speed matters significantly in that context.
I view the other algorithms as significantly less likely to see use in practice, though probably more likely than their corresponding quantum algorithms.</p>
<p>Regardless, these works fit nicely into the classical literature: dequantized quantum machine learning is just a nicely modular, quantum-inspired form of randomized numerical linear algebra.</p>
<h2 id="appendix-more-details">Appendix: More details</h2>
<p>As a reminder, here are the three techniques:</p>
<ol>
<li>Inner Product</li>
<li>Thin Matrix-Vector</li>
<li>Low-rank Approximation</li>
</ol>
<p>Below, we explain (1) and (2) fully, and give a rough sketch of (3).</p>
<h3 id="1-estimating-inner-products">1. Estimating inner products</h3>
<p>First, we give a basic way of estimating the mean of an arbitrary distribution with finite variance.</p>
<p><strong>Fact.</strong>
For <script type="math/tex">\{X_{i,j}\}</script> i.i.d random variables with mean <script type="math/tex">\mu</script> and variance <script type="math/tex">\sigma^2</script>, let</p>
<script type="math/tex; mode=display">Y := \underset{j \in [6\log 1/\delta]}{\operatorname{median}}\;\underset{i \in [6/\eps^2]}{\operatorname{mean}}\;X_{i,j}</script>
<p>Then <script type="math/tex">\vert Y - \mu\vert \leq \eps\sigma</script> with probability <script type="math/tex">\geq 1-\delta</script>, using only <script type="math/tex">O(\frac{1}{\eps^2}\log\frac{1}{\delta})</script> copies of <script type="math/tex">X</script>.</p>
<p><em>Proof sketch.</em>
The proof follows from two facts: first, the median of <script type="math/tex">C_1,\ldots,C_n</script> is at least <script type="math/tex">\lambda</script> precisely when at least half of the <script type="math/tex">C_i</script> are at least <script type="math/tex">\lambda</script>; second, <a href="https://en.wikipedia.org/wiki/Chebyshev%27s_inequality#Probabilistic_statement">Chebyshev’s inequality</a> (applied to the mean).</p>
<p>Estimating the inner product is just a basic corollary of this estimator.</p>
<p><strong>Proposition.</strong>
For <script type="math/tex">x,y \in \BB{C}^n</script>, given <script type="math/tex">\SQ(x)</script> and <script type="math/tex">\Q(y)</script>, we can estimate <script type="math/tex">\langle x,y\rangle</script> to <script type="math/tex">\eps\|x\|\|y\|</script> error with probability <script type="math/tex">\geq 1-\delta</script> with query complexity <script type="math/tex">O(\frac{1}{\eps^2}\log\frac{1}{\delta})</script>.</p>
<p><em>Proof.</em>
Sample <script type="math/tex">s</script> from <script type="math/tex">v</script> and let <script type="math/tex">Z = x_sv_s\frac{\|v\|^2}{|v_s|^2}</script>.
Apply the Fact with <script type="math/tex">X_{i,j}</script> being independent copies of <script type="math/tex">Z</script>.</p>
<h3 id="2-thin-matrix-vector-product-with-rejection-sampling">2. Thin matrix-vector product with rejection sampling</h3>
<p>We first go over rejection sampling, a naive way to efficiently generate samples from a specified distribution from samples from another distribution.</p>
<p>Input: samples from distribution <script type="math/tex">P</script><br />
Output: samples from distribution <script type="math/tex">Q</script></p>
<ol>
<li>Pull a sample <script type="math/tex">s</script> from <script type="math/tex">P</script>;</li>
<li>Compute <script type="math/tex">r_s = \frac{Q(s)}{MP(s)}</script> for some constant <script type="math/tex">M</script>;</li>
<li>Output <script type="math/tex">s</script> with probability <script type="math/tex">r_s</script> and restart otherwise.</li>
</ol>
<p><strong>Fact.</strong>
If <script type="math/tex">r_i \leq 1</script> for all <script type="math/tex">i</script>, then the above procedure is well-defined and outputs a sample from <script type="math/tex">Q</script> in <script type="math/tex">M</script> iterations in expectation.</p>
<hr />
<p><strong>Proposition.</strong>
For <script type="math/tex">V \in \BB{R}^{n\times k}</script> and <script type="math/tex">w \in \BB{R}^k</script>, given <script type="math/tex">\SQ(V)</script> and <script type="math/tex">\Q(w)</script>, we can simulate <script type="math/tex">\SQ(Vw)</script> with expected query complexity <script type="math/tex">O(k^2C(V,w))</script>, where</p>
<script type="math/tex; mode=display">C(V,w) := \frac{\sum_{i=1}^k\|w_iV^{(i)}\|^2}{\|Vw\|^2}.</script>
<p>We can compute entries <script type="math/tex">(Vw)_i</script> with <script type="math/tex">O(k)</script> queries.<br />
We can sample using rejection sampling:</p>
<ul>
<li><script type="math/tex">P</script> is the distribution formed by sampling from <script type="math/tex">V^{(j)}</script> with probability proportional to <script type="math/tex">\|w_jV^{(j)}\|^2</script>;</li>
<li><script type="math/tex">Q</script> is the target <script type="math/tex">Vw</script>.</li>
</ul>
<script type="math/tex; mode=display">r_i = \frac{(Vw)_i^2}{k \sum_{j=1}^k (w_jV_{ij})^2} = \frac{Q(i)}{kC(V,w)P(i)}</script>
<p>Notice that we can compute these <script type="math/tex">r_i</script>’s (in fact, despite that we cannot compute probabilities from the target distribution), and that the rejection sampling guarantee is satisfied (via Cauchy-Schwarz).</p>
<p>The probability of success is <script type="math/tex">\frac{\|Vw\|^2}{k\sum_{i=1}^k\|w_iV^{(i)}\|^2}</script>.
Thus, to estimate the norm of <script type="math/tex">Vw</script>, it suffices to estimate the probability of success of this rejection sampling process.
We can view this as estimating the heads probability of a biased coin, where the coin is heads if rejection sampling succeeds and tails otherwise.
Through a <a href="https://en.wikipedia.org/wiki/Chernoff_bound#Multiplicative_form_(relative_error)">Chernoff bound</a>, we see that the average of <script type="math/tex">O(kC(V,w)\frac{1}{\eps^2}\log\frac{1}{\delta})</script> “coin flips” is in <script type="math/tex">[(1-\eps)\|Vw\|,(1+\eps)\|Vw\|]</script> with probability <script type="math/tex">\geq 1-\delta</script>, where each coin flip costs <script type="math/tex">k</script> queries and samples.</p>
<h3 id="3-low-rank-approximation-briefly">3. Low-rank approximation, briefly</h3>
<p><strong>Proposition.</strong>
For <script type="math/tex">A \in \BB{C}^{m\times n}</script>, given <script type="math/tex">\SQ(A)</script> and some threshold <script type="math/tex">k</script>, we can output a description of a low-rank approximation of <script type="math/tex">A</script>.</p>
<p>Specifically, our output is <script type="math/tex">\SQ(S,\hat{U}, \hat{\Sigma})</script> for <script type="math/tex">S \in \BB{C}^{\ell \times n}</script>, <script type="math/tex">\hat{U} \in \BB{C}^{\ell \times k}</script>, <script type="math/tex">\hat{\Sigma} \in \BB{C}^{k\times k}</script> (<script type="math/tex">\ell = \poly(k,\frac{1}{\eps})</script>), and this implicitly describes the low-rank approximation to <script type="math/tex">A</script>, <script type="math/tex">D := A(S^\dagger\hat{U}\hat{\Sigma}^{-1})(S^\dagger\hat{U}\hat{\Sigma}^{-1})^\dagger</script> (notice rank <script type="math/tex">D \leq k</script>).</p>
<p>This matrix satisfies the following low-rank guarantee with probability <script type="math/tex">\geq1-\delta</script>: for <script type="math/tex">\sigma := \sqrt{2/k}\|A\|_F</script>, and <script type="math/tex">A_{\sigma} := \sum_{\sigma_i \geq \sigma} \sigma_iu_iv_i^\dagger</script> (using SVD),</p>
<script type="math/tex; mode=display">\|A - D\|_F^2 \leq \|A - A_\sigma\|_F^2 + \eps^2\|A\|_F^2.</script>
<p>This algorithm comes from the 1998 paper of Frieze, Kannan, and Vempala<sup id="fnref:fkv04"><a href="#fn:fkv04" class="footnote">13</a></sup>.
See the recent survey<sup id="fnref:kv17"><a href="#fn:kv17" class="footnote">14</a></sup> by Kannan and Vempala for a survey of these techniques, and see Woodruff’s textbook<sup id="fnref:woodruff14"><a href="#fn:woodruff14" class="footnote">15</a></sup> for a discussion of more general techniques.
The form I state above is a simple variant that I discuss in my recommendation systems paper<sup id="fnref:tang18a:1"><a href="#fn:tang18a" class="footnote">5</a></sup>.</p>
<p>The core piece of analysis is the following theorem (sometimes called the <em>Approximate Matrix Product</em> property in the literature).</p>
<p><strong>Theorem.</strong>
Let <script type="math/tex">S^TS = \sum_{j=1}^\ell S_jS_j^T</script>, where <script type="math/tex">S_j</script> is <script type="math/tex">\frac{A_i\|A\|_F^2}{\|A_i\|}</script> with probability <script type="math/tex">\frac{\|A_i\|^2}{\|A\|_F^2}</script> (so <script type="math/tex">i</script> is sampled from <script type="math/tex">\tilde{A}</script>). For sufficiently small <script type="math/tex">\eps</script> and <script type="math/tex">\ell = \Omega(\frac{1}{\eps^2}\log\frac1\delta)</script>, with probability <script type="math/tex">\geq 1-\delta</script>,</p>
<script type="math/tex; mode=display">\|S^TS - A^TA\|_F \leq \eps\|A\|_F^2.</script>
<p>This looks like a further higher-order (two order two tensor inner product) generalization of inner product (two order one tensor inner product) and thin matrix-vector (order two and order one tensor inner product); it’s possible that a clever rephrasing of this result in the <script type="math/tex">SQ</script> model could make the low-rank approximation result more quantum-ic.</p>
<p>We now sketch the algorithm along with intuition: it’s most useful to consider the low-rank approximation task as one of finding large approximate singular vectors.
First, sample <script type="math/tex">\ell</script> rows of <script type="math/tex">A</script> according to <script type="math/tex">\ell^2</script> norm, and consider the matrix <script type="math/tex">S \in \BB{C}^{\ell \times n}</script> of these rows, all renormalized to have the same length.
This is the <script type="math/tex">S</script> that we output.
By the above theorem, <script type="math/tex">\|S^TS - A^TA\|_F \leq \eps\|A\|_F^2</script> with good probability, which implies that the large right singular vectors of <script type="math/tex">S</script> (eigenvectors of <script type="math/tex">S^TS</script>) approximate the large right singular vectors of <script type="math/tex">A</script> (eigenvectors of <script type="math/tex">A^TA</script>).</p>
<p>Next, we can perform the same process to <script type="math/tex">S^T</script>: sample rows of <script type="math/tex">S^T</script> and get a normalized submatrix <script type="math/tex">W \in \BB{R}^{\ell \times \ell}</script> such that <script type="math/tex">\|WW^T-SS^T\|_F \leq \eps\|A\|_F^2</script>.
Since <script type="math/tex">W</script> is a constant-sized matrix, we can compute <script type="math/tex">\hat{U}</script> and <script type="math/tex">\hat{\Sigma}</script>, the large left singular vectors and values of <script type="math/tex">W</script>, which approximate the large left singular vectors and values of <script type="math/tex">S</script>.
Then, <script type="math/tex">S^T\hat{U}\hat{\Sigma}^{-1}</script> translates these large left singular vectors to their corresponding right singular vectors and rescales them accordingly, giving the approximate singular vectors of <script type="math/tex">A</script> as desired.</p>
<h2 id="glossary">Glossary</h2>
<p>For natural numbers <script type="math/tex">m, n</script>, vector <script type="math/tex">v \in \BB{C}^n</script> and <script type="math/tex">A \in \BB{C}^{m\times n}</script>:</p>
<p><script type="math/tex">[n]</script> denotes <script type="math/tex">\{1,2,\ldots,n\}</script>;
<script type="math/tex">O(\cdot)</script> and <script type="math/tex">\Omega(\cdot)</script> is <a href="https://en.wikipedia.org/wiki/Big_O_notation">big O notation</a>;
<script type="math/tex">A_i</script> and <script type="math/tex">A^{(j)}</script> denotes the <script type="math/tex">i</script>th row of <script type="math/tex">A</script> and the <script type="math/tex">j</script>th column of <script type="math/tex">A</script>;
<script type="math/tex">\|v\|</script> denotes the <script type="math/tex">\ell^2</script> norm of <script type="math/tex">v</script>, <script type="math/tex">\sqrt{\|v_1\|^2 + \cdots + \|v_n\|^2}</script>;</p>
<p><script type="math/tex">\ket{\psi}</script> is <a href="https://en.wikipedia.org/wiki/Bra%E2%80%93ket_notation">bra-ket notation</a>: kets are column vectors <script type="math/tex">\ket{\psi} \in \BB{C}^{n \times 1}</script>, bras are row vectors <script type="math/tex">\bra{\psi} := (\ket{\psi})^\dagger</script>, standard basis vectors are denoted <script type="math/tex">\ket{1},\ldots,\ket{n}</script>, and the tensor product of <script type="math/tex">\ket{\alpha}</script> and <script type="math/tex">\ket{\beta}</script> is denoted <script type="math/tex">\ket{\alpha}\ket{\beta}</script>.
Of course, these are all really quantum states, but that’s only relevant for quantum algorithms: for my purposes, I use <script type="math/tex">\ket{\phi}</script> and <script type="math/tex">\phi</script> interchangeably to refer to vectors.
(I ignore normalization, but those issues can be dealt with.)</p>
<p>The <a href="https://en.wikipedia.org/wiki/Singular_value_decomposition">singular value decomposition</a> (SVD) of <script type="math/tex">A \in \BB{C}^{m\times n}</script> is a decomposition <script type="math/tex">A = U\Sigma V^\dagger</script>, where <script type="math/tex">U \in \BB{C}^{m\times m}</script> and <script type="math/tex">V \in \BB{C}^{n\times n}</script> are unitary and <script type="math/tex">\Sigma \in \BB{R}^{m\times n}</script> is diagonal.
In other words, for <script type="math/tex">u_i</script> and <script type="math/tex">v_i</script> the columns of <script type="math/tex">U</script> and <script type="math/tex">V</script>, respectively, and <script type="math/tex">\sigma_i</script> the diagonal entries of <script type="math/tex">\Sigma</script>, <script type="math/tex">A = \sum \sigma_iu_iv_i^\dagger</script>.
By convention, <script type="math/tex">\sigma_1 \geq \ldots \geq \sigma_{\min m,n} \geq 0</script>.</p>
<p>Using <script type="math/tex">A</script>’s SVD, we can define basic linear algebraic objects.
<script type="math/tex">\|A\|_2 = \max_{v \in \BB{C}^n} \|Av\|/\|v\| = \sigma_1</script> is the spectral norm of <script type="math/tex">A</script>.
<script type="math/tex">\|A\|_F = \sqrt{\sum_{i=1}^m\sum_{j=1}^n |A_{ij}|^2} = \sqrt{\sigma_1^2 + \cdots + \sigma_{\min m,n}^2}</script> is the Frobenius norm of <script type="math/tex">A</script>.
<script type="math/tex">A_k = \sum_{i=1}^k \sigma_iu_iv_i^\dagger</script> is an optimal rank <script type="math/tex">k</script> approximation to <script type="math/tex">A</script> in both spectral and Frobenius norm.
<script type="math/tex">A^+ = \sum_{\sigma_i > 0} \frac{1}{\sigma_i}v_iu_i^\dagger</script> is <script type="math/tex">A</script>’s <a href="https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse">pseudoinverse</a>.</p>
<p>I define <script type="math/tex">\SQ(v)</script>, <script type="math/tex">\SQ(A)</script>, and <script type="math/tex">\Q(v)</script> in <a href="#an-introduction-to-dequantization">An introduction to dequantization</a>.</p>
<h2 id="references">References</h2>
<div class="footnotes">
<ol>
<li id="fn:aaronson15">
<p>Scott Aaronson. <em>Read the fine print</em>. Nature Physics 11.4, 2015. <a href="https://www.scottaaronson.com/papers/qml.pdf">Link</a> <a href="#fnref:aaronson15" class="reversefootnote">↩</a></p>
</li>
<li id="fn:kp17">
<p>Iordanis Kerenidis, Anupam Prakash. <em>Quantum recommendation systems</em>. <a href="https://arxiv.org/abs/1603.08675">arXiv:1603.08675</a>. <a href="#fnref:kp17" class="reversefootnote">↩</a> <a href="#fnref:kp17:1" class="reversefootnote">↩<sup>2</sup></a></p>
</li>
<li id="fn:tang18b">
<p>Ewin Tang. <em>Quantum-inspired classical algorithms for principal component analysis and supervised clustering</em>. <a href="https://arxiv.org/abs/1811.00414">arXiv:1811.00414</a>, 2018. <a href="#fnref:tang18b" class="reversefootnote">↩</a> <a href="#fnref:tang18b:1" class="reversefootnote">↩<sup>2</sup></a> <a href="#fnref:tang18b:2" class="reversefootnote">↩<sup>3</sup></a></p>
</li>
<li id="fn:dkr02">
<p>Petros Drineas, Iordanis Kerenidis, Prabhakar Raghavan. <em>Competitive recommendation systems</em>. STOC, 2002. <a href="https://www.irif.fr/~jkeren/jkeren/CV_Pubs_files/DKR02.pdf">Link</a>. <a href="#fnref:dkr02" class="reversefootnote">↩</a></p>
</li>
<li id="fn:tang18a">
<p>Ewin Tang. <em>A quantum-inspired algorithm for recommendation systems</em>. <a href="https://arxiv.org/abs/1807.04271">arXiv:1807.04271</a>, 2018. <a href="#fnref:tang18a" class="reversefootnote">↩</a> <a href="#fnref:tang18a:1" class="reversefootnote">↩<sup>2</sup></a></p>
</li>
<li id="fn:lmr14">
<p>Seth Lloyd, Masoud Mohseni, Patrick Rebentrost. <em>Quantum principal component analysis</em>. <a href="https://arxiv.org/abs/1307.0401">arXiv:1307.0401</a>, 2013. <a href="#fnref:lmr14" class="reversefootnote">↩</a></p>
</li>
<li id="fn:lmr13">
<p>Seth Lloyd, Masoud Mohseni, Patrick Rebentrost. <em>Quantum algorithms for supervised and unsupervised machine learning</em>. <a href="https://arxiv.org/abs/1307.0411">arXiv:1307.0411</a>, 2013. <a href="#fnref:lmr13" class="reversefootnote">↩</a></p>
</li>
<li id="fn:rsml16">
<p>Patrick Rebentrost, Adrian Steffens, Iman Marvian, Seth Lloyd. <em>Quantum singular-value decomposition of nonsparse low-rank matrices</em>. <a href="https://arxiv.org/abs/1607.05404">arXiv:1607.05404</a> . <a href="#fnref:rsml16" class="reversefootnote">↩</a></p>
</li>
<li id="fn:glt18">
<p>András Gilyén, Seth Lloyd, Ewin Tang. <em>Quantum-inspired low-rank stochastic regression with logarithmic dependence on the dimension</em>. <a href="https://arxiv.org/abs/1811.04909">arXiv:1811.04909</a>, 2018. <a href="#fnref:glt18" class="reversefootnote">↩</a></p>
</li>
<li id="fn:clw18">
<p>Nai-Hui Chia, Han-Hsuan Lin, Chunhao Wang. <em>Quantum-inspired sublinear classical algorithms for solving low-rank linear systems</em>. <a href="https://arxiv.org/abs/1811.04852">arXiv:1811.04852</a>, 2018. <a href="#fnref:clw18" class="reversefootnote">↩</a></p>
</li>
<li id="fn:ac16">
<p>Scott Aaronson and Lijie Chen. <em>Complexity-theoretic foundations of quantum supremacy experiments</em>. <a href="https://arxiv.org/abs/1612.05903">arXiv:1612.05903</a>, 2016. <a href="#fnref:ac16" class="reversefootnote">↩</a></p>
</li>
<li id="fn:hhl08">
<p>Aram W. Harrow, Avinatan Hassidim, Seth Lloyd. <em>Quantum algorithm for solving linear systems of equations</em>. <a href="https://arxiv.org/abs/0811.3171">arXiv:0811.3171</a>, 2008. <a href="#fnref:hhl08" class="reversefootnote">↩</a></p>
</li>
<li id="fn:fkv04">
<p>Alan Frieze, Ravindran Kannan, Santosh Vempala. <em>Fast monte-carlo algorithms for finding low-rank approximations</em>. <em>Journal of the ACM</em>, vol. 51, no. 6, 2004. <a href="https://www.math.cmu.edu/~af1p/Texfiles/SVD.pdf">Link</a>. <a href="#fnref:fkv04" class="reversefootnote">↩</a></p>
</li>
<li id="fn:kv17">
<p>Ravindran Kannan and Santosh Vempala. <em>Randomized algorithms in numerical linear algebra</em>. Acta Numerica 26, 2017. <a href="https://www.cc.gatech.edu/~vempala/papers/acta_survey.pdf">Link</a>. <a href="#fnref:kv17" class="reversefootnote">↩</a></p>
</li>
<li id="fn:woodruff14">
<p>David P. Woodruff. <em>Sketching as a tool for numerical linear algebra</em>. Foundations and Trends in Theoretical Computer Science 10.1–2, 2014. <a href="https://researcher.watson.ibm.com/researcher/files/us-dpwoodru/wNow.pdf">Link</a>. <a href="#fnref:woodruff14" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>
Mon, 28 Jan 2019 00:00:00 +0000
https://www.ewintang.com/blog/2019/01/28/an-overview-of-quantum-inspired-sampling/
https://www.ewintang.com/blog/2019/01/28/an-overview-of-quantum-inspired-sampling/