Quantization for Matrix MultiplicationPart 2 of 4
Theory of AI

Quantization for Matrix Multiplication (2): Why Classical Methods Fail

Abstract

In the previous post, we formulated the rate-distortion problem for matrix multiplication. Now we prove that three natural approaches—deterministic vector quantization, sketching, and naive dithering—provably fail to achieve optimal distortion. These negative results motivate the development of fundamentally new techniques.


1. Recap: The Objective

Recall our goal: given matrices A,BRnA, B \in \mathbb{R}^n (treating them as vectors for inner product), we want to compress each to nRnR bits and reconstruct their inner product ATBA^T B.

Distortion measure: D=1nE[(ATBC^)2]D = \frac{1}{n} \mathbb{E}\left[(A^T B - \hat{C})^2\right]

For i.i.d. N(0,1)\mathcal{N}(0,1) entries, we have E[ATB2]=n\mathbb{E}[|A^T B|^2] = n, so we want distortion D=O(1)D = O(1).

The optimal distortion-rate function Γ(R)\Gamma(R) achieves D=Γ(R)nD = \Gamma(R) \cdot n where Γ(R)<1\Gamma(R) < 1 for R>0R > 0.


2. Failure of Deterministic Vector Quantization

2.1 The Setup

Consider any deterministic encoding scheme:

  • Encoder 1: f1:Rn{1,,2nR}f_1: \mathbb{R}^n \to \{1, \ldots, 2^{nR}\}
  • Encoder 2: f2:Rn{1,,2nR}f_2: \mathbb{R}^n \to \{1, \ldots, 2^{nR}\}
  • Decoder: g:{1,,2nR}2Rg: \{1, \ldots, 2^{nR}\}^2 \to \mathbb{R}

The decoder reconstructs C^=g(f1(A),f2(B))\hat{C} = g(f_1(A), f_2(B)).

2.2 Main Negative Result

Proposition 1 (Deterministic VQ Fails). Fix rate R>0R > 0. There exist constants δ(R)>0\delta(R) > 0 and n0(R)n_0(R) such that for all nn0n \geq n_0, and for any deterministic encoders f1,f2f_1, f_2 and decoder gg, there exist vectors A,BA, B with A2=B2=n\|A\|_2 = \|B\|_2 = \sqrt{n} such that:

(ATBg(f1(A),f2(B)))2δ(R)n2(A^T B - g(f_1(A), f_2(B)))^2 \geq \delta(R) \cdot n^2

2.3 Proof Sketch

The proof uses a volume argument.

Step 1: Counting codewords

Each encoder has at most 2nR2^{nR} codewords. The joint codebook has at most 22nR2^{2nR} pairs.

Step 2: Covering the sphere

Consider vectors on the sphere Sn1(n)\mathcal{S}^{n-1}(\sqrt{n}) of radius n\sqrt{n}. The product of two spheres has "dimension" roughly 2n2n.

Step 3: Inner product variation

For two vectors A,AA, A' in the same Voronoi cell of encoder 1, and B,BB, B' in the same cell of encoder 2:

  • g(f1(A),f2(B))=g(f1(A),f2(B))g(f_1(A), f_2(B)) = g(f_1(A'), f_2(B'))
  • But ATBA^T B and (A)TB(A')^T B' can differ by Ω(n)\Omega(n)

Step 4: Quantitative bound

Using metric entropy arguments, for any partition of the sphere into 2nR2^{nR} cells, there exist points in the same cell with inner products differing by Ω(n)\Omega(n).

The reason: deterministic VQ creates Voronoi cells. Within each cell, the decoder must output a single value, but inner products within the cell vary by Θ(n)\Theta(n).

2.4 Implications

This result shows that no matter how cleverly we design the codebook, deterministic VQ has worst-case error Ω(n2)\Omega(n^2), not O(n)O(n).

The core issue: deterministic quantizers approximate vectors, but inner products are quadratic in vectors. Small errors in vectors become large errors in products.


3. Failure of Sketching

3.1 Random Projection Sketches

A natural idea: use random sketches to compress vectors.

Simple sketch: Let UN(0,In)U \sim \mathcal{N}(0, I_n) be a random direction. Compute:

  • UA=UTAU_A = U^T A (scalar sketch of AA)
  • UB=UTBU_B = U^T B (scalar sketch of BB)

Estimator: Use C^=UAUB\hat{C} = U_A \cdot U_B as an estimate of ATBA^T B.

3.2 Variance Analysis

Proposition 2 (Sketching Variance). For the simple sketch:

Var[UAUB]=n2(1+ρ2)\text{Var}[U_A U_B] = n^2(1 + \rho^2)

where ρ=ATBAB\rho = \frac{A^T B}{\|A\|\|B\|} is the cosine similarity.

Proof:

Let UN(0,In)U \sim \mathcal{N}(0, I_n). Then:

  • UA=UTAN(0,A2)U_A = U^T A \sim \mathcal{N}(0, \|A\|^2)
  • UB=UTBN(0,B2)U_B = U^T B \sim \mathcal{N}(0, \|B\|^2)
  • Cov(UA,UB)=ATB\text{Cov}(U_A, U_B) = A^T B

For jointly Gaussian (UA,UB)(U_A, U_B) with zero means:

E[UA2UB2]=E[UA2]E[UB2]+2(Cov(UA,UB))2\mathbb{E}[U_A^2 U_B^2] = \mathbb{E}[U_A^2]\mathbb{E}[U_B^2] + 2(\text{Cov}(U_A, U_B))^2

This gives: E[UA2UB2]=A2B2+2(ATB)2\mathbb{E}[U_A^2 U_B^2] = \|A\|^2 \|B\|^2 + 2(A^T B)^2

Since E[UAUB]=ATB\mathbb{E}[U_A U_B] = A^T B (the estimator is unbiased):

Var[UAUB]=E[UA2UB2](E[UAUB])2=A2B2+(ATB)2\text{Var}[U_A U_B] = \mathbb{E}[U_A^2 U_B^2] - (\mathbb{E}[U_A U_B])^2 = \|A\|^2 \|B\|^2 + (A^T B)^2

For A=B=n\|A\| = \|B\| = \sqrt{n}: Var[UAUB]=n2+(ATB)2=n2(1+ρ2)\text{Var}[U_A U_B] = n^2 + (A^T B)^2 = n^2(1 + \rho^2)

3.3 Implications

Even with perfect access to (UA,UB)(U_A, U_B) (infinite precision), the MSE is Θ(n2)\Theta(n^2)!

To reduce variance to O(n)O(n): Need Ω(n)\Omega(n) independent sketches, requiring Ω(n)\Omega(n) bits total—far above the O(nR)O(nR) budget for small RR.

Johnson-Lindenstrauss?: JL lemma preserves pairwise distances, but here we have a single pair with no concentration benefit.


4. Failure of Naive Dithered Quantization

4.1 Subtractive Dithering

A more sophisticated approach uses dithered quantization:

  1. Generate shared randomness ZUniform[0,Δ)nZ \sim \text{Uniform}[0, \Delta)^n
  2. Quantize: A^i=Δ(Ai+Zi)/Δ\hat{A}_i = \Delta \cdot \lfloor (A_i + Z_i)/\Delta \rfloor
  3. Similarly for BB
  4. Reconstruct: C^=A^TB^\hat{C} = \hat{A}^T \hat{B}

4.2 The Problem

Proposition 3 (Dithered Quantization Error). For subtractive dithered quantization with step size Δ\Delta:

E[(ATBA^TB^)2]=nΔ2(σA2+σB2)+nΔ412\mathbb{E}[(A^T B - \hat{A}^T \hat{B})^2] = n \cdot \Delta^2 \cdot (\sigma_A^2 + \sigma_B^2) + \frac{n \Delta^4}{12}

where σA2,σB2\sigma_A^2, \sigma_B^2 are the variances of Ai,BiA_i, B_i.

The issue: for rate RR, we need Δ2R/2\Delta \propto 2^{-R/2} per coordinate (roughly). But the nΔ2n\Delta^2 term means MSE scales as nn, not achieving the optimal Γ(R)n\Gamma(R) \cdot n for low rates.

4.3 Why Dithering Alone Isn't Enough

The fundamental issue: dithered quantization treats each coordinate equally. For inner products:

(A^A)TB+AT(B^B)+(A^A)T(B^B)(\hat{A} - A)^T B + A^T(\hat{B} - B) + (\hat{A} - A)^T(\hat{B} - B)

The cross-terms don't cancel, and the error accumulates across nn coordinates.


5. The Root Cause: Quadratic Distortion

5.1 Why All These Methods Fail

All three approaches share a common flaw: they treat the quantization problem linearly.

  • Deterministic VQ: Approximates AA^A \approx \hat{A}, ignores that ATBA^T B is quadratic
  • Sketching: Linear projections don't capture quadratic structure
  • Dithering: Per-coordinate treatment ignores interaction terms

5.2 Requirements for Optimal Schemes

The optimal scheme must:

  1. Be randomized: To escape the deterministic lower bound
  2. Jointly consider AA and BB: Even though they're encoded separately
  3. Exploit correlations: Use the structure of Gaussian distributions

6. A Glimpse of the Solution

The optimal approach uses time-sharing (or sparsification):

The idea: at low rates, it's better to:

  1. Select a random subset S[n]S \subset [n] of coordinates
  2. Quantize those coordinates with high precision
  3. Ignore the rest

This achieves: D=(1κ)n+κfnaive(R/κ)nD = (1-\kappa) \cdot n + \kappa \cdot f_{\text{naive}}(R/\kappa) \cdot n

where κ=S/n\kappa = |S|/n is the fraction selected.

Why this works: The ignored coordinates contribute variance (1κ)n\propto (1-\kappa)n, but the quantized coordinates are done well. The optimal κ\kappa balances these terms.


7. Conclusion

We have proven that three natural approaches fail:

MethodMSEWhy It Fails
Deterministic VQΩ(n2)\Omega(n^2) worst caseVoronoi cells have varying inner products
SketchingΘ(n2)\Theta(n^2) varianceSingle projection lacks concentration
Naive ditheringSuboptimal O(n)O(n)Doesn't exploit rate allocation

Quantizing for matrix multiplication differs from classical VQ. The quadratic nature of the distortion requires new techniques.

In the next post, we will derive the optimal distortion-rate function Γ(R)\Gamma(R) for Gaussian inner products and prove it achieves the information-theoretic limit.


This post is based on Professor Yuri Polyanskiy's lecture at the 2025 Princeton ML Theory Summer School and the paper "Optimal Quantization for Matrix Multiplication" by Ordentlich and Polyanskiy (2024).