The Lanczos Method

mathlinear algebramatrix decompositions

Matt Piekenbrock
Posted: 2023-10-15, ~15 min read, 5783 words

Central to the theory of matrix factorization is the spectral theorem, which provides conditions under which a linear operator A:RnRnA : \mathbb{R}^n \to \mathbb{R}^n can be diagonalized in terms of its eigenvectors:

A=UΛU1 A = U \Lambda U^{-1}

When AA is symmetric, the eigen-decomposition is not only guarenteed to exist, but its canonical form may be obtained via orthogonal diagonalization. Such matrices are among the most commonly encountered matrices in applications.

In 1950, Cornelius Lanczos proposed an algorithm for obtaining by different kind of factorization of ASymnA \in \mathrm{Sym}_n by means of tri-diagonalization:

AQ=QTQTAQ=TAQ = Q T \quad \Leftrightarrow \quad Q^T A Q = T

Like the eigen-decomposition, the “outer” term(s) QQ orthogonally project AA to a highly structured “inner” matrix TT. Unlike the eigen-decomposition, this factorization has no canonical form and does not readily yield the eigensets of AA, prompting questions of its utility.

Nonetheless, this decomposition ended up proving to be one of the most revealing spectral decompositions of all time. Despite its age, Lanczos’ method of tridiagonalization remains the standard algorithm1 for both computing eigensets and solving linear systems in the large-scale regime. An IEEE guest editorial places it among the top 10 most influential algorithms of the 20th century.

Lanczos’ “method of minimized iterations” on a 100x100 matrix with eigenvalues (0, 0.01, 0.02, …, 2, 2.5, 3)

Lanczos on a bumper sticker #

Given any non-zero vRnv \in \mathbb{R}^n, Lanczos generates a Krylov subspace via successive powers of AA:

K(A,v){A0v,A1v,A2v,,Anv}\mathcal{K}(A, v) \triangleq \{ \, A^{0} v, A^{1}v, A^{2}v, \dots, A^{n}v \, \}

These vectors are linearly independent, so orthogonalizing them yields not only an orthonormal basis for K(A,v)\mathcal{K}(A, v) but also a change-of-basis QQ, re-parameterizing AA by a new matrix TT:

K=[vAvA2vAn1v]Q=[q1,q2,,q]qr(K)T=QTAQ\begin{align*} K &= [\, v \mid Av \mid A^2 v \mid \dots \mid A^{n-1}v \,] && \\ Q &= [\, q_1, q_2, \dots, q \,] \gets \mathrm{qr}(K) && \\ T &= Q^T A Q && \end{align*}

Since AA is symmetric, TT is guaranteed to have a symmetric tridiagonal structure:

T=tridiag(β2β3βnα1α2αnβ2β3βn)T = \mathrm{tridiag} \Bigg( \begin{array}{ccccccccc} & \beta_2 & & \beta_3 & & \cdot & & \beta_n & \\ \alpha_1 & & \alpha_2 & & \cdot & & \cdot & & \alpha_n \\ & \beta_2 & & \beta_3 & & \cdot & & \beta_n & \end{array} \Bigg)

Since QQ spans K(A,v)=range(A)\mathcal{K}(A, v) = \mathrm{range}(A), the change-of-basis AQ1AQA \mapsto Q^{-1} A Q is a similarity transform—an equivalence relation on Symn\mathrm{Sym}_n—thus we can obtain Λ\Lambda by diagonalizing TT:

T=YΛYTT = Y \Lambda Y^T

As TT is quite structured, it is easy to diagonalize, and in doing so we have effectively solved the eigenvalue problem for AA. To quote the Lanczos introduction from Parlett, could anything be more simple?

The “iteration” part #

Lanczos originally referred to his algorithm as the method of minimized iterations, and indeed nowadays it is often considered an iterative method. Where’s the iterative component?

If you squint hard enough, you can deduce that for every j[1,n)j \in [1, n):

AQj=QjTj+βj+1qj+1ejTA Q_j = Q_j T_j + \beta_{j+1} q_{j+1} e_{j}^T

Equating the jj-th columns on both sides and rearranging yields a three-term recurrence:

Aqj=βj-1qj-1+αjqj+βjqj+1βjqj+1=Aqjαjqjβj-1qj-1\begin{align*} A q_j &= \beta_{j\text{-}1} q_{j\text{-}1} + \alpha_j q_j + \beta_j q_{j+1} \\ \Leftrightarrow \quad \beta_{j} \, q_{j+1} &= A q_j - \alpha_j \, q_j - \beta_{j\text{-}1} \, q_{j\text{-}1} \end{align*}

The equation above is a variable-coefficient second-order linear difference equation, and it is known such equations have unique solutions; they are given below:

αj=qjTAqj,    βj=rj2,    qj+1=rj/βj\alpha_j = q_j^T A q_j, \;\; \beta_j = \lVert r_j \rVert_2, \;\; q_{j+1} = r_j / \beta_j

where rj=(AαjI)qjβj-1qj\text{where } r_j = (A - \alpha_j I)q_j - \beta_{j\text{-}1} q_j

In other words, if (qj-1,βj,qjq_{j\text{-}1}, \beta_j, q_j) are known, then (αj\alpha_j, βj+1,qj+1\beta_{j+1}, q_{j+1}) are completely determined. In theory, this means we can iteratively generate both QQ and TT using just a couple vectors at a time—no explicit QR call needed. Pretty nifty, eh!

Wait, isn’t TT arbitrary? #

Unlike the spectral decomposition2, there is no canonical choice of TT. Indeed, as TT is a family with n1n - 1 degrees of freedom and vRnv \in \mathbb{R}^n was chosen arbitrarily, there are infinitely many essentially distinct such decompositions.

Not all hope is lost though. It turns out that TT is actually fully characterized by the pair (A,v)(A, v). To see this, notice that since QQ is an orthogonal matrix, we have:

QQT=In=[e1,e2,,en]Q Q^T = I_n = [e_1, e_2, \dots, e_n]

By extension, given an initial pair (A,q1)(A, q_1) satisfying q1=1\lVert q_1 \rVert = 1, the following holds:

Kn(A,q1)=QQTKn(A,q1)=Q[e1Te1T2e1Tn1e1]K_n(A, q_1) = Q Q^T K_n(A, q_1) = Q[ \, e_1 \mid T e_1 \mid T^2 e_1 \mid \dots \mid T^{n-1} e_1 \, ]

…this is a QR factorization, which is essentially unique! Indeed, the Implicit Q Theorem asserts that if an upper Hessenburg matrix TRn×nT \in \mathbb{R}^{n \times n} has only positive elements on its first subdiagonal and there exists an orthogonal matrix QQ such that QTAQ=TQ^T A Q = T, then QQ and TT are uniquely determined by (A,q1)(A, q_1).

The end of the beginning #

Nowadays, the Lanczos method lies at the core of a host of methods aimed at approximating various spectral quantities, and modern implementations of the Lanczos method deviate wildly from the original conception. Whether for computing eigenvalues (Parlett 1993), solving linear systems (Saad 1987), or approximating matrix functions (Musco, Musco, and Sidford 2018), applications of the Lanczos method are quite diverse.

In fact, the theory underlying the Lanczos method is intrinsically connected to many other algorithms and mathematical constructs, popular examples including the Rayleigh Ritz procedure, the conjugate gradient method, and notions of Gaussian quadrature. As each of these deserve their own post, I’ll defer them to later postings.

In the next post, I’ll cover more of the algorithmic components of the Lanczos method, including the pseudo-code, actual code, the time and space complexities, and additional nuances regarding the stability of the computation.

References #

Musco, Cameron, Christopher Musco, and Aaron Sidford. 2018. “Stability of the Lanczos Method for Matrix Function Approximation.” In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, 1605–24. SIAM.

Parlett, Beresford N. 1993. “Do We Fully Understand the Symmetric Lanczos Algorithm Yet.” Brown Et Al 3: 93–107.

Saad, Youcef. 1987. “On the Lanczos Method for Solving Symmetric Linear Systems with Several Right-Hand Sides.” Mathematics of Computation 48 (178): 651–62.

Footnotes

  1. A variant of the Lanczos method is actually at the heart of scipy.sparse.linalg’s default eigsh solver (which is a port of ARPACK).
  2. The spectral decomposition A=UΛUTA = U \Lambda U^T is canonical in the sense that it identifies a diagonalizable AA with its spectrum Λ(A)\Lambda(A) up to a change of basis AM1AMA \mapsto M^{-1} A M