Toolbox

C1: Fast Parallel Matrix Inversion Algorithms

Reminder: This post contains 2436 words · 7 min read · by Xianbin

This post introduce the very first paper giving an NC algorithm [1].

The Model

For parallel/distributed algorithms, the model we consider is very important. Here, the model is as follows.

The model of parallel computation is such a model in which there are arbitrary number of identical processors with independent control and arbitrarily huge memory with unrestricted access. Basically, you can consider that each memory can access anything from the whole input. We only care about stepssteps, i.e., the unit time of doing one operation. The parallel arithmetic complexity of the computation is the least number required to procedure the result.

Easy Problems

Matrix Multiplication. Let AA be an m×nm\times n matrix, and BB be an n×sn\times s matrix. The product A×BA \times B can be done in O(mps)O(mps) processors in O(logn)O(\log n) time that comes from adding.

Power of a matrix AA. We just do matrix multiplication in parallel for O(logn)O(\log n) times. So, we need O(log2n)O(\log^2 n) time and O(n4)O(n^4) processors.

Linear Recurrences

Let Ax+1=Ax+Ax1A_{x+1} = A_x + A_{x-1}. How to calculate AnA_n fast in parallel? It is easy to think that we can only do it sequentially.

But if we write down the details, we will see something different.

x1=c1x2=a21x1+c2...xn=an1x1++cnx_1 = c_1\\ x_2 = a_{21} x_1 + c_2\\ ... \\ x_n = a_{n1}x_1+\cdots+c_n

What is this?

It can be expressed by matrix!!!

Ax+c=xA\vec{x} + \vec{c} = \vec{x}

Then,

x=(IA)1c\vec{x} = (I - A)^{-1}c

Notice that AA is a lower triangular matrix.

The inversion of a lower triangular matrix can be done in O(log2n)O(\log^2 n) because we can can split AA into four parts (B,0;C,D)(B,0 ; C, D) and recursively continue to execute.

Matrix Inversion

In the following, we introduce Cayley-Hamilton theorem.

Definition. The characteristic polynomial of an n×nn\times n matrix AA is defined as p(λ)=det(λInA)p(\lambda) = \text{det}(\lambda I_n - A). If we replace the scalar variable λ\lambda by AA, we obtain p(A)p(A).

Let p(λ)=0p(\lambda) = 0, we get

p(A)=(0,0;0,0)p(A) = \left( 0,0;0,0\right)

Then, we have

Ans1An1±snI=0.A^n - s_1A^{n-1} \ldots \pm s_nI = 0. where sis_i is the coefficients of the characteristic polynomial.

So,

A1=1sn(sn1A)A^{-1}= \frac{1}{s_n}(s_{n-1}A \ldots)

Since power of matrix and matrix multiplication can be done in O(logn)O(\log n), we only need to consider the parallel complexity of computing sis_i.

The characteristic polynomial of a matrix AA is defined to be

det(xIA)=xns1xn1+s2xn2s3xn3±sn=Πi=1n(xλi)\text{det}(xI - A) = x^n - s_1 x^{n-1} + s_2 x^{n-2} - s_3 x^{n-3}\ldots\pm s_n \\ = \Pi_{i=1}^n (x-\lambda_i)

where λi\lambda_i are the eigenvalues of AA.

sk=1k(sk1tr(A)±trAk)s_k = \frac{1}{k}(s_{k-1}\cdot \text{tr}(A) \cdots \pm \text{tr}A^k)

The above can be computed in O(log2n)O(\log^2 n) time.

Put it together, we solve this problem.

Reference

[1]. Csanky, L. (1975, October). Fast parallel matrix inversion algorithms. In 16th Annual Symposium on Foundations of Computer Science (sfcs 1975) (pp. 11-12). IEEE.