Toolbox

C1: Fast Parallel Matrix Inversion Algorithms

Reminder: This post contains 2436 words · 7 min read · by Xianbin

This post introduce the very first paper giving an NC algorithm [1].

The Model

For parallel/distributed algorithms, the model we consider is very important. Here, the model is as follows.

The model of parallel computation is such a model in which there are arbitrary number of identical processors with independent control and arbitrarily huge memory with unrestricted access. Basically, you can consider that each memory can access anything from the whole input. We only care about \(steps\), i.e., the unit time of doing one operation. The parallel arithmetic complexity of the computation is the least number required to procedure the result.

Easy Problems

Matrix Multiplication. Let \(A\) be an \(m\times n\) matrix, and \(B\) be an \(n\times s\) matrix. The product \(A \times B\) can be done in \(O(mps)\) processors in \(O(\log n)\) time that comes from adding.

Power of a matrix \(A\). We just do matrix multiplication in parallel for \(O(\log n)\) times. So, we need \(O(\log^2 n)\) time and \(O(n^4)\) processors.

Linear Recurrences

Let \(A_{x+1} = A_x + A_{x-1}\). How to calculate \(A_n\) fast in parallel? It is easy to think that we can only do it sequentially.

But if we write down the details, we will see something different.

\[x_1 = c_1\\ x_2 = a_{21} x_1 + c_2\\ ... \\ x_n = a_{n1}x_1+\cdots+c_n\]

What is this?

It can be expressed by matrix!!!

\[A\vec{x} + \vec{c} = \vec{x}\]

Then,

\[\vec{x} = (I - A)^{-1}c\]

Notice that \(A\) is a lower triangular matrix.

The inversion of a lower triangular matrix can be done in \(O(\log^2 n)\) because we can can split \(A\) into four parts \((B,0 ; C, D)\) and recursively continue to execute.

Matrix Inversion

In the following, we introduce Cayley-Hamilton theorem.

Definition. The characteristic polynomial of an \(n\times n\) matrix \(A\) is defined as \(p(\lambda) = \text{det}(\lambda I_n - A)\). If we replace the scalar variable \(\lambda\) by \(A\), we obtain \(p(A)\).

Let \(p(\lambda) = 0\), we get

\[p(A) = \left( 0,0;0,0\right)\]

Then, we have

\(A^n - s_1A^{n-1} \ldots \pm s_nI = 0.\) where \(s_i\) is the coefficients of the characteristic polynomial.

So,

\[A^{-1}= \frac{1}{s_n}(s_{n-1}A \ldots)\]

Since power of matrix and matrix multiplication can be done in \(O(\log n)\), we only need to consider the parallel complexity of computing \(s_i\).

The characteristic polynomial of a matrix \(A\) is defined to be

\[\text{det}(xI - A) = x^n - s_1 x^{n-1} + s_2 x^{n-2} - s_3 x^{n-3}\ldots\pm s_n \\ = \Pi_{i=1}^n (x-\lambda_i)\]

where \(\lambda_i\) are the eigenvalues of \(A\).

\[s_k = \frac{1}{k}(s_{k-1}\cdot \text{tr}(A) \cdots \pm \text{tr}A^k)\]

The above can be computed in \(O(\log^2 n)\) time.

Put it together, we solve this problem.

Reference

[1]. Csanky, L. (1975, October). Fast parallel matrix inversion algorithms. In 16th Annual Symposium on Foundations of Computer Science (sfcs 1975) (pp. 11-12). IEEE.