I finally finish the Painless Conjugate Gradient by Jonathan Richard Shewchuk1 today.

## Quadratic form and Steepest Descent

The author spends almost half of the article on this part. To me the most useful take-aways are:

• Solving a linear system $\mathbf{A}\vec{x} = \vec{b}$ can be regarded as optimizing a quadratic form $f(x) = \frac{1}{2}\vec{x}^T \mathbf{A} \vec{x} - \vec{b}^T\vec{x} + c$.

• The equation (8) in the paper $f(\vec{p}) = f(\vec{x}) + \frac{1}{2}(\vec{p} - \vec{x})^T\mathbf{A}(\vec{p} - \vec{x})$ suggests that optimizing a quadratic form is same to optimizing its “conjugated error” $||\vec{e}||_{\mathbf{A}}$ because

• Represent iterative algorithm as matrix power and analyze the convergence based on the distribution of eigenvalues.

## Conjugate Directions

Section 7.1 introduces the concept of Conjugate Directions. The conjugate directions is just a special kind of descent method, in which a a set of A-orthogonal(i.e. $\vec{d}_i^T\mathbf{A}\vec{d}_j = 0$) vectors $\vec{d}_i$ are used for line search and exactly one step is made along each vector.

### Representing error in search directions

We can derive the step size $\alpha_i$ by a similar procedure which we use for Steepest Descent (equation 31 in the paper):

Since we require that exactly one step is made along each search direction, we should be able to represent the initial error by these $n$ search directions(equation 33):

The paper uses a constructive proof which derives a formula for $\delta_k$ to show such representation does exist(equation 34):

Surprisingly we find that $\alpha_i = -\delta_i$. This hints us we do eliminate the error along one direction completely for each step. In fact this is true and proved in page 27.

### A-orthogonality

Another requirement on search directions is A-orthogonality. This is done by performing a Gram-Schmidt conjugation on a set of $n$ linearly independent vectors $\vec{u}_i$.

Similar to traditional Gram-Schmidt process, we can compute a set of A-orthogonal vectors $\vec{d}_i$ by linear combination with coefficients $\beta_{ij}$(equation 37):

Note that we have to store all previous search directions to compute a new one $\vec{d}_{i+1}$. This will be resolved in Conjugate Gradients by exploiting properties of search space $\mathcal{D}_i$(described below).

### Space of search directions: $\mathcal{D}_i$

In section 7.3 the author introduces a space $\mathcal{D}_i$ which is spanned by $\vec{d}_0$, $\vec{d}_1$, …, $\vec{d}_i$, and proves that current residual $\vec{r}_j$ is orthogonal to all previous search space $\mathcal{D}_i$ ($% $) in equation 38 and 39.

This property will be used to simplify Gram-Schmidt coefficients.

We finally reach the Conjugate Gradient method in chapter 8.

In section 7.2 we said that Gram-Schmidt conjugation is used to construct a set of A-orthogonal vectors $\vec{d}_i$ from linearly independent vectors $\vec{u}_i$. But $\vec{u}_i$ has not determined yet.

In Conjugate Gradient we just let $\vec{u}_i = \vec{r}_i$.

To reason this we write down the iteration step for residual(equation 43):

We can see the new residual is a linear combination of $\vec{r}_i$ and $\mathbf{A} \vec{d}_i$.

By letting $\vec{u}_i = \vec{r}_i$, we have

because $\mathbf{A} \mathcal{D}_i$ is included in $\mathcal{D}_{i+1}$, the fact that next residual $\vec{r}_{i+1}$ is orthogonal to $\mathcal{D}_{i+1}$(Equation 39) implies that $\vec{r}_{i+1}$ is A-orthogonal to $\mathcal{D}_i$. … $\vec{r}_{i+1}$ is already A-orthogonal to all of the previous search directions except $\vec{d}_i$!

Therefore the $\beta_{ij}$ coefficients for each $i$ are simplified to a single scalar, and we can just store the last search direction for each iteration step.

## References

1. Shewchuk, Jonathan Richard. “An introduction to the conjugate gradient method without the agonizing pain.” (1994).