I finally finish the Painless Conjugate Gradient by Jonathan Richard Shewchuk1 today.

Quadratic form and Steepest Descent

The author spends almost half of the article on this part. To me the most useful take-aways are:

  • Solving a linear system can be regarded as optimizing a quadratic form .

  • The equation (8) in the paper suggests that optimizing a quadratic form is same to optimizing its “conjugated error” because

  • Represent iterative algorithm as matrix power and analyze the convergence based on the distribution of eigenvalues.

Conjugate Directions

Section 7.1 introduces the concept of Conjugate Directions. The conjugate directions is just a special kind of descent method, in which a a set of A-orthogonal(i.e. ) vectors are used for line search and exactly one step is made along each vector.

Representing error in search directions

We can derive the step size by a similar procedure which we use for Steepest Descent (equation 31 in the paper):

Since we require that exactly one step is made along each search direction, we should be able to represent the initial error by these search directions(equation 33):

The paper uses a constructive proof which derives a formula for to show such representation does exist(equation 34):

Surprisingly we find that . This hints us we do eliminate the error along one direction completely for each step. In fact this is true and proved in page 27.


Another requirement on search directions is A-orthogonality. This is done by performing a Gram-Schmidt conjugation on a set of linearly independent vectors .

Similar to traditional Gram-Schmidt process, we can compute a set of A-orthogonal vectors by linear combination with coefficients (equation 37):

Note that we have to store all previous search directions to compute a new one . This will be resolved in Conjugate Gradients by exploiting properties of search space (described below).

Space of search directions:

In section 7.3 the author introduces a space which is spanned by , , …, , and proves that current residual is orthogonal to all previous search space () in equation 38 and 39.

This property will be used to simplify Gram-Schmidt coefficients.

Conjugate Gradients

We finally reach the Conjugate Gradient method in chapter 8.

In section 7.2 we said that Gram-Schmidt conjugation is used to construct a set of A-orthogonal vectors from linearly independent vectors . But has not determined yet.

In Conjugate Gradient we just let .

To reason this we write down the iteration step for residual(equation 43):

We can see the new residual is a linear combination of and .

By letting , we have

because is included in , the fact that next residual is orthogonal to (Equation 39) implies that is A-orthogonal to . … is already A-orthogonal to all of the previous search directions except !

Therefore the coefficients for each are simplified to a single scalar, and we can just store the last search direction for each iteration step.


  1. Shewchuk, Jonathan Richard. “An introduction to the conjugate gradient method without the agonizing pain.” (1994).