# Acceleration methods

Jump to: navigation, search

The theory and techniques for increasing the speed of convergence in the iterative solution of linear systems . See Acceleration of convergence and the references given there.

The fundamental problem of linear algebra is the solution of finite-dimensional linear algebraic systems (a1)

in which is a given -matrix. For simplicity, let be an -matrix with real entries . The numerical solution of very large systems (e.g., ) is not uncommon today (1996). Commonly, the saving grace of such large systems is their sparseness: most are zero (cf. Sparse matrix). The classical direct method of solution, Gauss elimination, has at least three disadvantages: the pivoting operation fills in zeros, the operation count is , and the storage requirements when building a classical triangular factorization of increases superlinearly with .

For these reasons, indirect iterative methods for the solution of (a1) have recently come to the fore. Iterative methods may be roughly broken into two classes: splitting methods and gradient methods. Splitting methods decompose according to (a2)

where is non-singular (cf. Non-singular matrix). The solution to (a1) is then sought as the fixed point of the equation (a3)

Accordingly, one picks an initial guess and forms the iterates (a4)

Provided that the spectral radius is less than one, the iteration (a4) converges to the solution of (a1) for any initial guess .

Gradient methods, on the other hand, form a variational problem from (a1). For example, minimizing the quadratic form (a5)

for a symmetric positive-definite matrix, is equivalent to the problem (a1).

Just as for the classical direct methods of solution of (a1), the indirect iterative methods in their original formulations are too slow for ever larger matrices . Therefore the original Jacobi iterative method (cf. Jacobi method) (a6)

where has been written in terms of its lower-triangular, diagonal, and upper-triangular parts, has been accelerated by the Gauss–Seidel method (a7)

which in turn has been accelerated by the successive overrelaxation method (a8)

(Cf. also Hyperrelaxation method.) In (a8), is a free parameter which may be sought optimally. Generally, the error of a splitting iterative method is reduced by each iteration by a factor bounded by .

In like vein, the original gradient iterative method, steepest descent, in which the -th residual error vector (a9)

being the negative of the gradient of hence defines the direction of steepest descent, converges more slowly as one nears the bottom of the quadratic surface (cf. also Steepest descent, method of). The convergence also becomes slow when the condition number (a10)

of the matrix is large. Here and denote the largest and smallest eigenvalues of for symmetric positive-definite, and may be replaced in (a10) by their absolute values or, more generally, by and for general matrices . Therefore, the steepest descent method has been accelerated by the conjugate-gradient method. Both steepest descent and conjugate gradient may be written as iterations (a11)

where denotes an optimal magnitude of change in the direction . In steepest descent, is the residual of (a9) and is then chosen to minimize . In conjugate gradient one takes , where is chosen so that is conjugate to , i.e., (a12)

Then is chosen to minimize as before. Because the conjugate-gradient method may be seen to choose a new dimension each iteration, in theory (i.e., in the absence of roundoff errors) it must converge in steps. In practice it usually converges even faster than that.

The above acceleration methods, i.e., those based upon splittings and those based upon gradients, respectively, can be referred to as the classical acceleration methods: relaxation methods and direction methods, respectively. Recently (1996) these methods have been sharpened in a number of ways. Two of the most important sharpenings are Chebyshev acceleration and preconditioning acceleration. Both methods often involve polynomial iterations (a13)

where is some polynomial chosen from some knowledge or estimation of the eigenvalues of . These methods therefore make use of results and techniques from both approximation theory and spectral theory.

Chebyshev methods depend upon the property of the Chebyshev polynomials of minimizing the maximum error for in some interval . For example, for the conjugate-gradient method it may be shown that (see [a5]) (a14)

where is the Chebyshev polynomial of the first kind of degree , where , and where . A disadvantage of Chebyshev methods in practice is their dependence upon knowledge or good estimates of the eigenvalues of .

Preconditioning has recently become the most important acceleration method. There are several preconditioning strategies, but the basic idea is to left or right multiply in (a1) by some preconditioning matrix so that the transformed system is easier to solve. For example, if approximates in some correct sense of representing 's main invertibility features, i.e., so that approximates in some good way, then (a1) is transformed by right preconditioning to the equivalent system (a15)

Left preconditioning yields (a16)

Two-sided preconditioning by multiplicatively splitting into becomes (a17)

Because the convergence rate of iterative methods generally depends on the condition number of (a10), a rule of thumb has been to seek preconditioners such that the transformed system has lower, hopefully much lower, condition number .

There are roughly three generally successful theories for finding preconditioners: incomplete factorizations, Frobenius-norm methods and relaxation methods. The incomplete factorizations approximate by its approximate factorization into lower and upper triangular factors, but insisting on sparse approximations by dropping small elements. Then a conjugate-gradient method or Krylov-space variant of it (e.g., schemes such as ORTHOMIN or GMRES) may converge well. The Frobenius-norm methods, roughly, seek a weight matrix such that the trace is minimized. Often, an incomplete Cholesky factorization enters into applications of this method. Relaxation methods have been described above. Often, these are combined (see [a1]) with block Schur complement decompositions of . Relaxation methods such as (a6), (a7), (a8) are used today chiefly as preconditioners. Even the lowly Jacobi method (a6) can be a practical preconditioner on highly parallel computers when appropriately implemented.

Acceleration methods are also used for the iterative solution of the eigenvalue-eigenvector problem (a18)

(a18) may be regarded as a special case of (a1) in which and . However, one must approximate the eigenvalue as one is approximating the eigenvector . Both Chebyshev and preconditioning acceleration methods enter into these (e.g., Arnoldi and Krylov) methods for eigenvalue-eigenvector computation. An important preconditioning method somewhat special to these problems is the so-called shift and invert method, using inverted shifts to more widely disperse the eigenvalues (see [a4]).

There are other useful variations on preconditioning for accelerating convergence. Conjugate gradient can be applied as a preconditioner for any of the relaxation schemes. Other methods, such as multigrid schemes, have been successfully used as preconditioners (see [a3]). Preconditioning strategies can be cast in terms of a new theory of anti-eigenvalues rather than the eigenvalue condition number (see [a2] and Anti-eigenvalue). Finding optimal preconditioners for an important specific application often usefully involves specific physical details of that application. For this reason, in practice preconditioning is sometimes labeled "black magic" .