6th June 2023, 5 min read

Theorem of Stein and Rosenberg

This post recaps content from Homer Walker in his handout, and Richard Varga's classic book "Matrix Iterative Analysis".

Below tables show the three standard iterations for solving the linear system $Ax=b$, with $A=(a_{ij})\in\mathbb{R}^{n\times n}$, and $x,b\in\mathbb{R}^n$; index $i$ is running $i=1(1)n$. Split $A$ into $A=D-L-U$, where $D$ is a diagonal matrix, $L$ is strictly lowertriangular, $U$ is strictly uppertriangular. $x^\nu$ is the $\nu$-th iterate.

Jacobi iteration	Gauss-Seidel iteration	SOR
$x_i^{\nu+1} = \left(b_i - \sum_{j\ne i}{a_{ij}x_j^\nu}\right)/a_{ii}$	$x_i^{\nu+1}=\left(b_i - \sum_{j<i}{a_{ij}x_j^{\nu+1}-\sum_{j>i}{a_{ij}x_j^\nu}}\right)/a_{ii}$	$x_i^{\nu+1}=(1-\omega)x_i+{\omega\over a_{ii}}\left(b_i - \sum_{j<i}{a_{ij}x_j^{\nu+1} - \sum_{j>i}{a_{ij}x_j^\nu}}\right)$
$x^{\nu+1}=D^{-1}[(L+U)x^\nu+b]$	$x^{\nu+1}=(D-L)^{-1}\left(Ux^\nu+b\right)$	$x^{\nu+1}=(D-\omega L)^{-1}\left\lbrace[(1-\omega)D+\omega U]x^\nu+\omega b\right\rbrace$
$T_J=D^{-1}(L+U)$	$T_1=(D-L)^{-1}U$	$T_\omega=(D-\omega L)^{-1}[(1-\omega)D+\omega U]$

The difference between Jacobi and Gauss-Seidel is that the Gauss-Seidel iteration already uses the newly computed elements $x_j$, for $j<i$. The SOR (Successive Over-Relaxation) does likewise but employs a "damping" or overshooting factor $\omega$ in addition to the Gauss-Seidel method. The case $\omega=1$ is the Gauss-Seidel method. $\omega>1$ is called overrelaxation, $\omega<1$ is called underrelaxation.

The iterations from above are called point iterations. If, instead, the $a_{ii}$ are taken as matrices, i.e., solve a linear equation in each iteration step, then the corresponding method is called a block-method.

Above iteration methods can be written as $x^{\nu+1}=Tx^{\nu}+c$, with $T\in\mathbb{R}^{n\times n}$. Let $\rho(T)$ be the spectral radius of $T$.

Theorem. The iterates $x^\nu$ converge if and only if $\rho(T)<1$.

Proof: See "Matrix Iterative Analysis", theorem 1.4 using Jordan normal form, or theorem 5.1 in Auzinger/Melenk (2017) using the equivalence of norms in finite dimensional normed vectorspaces. A copy is here Iterative Solution of Large Linear Systems.

Theorem (Stein-Rosenberg (1948)): If $a_{ij}\le0$ for $i\ne j$ and $a_{ii}>0$ for $i=1,\ldots,n$. Then, one and only one of the following mutually exclusive relations is valid:

$\rho(T_J) = \rho(T_1) = 0$.
$0<\rho(T_1)<\rho(T_J)<1$.
$1=\rho(T_J)=\rho(T_1)$.
$1<\rho(T_J)<\rho(T_1)$.

Thus, the Jacobi method and the Gauss-Seidel method are either both convergent, or both divergent. If they are convergent then the Gauss-Seidel is faster than the Jacobi method.

Proof: See §3.3 in "Matrix Iterative Analysis".

From R. Varga:

the Stein-Rosenberg theorem gives us our first comparison theorem for two different iterative methods. Interpreted in a more practical way, not only is the point Gauss-Seidel iterative method computationally more convenient to use (because of storage requirements) than the point Jacobi iterative matrix, but it is also asymptotically faster when the Jacobi matrix $T_J$ is non-negative

Philip Stein born 1890 in Lithuania, graduated in South Africa, died 1974 in London. R.L. Rosenberg was a coworker at the University Technical College of Natal, South Africa.

Theorem. If $A$ is strictly or irreducibly diagonally dominant, then, both the point Jacobi and the point Gauss-Seidel converge for any starting value.

Proof: See "Matrix Iterative Analysis" theorem 3.4 cleverly using the Stein-Rosenberg theorem and the Perron-Frobenius theorem. Alternatively, see theorem 5.5 in Auzinger/Melenk (2017). and just using computation.

The Stein-Rosenberg theorem compares Jacobi and Gauss-Seidel iteration under mild conditions. If those conditions are sharpened then quantitative comparisons between Jacobi and Gauss-Seidel iteration can be made. In particular, a relation between the eigenvalues of $T_J$ and $T_\omega$ can be given. Unrelated to Stein-Rosenberg but of general interest for the overrelaxation method is the theorem of Kahan (1958), which says that overrelaxation only converges for $0<\omega<2$, if at all. It is a necessary condition not a sufficient one.

Theorem (Kahan (1958)): $\rho(T_\omega)\ge\left|\omega-1\right|$. Equality holds only if all eigenvalues of $T_\omega$ are of modulus $|\omega-1|$.

Proof: See "Matrix Iterative Analysis" theorem 3.5.

Theorem (Ostrowski (1954)): Let $A=D-L-L^*$ be a Hermitian matrix and $D-\omega L$ is nonsingular for $0\le\omega\le2$. Then, $\rho(T_\omega)<1$ if and only if $A$ is positive definite and $0<\omega<2$.

Proof: See theorem 3.6 in "Matrix Iterative Analysis". The case $\omega=1$ was first proved by Reich (1949).

Corollary: If $A$ is positive definite then any splitting will converge provided $0<\omega<2$.

Proof: See Corollary 2 in §3.4 in "Matrix Iterative Analysis".

Theorem. Let $A=I-L-U$, where $L+U\ge0$ irreducible and $\rho(L+U)<1$, where $L$ and $U$ are strictly lower and upper triangular matrices. Then $\rho(T_\omega)<1$ and underrelaxation with $\omega\le1$ is not beneficial, i.e., $0<\omega_1<\omega_2\le1$ then

$$ 0\lt \rho(T_{\omega_2})\lt \rho(T_{\omega_1})\lt 1. $$

Proof: Theorem 3.16 in "Matrix Iterative Analysis" using the Stein-Rosenberg theorem.

The relation between the eigenvalues $\lambda$ of the overrelaxation method and the Jacobi iteration $\mu$ for consistently ordered $p$-cyclic matrices is

$$ (\lambda+\omega-1)^p = \lambda^{p-1}\omega^p\mu^p. \tag V $$

Theorem. Let $A$ be consistently ordered $p$-cyclic matrix with nonsingular submatrices. If $\omega\ne0$, $\lambda$ is a nonzero eigenvalue of $T_\omega$ and if $\mu$ satisfies (V), then $\mu$ is an eigenvalue of the block Jacobi matrix. Converseley, if $\mu$ is an eigenvalue of $T_J$ and $\lambda$ satisfies (V), then $\lambda$ is an eigenvalue of $T_\omega$.

Proof: See "Matrix Iterative Analysis" theorem 4.3, or Varga's paper "p-Cyclic Matrices: A Generalization Of The Young-Frankel Successive Overrelaxation Scheme", or see local copy.

From R. Varga:

the main result of this section is a functional relationship (V) between the eigenvalues $\mu$ of the block Jacobi matrix and the eigenvalues $\lambda$ of the block successive overrelaxation matrix. That such a functional relationship actually exists is itself interesting, but the importance of this result lies in the fact that it is the basis for the precise determination of the value of $\omega$ which minimizes $\rho(T_\omega)$.

For the special case $p=2$ we have:

Corollary: The Gauss-Seidel iteration is twice as fast as the Jacobi iteration: $\rho(T_1)=\rho(T_J)^2<1$. The relaxation factor $\omega$ that minimizes $\rho(T_\omega)$ is

$$ \omega = {2\over 1+\sqrt{1-\rho(T_J)^2} }. $$

For this $\omega$, the spectral radius $\rho(T_\omega)=\omega-1$.

So called M-matrices are relevant in the study of iteration methods. See M-matrix.