1st February 2021

For comparing different machines I invert the Hilbert matrix

$$
H = \left(\begin{array}{ccccc} 1 & {1\over2} & {1\over3} & \cdots & {1\over n} \\
{1\over2} & {1\over3} & {1\over4} & \cdots & {1\over n+1} \\
{1\over3} & {1\over4} & {1\over5} & \cdots & {1\over n+2} \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
{1\over n} & {1\over n+1} & {1\over n+2} & \cdots & {1\over2n-1}
\end{array} \right)
= \left( {\displaystyle {1\over i+j-1} } \right)_{ij}
$$

This matrix is known to have very high condition numbers. Program `xlu5.c`

stores four double precision matrices of dimension \(n\). Matrix `H`

and `A`

store the Hilbert matrix, `X`

is the identity matrix, `Y`

is the inverse of `H`

. Finally the maximum norm of \(I-H\cdot H^{-1}\) is printed, which should be zero. These four double precision matrices occupy roughly 1.6 MB for \(n=230\).

**1.** Runtime on Ryzen, AMD Ryzen 5 PRO 3400G with Radeon Vega Graphics, max 3.7 GHz, as given by `lscpu`

.

```
$ time xlu5o3b 230 > /dev/null
real 0.79s
user 0.79s
sys 0
swapped 0
total space 0
```

Cache sizes within CPU are:

```
L1d cache: 128 KiB
L1i cache: 256 KiB
L2 cache: 2 MiB
L3 cache: 4 MiB
```

Required storage for above program is 4 matrices, each having 230x230 entries with double (8 bytes), giving 1692800 bytes, roughly 1.6 MB.

**2.** Runtime on AMD FX-8120, Bulldozer, max 3.1 GHz, as given by `lscpu`

.

```
$ time xlu5o3b 230 >/dev/null
real 1.75s
user 1.74s
sys 0
swapped 0
total space 0
```

Cache sizes within CPU are:

```
L1d cache: 64 KiB
L1i cache: 256 KiB
L2 cache: 8 MiB
L3 cache: 8 MiB
```

**3.** Runtime on Intel, Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz, max 2.6 GHz, as given by `lscpu`

.

```
$ time xlu5o3b 230 > /dev/null
real 1.68s
user 1.67s
sys 0
swapped 0
total space 0
```

Cache sizes within CPU are:

```
L1d cache: 64 KiB
L1i cache: 64 KiB
L2 cache: 512 KiB
L3 cache: 3 MiB
```

Apparently the Ryzen processor can outperform the Intel processor on cache, higher clock frequency. But even for smaller matrix sizes, e.g., 120, the Ryzen is two times faster.

Interestingly, the error in computations are different!

AMD and Intel machines run Arch Linux with kernel version 5.9.13, gcc was 10.2.0.

**4.** Runtime on Raspberry Pi 4, ARM Cortex-A72, max 1.5 GHz, as given by `lscpu`

.

```
$ time xlu5 230 > /dev/null
real 4.37s
user 4.36s
sys 0
swapped 0
total space 0
```

Linux 5.4.83 and GCC 10.2.0.

**5.** Runtime on Odroid XU4, Cortex-A7, max 2 GHz, as given by `lscpu`

.

```
$ time xlu5 230 > /dev/null
real 17.75s
user 17.60s
sys 0
swapped 0
total space 0
```

So the Raspberry Pi 4 is clearly way faster than the Odroid XU4.

**Categories: **C / C++, programming
**Tags: **, , ,
**Author: **Elmar Klausmeier