5th May 2020, 2 min read

Performance Comparison in Computing Exponential Function

If your computation is dominated by exponential function evaluations, then it makes a significant difference whether you evaluate the exponential function exp() in single precision or in double precision. You can reduce your computing time by roughly 25% when moving from double precision (double) to single precision (float). Evaluation in quadruple precision is more than six times more expensive than evaluation in double precision.

Changing from double precision to single precision also halves the amount of storage needed. On x86_64 Linux float usually occupies 4 bytes, double occupies 8 bytes, and long double needs 16 bytes.

1. Result. Here are the runtime numbers of a test program.

Single precision (float): 2.44s
Double precision (double): 3.32s
Quadruple precision (long double): 22.88s

Precision	FX-8120 Bulldozer	Ryzen 5 PRO 3400G	Ryzen 7 5700G
`float`	2.44s	0.94s	0.68s
`double`	3.32s	1.55s	1.04s
`long double`	22.88s	10.70s	11.76s

It is quite surprising that the Ryzen 5 is slightly faster for quadruple precision than the Ryzen 7!

These numbers are dependant on CPU internal scheduling, see CPU Usage Time Is Dependant on Load.

2. Test program. The test program is essentially as below:

long i, rep=1024, n=65000;
int c, precision='d';
float sf = 0;
double sd = 0;
long double sq = 0;
...
switch(precision) {
case 'd':
        while (rep-- > 0)
                for (i=0; i<n; ++i)
                        sd += exp(i % 53) - exp((i+1) % 43) - exp((i+2) % 47) - exp((i+3) % 37);
        printf("sd = %f\n",sd);
        break;
case 'f':
        while (rep-- > 0)
                for (i=0; i<n; ++i)
                        sf += expf(i % 53) - expf((i+1) % 43) - expf((i+2) % 47) - expf((i+3) % 37);
        printf("sf = %f\n",sf);
        break;
case 'q':
        while (rep-- > 0)
                for (i=0; i<n; ++i)
                        sq += expl(i % 53) - expl((i+1) % 43) - expl((i+2) % 47) - expl((i+3) % 37);
        printf("sq = %Lf\n",sq);
        break;
}

Full source code is in GitHub, file in question is called exptst.c.

3. Environment. The following hardware was used. On all machines the code was compiled with -O3 -march=native.

AMD Bulldozer FX-8120, 3.1 GHz, Arch Linux 5.6.8, gcc version 9.3.0.
Ryzen 5 PRO 3400G is Arch Linux 6.1.5-arch2-1, gcc version 12.2.0.
Ryzen 7 5700G is 6.1.7-arch1-1, gcc version 12.2.1 20230111.

Added 26-Jan-2023: Added Ryzen 5 and 7 benchmarks.