, 2 min read

Performance Comparison in Computing Exponential Function

If your computation is dominated by exponential function evaluations, then it makes a significant difference whether you evaluate the exponential function exp() in single precision or in double precision. You can reduce your computing time by roughly 25% when moving from double precision (double) to single precision (float). Evaluation in quadruple precision is more than six times more expensive than evaluation in double precision.

Changing from double precision to single precision also halves the amount of storage needed. On x86_64 Linux float usually occupies 4 bytes, double occupies 8 bytes, and long double needs 16 bytes.

1. Result. Here are the runtime numbers of a test program.

  1. Single precision (float): 2.44s
  2. Double precision (double): 3.32s
  3. Quadruple precision (long double): 22.88s
Precision FX-8120 Bulldozer Ryzen 5 PRO 3400G Ryzen 7 5700G
float 2.44s 0.94s 0.68s
double 3.32s 1.55s 1.04s
long double 22.88s 10.70s 11.76s

It is quite surprising that the Ryzen 5 is slightly faster for quadruple precision than the Ryzen 7!

These numbers are dependant on CPU internal scheduling, see CPU Usage Time Is Dependant on Load.

2. Test program. The test program is essentially as below:

long i, rep=1024, n=65000;
int c, precision='d';
float sf = 0;
double sd = 0;
long double sq = 0;
...
switch(precision) {
case 'd':
        while (rep-- > 0)
                for (i=0; i<n; ++i)
                        sd += exp(i % 53) - exp((i+1) % 43) - exp((i+2) % 47) - exp((i+3) % 37);
        printf("sd = %f\n",sd);
        break;
case 'f':
        while (rep-- > 0)
                for (i=0; i<n; ++i)
                        sf += expf(i % 53) - expf((i+1) % 43) - expf((i+2) % 47) - expf((i+3) % 37);
        printf("sf = %f\n",sf);
        break;
case 'q':
        while (rep-- > 0)
                for (i=0; i<n; ++i)
                        sq += expl(i % 53) - expl((i+1) % 43) - expl((i+2) % 47) - expl((i+3) % 37);
        printf("sq = %Lf\n",sq);
        break;
}

Full source code is in GitHub, file in question is called exptst.c.

3. Environment. The following hardware was used. On all machines the code was compiled with -O3 -march=native.

  1. AMD Bulldozer FX-8120, 3.1 GHz, Arch Linux 5.6.8, gcc version 9.3.0.
  2. Ryzen 5 PRO 3400G is Arch Linux 6.1.5-arch2-1, gcc version 12.2.0.
  3. Ryzen 7 5700G is 6.1.7-arch1-1, gcc version 12.2.1 20230111.

Added 26-Jan-2023: Added Ryzen 5 and 7 benchmarks.