, 2 min read
Performance Comparison in Computing Exponential Function
If your computation is dominated by exponential function evaluations, then it makes a significant difference whether you evaluate the exponential function exp()
in single precision or in double precision. You can reduce your computing time by roughly 25% when moving from double precision (double
) to single precision (float
). Evaluation in quadruple precision is more than six times more expensive than evaluation in double precision.
Changing from double precision to single precision also halves the amount of storage needed. On x86_64 Linux float
usually occupies 4 bytes, double
occupies 8 bytes, and long double
needs 16 bytes.
1. Result. Here are the runtime numbers of a test program.
- Single precision (
float
): 2.44s - Double precision (
double
): 3.32s - Quadruple precision (
long double
): 22.88s
Precision | FX-8120 Bulldozer | Ryzen 5 PRO 3400G | Ryzen 7 5700G |
---|---|---|---|
float |
2.44s | 0.94s | 0.68s |
double |
3.32s | 1.55s | 1.04s |
long double |
22.88s | 10.70s | 11.76s |
It is quite surprising that the Ryzen 5 is slightly faster for quadruple precision than the Ryzen 7!
These numbers are dependant on CPU internal scheduling, see CPU Usage Time Is Dependant on Load.
2. Test program. The test program is essentially as below:
long i, rep=1024, n=65000;
int c, precision='d';
float sf = 0;
double sd = 0;
long double sq = 0;
...
switch(precision) {
case 'd':
while (rep-- > 0)
for (i=0; i<n; ++i)
sd += exp(i % 53) - exp((i+1) % 43) - exp((i+2) % 47) - exp((i+3) % 37);
printf("sd = %f\n",sd);
break;
case 'f':
while (rep-- > 0)
for (i=0; i<n; ++i)
sf += expf(i % 53) - expf((i+1) % 43) - expf((i+2) % 47) - expf((i+3) % 37);
printf("sf = %f\n",sf);
break;
case 'q':
while (rep-- > 0)
for (i=0; i<n; ++i)
sq += expl(i % 53) - expl((i+1) % 43) - expl((i+2) % 47) - expl((i+3) % 37);
printf("sq = %Lf\n",sq);
break;
}
Full source code is in GitHub, file in question is called exptst.c
.
3. Environment. The following hardware was used. On all machines the code was compiled with -O3 -march=native
.
- AMD Bulldozer FX-8120, 3.1 GHz, Arch Linux 5.6.8, gcc version 9.3.0.
- Ryzen 5 PRO 3400G is Arch Linux 6.1.5-arch2-1, gcc version 12.2.0.
- Ryzen 7 5700G is 6.1.7-arch1-1, gcc version 12.2.1 20230111.
Added 26-Jan-2023: Added Ryzen 5 and 7 benchmarks.