, 1 min read

CUDA Performance

I ran below commands under different load for my Gigabyte GTX 560 graphic card.

export LD_LIBRARY_PATH=$CUDA_PATH/lib64
time /usr/local/cuda/samples/sdk/0_Simple/matrixMul/matrixMul
time /usr/local/cuda/samples/sdk/0_Simple/matrixMulCUBLAS/matrixMulCUBLAS

I was interested in the value GFlop/s.

Test case console X11
Nothing 127.03 n/a
GPUGrid 82.67 127.05
GPUGrid+Chrome n/a 83.52
CUBLAS: Nothing 451.85 442.50
CUBLAS: GPUGrid 186.41 n/a
CUBLAS: GPUGrid+Chrome n/a 212.73

So one can clearly see that matrix multiplication using CUBLAS is 3.5-times faster than matrix multiplication without CUBLAS.

Furthermore, the more load you put on the graphic card the slower the matrix multiplication.

When GPUGrid runs nvidia-smi returns the following information:

Wed Jun 19 23:26:13 2013       
+------------------------------------------------------+                       
| NVIDIA-SMI 4.310.44   Driver Version: 310.44         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name                     | Bus-Id        Disp.  | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap| Memory-Usage         | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 560          | 0000:01:00.0     N/A |                  N/A |
| 54%   52C  N/A     N/A /  N/A |  30%  303MB / 1023MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    0            Not Supported                                               |
+-----------------------------------------------------------------------------+

Temperature is as below, given by sensors:

fam15h_power-pci-00c4
Adapter: PCI adapter
power1:       39.34 W  (crit = 124.95 W)

k10temp-pci-00c3
Adapter: PCI adapter
temp1:        +18.5°C  (high = +70.0°C)
                       (crit = +90.0°C, hyst = +87.0°C)

it8721-isa-0290
Adapter: ISA adapter
in0:          +2.80 V  (min =  +2.41 V, max =  +2.32 V)  ALARM
in1:          +2.78 V  (min =  +0.19 V, max =  +2.09 V)  ALARM
in2:          +0.83 V  (min =  +0.01 V, max =  +1.03 V)
+3.3V:        +3.31 V  (min =  +2.88 V, max =  +4.63 V)
in4:          +0.34 V  (min =  +1.10 V, max =  +1.34 V)  ALARM
in5:          +2.52 V  (min =  +1.24 V, max =  +0.60 V)  ALARM
in6:          +2.35 V  (min =  +0.14 V, max =  +1.16 V)  ALARM
3VSB:         +4.82 V  (min =  +0.00 V, max =  +4.78 V)  ALARM
Vbat:         +3.34 V  
fan1:         437 RPM  (min =   39 RPM)
fan2:           0 RPM  (min =   22 RPM)  ALARM
fan3:        1421 RPM  (min =   17 RPM)
temp1:        +38.0°C  (low  = -47.0°C, high = +51.0°C)  sensor = thermistor
temp2:        +38.0°C  (low  = -101.0°C, high = -61.0°C)  ALARM  sensor = thermistor
temp3:       -128.0°C  (low  = -22.0°C, high = -11.0°C)  sensor = disabled
intrusion0:  OK

Outside temperature is 27°.