Tudor Tudor - 3 months ago 16
C Question

Timing CUDA operations

I need to time a CUDA kernel execution. The Best Practices Guide says that we can use either events or standard timing functions like

clock()
in Windows. My problem is that using these two functions gives me a totally different result.
In fact, the result given by events seems to be huge compared to the actual speed in practice.

What I actually need all this for is to be able to predict the running time of a computation by first running a reduced version of it on a smaller data set. Unfortunately, the results of this benchmark are totally unrealistic, being either too optimistic (
clock()
) or waaaay too pessimistic (events).

Answer

You could do something along the lines of :

#include<sys/time.h>

struct timeval t1, t2;

gettimeofday(&t1, 0);

kernel_call<<<dimGrid, dimBlock, 0>>>();

HANDLE_ERROR(cudaThreadSynchronize();)

gettimeofday(&t2, 0);

double time = (1000000.0*(t2.tv_sec-t1.tv_sec) + t2.tv_usec-t1.tv_usec)/1000000.0;

printf("Time to generate:  %3.1f ms \n", time);

or:

float time;
cudaEvent_t start, stop;

HANDLE_ERROR( cudaEventCreate(&start) );
HANDLE_ERROR( cudaEventCreate(&stop) );
HANDLE_ERROR( cudaEventRecord(start, 0) );

kernel_call<<<dimGrid, dimBlock, 0>>>();

HANDLE_ERROR( cudaEventRecord(stop, 0) );
HANDLE_ERROR( cudaEventSynchronize(stop) );
HANDLE_ERROR( cudaEventElapsedTime(&time, start, stop) );

printf("Time to generate:  %3.1f ms \n", time);