I have some code that tries to determine the execution time of a code block.
clock_t start_t, end_t, total_t;
start_t = clock(); //clock start
printf("Starting of the program, start_t = %ld\n", start_t);
printf("Going to scan a big loop, start_t = %ld\n", start_t);
for(i=0; i< 10000000; i++) //trying to determine execution time of this block
end_t = clock(); //clock stopped
printf("End of the big loop, end_t = %ld\n", end_t);
total_t = (long int)(end_t - start_t);
printf("Total time taken by CPU: %lu\n", total_t );
Starting of the program, start_t = 8965
Going to scan a big loop, start_t = 8965
End of the big loop, end_t = 27259
Total time taken by CPU: 18294
The unit of time used by the
clock function is arbitrary. On most platforms, it is unrelated to the processor speed. It's more commonly related to the frequency of an external timer interrupt — which may be configured in software — or to a historical value that's been kept for compatibility through years of processor evolution. You need to use the macro
CLOCKS_PER_SEC to convert to real time.
printf("Total time taken by CPU: %fs\n", (double)total_t / CLOCKS_PER_SEC);
The C standard library was designed to be implementable on a wide range of hardware, including processors that don't have an internal timer and rely on an external peripheral to tell the time. Many platforms have more precise ways to measure wall clock time than
time and more precise ways to measure CPU consumption than
clock. For example, on POSIX systems (e.g. Linux and other Unix-like systems), you can use
getrusage, which has microsecond precision.
struct timeval start, end; struct rusage usage; getrusage(RUSAGE_SELF, &usage); start = usage.ru_utime; … getrusage(RUSAGE_SELF, &usage); end = usage.ru_utime; printf("Total time taken by CPU: %fs\n", (double)(end.tv_sec - start.tv_sec) + (end.tv_usec - start.tv_usec) / 1e-6);
clock_gettime(CLOCK_PROCESS_CPUTIME_ID) may give better precision. It has nanosecond precision.
Note the difference between precision and accuracy: precision is the unit that the values are reported. Accuracy is how close the reported values are to the real values. Unless you are working on a real-time system, there are no hard guarantees as to how long a piece of code takes, including the invocation of the measurement functions themselves.
Some processors have cycle clocks that count processor cycles rather than wall clock time, but this gets very system-specific.
Whenever making benchmarks, beware that what you are measuring is the execution of this particular executable on this particular CPU in these particular circumstances, and the results may or may not generalize to other situations. For example, the empty loop in your question will be optimized away by most compilers unless you turn optimizations off. Measuring the speed of unoptimized code is usually pointless. Even if you add real work in the loop, beware of toy benchmarks: they often don't have the same performance characteristics as real-world code. On modern high-end CPUs such as found in PC and smartphones, benchmarks of CPU-intensive code is often very sensitive to cache effects and the results can depend on what else is running on the system, on the exact CPU model (due to different cache sizes and layouts), on the address at which the code happens to be loaded, etc.