stefan stefan - 1 month ago 70
Python Question

Tensorflow matmul calculations on GPU are slower than on CPU

I'm experimenting with GPU computations for the first time and was hoping for a big speed-up, of course. However with a basic example in tensorflow, it actually was worse:

On cpu:0, each of the ten runs takes on average 2 seconds, gpu:0 takes 2.7 seconds and gpu:1 is 50% worse than cpu:0 with 3 seconds.

Here's the code:

import tensorflow as tf
import numpy as np
import time
import random

for _ in range(10):
with tf.Session() as sess:
start = time.time()
with tf.device('/gpu:0'): # swap for 'cpu:0' or whatever
a = tf.constant([random.random() for _ in xrange(1000 *1000)], shape=[1000, 1000], name='a')
b = tf.constant([random.random() for _ in xrange(1000 *1000)], shape=[1000, 1000], name='b')
c = tf.matmul(a, b)
d = tf.matmul(a, c)
e = tf.matmul(a, d)
f = tf.matmul(a, e)
for _ in range(1000):
end = time.time()
print(end - start)

What am I observing here? Is run time maybe mainly dominated by copying data between RAM and GPU?


The way you use to generate data is executed on CPU (random.random() is a regular python function and not TF-one). Also, executing it 10^6 times will be slower than requesting 10^6 random numbers in one run. Change the code to:

a = tf.random_uniform([1000, 1000], name='a')
b = tf.random_uniform([1000, 1000], name='b')

so that the data will be generated on a GPU in parallel and no time will be wasted to transfer it from RAM to GPU.