Ashwini Narayana Murthy Ashwini Narayana Murthy - 4 months ago 40
C++ Question

cuda: matrix multiplication using shared and global

I'm trying to do a matrix multiplication between a 3x3 matrix and 360x360 matrix. The smaller matrix (3x3) is going to be manipulated with the first (3x3) block of the big matrix and so forth. Hence I want to have my smaller matrix constant and slide it over my bigger matrix.

Is it possible to store my smaller matrix as part of shared memory and have my bigger matrix divided into 3x3 in global?

I'm not finding a way to copy the smaller matrix to shared directly from host. Kindly do correct me if my visualization of cuda is wrong.



It is not possible to populate shared memory from the host.

However, the best way to handle constants for all threads, such as the 3x3 matrix from your example, is to put them in constant memory (the size of which is 64 kB). There are 2 ways of using constant memory:

  • The easiest way is to use kernel arguments. Define a struct containing your kernel arguments, including the 3x3 matrix, and pass it to your kernel
  • Use __constant__ type qualifier and use cudaMemcpyToSymbol to populate it from the host