S.G. S.G. - 13 days ago 5
C++ Question

2D arrays with contiguous rows on the heap memory for cudaMemCpy2D()

CUDA documentation recommends the use of

cudaMemCpy2D()
for 2D arrays (and similarly
cudaMemCpy3D()
for 3D arrays) instead of
cudaMemCpy()
for better performance as the former allocates device memory more appropriately. On the other hand, all
cudaMemCpy
functions, just like
memcpy()
, require contiguous allocation of memory.

This is all fine if I create my (host) array as, for example,
float myArray[h][w];
. However, it most likely will not work if I use something like:

float** myArray2 = new float*[h];
for( int i = 0 ; i < h ; i++ ){
myArray2[i] = new float[w];
}


This is not a big problem except when one is trying to implement CUDA into an existing project, which is the problem I am facing. Right now, I create a temporary 1D array, copy the contents of my 2D array into it and use
cudaMemCpy()
and repeat the whole process to get the results after the kernel launch, but this does not seem an elegant/efficient way.

Is there a better way to handle this situation? Specifically, is there a way to create a genuine 2D array on the heap with contiguously allocated rows so that I can use
cudaMemCpy2D()
?

P.S: I couldn't find the answer to this question the following previous similar posts:


Answer

Allocate the big array, then use pointer arithmetic to find the actual beginnings of the rows.

float* bigArray = new float[h * w]
float** myArray2 = new float*[h]
for( int i = 0 ; i < h ; i++ ){
   myArray2[i] = &bigArray[i * w];
}

Your myArray2 array of pointers gives you C/C++ style two dimensional arrays behavior, bigArray gives you the contiguous block of memory needed by CUDA.

Comments