S.G. S.G. - 4 months ago 29
C++ Question

2D arrays with contiguous rows on the heap memory for cudaMemCpy2D()

CUDA documentation recommends the use of

for 2D arrays (and similarly
for 3D arrays) instead of
for better performance as the former allocates device memory more appropriately. On the other hand, all
functions, just like
, require contiguous allocation of memory.

This is all fine if I create my (host) array as, for example,
float myArray[h][w];
. However, it most likely will not work if I use something like:

float** myArray2 = new float*[h];
for( int i = 0 ; i < h ; i++ ){
myArray2[i] = new float[w];

This is not a big problem except when one is trying to implement CUDA into an existing project, which is the problem I am facing. Right now, I create a temporary 1D array, copy the contents of my 2D array into it and use
and repeat the whole process to get the results after the kernel launch, but this does not seem an elegant/efficient way.

Is there a better way to handle this situation? Specifically, is there a way to create a genuine 2D array on the heap with contiguously allocated rows so that I can use

P.S: I couldn't find the answer to this question the following previous similar posts:


Allocate the big array, then use pointer arithmetic to find the actual beginnings of the rows.

float* bigArray = new float[h * w]
float** myArray2 = new float*[h]
for( int i = 0 ; i < h ; i++ ){
   myArray2[i] = &bigArray[i * w];

Your myArray2 array of pointers gives you C/C++ style two dimensional arrays behavior, bigArray gives you the contiguous block of memory needed by CUDA.