Thebeginner Thebeginner - 1 month ago 18
C++ Question

Passing variables between kernels in OpenCL 1.2 / Communication between kernels

I am relatively new to OpenCL. I am using the OpenCL 1.2 C++ wrapper. Say I have the following problem: I have three integer values a, b, and c all declared on the host

int a = 1;
int b = 2;
int c = 3;
int help;
int d;


with d being my result and help being a help variable.

I want to calculate d = (a + b)*c. To do this, I now have two kernels called 'add' and 'multiply'.

Currently, I am doing this the following way (please don't be confused by my pointer oriented way of programming): First, I create my buffers

bufferA = new cl::Buffer(*context, CL_MEM_READ_ONLY, buffer_length);
cl::Buffer bufferB = new cl::Buffer(*context, CL_MEM_READ_ONLY, buffer_length);
bufferC = new cl::Buffer(*context, CL_MEM_READ_ONLY, buffer_length);
bufferHelp = new cl::Buffer(*context, CL_MEM_READ_WRITE, buffer_length);
bufferD = new cl::Buffer(*context, CL_MEM_WRITE_ONLY, buffer_length);


Then, I set my kernel arguments for the addition kernel

add->setArg(0, *bufferA);
add->setArg(1, *bufferB);
add->setArg(2, *bufferHelp);


and for the multiplicatoin kernel

multiply->setArg(0, *bufferC);
multiply->setArg(1, *bufferHelp);
multiply->setArg(2, *bufferD);


Then I enqueue my data for the addition

queueAdd->enqueueWriteBuffer(*bufferA, CL_TRUE, 0, datasize, &a);
queueAdd->enqueueWriteBuffer(*bufferB, CL_TRUE, 0, datasize, &b);
queueAdd->enqueueNDRangeKernel(*add, cl::NullRange, global[0], local[0]);
queueAdd->enqueueReadBuffer(*bufferHelp, CL_TRUE, 0, datasize, &help);


and for the multiplication

queueMult->enqueueWriteBuffer(*bufferC, CL_TRUE, 0, datasize, &c);
queueMult->enqueueWriteBuffer(*bufferHelp, CL_TRUE, 0, datasize, &help);
queueMult->enqueueNDRangeKernel(*multiply, cl::NullRange, global[0], local[0]);
queueMult->enqueueReadBuffer(*bufferD, CL_TRUE, 0, datasize, &d);


This works in a fine way. However, I do not want to copy the value of help back to the host and then back on the device again. To achieve this, I thought of 3 possiblities:


  1. a global variable for help on the device side. Doing this, both kernels could access the value of help at any time.

  2. kernel add calling kernel multiply at runtime. We then would insert the value for c into the add kernel and pass both help and c over to the multiply kernel as soon as the addition has finished.

  3. Simply pass the value of help over to the multiplication kernel. What I search here is something like a pipe object as available for OpenCL 2.0. Does anybody know something similar for OpenCL 1.2.?



I would be very thankful if somebody could propose the smoothest way to solve my problem!

Thanks in advance!

Answer

There is no need to read and write the bufferHelp. Just leave it in device memory. The number 1) of your proposed solution is how cl::Buffers already are, globals in device memory.

This is equivalent to your code and will produce same results:

queueAdd->enqueueWriteBuffer(*bufferA, CL_FALSE, 0, datasize, &a);
queueAdd->enqueueWriteBuffer(*bufferB, CL_FALSE, 0, datasize, &b);
queueAdd->enqueueNDRangeKernel(*add, cl::NullRange, global[0], local[0]);
//queueAdd->enqueueReadBuffer(*bufferHelp, CL_FALSE, 0, datasize, &help);

queueMult->enqueueWriteBuffer(*bufferC, CL_FALSE, 0, datasize, &c);
//queueMult->enqueueWriteBuffer(*bufferHelp, CL_FALSE, 0, datasize, &help);
queueMult->enqueueNDRangeKernel(*multiply, cl::NullRange, global[0], local[0]);
queueMult->enqueueReadBuffer(*bufferD, CL_TRUE, 0, datasize, &d);

NOTE: I also changed the blocking write calls, this will provide much better speed, because copy of buffer C and execution of kernel "add" can be parallelized.

Comments