user3195492 user3195492 - 4 years ago 120
C++ Question

How to implement a compression table in CUDA?

I'm trying to optimize my C++ code, I don't know if there is a way to store a table in GPU with CUDA-C. The current code in C++ of the table is:

double m_alpha = 0.5;
unsigned char* compressionTable = new unsigned char[65536];
double denom = exp(m_alpha * log(65535.0)) / 255.0;
for (unsigned int i = 0; i < 65536; ++i)
compressionTable[i] = exp(m_alpha * log(i)) / denom;

After I access to this table in a loop as:

bmode[i][j] = compressionTable[round(abs(sH[i][j]))];

sH is the Hilbert transform (complex array) obtained of an array of short int type data (memory of compression table 216). The loop for the access is not a trivial problem, but my main question is the fast implementation of the compressionTable. I will appreciate any help.

Answer Source

If you really need to use a lookup table, on a GPU with SM 2.0 or higher, you should just put it in device memory and let the caches handle the memory traffic. For lookup tables, the other memory spaces don't work any better than L1/L2.

But this looks like a case where an optimization that works well on CPUs, is not needed at all on GPUs. CUDA hardware can compute single precision logarithms and exponentials with a latency of just 4 clock cycles. Rewrite your algorithm to do the computation in-line instead of using a lookup table. The resulting code will have less data-dependent performance, and the memory subsystem will be freed up to service memory traffic that's actually needed to run the kernel.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download