I'm trying to optimize my C++ code, I don't know if there is a way to store a table in GPU with CUDA-C. The current code in C++ of the table is:
double m_alpha = 0.5;
unsigned char* compressionTable = new unsigned char;
double denom = exp(m_alpha * log(65535.0)) / 255.0;
for (unsigned int i = 0; i < 65536; ++i)
compressionTable[i] = exp(m_alpha * log(i)) / denom;
bmode[i][j] = compressionTable[round(abs(sH[i][j]))];
If you really need to use a lookup table, on a GPU with SM 2.0 or higher, you should just put it in device memory and let the caches handle the memory traffic. For lookup tables, the other memory spaces don't work any better than L1/L2.
But this looks like a case where an optimization that works well on CPUs, is not needed at all on GPUs. CUDA hardware can compute single precision logarithms and exponentials with a latency of just 4 clock cycles. Rewrite your algorithm to do the computation in-line instead of using a lookup table. The resulting code will have less data-dependent performance, and the memory subsystem will be freed up to service memory traffic that's actually needed to run the kernel.