Serge Rogatch Serge Rogatch - 2 months ago 22
C++ Question

How to define CUDA device constant like a C++ const/constexpr?

In a .cu file I've tried the following in the global scope (i.e. not in a function):

__device__ static const double cdInf = HUGE_VAL / 4;


And got nvcc error:

error : dynamic initialization is not supported for __device__, __constant__ and __shared__ variables.


How to define a C++ const/constexpr on the device, if that's possible?

NOTE1:
#define
is out of question not only for aesthetic reasons, but also because in practice the expression is more complex and involves an internal data type, not just double. So calling the constructor each time in each CUDA thread would be too expensive.

NOTE2: I doubt the performance of
__constant__
because it's not a compile-time constant, but rather like a variable written with
cudaMemcpyToSymbol
.

Answer

Use a constexpr __device__ function:

#include <stdio.h>
__device__ constexpr double cdInf() { return HUGE_VAL / 4; }
__global__ void print_cdinf() { printf("in kernel, cdInf() is %lf\n", cdInf()); }
int main() { print_cdinf<<<1, 1>>>(); return 0; }

The PTX should be something like:

.visible .entry print_cdinf()(

)
{
        .reg .b64       %SP;
        .reg .b64       %SPL;
        .reg .b32       %r<2>;
        .reg .b64       %rd<7>;


        mov.u64         %rd6, __local_depot0;
        cvta.local.u64  %SP, %rd6;
        add.u64         %rd1, %SP, 0;
        cvta.to.local.u64       %rd2, %rd1;
        mov.u64         %rd3, 9218868437227405312;
        st.local.u64    [%rd2], %rd3;
        mov.u64         %rd4, $str;
        cvta.global.u64         %rd5, %rd4;
        // Callseq Start 0
        {
        .reg .b32 temp_param_reg;
        // <end>}
        .param .b64 param0;
        st.param.b64    [param0+0], %rd5;
        .param .b64 param1;
        st.param.b64    [param1+0], %rd1;
        .param .b32 retval0;
        call.uni (retval0), 
        vprintf, 
        (
        param0, 
        param1
        );
        ld.param.b32    %r1, [retval0+0];

        //{
        }// Callseq End 0
        ret;
}

With no code for the constexpr function. You could also use a constexpr __host__ function, but that's experimental in CUDA 7: use the nvcc command-line options seems to be --expt-relaxed-constexpr and see here for more details (thanks @harrism).

Comments