cukier9a7b5 cukier9a7b5 - 1 month ago 9
C++ Question

C/C++ "inline" keyword in CUDA device-side code

I am total "newbie", when it comes to CUDA. So if my question is trivial, pardon me.

Does nvcc understands meaning of

inline
C keyword?

I know about
__forceinline__
, and similar nvcc "macros", therefor I am not asking how to write
inline
cuda device-side code.

I know also, that my code is "split" between nvcc and c/c++ compiler (I am using Visual Studio IDE).
Does that mean
inline
keyword is ignored by nvcc when it "stands next to"
__device__
or
__global__
kernels?



Edit:

P.S. I had searched cuda programing guide. I could not find anything useful under
inline
entry, similar "tags" does not help either.

Answer

CUDA is a programming language in the C++ family. Therefore, the CUDA documentation generally does not duplicate standard C++ documentation, it merely points out differences and extensions. If you can't find a description of the use of the inline specifier with functions in CUDA documentation, that is a good indication that it is processed in the standard C++ fashion.

Interpolating between the various parts of your questions, it seems you are mostly concerned how the use of inline affects the actual inlining of functions in the generated code.

The ISO C++11 standard specifies inline as a function attribute in section 7.1.2. Besides provisions about linkage and duplicate definitions, it states the following about the actual inlining of functions with the inline specifier:

The inline specifier indicates to the implementation that inline substitution of the function body at the point of call is to be preferred to the usual function call mechanism. An implementation is not required to perform this inline substitution at the point of call;

So inline is merely a suggestion to the compiler, which it is free to ignore. Since the CUDA compiler inlines functions aggressively in device code by default (for performance reasons), the use of inline seems quite redundant for device code, but programmers are free to use it.

The inlining heuristics used by the CUDA compiler may prevent inlining of a particular function that a programmer would like to have inlined under all circumstances. For this purpose, CUDA provides the non-standard __forceinline__ function attribute. This specifier affects both device code and host code, as nvcc translates it into the equivalent host-compiler specific attribute for host code, such as __forceinline for MSVC. This can be verified by dumping and inspecting the intermediate C++ files that nvcc sends to the host compiler.