I am total "newbie", when it comes to CUDA. So if my question is trivial, pardon me.
Does nvcc understands meaning of
CUDA is a programming language in the C++ family. Therefore, the CUDA documentation generally does not duplicate standard C++ documentation, it merely points out differences and extensions. If you can't find a description of the use of the
inline specifier with functions in CUDA documentation, that is a good indication that it is processed in the standard C++ fashion.
Interpolating between the various parts of your questions, it seems you are mostly concerned how the use of
inline affects the actual inlining of functions in the generated code.
The ISO C++11 standard specifies
inline as a function attribute in section 7.1.2. Besides provisions about linkage and duplicate definitions, it states the following about the actual inlining of functions with the
The inline specifier indicates to the implementation that inline substitution of the function body at the point of call is to be preferred to the usual function call mechanism. An implementation is not required to perform this inline substitution at the point of call;
inline is merely a suggestion to the compiler, which it is free to ignore. Since the CUDA compiler inlines functions aggressively in device code by default (for performance reasons), the use of
inline seems quite redundant for device code, but programmers are free to use it.
The inlining heuristics used by the CUDA compiler may prevent inlining of a particular function that a programmer would like to have inlined under all circumstances. For this purpose, CUDA provides the non-standard
__forceinline__ function attribute. This specifier affects both device code and host code, as
nvcc translates it into the equivalent host-compiler specific attribute for host code, such as
__forceinline for MSVC. This can be verified by dumping and inspecting the intermediate C++ files that
nvcc sends to the host compiler.