einpoklum einpoklum - 9 months ago 24
C++ Question

When I target 32-wide warp CUDA architectures, should I use warpSize?

This is a follow-up question to this one.

Suppose I have a CUDA kernel

template<unsigned ThreadsPerWarp>
___global__ foo(bar_t* a, const baz_t* b);


and I'm implementing a specialization of it for the case of
ThreadsPerWarp
being 32 (this circumvents the valid criticism of Talonmies' answer to my previous question.)

In the body of this function (or of other
__device__
functions called from it) - should I prefer using the constant value of
ThreadsPerWarp
? Or is it better to use
warpSize
? Or - will it be all the same to the compiler in terms of the PTX it generates?

Answer Source

No, don't use warpSize.

It seems that other than potential future-proof'ness (which in practice is questionable), there is no advantages in using it. Instead, you can very well use something like:

enum : unsigned { warp_size = 32 };