einpoklum einpoklum - 1 year ago 53
C++ Question

When I target 32-wide warp CUDA architectures, should I use warpSize?

This is a follow-up question to this one.

Suppose I have a CUDA kernel

template<unsigned ThreadsPerWarp>
___global__ foo(bar_t* a, const baz_t* b);

and I'm implementing a specialization of it for the case of
being 32 (this circumvents the valid criticism of Talonmies' answer to my previous question.)

In the body of this function (or of other
functions called from it) - should I prefer using the constant value of
? Or is it better to use
? Or - will it be all the same to the compiler in terms of the PTX it generates?

Answer Source

No, don't use warpSize.

It seems that other than potential future-proof'ness (which in practice is questionable), there is no advantages in using it. Instead, you can very well use something like:

enum : unsigned { warp_size = 32 };
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download