user3667089 user3667089 - 11 months ago 63
C++ Question

How to check what CUDA compute compatibility is the library compiled with?

I am on Ubuntu 16.04. Suppose I am given a random file, is there anyway I can check what CUDA compute compatibility is the library compiled with?

I have tried


It doesn't show much.

I want to know this because if I compile my code with

-gencode arch=compute_30,code=sm_30;

It compiles and runs fine on a small cuda program I wrote, but when I run deviceQuery on my GPU it actually shows CUDA compute compatibility 3.5, so I am curious to know whether this code will be executed in the 3.0 or 3.5 architecture.

If I compile and run it with

-gencode arch=compute_20,code=sm_20;


-gencode arch=compute_50,code=sm_50;

it fails as expected.

If I compile and run it with

-gencode arch=compute_35,code=sm_35;

it runs fine as expected.

Answer Source

For general background on the use of flags to tell nvcc which architectures to compile for, I would suggest this question and this question, as well as the nvcc documentation.

After discussion in the comments, there appear to be two questions. (Although these questions have libraries in view, most of the comments apply equally to executable objects as well.)

How can I discover which architectures (PTX, SASS) a particular library has been compiled for?

This can be discovered using the CUDA binary utilities e.g. cuobjdump. In particular, the -ptx switch will list all PTX objects contained, and the -sass switch will list all SASS objects contained. A library that is compiled for the "real architecture" of sm_30 for example will contain sm_30 SASS code, and this will be evident in the cuobjdump output. A library that is compiled for the "virtual architecture" compute_50 for example will contain compute_50 PTX code, and this will be evident in the cuobjdump output. Note that a library (or any CUDA fatbin object) may contain code for multiple architectures, both PTX and SASS, or multiple SASS versions.

If a library contains multiple architectures, how do I know what will actually execute on the device.

At application launch, the CUDA runtime inspects the binary object for the application, and will use, roughly speaking, the following heuristic to determine what will run on the GPU:

  1. If an exact SASS match exists in the binary object, then the runtime will use that for the GPU. This means for example that if your object (executable, or library) contains an entry for sm_35 SASS code, and you are running on a sm_35 (i.e. a compute capability 3.5) GPU, then the CUDA runtime will select that.

  2. If item 1 is not satisfied, the CUDA runtime will next choose a "compatible" SASS entry, if one exists. This is not well defined/specified AFAIK, but in general a sm30 SASS object should be usable on any sm_3x device, and likewise for sm20 SASS on a sm_2x device, or sm50 SASS on any sm_5x device. For other questions (e.g. will sm32 SASS be usable directly on a sm35 device) I don't have a complete table that specifies compatibility. Specific cases can be tested using the methodology exposed in the question: build an object containing only a particular SASS type, and see if it will run on the intended GPU.

  3. If items 1 and 2 are not satisfied, the CUDA runtime will search for a compatible PTX entry. For a given GPU type of compute capability x.y, a compatible PTX entry is defined to be PTX for architecture z.w, where z.w is less than or equal to x.y. cc2.0 PTX is compatible with a cc3.5 device, for example. cc5.0 PTX is not compatible with a cc3.5 device. Once the highest numbered PTX entry is found that meets this criterion, it will be JIT-compiled by the GPU driver to produce a necessary SASS object, on-the-fly, at runtime.

If none of the items 1, 2, or 3, are satisfied, the GPU code will return a runtime error at any and all calls into the CUDA runtime library (NO BINARY FOR GPU, or similar).

I've glossed over a number of concepts related to "real" and "virtual" architectures. This is a complex topic and I recommend reading the nvcc documentation linked above for background. For example, it is not correct that any given compute capability has the same numerical architectures avaialble for both real (SASS) and virtual (PTX). For cc 2.0, for example, both real (sm_20) and virtual (compute_20) architectures exists. For cc2.1, for example, only the real architecture (sm_21) exists, the virtual architecture (compute_21) does not exist and the compute_20 architecture should be specified instead. This will be readily evident if you attempt to compile for compute_21, for example.

One might also ask "given all this", what architectures should I compile for?

This question has been answered on many previous SO questions, and is somewhat a matter of opinion. As a useful reference point, I suggest following a strategy used by the projects for the CUDA sample codes.