I've got a NVIDIA GT650M, with the following properties :
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
Maximum number of threads per multiprocessor: 2048
This seems to be confusion over terminology.
"SM" (SM = Streaming Multiprocessor) and "multiprocessor" refer to the same thing, a hardware unit that is the principal execution unit on the GPU. These terms refer to specific HW resources. Different GPUs may have differing numbers of SMs. The number of SMs can be found for a particular GPU using the CUDA
deviceQuery sample code.
The elements of a CUDA program that are in the "launch" are threadblocks. A grid is the collection of all threadblocks associated with a kernel launch. Individual threadblocks execute on individual SMs. You can launch a large number of threadblocks in a kernel, more or less independent of what GPU you are running on. The threadblocks will then be processed at whatever rate is afforded by the particular GPU and it's SMs.
There is no API function which gives direct control over the scheduling of threadblocks onto SMs. Some level of indirect control for scheduling of threadblocks from different kernels that are running concurrently can be obtained through the use of CUDA stream priorities.