Nick X Tsui Nick X Tsui - 1 year ago 127
C++ Question

CUDA resample from non-even sampling

I'd like to resample (interpolate) a sequence from a non-uniformed samplings. I don't think tex works because it basically does interpolation assuming your sample is uniform? Doing search would be too slow?

Should I do thrust? Any pointer is appreciated it. Any examples would be greatly helpful.


Say the line with circle mark is my sample. I know the value at each circle point. Obviously, the sample is evenly distributed on the horizontal axis.

Now, I would like to know the value at each x mark on the line underneath the sampling line. The x mark are uniformly distributed along the line.

---o--------o----o------o------o------o------ (sampling)

--X-----X-----X-----X-----X-----X-----X--- (known to interpolate)

So I am wondering how to get the values at each x mark position using CUDA? Obviously, the most basic algorithm using C/C++ would be for each x mark position, search for the two nearest circle position, then do linear interpolation. But in this case, you need to first sort two sequence, then loop over x mark, and for each x mark, you do the search. This just sounds expansive.

I am wondering how we should do it in CUDA? Thanks.

Answer Source

There are probably a number of approaches. For example you could use a basic cuda binary search in a thread-parallel fashion. I'll demonstrate a thrust implementation.

For the purpose of this discussion, I'll assume that both data sets (known sample points, and desired sample points) are arbitrarily placed (i.e. I'm not assuming either sample set is evenly spaced). However I will stipulate or require that the desired sample points are fully contained within the known sample points. I believe this is sensible as usual linear interpolation requires a known sample point on either side of the desired sample point.

Therefore we'll use a data set like this:

   o:  1,3,7
f(o):  3,7,15
   x:  1.5, 2.5, 4.5, 5.0, 6.0, 6.5
f(x):    ?,   ?,   ?,   ?,   ?,   ?

We see that f is the known functional values, that correspond to f(o) = 2o+1, a straight line in this case (although this method does not require the known sample points to fit any particular function). x represent the indices at which we desire to interpolate the functional value, based on the known values (f(o)). Our desire then is to compute f(x) via interpolation from the nearest f(o) points. Note that our data set is such that all values of x lie between the minimum (1) and maximum (7) o values. This is the stipulation/requirement I stated earlier.

Our thrust method will be to use a vectorized binary search, using thrust::upper_bound, to locate the "insertion point" where each desired x value fits within the o sequence. This gives us our right neighbor, and left neighbor (right-1) for interpolation. Once we know the insertion point, it would be a trivial extension of this algorithm to choose e.g. the two left and two right neighbors (or more) if we wanted to use something other than linear interpolation.

The insertion point then gives us our left and right neighbors, and we use this information to pass to an appropriately crafted thrust::transform operation the x vector (desired interpolation points) along with a thrust::tuple (via thrust::zip_iterator) that provides:

  • the right neighbor index
  • the right neighbor functional value
  • the left neighbor index
  • the left neighbor functional value

With these quantities, plus the desired index (x), interpolation is straightforward.

Here's a fully worked example:

$ cat
#include <thrust/device_vector.h>
#include <thrust/binary_search.h>
#include <thrust/transform.h>
#include <thrust/copy.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/iterator/permutation_iterator.h>
#include <thrust/iterator/transform_iterator.h>
#include <iostream>

struct interp_func
  template <typename T>
  __host__ __device__
  float operator()(float t1, T t2){  // m = (y1-y0)/(x1-x0)  y = m(x-x0) + y0
    return ((thrust::get<1>(t2) - thrust::get<3>(t2))/(thrust::get<0>(t2) - thrust::get<2>(t2)))*(t1 - thrust::get<2>(t2)) + thrust::get<3>(t2);

using namespace thrust::placeholders;

int main(){

  // sample data
  float o[] = {1.0f, 3.0f, 7.0f}; // unevenly spaced sample points for function f
  float f[] = {3.0f, 7.0f, 15.0f}; // f(o) = 2o+1
  float x[] = {1.5f, 2.5f, 4.5f, 5.0f, 6.0f, 6.5f}; // additional desired sample points for f
  int so = sizeof(o)/sizeof(o[0]);
  int sx = sizeof(x)/sizeof(x[0]);

  // setup data on device
  thrust::device_vector<float> d_o(o, o+so);
  thrust::device_vector<float> d_f(f, f+so);
  thrust::device_vector<float> d_x(x, x+sx);
  thrust::device_vector<float> d_i(sx); // insertion indices
  thrust::device_vector<float> d_r(sx); // results

  // perform search for insertion indices
  thrust::upper_bound(d_o.begin(), d_o.end(), d_x.begin(), d_x.end(), d_i.begin());
  // then perform linear interpolation based on left and right neighbors
  thrust::transform(d_x.begin(), d_x.end(), thrust::make_zip_iterator(thrust::make_tuple(thrust::make_permutation_iterator(d_o.begin(), d_i.begin()), thrust::make_permutation_iterator(d_f.begin(), d_i.begin()), thrust::make_permutation_iterator(d_o.begin(), thrust::make_transform_iterator(d_i.begin(), _1-1)), thrust::make_permutation_iterator(d_f.begin(), thrust::make_transform_iterator(d_i.begin(), _1-1)))), d_r.begin(), interp_func());

  // output results
  std::cout << "Interpolation points:" << std::endl;
  thrust::copy(d_x.begin(), d_x.end(), std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl << "Interpolated values:" << std::endl;
  thrust::copy(d_r.begin(), d_r.end(), std::ostream_iterator<float>(std::cout, ","));
  std::cout << std::endl << "Expected values:" << std::endl;
  for (int i = 0; i < sx; i++) std::cout << 2*x[i]+1 <<  ",";
  std::cout << std::endl;
  return 0;
$ nvcc -o t1224
$ ./t1224
Interpolation points:
Interpolated values:
Expected values:

Again, once we know the insertion point, choosing 2 right and 2 left neighbors for more involved interpolation would be a trivial extension. We would just modify the zip iterator being passed to the transform (interpolation) functor, and modify the functor itself to implement the desired arithmetic.

Also note that this method assumes the input o sequence is already sorted. If it is not, then it would be necessary to add a sort-by-key of o (keys) with f (values). The x sequence need not be sorted.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download