I used metal to do some interpolation task. I wrote the kernel function as followed:
kernel void kf_interpolation( device short *dst, device uchar *src, uint id [[ thread_position_in_grid ]] )
dst[id] = src[id-1] + src[id] + src[id+1];
The most efficient way to handle edge cases like this is usually to grow your source array at each end and offset the indices. So for N calculations, allocate your
src array with N+2 elements, fill elements 1 through N (inclusive) with the source data, and set element 0 and N+1 to whatever you want the edge condition to be.