user1794469 user1794469 - 4 months ago 10
C++ Question

How do I write a branchless std::vector scan?

I want to write a simple scan over an array. I have a

std::vector<int> data;
and I want to find all the array indexes that are less than 9 and add them to a result vector. I can write this using a branch:

for (int i = 0; i < data.size(); ++i)
if (data[i] < 9)
r.push_back(i);


This gives the correct answer but I would like to compare it to a branchless version.

Using raw arrays, (assuming,
data
is an int array,
length
is the length of data and
r
is a results array with plenty of room) I can write something like:

int current_write_point = 0;
for (int i = 0; i < length; ++i){
r[current_write_point] = i;
current_write_point += (data[i] < 9);
}


How would I get similar behavior using a vector for data?

Answer

Let's see with the actual compiler output:

auto scan_branch(const std::vector<int>& v)
{
  std::vector<int> res;
  int insert_index = 0;
  for(int i = 0; i < v.size(); ++i)
  {
    if (v[i] < 9)
    {
       res.push_back(i);
    } 
  }
  return res;
}

This code clearly has a branch at 26th line of disassembly. If it's greater than or equal to 9, it just continues with the next element, however in the event of lesser than 9, some horrible amount of code executes for the push_back and we continue. Nothing unexpected.

auto scan_nobranch(const std::vector<int>& v)
{
  std::vector<int> res;
  res.resize(v.size());

  int insert_index = 0;
  for(int i = 0; i < v.size(); ++i)
  {
    res[insert_index] = i;
    insert_index += v[i] < 9;
  }

  res.resize(insert_index);
  return res;
}

This one, however, only has a conditional move, which you can see in the 190th line of the disassembly. It looks like we have a winner. Since conditional move cannot result in pipeline stalls, there are no branches in this one (except the for condition check).