Thomas Thomas - 23 days ago 10
C++ Question

How to use right shift to avoid operator division

I have a cpp project which works but has a bad performance.


int currentPos = getPos();
int length = getLength();
if (1.0 * currentPos / length < 0.5)
{
// do something
}
else
{
// do something
}


The problem is:
1.0 * currentPos / length
takes too much time.

Google told me that division always took much time and we could avoid it with the help of right shift.


For example,
a=a/4
can be replaced by
b=b>>2
.

I can understand this example but I don't know how to use right shift to optimize my code as above.


If it is impossible, is there other ways to avoid division?







EDIT

1) The condition in
if
is not always
0.5
, it could be any rational between (0, 1).


2) The code above is executed
10 * 56 * 181 * 56 * 181
times per second.

Answer

Let's get honest for a moment. On a even remotely modern CPU, division of a floating point number will be pipelined away and take roughly as much time as most other FPU or even Integer operations.

Instead, you should use a profiler on your code to see exactly where your bottlenecks are actually occurring. As your code is written, unless it is sitting in a 1,000,000,000,000 time type for/loop, it isn't going to matter at all.

If your code IS sitting in such a loop, please let us know because there are ways of strength reducing, pre-calculating, etc. that can help in those instances outside of a simple divide hack that has been somewhat useless for a decade.


Update for the fact that this is indeed sitting in a 1 billion time loop.

Now then, Let's start with your two functions GetPos() and GetLength() If you can organize your data in such a way as to make those values constant for parts of the loop, you can completely eliminate a number of memory accesses. You can also then do the multiplication by 2 outside the loop as well.

Next, if you can organize your data such that it is sorted by length or by position before the loop is run, then you can do a binary search through your data and reduce your compares down to around a maximum of 20 or so instead of billions (the power of O(log n) vs O(n) ) and then your code goes extremely fast.

If not possible, but the data is constant per loop and the "do something" doesn't change the conditions, then this becomes embarrassingly parallel and might be able to be threaded across many CPUs - this is not as easy as it sounds though so beware.

This is just a start, but I wanted to let you see that more information allows better solutions to be offered to you.