user2036256 user2036256 - 21 days ago 6
C++ Question

Fastest way to find counts of integral values (C++)

I need an occurrence count for each occurring value from a list of unsigned integrals. I.e. if passed the sequence [ 3, 6, 9, 3, 9 ] I would want [ { 3, 2}, {6, 1}, {9, 2} ].

The values are random 32 bit unsigned ints (Range of 1 to 1,000,000,000). The result can be stored in any data structure (as long as they can be iterated over linearly) and whilst value ordered would be ideal this is a secondary concern after speed.

Currently I have -

T UniqueCount(std::vector<unsigned> &A)
{
std::unordered_map<unsigned,unsigned> value_counts;

for(unsigned val : A) {
value_counts[val]++;
}

A.clear();

...
}


Profiling has shown std::unordered_map to be faster than std::map.

Is there a better approach for this? / Faster way? It is also worth noting because of the use case (count > 4) can be recorded as 4.

This is currently a bottleneck so whilst standard containers are preferred something custom can be considered if the performance boost is worth the additional maintenance cost.

Answer

On my system (Win10 x64, MSVC daily package x64 release build), testing with 100,000 random unsorted values in the input vector, the following using std::sort + std::adjacent_find performs in ~10ms vs. ~27ms using std::unordered_map and the code in @krzaq's answer (and now in the OP):

std::vector<std::pair<unsigned, unsigned>> unique_count(std::vector<unsigned>& a) {
    auto it = begin(a);
    auto const last = end(a);

    std::vector<std::pair<unsigned, unsigned>> value_counts;
    std::sort(it, last);
    while (it != last) {
        auto const prev = it;
        it = std::adjacent_find(it, last, std::not_equal_to<unsigned>{});
        if (it != last) {
            ++it;
        }
        value_counts.emplace_back(*prev, static_cast<unsigned>(it - prev));
    }
    return value_counts;
}

Online Demo

The lesson: often, cache coherency beats algorithmic complexity.