Nawaz Nawaz - 1 month ago 7
C++ Question

What is wrong with `std::set`?

In the other topic I was trying to solve this problem. The problem was to remove duplicate characters from a

std::string
.

std::string s= "saaangeetha";


Since the order was not important, so I sorted
s
first, and then used
std::unique
and finally resized it to get the desired result:

aeghnst


That is correct!




Now I want to do the same, but at the same time I want the order of characters intact. Means, I want this output:

sangeth


So I wrote this:

template<typename T>
struct is_repeated
{
std::set<T> unique;
bool operator()(T c) { return !unique.insert(c).second; }
};
int main() {
std::string s= "saaangeetha";
s.erase(std::remove_if(s.begin(), s.end(), is_repeated<char>()), s.end());
std::cout << s ;
}


Which gives this output:

saangeth


That is,
a
is repeated, though other repetitions gone. What is wrong with the code?

Anyway I change my code a bit: (see the comment)

template<typename T>
struct is_repeated
{
std::set<T> & unique; //made reference!
is_repeated(std::set<T> &s) : unique(s) {} //added line!
bool operator()(T c) { return !unique.insert(c).second; }
};
int main() {
std::string s= "saaangeetha";
std::set<char> set; //added line!
s.erase(std::remove_if(s.begin(),s.end(),is_repeated<char>(set)),s.end());
std::cout << s ;
}


Output:

sangeth


Problem gone!

So what is wrong with the first solution?

Also, if I don't make the member variable
unique
reference type, then the problem doesn't go.

What is wrong with
std::set
or
is_repeated
functor? Where exactly is the problem?

I also note that if the
is_repeated
functor is copied somewhere, then every member of it is also copied. I don't see the problem here!

Answer

In GCC (libstdc++), remove_if is implemented essentially as

    template<typename It, typename Pred>
    It remove_if(It first, It last, Pred predicate) {
      first = std::find_if(first, last, predicate);
    //                                  ^^^^^^^^^
      if (first == last)
         return first;
      else {
         It result = first;
         ++ result;
         for (; first != last; ++ first) {
           if (!predicate(*first)) {
    //          ^^^^^^^^^
              *result = std::move(*first);
              ++ result;
           }
         }
      }
    }

Note that your predicate is passed by-value to find_if, so the struct, and therefore the set, modified inside find_if will not be propagated back to caller.

Since the first duplicate appears at:

  saaangeetha
//  ^

The initial "sa" will be kept after the find_if call. Meanwhile, the predicate's set is empty (the insertions within find_if are local). Therefore the loop afterwards will keep the 3rd a.

   sa | angeth
// ^^   ^^^^^^
// ||   kept by the loop in remove_if
// ||
// kept by find_if
Comments