32bitsx86 32bitsx86 - 1 month ago 9
C++ Question

C++ incomplete output when splitting string

im trying to extract certain data from a string which is like this:

2833ae7~2be;2833ae8~2272


what i want to do is first split it by the colon, then each record, find if it contains
2be
then split the records which contains it by
~2be
, and give me just the value before
~2be


I did some essays, and this code kind of does it, but the problem is that it don't gives to me the full output:

#include <string>
#include <sstream>
#include <vector>
#include <iostream>
using namespace std;

vector<string> split(string str,string sep){
char* cstr=const_cast<char*>(str.c_str());
char* current;
vector<string> arr;
current=strtok(cstr,sep.c_str());
while(current!=NULL){
arr.push_back(current);
current=strtok(NULL,sep.c_str());
}
return arr;
}

int main(){
string items = "2833ae7~2be;2833ae8~2272";
vector<string> food = split(items, ";");
for(unsigned int i = 0; i < food.size(); i++)
{
if(food[i].find("2be") != string::npos)
{
vector<string> arr = split(food[i],"~2be");


cout << "Output ("<< i << ") = " << arr[0] << endl;

}// end if

}// end for

return 0;
}// end main


The output i get is:

Output <0> = 833a


When it should be:

Output <0> = 2833ae7


What im doing wrong?

Answer

As others have mentioned in the comments, modifying the character array you get from std::string::c_str() is undefined behavior. strtok() modifies its parameter to mark the tokens, so you can't use it here.

Splitting an std::string by a single-char delimiter is not hard. One way you could do it is like this:

std::vector<std::string> split(const std::string &input, std::string::value_type delim)
{
    std::stringstream ss(input);
    std::vector<std::string> tokens;
    std::string token;
    while (std::getline(ss, token, delim))
    {
        tokens.push_back(token);
    }

    return tokens;
}

You can then split your string like this:

std::vector<std::string> tokens = split("2833ae7~2be;2833ae8~2272", ';');

This will give you a vector containing two elements: "2833ae7~2be" and "2833ae8~2272".

Now for the second part you can't use the same split() method because it only works for single-char delimiters. But you could do something like this instead:

for (std::size_t index = 0; index < tokens.size(); ++index)
{
    if (tokens[index].find("2be") != std::string::npos)
    {
        std::string::size_type pos = tokens[index].find("~2be");

        std::cout << "Output (" << index << ") = " << tokens[index].substr(0, pos) << "\n";
    }
}

Using the same idea you could also try to rewrite the split() method to make it work for string delimiters, not just single-char ones like my version.