Viking Viking - 1 year ago 150
C++ Question

Split string into key-value pairs (map) C++

I have a string like this:

"CA: ABCD\nCB: ABFG\nCC: AFBV\nCD: 4567"

": "
splits key from value while
separates the pairs. I want to add the key-value pairs to a map in C++.

Is there any efficient way of doing this considering optimization in mind?

Answer Source

This format is called "Tag-Value".

The most performance critical place where such encoding is used in the industry is probably financial FIX Protocol (= for key-value separator, and '\001' as entries delimiter). So if you are on x86 hardware then your best bet would be to google 'SSE4 FIX protocol parser github' and reuse the open sourced findings of HFT shops.

If you still want to delegate the vectorization part to the compiler and can spare few nanoseconds for readability then the most elegant solution is to store the result in a std::string (data) + boost::flat_map<boost::string_ref, boost::string_ref> (view). Parsing is a matter of taste, while-loop or strtok would be easiest for the compiler to parse. Boost-spirit based parser would be easiest for a human (familiar with boost-spirit) to read.

#include <boost/container/flat_map.hpp> 
#include <boost/range/iterator_range.hpp>

#include <boost/range/iterator_range_io.hpp> 
#include <iostream>

// g++ -std=c++1z ~/
int main()
    using range_t = boost::iterator_range<std::string::const_iterator>;
    using map_t = boost::container::flat_map<range_t, range_t>;

    char const sep = ':';
    char const dlm = '\n';

    // this part can be reused for parsing multiple records
    map_t result;

    std::string const input {"hello:world\n bye: world"};

    // this part is per-line/per-record
    for (auto _beg = begin(input), _end = end(input), it = _beg; it != _end;)
        auto sep_it = std::find(it, _end, sep);
        if (sep_it != _end)
            auto dlm_it = std::find(sep_it + 1, _end, dlm);
            result.emplace(range_t {it, sep_it}, range_t {sep_it + 1, dlm_it});
            it = dlm_it + (dlm_it != _end);
        else throw std::runtime_error("cannot parse");

    for (auto& x: result)
        std::cout << x.first << " => " << x.second << '\n';

    return 0;