bfriz_96 bfriz_96 - 1 month ago 21
C++ Question

Bigrams as map keys in C++

Alright, so I'm wondering how I would go about creating a map key from a two word string (a bigram). For example, the line "This is only a test." would contain the bigrams "this is", "is only", "only a", and "a test."

I'm thinking of using make_pair, but something tells me this could cause these bigrams to be created out of order. Would this be the case? If not, would this pairing approach be on the right track?

Answer

I'd recommend a std::pair because it's inherently binary. If there's a chance you need trigrams or 4-grams, you might want to stick to a std::tuple, which is a generalization of std::pair (informally speaking).

So long as your bigram algorithm is correct, the order will be preserved. There are some obvious improvements which could be made, but I wrote a quick implementation here:

#include <iostream>
#include <vector>
#include <string>
#include <iterator>
#include <map>
#include <sstream>
#include <utility>

std::vector<std::string> tokenize(const std::string& s) {
    std::istringstream iss(s);

    std::vector<std::string> v{std::istream_iterator<std::string>(iss),
                               std::istream_iterator<std::string>()};
    return v;
}

std::vector<std::pair<std::string, std::string>> make_bigrams(const std::vector<std::string>& tokens) {
    std::vector<std::pair<std::string, std::string>> bigrams;
    for(auto it = std::cbegin(tokens); it != std::prev(std::cend(tokens)); ++it) {
        bigrams.push_back(std::make_pair(*it, *std::next(it)));
    }
    return bigrams;
}

std::vector<std::pair<std::string, std::string>> sentence_bigram(const std::string& s) {
    const auto toks = tokenize(s);
    return make_bigrams(toks);
}

int main() {
    const auto& bigrams = sentence_bigram("hello, world. my name is erip");
    std::map<std::pair<std::string, std::string>, int> m;
    for(const auto& e: bigrams) {
       std::cout << "Adding (" << e.first << "), (" << e.second << ") to the map.\n";
       m[e] = 0;
    }
} 

and you can see it in action here.