illusionist illusionist - 2 months ago 7
C++ Question

Using STL Map & Set for reocurring words

I am currently working on a program that will take in files of text and organize every words into it's own value paired with how many times it occurs. I have been playing with this idea for quite some time and cannot get past the basic implementation. I am very new to using MAP and SET, I understand SET will only hold one occurrence of each word, and MAP can use the word itself as the key and it's data type can be how many times it repeats. However, to accomplish this, I am very lost. My code is incomplete and I am stuck, what I attempted to do was figure out a way to store each word within SET and instantly map the word to an original value of one into a map, however, if a word repeats then the set would catch it and increase the MAP KEY - VALUE pair by one.

Example:

#include <iostream>
#include <vector>
#include <map>
#include <set>
using namespace std;

int main()
{
map<string, int> testMap;
vector<string> text;
set<string> words;
int datVal = 1;

text.push_back("Hi");
text.push_back("Hi");
text.push_back("Bye");
text.push_back("test");
text.push_back("ice");
text.push_back("pie");
text.push_back("pie");
text.push_back("cheese");
text.push_back("wampum");

for(int x = 0; x < text.size(); x++)
{
words.insert(text[x]);
if(
testMap.insert(make_pair(text[x], datVal)).second;

}


If anyone could help me I would greatly appreciate it! I do not even understand how to check the set and increase a value linked to map, I have a lot to learn. Thank you for your time!

Answer

Say we have a std::vector<std::string> containing your words data. If we want to create a histogram of the words in this vector then using a single std::map<std::string, unsigned int> is simple and efficient via probing the map instance using the std::map::operator[] which accesses a reference to the value corresponding to they key passed to this method or inserts the data if it is not already contained within the map:

int main() {
    std::vector<std::string> words; // populated somewhere
    std::map<std::string, unsigned int> word_histogram; // store word with associated count
    for (const auto& word : words) ++word_histogram[word]; // find word and increment count
}

This results in a histogram of the words contained in the std::vector words. Note, if you do not care about the ordering of the items in the word_histogram then use a std::unordered_map<std::string, unsigned int> instead as the average case complexity of std::unordered_map::operator[] is constant whilst it is logarithmic in the container size for std::map::operator[].

There is no need to use an extra std::set for your requirement that you mentioned at the start of your question, this can simply be achieved using the above code.

Comments