I wrote a program that reads numbers out of a file (about 500,000 of them), and inserts them to a data structure. the numbers are distinct.
I'm inserting the numbers to an
Since you are specifically not using a value, and merely searching for existence, go with std::unordered_set. It does what you wanted when you made a dummy value to go with every key in the map.
First, I want to re-iterate what everyone said: inserting 500,000 items to use it a few hundred times is going to take up a sizable chunk of your time, and you can't really avoid that, unless you can turn it around -- build a set of the things you are searching for, then search that 500,000 times.
All that said, I was able to get some improvement on the insertion of 500,000 items in a test app, by taking into account the nature of hash tables:
Reviewing http://en.cppreference.com/w/cpp/container/unordered_map, I found these:
[Insert] Complexity: Average case: O(1), worst case O(size())
By default, unordered_map containers have a max_load_factor of 1.0.
When you reserve space for 500000 items, you get 500000 buckets. If you put 500000 pieces of data in 500000 buckets, you are going to get a lot of collisions. I reserved extra space, and it was faster.
If you really need speed, and are willing to get some errors, look into bloom filters.