Recently we embarked on a project that required the storage of a relatively big dictionary with 10M+ key-value pairs. Unsurprisingly, Python took over two hours to build such dictionary, taking into accounts all the time for extending, accessing and writing to the dictionary, AND it eventually crashed. So I turned to C++ for help.
In C++, map is one of the ways you can store a string-key and an integer value. Since we are concerned about the data storage and access, I compared map and unordered_map.
unordered_map stores a hash table of the keys and the mapped value; while a map is ordered. The important consideration here includes:- Memory:
mapdoes not have the hash table and is therefore smaller than anunordered_map. - Access: accessing an
unordered_maptakes O(1) while accessing amaptakes log(n).
map, because it is more memory efficient considering the small RAM size that I have access to. However, it still takes up about 8GB of RAM per object during its runtime (and I have 1800 objects to run through, each building a different dictionary). Saving these seems to open another can of worm.