I saw this question on one of the .NET newsgroups today:
I am looping through a datareader, putting the value of a string column as both the key and value of a hashtable item using the add method. The data was ordered in the sql statement. When I traverse the hashtable the data is in random order. Can anyone tell me why this happens?
This isn't a surprising question, since hash tables in .NET serve much the same function that maps (and multimaps) did in standard C++. But their implementation is radically different.
The map in C++ is a balanced tree. That means that its average access for any data element is O(1/2 log n), and the worst case is O(log n). Binary trees always keep the data is sorted order, and the balanced nature of the trees is what gives you the O(log n) access rate. A side effect of this is that when you traverse the data, it's always in sorted order.
The hash table in .NET is ... a hash table. That means there's a lot of buckets, and some algorithm for turning the key into some bucket number. Given a key, you can find the appropriate bucket in O(1), and if you have enough buckets, your retrieval is also O(1). The reason you might end up worse than O(1) (worse case is O(n)) is if you don't have enough buckets, such that multiple things fall into one bucket. If there's more than one item in the bucket, then it's a linear search to find the right item.
Clearly, when you traverse the hash table, the items are going to be in what appears to be a random order (which is based on the values that result from the hashing algorithm). If you need sorted data, a hash table isn't for you.
Unfortunately, Microsoft didn't see fit to include a balanced tree container like the C++ map into the .NET framework. Good luck!
[The .NET Guy]
- Hmm do you suppose an array might have done a better job here?
9:22:35 PM
|