While glancing through the decompiled SortedList code I noticed there is a serious bug waiting to happen. When a key/value pair is added, a reference to the key object, not a copy or clone of it, is stored in the key array. This means that if a reference to the key object is also held outside the SortedList (or obtained via the GetKey method), the object can be modified, thereby corrupting the index.
...
HashTable suffers from the same problem. In this case the documentation states: Key objects must be immutable as long as they are used as keys in the Hashtable.
An example of an immutable object would be an instance of String. Alternatively if a value type is passed in as the first parameter of Add, an implicitly boxed value is created which cannot be modified outside the SortedList instance. [Cook Computing]
Great stuff Charles! This is sure to lead to a lot of people staring at what looks like perfectly valid code for hours, tearing their hair out and wondering why their experiencing such strange sideffects. I believe there was a discussion about this very same topic on the DOTNET list late last year. While I don't think I'd call it a bug, it certainly is an oversight in the documentation of that class. The only hint the documentation gives you is:
A SortedList is a hybrid between a Hashtable and an Array.
The problem lies in the fact that the internals of a HashTable uses Object::GetHashCode, by default, to efficiently sort the keys into hash buckets. You can also optionally supply an IHashCodeProvider which it will use to generate hash codes for the keys instead. The three basic properties of a hash function imply immutability:
A hash function must have the following properties:
- If two objects of the same type represent the same value, the hash function must return the same constant value for either object.
- For the best performance, a hash function should generate a random distribution for all input.
- A hash function should be based on an immutable data member. The hash function should return exactly the same value regardless of any changes that are made to the object. Basing the hash function on a mutable data member can cause serious problems, including never being able to access that object in a hash table.
The last bullet-point is where it's at. As Charles also points out, if you use a value type you'll be safe because the data is copied as it boxed to/unboxed from the heap ensuring that you can never actually change the key instance.
Bottom line is: make sure that the type used as a key to a HashTable is immutable or that the instance is simply not mutated while it's still a key in the HashTable. [Drew's Blog]
6:08:15 AM
|