The Wagner Blog

The Wagner Blog
Development Notes, News and Trivia

Home

Stories

Scripting News

Pythoz

Caetano

Gilmore

Dane Carlson

Thursday, June 13, 2002

The Register: MS Security Hole Extravaganza - including SQLXML3,

and Versign is laying the groundwork for foreign domain names.

7:49:36 PM

Found: Solar System Like Our Own. Researchers discover, for the first time ever, a planetary system similar to our own. And it's right next door, only 41 light years away. By Noah Shachtman. [Wired News]
5:45:06 PM

Follow up on for vs. foreach on arrays

So I posted this quick blurb on foreach'ing vs. for'ing arrays earlier today. When I got home I opened an email from Fumiaki Yoshimatsu pointing out that the C# compiler was actually smart enough to detect that the variable myStructs was of type Array and emitted IL (almost) equivalent to that of a handwritten for loop automagically on the developers behalf. First off, thanks for pointing this out Fumiaki! Next, I don't know when this happened, but I remember Eric Gunnerson discussing this compiler optimization and coulda sworn it wasn't going in until after v1. Obviously somewhere between beta2 and v1 they must have snuck it in there. I guess I just haven't looked at the scenario for a while.

I checked out the VB.NET compiler and it uses this optimization as well. However, both languages seem to do something very strange. They create another local array variable of the same type, assign the array reference to that, then loop over that reference as opposed to your own. This happens in both debug and release builds. I don't really see a reason why they needed to do this and I'm not sure if the JIT will optimize that stack allocation and assignment away, but it's not the end of the world and it sure beats IEnumerable/IEnumerator.

So in the end, using either a manual for or foreach in C# and For or For Each in VB.NET will emit IL that allows the JIT to optimize away bounds checking. For the sake of being thourough, click here to see the different approaches coded in C# and the IL that is emitted in a release build.
[Drew's Blog]
6:09:44 AM

While glancing through the decompiled SortedList code I noticed there is a serious bug waiting to happen. When a key/value pair is added, a reference to the key object, not a copy or clone of it, is stored in the key array. This means that if a reference to the key object is also held outside the SortedList (or obtained via the GetKey method), the object can be modified, thereby corrupting the index.

...

HashTable suffers from the same problem. In this case the documentation states: Key objects must be immutable as long as they are used as keys in the Hashtable.

An example of an immutable object would be an instance of String. Alternatively if a value type is passed in as the first parameter of Add, an implicitly boxed value is created which cannot be modified outside the SortedList instance. [Cook Computing]

Great stuff Charles! This is sure to lead to a lot of people staring at what looks like perfectly valid code for hours, tearing their hair out and wondering why their experiencing such strange sideffects. I believe there was a discussion about this very same topic on the DOTNET list late last year. While I don't think I'd call it a bug, it certainly is an oversight in the documentation of that class. The only hint the documentation gives you is:

A SortedList is a hybrid between a Hashtable and an Array.

The problem lies in the fact that the internals of a HashTable uses Object::GetHashCode, by default, to efficiently sort the keys into hash buckets. You can also optionally supply an IHashCodeProvider which it will use to generate hash codes for the keys instead. The three basic properties of a hash function imply immutability:

A hash function must have the following properties:

If two objects of the same type represent the same value, the hash function must return the same constant value for either object.
For the best performance, a hash function should generate a random distribution for all input.
A hash function should be based on an immutable data member. The hash function should return exactly the same value regardless of any changes that are made to the object. Basing the hash function on a mutable data member can cause serious problems, including never being able to access that object in a hash table.

The last bullet-point is where it's at. As Charles also points out, if you use a value type you'll be safe because the data is copied as it boxed to/unboxed from the heap ensuring that you can never actually change the key instance.

Bottom line is: make sure that the type used as a key to a HashTable is immutable or that the instance is simply not mutated while it's still a key in the HashTable.
[Drew's Blog]
6:08:15 AM

Hosting ASP.NET in a WinForms app [IUnknown.com: John Lam's Weblog on Software Development] [Sam Gentile's Radio Weblog]
6:06:22 AM

Transistors Reach Molecular Level. Researchers from two different teams publish their technique to wire up individual molecules into electronic circuits. By Mark K. Anderson. [Wired News]
6:04:01 AM

Behind Linux's Struggle in Gov't. It's free, it's becoming more secure, and it's even the dirty little secret among some computer geeks who work in the U.S. government. Then why isn't Linux more prevalent? One word: Microsoft. Another: Oracle. By Declan McCullagh and Robert Zarate. [Wired News]

- As opposed to the German Government which has decided to standardize on Open Source software, including Linux!

6:03:18 AM