Pete Wright's Radio Weblog

Pete Wright's Radio Weblog
Musings on anything and everything, but mainly code!

PETE WRIGHT'S STUFF

Home

Blog

About Pete

Writing

Tech Friends

FRIENDS AND INTERESTING BLOGS

Jason Bock's site

Carl Franklin's site

The Apress author blogs

16 November 2004

The vagaries of string
I'm working on a chapter for the new book at the moment that looks in some detail at C# Types, variables and all that other good stuff. I've come across something though that I'm having trouble getting my head around. Either I'm blind or stupid and missing something obvious, or Microsoft are pulling a fast one on us here. It's quite possible all of the above are true as well I suppose.

Take look at this code.
class Program
{
static void Main(string[] args)
{
foo myName = new foo();
myName.name = "Pete";

foo hisName = myName;
myName.name = "Paul";
Console.WriteLine("{0}, {1}", myName, hisName);
Console.ReadLine();
}
}
class foo
{
public string name;
public override string ToString()
{
return name;
}
}

It's simple enough. Foo is a class type that contains a single public property (name) and a method (ToString). The main code just creates an instance of Foo, and sets up the name property, then creates another Foo reference, which points at the first. The result, obviously, is that when I change the property value on the original object reference, the change is reflected in the second object reference since by virtue of them being reference type variables, they are pointing to the same object.

You should all be able to easily see why this code works - it's not magic or anything. But, just in case you've never dived into the murky depths of the C# reference let me explain "how" it works. Class types are reference types. When you create an instance of a class type what you are actually doing is creating an object in main memory and storing a reference to it in your variable. If you then go and assign that variable to another one, you are really just copying the object reference around. This is different to value-types (like int, char, byte and so on). Value types actually store their data, and functionality in the variable. So, when you assign one value type to another, C# goes ahead and creates a copy of the entire variable contents (just like it does with reference types actually) and assigns that copy to the new variable. The net result is that with value types you have two variables both with identical data, but independent of each other; if you change the data in one of the value type variables the change doesn't also happen in the second.

So, what's the problem? Well, String to be quite frank. I like String, we've spent a lot of time together and I like to think of us as being good friends by now. String and I have shared pizza and Dr. Pepper during late night coding sessions. String has even been there when I've sat coding in my underwear (sorry, try not to picture that too hard). We're close, or at least we were until I had to write about String.

You see in C# 'int' is an alias for System.Integer. System.Integer derives from System.ValueType, so I know that at runtime I'm getting the entire contents of the class in my variable - code, data and everything. It's a value type after all. String on the other hand is a reference type. Check it out in the documentation. Check it out in the Reflector if you really want to be sure. 'string' in C# is an alias for System.String, which dervices from System.Object directly, and is thus a reference type. What this means is that when you think long and hard about it, if you assign a string to variable a, and then assign variable a to variable b, both a and b should be pointing at the same object. It's a reference type. However, they don't. Try this out.
class Program
{
static void Main(string[] args)
{
string myName = "pete";
string hisName = myName;
myName = "Paul";
Console.WriteLine("{0}, {1}", myName, hisName);
Console.ReadLine();
}
}

When you run this you'll see that myName and hisName are both very different object instances. But how? String is an object type. Surely if I change myName, hisName will change as well.

You see, this is where String and I fell out. I don't like to write stuff in books that I don't really understand fully, and so I had to get to the bottom of this. If you just read over the code without trying to hard to understand what it's doing behind the scenes, it's really quite obvious. myName is "pete", and hisName is set to the same string. I then changed the string (you can't actually change a string - it's immutable - but hang in there), hence the two strings that come out at the end are just that; 2 different strings. The code works as it reads, and that's a credit to the C# team. It's really quite an intuitive AND logical language (my wife is intuitive, but living proof that intuitive things frequently defy all semblence of logic).

What's actually happening here when you dive down really deep is quite neat. When you dump a string literal into your code you create an instance of the string class type, loaded up with the text in question. So, lets look at the code again.

First I create a string, that contains 'pete' and I store a reference to it in myName. Then I create another string reference variable that also points to the first string object ('pete'). I then create a second instance of a string class type and initialize it with the text 'Paul'. A reference to this is stored in myName, overwriting the first reference to the string 'pete'. There's the issue. I actually have two strings in memory - pete and Paul.

It's neat because that's really just what you'd expect, and to have anything different to that would just be damn odd. It's only when you ask a centipede how he walks though that things get all confused. By dropping a string literal into your code you are creating an object on the heap and then storing a reference to it somewhere. In effect, String breeds if you aren't careful.

5:05:51 PM comment []