Jon Box's Weblog

 


.NET Articles by the Atomic group

MSDN RDs









Subscribe to "Jon Box's Weblog" in Radio UserLand.

Click to see the XML version of this web page.

Click here to send an email to the editor of this weblog.

 

 

  Tuesday, July 29, 2003


The Scenario

I’m working on a project that reads data from a standard text file, that is the file is CR/LF delimited.  This is an extremely easy task in .NET using the StreamReader class and its ReadLine function.

 

Dim line As String

Dim sr As StreamReader = New StreamReader("c:\SomeFile.txt")

Do

    line = sr.ReadLine()

    ProcessRow(line)

Loop Until line Is Nothing

sr.Close()

 

Next (and the real point of this text), I needed to work on some logic which would parse the data from the current row of text (for example, working with the “line” variable in the above snippet).  In my case, the file is a comma delimited file (CDF), and each line in the file appears like the following: 1, 2, 3, 4, 5[CR][LF].

 

String.Split

Just as VB6 has the Split command to parse CDF based text, .NET provides several ways to accomplish the same.  The most obvious option to me was to use the String class’ Split method which has the following syntax:

 

[Visual Basic] Overloads Public Function Split(ParamArray Char()) As String()

[C#] public string[] Split(params char[]);

 

While this might initially appear to do the trick, notice the variable type of the text delimiters.  Instead of passing a string as in VB6, the String class expects an array of Char’s.  To see the potential dilemma, consider the following scenarios using the “line” variable from above.

 

Dim Line As String = "1, 2, 3, 4, 5"

Dim pos As Integer

Dim arrString() As String

Dim splitParams As Char() = New Char() {","c}

arrString = Line.Split(splitParams)

 

For pos = 0 To arrString.Length - 1

    Console.WriteLine("{0}: [{1}]", pos, arrString(pos))

Next

 

The Results are:

0: [1]

1: [ 2]

2: [ 3]

3: [ 4]

4: [ 5]

 

While this basically does what I need, the second and subsequent strings will have an extra space character.  I know that this is not a great programmatic challenge but I wish that .NET would have done more work for me.  So, my next attempt was to change the delimiter array by adding a space, like this:

 

Dim splitParams As Char() = New Char() {","c, " "c}

 

The Results are:

0: [1]

1: []

2: [2]

3: []

4: [3]

5: []

6: [4]

7: []

8: [5]

 

 

The space padded data issue is now gone, but there are additional empty elements in the array due to the space delimiter.  There’s got to be a solution which ends with a cleaner result.

 

Microsoft.VisualBasic.Strings

 

Another option that I found is to use the Split method provided in the VB runtime help.  It is a good way to go because it allows String delimiters.  Consider the following snippet:

Dim pos As Integer

Dim arrString() As String

arrString = Microsoft.VisualBasic.Strings.Split(Line, ", ")

 

For pos = 0 To arrString.Length - 1

    Console.WriteLine("{0}: [{1}]", pos, arrString(pos))

Next

 

The Results are:

0: [1]

1: [2]

2: [3]

3: [4]

4: [5]

 

This is a perfect result.  But, I still have a gripe about this solution.  I think that this should be a standard part of the String class so that all can enjoy easily and cleanly.  In my opinion, the strong willed C# developers (i.e. anti-VB mindset) will have a hard time using this solution.

 

It turns out that another option exists in the System.Text.RegularExpressions namespace.  Consider the sample below where a RegEx object is instantiated with a string delimiter and then the Split function is called:

 

Dim pos As Integer

Dim r As Regex = New Regex("(, )") ' Split on hyphens.

Dim arrString() As String

arrString = r.Split(Line)

 

For pos = 0 To arrString.Length - 1

    Console.WriteLine("{0}: [{1}]", pos, arrString(pos))

Next

 

The Results are:

0: [1]

1: [, ]

2: [2]

3: [, ]

4: [3]

5: [, ]

6: [4]

7: [, ]

8: [5]

 

Again, this is not the clean result that I am looking for, although it did get me a decent result.  So, the search continues.

 

String.IndexOf

Another option that I discovered was right there the whole time in the String class.  By using the IndexOf method, I can work my way down the string looking for a string delimiter.  The syntax of the function is following:

 

[Visual Basic] Overloads Public Function IndexOf(String, Integer, Integer) As Integer

[C#] public int IndexOf(string, int, int);

 

To utilize this functionality, it will take more than a single function call.  A sample is following:

 

Dim pos As Integer

Dim at, start As Integer

Const constSearch As String = ", "

 

Do While at >= 0

    at = Line.IndexOf(constSearch, start)

    If at <= 0 Then

        Console.WriteLine("{0}: [{1}]", pos, Line.Substring(start))

        Exit Do

    End If

 

    Console.WriteLine("{0}: [{1}]", pos, _

          Line.Substring(start, at - start))

    pos += 1

    start = at + constSearch.Length

Loop

 

The Results are:

0: [1]

1: [2]

2: [3]

3: [4]

4: [5]

 

Based on the previous examples, you’ll probably guess that I like the results.  Furthermore, I like the code other than the effort of the required extra coding that could be initially error prone.  I have not dug deeper into this one to see how performing it is.

 

Conclusion

This is not a big deal because .NET provides several alternatives including allowing me to code my own.  However, .NET does so much for me that I am spoiled.  I want life to be as simple as possible.

 

My Split wish list is the following:

  • Array of String Delimiters
  • Part of String class
  • Called by a simple method call, no extra coding required
  • Delimiters or Empty strings optionally not returned

 


9:05:19 AM    comment []


Click here to visit the Radio UserLand website. © Copyright 2004 Jon Box.
Last update: 8/31/2004; 11:54:34 PM.

July 2003
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
Jun   Aug