REALbasic
Maybe a diary of REALbasic app development. Maybe just stuff about REALbasic. Maybe nothing :)


@ Thursday, March 14, 2002
 



Sorry for all of the REALbasic posts. I'm really into getting ready to write my REALbasic Developer column. I guess I'm afraid of making a mistake and so I'm trying to imagine every possible detail I'll need to cover. But I'm having a blast thinking about it.

comment ()  11:28:29 PM  #  



I should point out that XML Toolkit is a very complete and well-written XML parser. My parser could never be a replacement. And one reason for my parser's greater speed (in a compiled app it turns out that my parser is about 10 times faster than XML Toolkit) is that my parser does far less than XML Toolkit. XML Toolkit is a real solution if you need an XML parser. And it should be cross-platform. If you need something really fast, use Doug Holton's expat XML plugin. It wraps expat, which was written by Jame's Clark, and it is the defacto standard XML parser. It is used by pretty much everyone and has been very thoroughly tested.

comment ()  11:25:39 PM  #  



So I've pretty much finished my REALbasic XML parser and it works well. I tested its speed against XML Toolkit 2.0.1 by Amar Sagoo. XML Toolkit is also an XML parser that has been written entirely in REALbasic. At first my parser didn't perform very well. It was actually slower. I was surprised. I've done all my parsing with a memory block and I figured that would make it much faster. But it turns out that wasn't the problem. The problem was my function for normalizing whitespace in attributes and text nodes. That was also written with a memory block and for some reason that turned out to be quite slow. When I replaced that code with just a bunch of ReplaceAll's things got quite a bit faster. Which is really surprising. I would think all of those ReplaceAll's would be creating new strings every time and that should be slower than making one MemoryBlock and running through the text one character at a time. Maybe ReplaceAll is really fast. I got concerned that perhaps MemoryBlocks in general are slow and I tried replacing all of my memory blocks with string manipulation, but that did turn out to be a bit slower (though not much slower). So I'm sticking with MemoryBlocks except where I am replacing CR and CRLF with LF.

comment ()  11:20:41 PM  #  



One other wrinkle with XML and whitespace. There's an attribute you can put on a tag that goes like this: xml:space="preserve". You might think that if that weren't present or weren't set to preserve then the parser wouldn't preserve whitespace. Wrong again. That's really just a hint to the calling application, not a directive to the parser. The parser ignores it, but it does get passed up to the calling application. And it is up to that caller to decide what to do with the value, I guess.

comment ()  11:14:05 PM  #  



It turns out that handling whitespace in an XML document is trickier than you think. Well, trickier than I thought, anyway. If you look at the spec, you'll find this, for handling whitespace in attributes:

3.3.3 Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

  1. All line breaks must have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.

  2. Begin with a normalized value consisting of the empty string.

  3. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:

    • For a character reference, append the referenced character to the normalized value.

    • For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.

    • For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.

    • For another character, append the character to the normalized value.

So far, so good. But the next line can really throw you:

If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

"Ah," you might think, "I know what a CDATA section is and there aren't any of those in my document, so I need to collapse white space in my attributes." And there is where you'd be wrong. Because an attribute value has a type, and it can be one of three types: StringType, TokenizedType, and EnumeratedType. And the kicker: StringType is CDATA. So basically all the regular attribute values we know and love in an XML document are CDATA. They aren't a CDATA section, which is enclosed by those groovey double-bracket thingies. Just regular old CDATA. So that means runs of whitespace are not collapsed in attribute values (unless the attributes are marked as not CDATA, which isn't the case in my document).

Space between tags is supposed to be passed to the application in all of its glory. Except that line endings are normalized to UNIX line endings. Otherwise all space is passed to the calling application. Including tabs and runs of spaces.

Whew.

comment ()  11:10:35 PM  #  




XML-RPC Client for REALbasic is very cool. And you can look at the source to get some idea of how to do XML-RPC in REALbasic. It uses Dan Vanderkam's most excellent httpSocket 2.0, which may very well be the best http socket class for REALbasic. It even handles authentication, which I'm going to need for my Radio Poster app. I used httpSocket in StockittoMe.

comment ()  1:00:57 PM  #  



I can't wait to have access to the MetaWeblog API. That, together with Matt Neuburg's excellent A Gentle Introduction to XML-RPC should be all I need to write Radio Poster, which will be an app that helps you compose and post a Radio weblog item. I was playing around with something that would be the composer part. It would present itself as an edit window and a list of links. You'd drop the links on the list to get them all together. You'd also provide names for the links so that you could refer to them easily in your post. Then, in the edit field, you would compose your post, using a notation something like #item# to indicate what text was to be substituted with text from the list. My idea was that one you were finished with your post you would copy it, which would automatically expand the substitution text, and paste it into your web browser to post it to your weblog. But with MetaWeblog API, I should be able to save that step and let you post directly.

comment ()  10:33:00 AM  #  


Click here to visit the Radio UserLand website. © Copyright 2002 Will Leshner.
Last update: 3/14/02; 10:33:04 AM.
March 2002
Sun Mon Tue Wed Thu Fri Sat
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            
Feb   Apr

Click to see the XML version of this web page.

Click here to send an email to the editor of this weblog.

~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~
~