transcendental petroglyphs

will leshner's cave wall scratchings

@ Thursday, March 14, 2002

Day 18 of TheBeard™:

comment ()  11:55:39 PM  #

Sorry for all of the REALbasic posts. I'm really into getting ready to write my REALbasic Developer column. I guess I'm afraid of making a mistake and so I'm trying to imagine every possible detail I'll need to cover. But I'm having a blast thinking about it.

comment ()  11:28:29 PM  #

I should point out that XML Toolkit is a very complete and well-written XML parser. My parser could never be a replacement. And one reason for my parser's greater speed (in a compiled app it turns out that my parser is about 10 times faster than XML Toolkit) is that my parser does far less than XML Toolkit. XML Toolkit is a real solution if you need an XML parser. And it should be cross-platform. If you need something really fast, use Doug Holton's expat XML plugin. It wraps expat, which was written by Jame's Clark, and it is the defacto standard XML parser. It is used by pretty much everyone and has been very thoroughly tested.

comment ()  11:25:39 PM  #

So I've pretty much finished my REALbasic XML parser and it works well. I tested its speed against XML Toolkit 2.0.1 by Amar Sagoo. XML Toolkit is also an XML parser that has been written entirely in REALbasic. At first my parser didn't perform very well. It was actually slower. I was surprised. I've done all my parsing with a memory block and I figured that would make it much faster. But it turns out that wasn't the problem. The problem was my function for normalizing whitespace in attributes and text nodes. That was also written with a memory block and for some reason that turned out to be quite slow. When I replaced that code with just a bunch of ReplaceAll's things got quite a bit faster. Which is really surprising. I would think all of those ReplaceAll's would be creating new strings every time and that should be slower than making one MemoryBlock and running through the text one character at a time. Maybe ReplaceAll is really fast. I got concerned that perhaps MemoryBlocks in general are slow and I tried replacing all of my memory blocks with string manipulation, but that did turn out to be a bit slower (though not much slower). So I'm sticking with MemoryBlocks except where I am replacing CR and CRLF with LF.

comment ()  11:20:41 PM  #

One other wrinkle with XML and whitespace. There's an attribute you can put on a tag that goes like this: xml:space="preserve". You might think that if that weren't present or weren't set to preserve then the parser wouldn't preserve whitespace. Wrong again. That's really just a hint to the calling application, not a directive to the parser. The parser ignores it, but it does get passed up to the calling application. And it is up to that caller to decide what to do with the value, I guess.

comment ()  11:14:05 PM  #

It turns out that handling whitespace in an XML document is trickier than you think. Well, trickier than I thought, anyway. If you look at the spec, you'll find this, for handling whitespace in attributes:

3.3.3 Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

All line breaks must have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.

Begin with a normalized value consisting of the empty string.

For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:

For a character reference, append the referenced character to the normalized value.

For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.

For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.

For another character, append the character to the normalized value.

So far, so good. But the next line can really throw you:

If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.
"Ah," you might think, "I know what a CDATA section is and there aren't any of those in my document, so I need to collapse white space in my attributes." And there is where you'd be wrong. Because an attribute value has a type, and it can be one of three types: StringType, TokenizedType, and EnumeratedType. And the kicker: StringType is CDATA. So basically all the regular attribute values we know and love in an XML document are CDATA. They aren't a CDATA section, which is enclosed by those groovey double-bracket thingies. Just regular old CDATA. So that means runs of whitespace are not collapsed in attribute values (unless the attributes are marked as not CDATA, which isn't the case in my document).
Space between tags is supposed to be passed to the application in all of its glory. Except that line endings are normalized to UNIX line endings. Otherwise all space is passed to the calling application. Including tabs and runs of spaces.
Whew.

comment ()  11:10:35 PM  #

Garth's having trouble finding a good cup of coffee :) I'm not sure what he likes in coffee. I'm partial to Peet's. But I spend a lot of time in Starbuck's. There isn't a Peet's close by, or I'd be there instead. Everytime I go to Starbuck's they have the weakest coffee as their coffee-of-the-day. There are usually two regulars and one decaf. Frequently the decaf is better than the regular, so I get that instead.

comment ()  6:14:31 PM  #

As my last post proves, you can't use Radio to do Japanese posts. Or, if you can, then I don't have something configured correctly. I told my wife that she should keep a weblog, but she would want to do it in Japanese and I told her she could. But maybe not yet.

comment ()  1:08:05 PM  #

[base "]ú[^]{'ê[not equal]Ì[infinity]|[infinity]X[infinity]g[trademark]B[not equal]±[not equal]ê[not equal]Å[base "]ú[^]{'ê[not equal]Ì[infinity]|[infinity]X[infinity]g[not equal]º[not equal]Å[not equal]«[not equal]é[not equal][sgl dagger][not equal][product][not equal]¢[trademark]B

comment ()  1:06:08 PM  #

XML-RPC Client for REALbasic is very cool. And you can look at the source to get some idea of how to do XML-RPC in REALbasic. It uses Dan Vanderkam's most excellent httpSocket 2.0, which may very well be the best http socket class for REALbasic. It even handles authentication, which I'm going to need for my Radio Poster app. I used httpSocket in StockittoMe.

comment ()  1:00:57 PM  #

Darn! I wondered what happened to those [^] they were my only pair. But what I really want to know is why we call them "pairs." [jenett.radio]
That's jenett.radio's response to my post about seeing a pair of men's underwear on a recent bike ride. I worried about the "pair" thing myself. I actually tried to word my post so as not to use "pair" because it seemed kinda weird.
I guess I should have picked them up and returned them. Sorry about that :)

comment ()  11:30:15 AM  #

How to use fetchmail and procmail to filter spam on Mac OS X. This is exactly what I thought I wanted to do, until I realized that it's a POP thing. I use IMAP. Dang.

comment ()  10:59:32 AM  #

Nice clean use of CSS in a Radio weblog. Easy to read. Fast. [Scripting News]
I like that site a lot too. One of the things I like the most is the fact that I'm on the blogrolls! Thanks!

comment ()  10:36:51 AM  #

I can't wait to have access to the MetaWeblog API. That, together with Matt Neuburg's excellent A Gentle Introduction to XML-RPC should be all I need to write Radio Poster, which will be an app that helps you compose and post a Radio weblog item. I was playing around with something that would be the composer part. It would present itself as an edit window and a list of links. You'd drop the links on the list to get them all together. You'd also provide names for the links so that you could refer to them easily in your post. Then, in the edit field, you would compose your post, using a notation something like #item# to indicate what text was to be substituted with text from the list. My idea was that one you were finished with your post you would copy it, which would automatically expand the substitution text, and paste it into your web browser to post it to your weblog. But with MetaWeblog API, I should be able to save that step and let you post directly.

comment ()  10:33:00 AM  #

Firstly, John ignores the time needed to find items that might be worth linking to. [Jonathon Delacour]
Jonathon's entire post is definitely worth reading. So if you haven't read the whole thing, then get over there and read it. This quote is also very interesting. John is John Hiler, who wrote The Tipping blog: How Weblogs Can Turn an Idea into an Epidemic. John Hiler claims in that article that it's easier to put a quick link in your weblog than it is to write your own content. Jonathon rightly points out that a fair amount of time goes into finding the right link and preparing it for inclusion in a weblog. This little post didn't take me hours to prepare, but I did have to make sure I had the quotes right and got the various links I needed to complete it. It is more time-consuming than John Hiler seems to think to do this stuff..

comment ()  10:12:13 AM  #

The brutality of the time-economics lies in the calculation that the reader is ultimately engaged for thirty times longer via a link that took one-thirtieth of the time to create. [Jonathon Delacour]
That is brutal. I never thought about it that way before. I have thought about the two blogging styles, which most people usually mix: link-comment and original content. The blogs I admire most are the ones that can mix those two styles. Jonathon's weblog is a perfect example of a mix of personal content and links to other content. True scholars like Jonathon produce original research that also references a good deal of other people's research.

comment ()  9:58:53 AM  #