Rod Waldhoff's Weblog  

 Tuesday, 10 February 2004 
INTP #

I'm not normally a follower of those "link to me" memes, but my "bloginality" of INTP seems to get this part:

"Because routines aren't your strong point, you might be more likely to work on the concept of how to do a blog, but not be as excited to keep it up."

right. On that note, as I've previously alluded to, I hope to get a new blog framework up and running here in the next month or two.

Actually I've been terribly busy lately with personal and professional commitments and my online presence (here and elsewhere) has suffered as a result.

 Thursday, 22 January 2004 
Hello Again #

Sigh. I've been ignoring my blog for too long again, as I've been rather busy both before and since the holidays. I'll see what I can do to rectify that.

 Monday, 8 December 2003 
An SQL Annoyance #

SQL isn't consistent under row/column transposition. For example:

select 3 + NULL

yields NULL. Yet,

create table TEMP ( NUM number );
insert into TEMP values ( 3 );
insert into TEMP values ( NULL );
select sum(NUM) from TEMP

yields 3 (since NULL valued rows are ignored by aggregate functions).

This inconsistency is all the more annoying since both:

select sum(NUM) from TEMP where NUM is not NULL

and

select sum(coalesce(NUM,0)) from TEMP

would yield the same result under an "aggregation of NULL is NULL" rule. Yet under the "aggregation function ignores NULL" rule, creating a single, efficient, cross-database query the yields NULL if there's a NULL row and the SUM otherwise is awkward at best.

 Friday, 21 November 2003 
Ruby Userland #

As I previously mentioned, I've been toying with custom clients to the XmlStorageSystem XML-RPC protocol used by Radio Userland and several open source blog servers, with the ultimate goal of hosting a custom blog on the radio.weblogs.com host.

Over the past couple of evenings I've hacked out xmlStorageSystem.rb, a reasonably functional Ruby-based client to the XmlStorageSystem system protocol. It works like this:

To create a new instance of the client, use:

XmlStorageSystem.new(<usernumber>,<md5-hash-of-password>,<root-directory>)

For instance, I use:

XmlStorageSystem.new('122027','8c8034f5c9d68564e155c67a6d4e4612','/0122027/')

although that's not my actual password.

Actually, my local copy of xmlStorageSystem.rb has these value specified as the defaults, so I just user XmlStorageSystem.new, and I'll use that form in the rest of these examples. The constructor also accepts an number of arguments that should allow one to connect to the Python Community Server and others, although Radio is the only server I've tried.

To get a listing of the files currently stored on the server, use:

XmlStorageSystem.new.getMyDirectory

To download all those files to a local directory

XmlStorageSystem.new.backupMyDirectory 'backupdir'

To upload a file (or files) to the server, use:

XmlStorageSystem.new.saveMultipleFiles( 'local-base-dir', [ 'file1', 'file2', 'etc' ])

To delete a file (or files) from the server, use:

XmlStorageSystem.new.deleteMultipleFiles( [ 'file1', 'file2', 'etc' ])

Finally, the really handy one:

XmlStorageSystem.new.updateFromLocalDirectory 'localdir'

Which will compare the list of files in the local directory with those on the server, delete the ones that don't appear locally, and load/update the rest.

Since this is Ruby, it's easy to set up little shell scripts that invoke those commands in ways useful to your personal work style.

If one wanted to be clever, there is metadata available via XmlStorageSystem.getMyDirectory that would allow one to determine whether or not a file has changed since it was last uploaded, but I haven't gotten around to that yet.

I'm still pretty much a Ruby neophyte, so there's probably substantial room for improvement there. In particular, (1) there's no error handling present just yet and (2) the current implementation supports hackablity (changing the script itself) more than extensibility. Nevertheless, it's neat that a Ruby neophyte can write a basic XmlStorageSystem client in 150 lines of readable code.

 Wednesday, 19 November 2003 
some bookmarks #

 Monday, 17 November 2003 
What's this sticky green fluid? #

Oh, I see, I've been biled.

While I think Hani understands commons-primitives better than he lets on, I'm not sure the same is true of the peanut gallery that regularly fill up his comment threads. Since Hani was nice enough to address me as "dear", I guess I'll go ahead and feed the trolls.

Hani's rants are most amusing when there's some content beyond vulgarity and argumentum ad hominem. This post is thinner on that point than many, but let's see if we can find some actual, specific complaints to consider.

First, there's one point on which Hani and I are in agreement:

[F]or most applications, the performance gain is so trivial and insignificant that it really isn't worth the extra jar and complexity of using non-standard collection classes.

Agreed. Moreover, the space savings (which in the case of an ArrayList varies from 50% to 94%, depending upon the primitive type being used) is also "trivial and insignificant" in light of the size and number of collections of primitives used by most applications.

So there you go. Commons-primitives isn't universally applicable. A damning critique indeed.

The rest of the post is less insightful.

Hani writes:

How on earth could they have missed that age old adage, 'premature optimisation is the root of all evil'?

Is this meant to suggest that commons-primitives was developed before there was a demonstrated need for it? Hani, your omniscience has failed you, as a bit of research would have revealed.

Commons-primitives was initially developed in support of the Axion database project. In Axion, we have need to store a significant number of collections of primitives, and at times those collections grow rather large. Consider, for instance, a table with an integer-valued key field. In Axion, depending upon the index type and configuration, there may be three primitive values lists created for this table--a list of positive long values representing file offsets by row identifier and a pair of lists of integers, one containing key values and the other the associated row identifier. As initially developed using the java.util collections, this setup used 48 bytes per row in memory. The current, commons-primitives based implementation uses only 12 bytes per row in memory, saving 75% of the space. In my mind, increasing the number of rows that can be efficiently accessed by a factor of 4 (and getting a little performance boost to boot) is neither a "trivial" nor "insignificant" improvement.

Alternatively, perhaps this comment is meant to suggest that some clients might use commons-primitives without a demonstrated need for trying to reduce the size of their collections of primitives. I'm not sure how this reflects upon commons-primitives itself. As above, commons-primitives isn't universally applicable. Perhaps optimistically, I'll continue working from the assumption that most folks have the critical analysis skills necessary to determine if a given library is applicable to their particular situation.

Hani's final group of complaints are concerned with object naming. He writes:

Now maybe I'm old fashioned, but in my crazy world [List]Iterator is a [...] lot easier to work with [than] DoubleListIteratorListIterator.

Really? That's odd, given that they have literally the same interface. Perhaps this is meant to suggest that the name is verbose? Sure, I'll concede that. But it's also the conventional name, and a type that's rarely used. Allow me to break it down for you. <Type>ListIteratorListIterator is an adapter which makes a <Type>ListIterator look like a ListIterator. That's why you find it in the adapter package. That's why it follows the naming convention used by other Java adapters, like ByteArrayInputStream, StringReader and OutputStreamWriter. That's also while you'll use it maybe a handful of times in a complete application.

Hani continues:

If your brain hasn't automatically shut down by now to protect itself from these vile names, then contemplate RandomAccessDoubleList.RandomAccessDoubleListIterator if you will.

Ah, yes. A protected-scope, inner class of an abstract base class goes right to the heart of the component's usability. I think if you poke around a bit, you might be able to find an oddly named private variable as well.

Meanwhile, for the methods one actually uses on a regular basis, such as List.add or Iterator.next, the primitive collections allow a more concise, readable, implementation. Consider, for example, taking the pairwise sum of two lists. Here's an Object-based implementation:

List pairwiseSum(List lista, List listb) {
  List result = new ArrayList();
  for(Iterator a = lista.iterator(), b = listb.iterator(); a.hasNext(); ) {
    result.add(
      new Integer(
        ((Integer)(a.next())).intValue() +
        ((Integer)(b.next())).intValue() ) );
  }
  return result;
}

Here's a primitive version:

IntList pairwiseSum(IntList lista, IntList listb) {
  IntList result = new ArrayIntList();
  for(IntIterator a = lista.iterator(), b = listb.iterator(); a.hasNext(); ) {
    result.add( a.next() + b.next() );
  }
  return result;
}

Hani, I enjoy your rants as much as the next geek, but if the best you can do is troll the annoucements@jakarta list to wait for a chance to say "but there are times when that library isn't helpful" (which, so far, has seems to be the point of every jakarta-commons rant you've posted to date), I may find another use for that slot in my aggregator. Also, I've noticed an increase in the number of logical fallacies in your rants. Being a jerk for dramatic effect might be entertaining, but being a misleading jerk is not.

 Friday, 14 November 2003 
On Programming Idioms #

Two things I'm always keen to learn when picking up a new programming language are:

  1. How does one organize large projects? In other words, how does one partition responsibilities and types across namespaces, modules and files?
  2. What are the common idioms in the language?

I've been doing some string processing work with Ruby recently, and it's got me thinking about examples of the latter.

For example, in Java, the String class doesn't have a direct, boolean-valued method that will tell you whether or not a String contains another String, i.e., there's nothing like:

if(someString.contains(anotherString)) { ... }

Instead, most Java developers will write:

if(someString.indexOf(anotherString) != -1) { ... }

where String.indexOf(String) returns the index of the first occurrence of the argument String, or -1 if the given String isn't found. Most Java developers will immediately recognize that as the "String.contains" idiom, and won't miss a beat.

This idiom is so strong in the Java community that it's almost counter-productive to write a custom utility method:

public class StringUtils {
  public static boolean contains(String a, String b) {
    return (a.indexOf(b) != -1);
  }
}

since many developers who see

if(StringUtils.contains(someString,anotherString)) { ... }

are likely to wonder whether the StringUtils.contains method really does what it is implied--Is this equivalent to the String.indexOf idiom? Is that someString.contains(anotherString) or vice versa? How are null's handled? etc. Unless the developer is already comfortable and familiar with the StringUtils class being used, this code is probably less readable to an experienced developer than the "indexOf != -1" formulation.

(This is not to say that "String.indexOf(x) != -1" is actually preferable to "String.contains(x)", but rather that in the absence of String.contains, the idiom is more widely recognized than a custom utility method. Why Sun can't at some point introduce a String.contains method, say in JDK 1.5, isn't entirely clear to me.)

Now, in the Ruby scripting I've been doing recently, I keep needing to determine whether a String begins with a given prefix. In Java, that's the String.startsWith method, of course. The Ruby String class does not have a startsWith method, but one of the neat things about Ruby is that it's possible to literally add such a method the the String class, as follows:

class String
  def startsWith str
    return self[0...str.length] == str
  end
end

after which everything behaves exactly like the built-in String class contained that definition. For example, one can then write:

if someString.startsWith(anotherString) ...

or even

if "a literal string".startsWith(anotherString) ...

etc.

Of course, the implementation I used for String.startsWith (self[0...str.length] == str) is just one of several possible implementations. Regular expressions provide one way of implementing such a check. The Java-like indexOf function provides another (e.g., self.indexOf(str) == 0).

Since there is no built-in String.startsWith (or for that matter String.contains) in Ruby, I wonder if there is some common idiom that experienced Ruby developers find more readable than adding a custom method to String? If not for String.startsWith, how about String.contains?