Pushing the envelope

Darren's take on Java, agile methods, cool open source stuff, interesting technologies and other random wanderings through the land of blog.
Updated: 26/01/2003; 11:48:59.
Places to go
Apache Jakarta Project
c2.com
ExtremeProgramming.org
OpenSymphony
XProgramming.com
XP Developer

People to see
Russell Beattie
Eugene Belyaev
Tony Bowden
Mike Cannon-Brookes
Jeff Duska
Paul Hammant
Scott Johnson
Brett Morgan
Rickard Öberg
James Strachan
Joe Walnes

Things to do

Subscribe to "Pushing the envelope" in Radio UserLand.

Click to see the XML version of this web page.

Click here to send an email to the editor of this weblog.


That was the day
October 2002
Sun Mon Tue Wed Thu Fri Sat
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
Sep   Nov



Archives
December 2002
November 2002
October 2002
September 2002
August 2002

Listening To


Valid RSS

Click here to visit the Radio UserLand website.

  16 October 2002

Why I'm not a corporate investor

Party like it's 1993. -Russ [Russell Beattie Notebook]

1993 was also when I first heard the word 'Linux', which, because we were British, was pronounced 'Line-ux'. One of my Uni acquaintances showed it to me. "Look, free Unix on a pc". "That won't go anywhere", I thought... This rates alongside my other great predictions such as, 'Netscape are having an IPO, should I invest? Naa, they probably won't be worth much'. And Yahoo, and Redhat... I console myself that I was a poor student and didn't have much to invest at the time anyway. Wouldn't have had enough money to get rich, just enough to have had one whale of a time at Uni.


11:30:53 PM      comment []

Note to Russ

I couldn't unplug.... But I'm going to be brief tonight. It's 11:19 p.m. I want to be off the computer by 11:30 (maybe 11:45's more realistic).

-Russ [Russell Beattie Notebook]

Russ, if you're reading this and its NOT just after you got up on thursday morning, switch off. Now. :)


11:19:44 PM      comment []

BEEP

Fed my book-buying habit today with the acquisition of the O'Reilly BEEP book. Build your own network protocol. Cool. BEEP looks very interesting as an alternative for all the contortions distributed application developers have to go through to make them work over HTTP. It provides a framework where most of the complex low-level stuff is done for you, and you just have to build your application-specific stuff on top of it. So the developer gets to decide whether the connection should be pull/push or both, stateless or stateful, pipelined or multiplexed etc. And security appears to be pluggable too.

I seem to remember Paul Hammant mentioning something about writing a BEEP module for AltRMI, which sounds like a great idea, especially for doing asynchronous callbacks. Must read more in case I'm totally wrong...


8:29:57 PM      comment []

You're not a *nix geek unless...

...you've replaced sendmail as your default MTA (in my case with postfix). Oh my word. Talk about stressful. At one point I thought I'd just obliterated a whole day's worth of incoming mail because I kicked off fetchmail (thinking I was ready when I wasn't), and postfix threw a wobbly. Thankfully it kept all the undelivered messages so after a few frantic minutes skimming the docs, hacking the config and one 'postfix flush' later, all my email reappeared. Phew.

I flatter myself that I can usually puzzle my way through most techie things, but email delivery systems are way more complex than I ever imagined. I had no idea what I was getting into when I started. Its still not working as I expected but I appear to be able to send email, so I think I'll leave it until my palms stop sweating.


8:21:39 PM      comment []

Distributed Lucene

Interesting article by Mark Harwood here regarding distributed lucene indexes. Using distributed indexes is how google achieves its scalability I believe, but they are a fairly special case.

If scalability in the sense of concurrent users is the issue, I tend to favour multiple identical boxes with a load balancer and an RPC frontend. This can be as simple as a servlet, or you can use SOAP or XML-RPC etc. (Possibly RMI, although I've never tried that across a load balancer). Doing things this way is probably a lot simpler to manage than splitting your indexes across boxes and means that even if your queries are asymmetric (ie. 85% of the queries are for the same thing), the load can be fairly balanced. Reliability is achieved for free as well - if a box dies just stop sending requests there. Given Lucene's performance (it has been used to index collections of more than 10 million documents) its pretty unlikely that your dataset will get so large that sheer size starts to affect your query times. Unless of course, you are google :)


10:16:02 AM      comment []

Lucene hints

Lucene is great, but some of the default settings are heavily biased towards interactive indexing and searching. If you're building an index in a batch process style, set the IndexWriter.mergeFactor value to something big. I use 10,000, which makes it burn about 500 meg of RAM while indexing, but speeds it up a lot over the default value of 10. YMMV as ever.

7:16:14 AM      comment []

© Copyright 2003 Darren Hobbs