James Strachan's Weblog

Wednesday, January 22, 2003

On Pre-Loading
Cameron's written a nice piece on caching - its well worth a read. I'll just make a couple of quotes...

On Pre-Loading.

James has been busy on the cache pre-loading topic and the issues around Prevayler. I think he's the architect of the SpiritCache product (caching over SpiritSoft JMS) that competes with our Coherence software.

Minor correction, SpiritCache implements distributed caching using any JMS provider. Though obviously it performs better if you choose a good provider :-)

Incidentally, the biggest cost of accessing Java objects over the network is typically serialization/deserialization -- not network time! This is a great reason to try to move up to JDK 1.4.1, which has improved the performance of serialization rather dramatically. In a lot of cases, it used to actually be faster to serialize as XML and deserialize by parsing it! Now, JDK 1.4.1 runs within about 25% of hand-coded custom serialization speed, which is pretty impressive.

Totally agree. Using XML to serialize objects is often much faster (and spookily, often smaller too, even without gzipping it) than serialization. Java Serialization is a pretty heavyweight process, since each object is explicitly versioned and the methods hash-coded. Its also very easy to accidentally send around unnecessary data by missing a transient here or there. Using XML allows the whole schema to be versioned, rather than versioning each little piece of the schema (i.e. object) and gives much more control over what goes on the wire without having to be intrusive in your code.

James adds: "if a database is slow to update, rather than blocking a Servlet making your application appear slow and using up scarce threads on your heavily loaded web server, try just post a command object to a JMS Queue and perform the database write asynchronously." He's being a little humble at this point, because caching products pay for themselves in spades precisely because of the database latency and throughput issues. Think about it this way: The application servers scale out just by adding pizza boxes, but the database will basically be limited to vertical scale. Without careful architecting, sufficiently high-scale applications will always bottleneck on the data source, be it a database or a mainframe service.

[/dev/null [Cameron]]

:-). Well said Cameron. Indeed, most tiers in your application can scale by just adding boxes - whereas there's usually just one database instance. If your database is slow you can't just drop in another box, you need to buy a new bigger box (and probably pay more licences for your database software). So the more load you can take off the database, the more cost effectively you can scale. Or to put that another way, taking load off the database usually results in a direct cost saving. This is in addition to the reduced latency and higher throughput of your application tier though caching.

BTW Happy Birthday Cameron!

7:38:09 AM comment []

Open Source projects I work on
ActiveCluster
ActiveMQ
ActiveSOAP
ActiveSpace
axion
betwixt
Camel
commons
db
dom4j
groovy
geronimo
jaxen
jelly
jexl
Jencks
Lingo
maven
messenger
panoptes
picocontainer
ServiceMix
sql
spring
sysunit
taglibs
XBean