I spent the past week trying to emulate certain aspects of a hardware load
balancer using Apache. It wasn't actually load balancing I was
interested in, but the ability of load balancers to pick URLs apart,
and redirect the client to different servers depending on the contents
of the URL.
At Georgia Tech, we're working with a large Java-based dynamic web
application (Campus Pipeline's Luminis portal software). From our
point of view, one problem with this application is that it's
monolithic: we can't scale it except by buying larger boxes; that
single server is also a single point of failure. To alleviate this
problem, We'd like to be able to front the Java server with static web
servers, which could handle the bits that don't have to be generated
dynamically. We're doing this without the cooperation of the central
application, so the easiest way to do this is by URL inspection by a
device sitting between the client and that server. URLs that indicate
dynamic requests get passed on to the Java-backed web server, while
URLs that can be served statically are redirected to static web
servers. It's something hardware load balancers do very well, but I'm
working at a state-funded university, and state budgets aren't too
good now. Perhaps we'll get what we need in the new year, but right
now, I needed to prototype something that would show that the
application would even work under these circumstances.
In any case, I was able to doing the URL redirection tricks I needed using
mod_rewrite package. (I'll write that up
another time.) But I had another problem:
the central application serves some things via HTTPS/SSL. Since those
a are a small percentage of the bits served by the application, I didn't need to pick those apart, but I did need to be
able to redirect those on to the Java-based server.
I had a devil of a time figuring out how redirect HTTPS/SSL connections. The
mod_rewrite approach doesn't work, because
mod_rewrite works by examining each HTTP request and
changing it or forwarding it. Once an HTTPS connection is set up,
you can't examine the requests: they're encrypted inside an
SSL connection, which is the whole point of HTTPS.
In the end, what I needed was a port-forwarder. A
port-forwarder takes requests on a TCP port on one machine, and
passes them off to a TCP port on another machine. In this case,
I needed to forward all packets coming into port 443 on my pseudo load
balancer, and pass them on to the same port on the application server.
I was building this all under Linux, and Linux has very strong
facilities for routing and forwarding TCP/IP, so I thought that would
be the way to go. I spent many hours chasing that mirage. In the
end, I was convinced that it would be easier to get a Ph.D in physics than it
would be to figure out all the details of Linux routing.
For a time, I was convinced that I could forward HTTPS connection
using Apache's proxying facilities. I found an
note that suggested it should be possible. It's possible that
might work, but I wasn't able to figure it out.
In the end, I settled on an open source package called
portfwd. It works on Linux, and claims to work on
Solaris as well.
portfwd does exactly what says it does:
given a simple config file, it forwards all packets arriving at one
port on to another port, much like a wormhole out of a Star Trek
portfwd in place, my SSL connections were quickly
sped on to the appplication server, and all was well.
In the end, through a lot of dead ends, I was able to get what I
wanted done. I have a feeling that it would have been a lot easier
with a piece of hardware. If we can't get a new hardware
load balancer in the budget, perhaps we'll take up a collection
and get one on eBay.