Steve's No Direction Home Page

Steve's No Direction Home Page :
If he needs a third eye, he just grows it.

Updated: 10/23/2004; 12:14:35 PM.

Skeptics Annotated Bible

Tuesday, June 10, 2003

Historic Sheet Music

Yesterday, below, I mentioned the Historic Sheet Music project I've been working on. Here's some more info on it.

I've been working with Byron Hoyt Sheet Music (http://www.byronhoyt.com) for the past year and a half or so, doing all sorts of things related to their database and the web site. A while back, we were able to acquire rights to a library of Historic American Sheet Music, in digital format. The library consisted of over 16,000 individual JPG files, which comprised about 3,000 songs, along with an associated SGML database describing the songs. We wanted to make this material available, inexpensively, on the web site.

We've worked before with ebrary (http://www.ebrary.com), a company which specializes in selling PDF files, or portions of them, inexpensively. So the challenge was to take all those JPGs we acquired, create PDFs out of them, make the database easily sortable, and add some value to the PDF, too.

There were a lot of steps in the process: First, just downloading the 16,000 JPGs was a challenge: they took up about 15GB of hard disk, and no small amount of time to download, especially since I only downloaded at night, so we wouldn't put too much of a burden on the source servers or my DSL line. While this was going on, I parsed the SGML file into a set of tables in mySQL, making them more easily searchable. To combine the JPGs into PDFs, I used FOP, part of the Apache project, which parses XSL:FO files, and uses them to form PDFs. I wrote the scripts in PHP, with a few in Perl, and also used a lot of XML and XSL to guide the projects. Lots of fun! But with the additional challenge of limited disk space on my laptop (PDFs are no smaller than JPGs, of course, so they took up another 15GB), I had to build the PDFs in manageable batches. I had to deliver the PDFs to ebrary on CD, so I picked up a fast (48x) CD burner; I would rather have used DVD, which would have made that step easier. The CD burning process took a couple of days, because I made backup CDs as I went along, in case one batch got lost, or one disc in the batch was bad -- and 30 CDs takes a long time to burn, even at 48x.

Anyway, the whole thing is finally up. You can see it at http://www.byronhoyt.com/historic/about.php. We're hosting the database, the searching, etc. But ebrary hosts the PDFs themselves. At ebrary's site, you can download a small benign plug-in that lets you see the files. Best of all: looking at the files on screen is perfectly free. If you want to print them out, it's only $.25 (yes, that's one thin quarter) per page. Poke around Byron Hoyt's search site: I'm especially fond of my "found poetry" page at http://www.byronhoyt.com/historic/found_poetry.php -- it generates found sonnets made up of 14 lines picked randomly from the database of first lines and refrains. Every time you reload the page, you get a different sonnet, and you can click on any line to see details about the song it came from. Note that on all the detail pages of songs, there's a list of subjects applied to that song; clicking on any subject shows you other songs with the same subjects.

Pretty cool. So if you're at all interested, have a poke at the database. It's lots of fun. I haven't looked at all this stuff, so if you see something really interesting, let me know.

7:54:57 PM Permalink comment []

Previous/Next

� � �