Quest for Massage, Part 2
Time to roll up our sleeves...
The first thing I want to do is read in the web page holding the masseuse's appointment schedule. Easy to do with Python's 'batteries included' standard library. There is both the 'urllib' and 'urllib2' modules, both of which include a 'urlopen' method. Here, I'm using 'urllib2': from urllib2 import urlopen urlObj = urlopen('www.iwannamassage.com')
This gives me a file-like object I can use do get the web page data (there's no way I'm gonna give you the real URL of the massage schedule page... I'd never get an appointment if I did). Next, I'll read in all of the data using the 'readlines' method: data = urlObj.readlines()
Logically, the next thing I want to do is parse the web page data. Recently, I've been doing a lot of XML parsing for work-related Python projects. Initially, I used the low-level XML 'expat' parser available in the module 'xml.parsers.expat'. But recently, I've been using the excellent 'ElementTree' lightweight DOM-like parser by the effbot.
ElementTree is DOM-like in that it parses XML into a tree of elements (hence the name). Getting the data in a tree makes getting access to any part of the data quite easy. But rather than attempting to do this just like DOM, ElementTree does it in a more pythonic way; for example, you can use a Python iterator to iterate over the sub-elements of any element in the tree. Go Pythonic!
Since it has become my XML parser of choice, I was happy to discover that ElementTree can be told to use an HTML parser rather than the default XML one.
Hmmm... Getting pretty late. More in Part 3.
11:34:59 PM
|