Java HTML parsers..
The LinkbackExtractor that I posted yesterday uses the Swing HTML parser, which is built into Java, but there are other Java-based HTML parsers available. Erik Hatcher suggested the JTidy HTML parser and there is also the HTMLParser project on SourceForge. Know of any others?
[Blogging Roller]
Jelly uses NekoHTML to parse HTML as if it were XML which automatically fixes up any missing tags and can perform case conversion of element or attribute names etc. NekoHTML provides HTML parsing as a normal SAX parser, which is cool - I can highly recommend it.
9:00:01 AM
|