James Strachan's Weblog

Updated: 17/4/07; 10:53:25.

James Strachan's Weblog
Ramblings on Open Source, Java, Groovy, XML and other geeky malarkey

Monday, January 13, 2003

Java HTML parsers

Java HTML parsers..

The LinkbackExtractor that I posted yesterday uses the Swing HTML parser, which is built into Java, but there are other Java-based HTML parsers available. Erik Hatcher suggested the JTidy HTML parser and there is also the HTMLParser project on SourceForge. Know of any others?

[Blogging Roller]

Jelly uses NekoHTML to parse HTML as if it were XML which automatically fixes up any missing tags and can perform case conversion of element or attribute names etc. NekoHTML provides HTML parsing as a normal SAX parser, which is cool - I can highly recommend it.

9:00:01 AM comment []

© Copyright 2007 James Strachan.

Open Source projects I work on
ActiveCluster
ActiveMQ
ActiveSOAP
ActiveSpace
axion
betwixt
Camel
commons
db
dom4j
groovy
geronimo
jaxen
jelly
jexl
Jencks
Lingo
maven
messenger
panoptes
picocontainer
ServiceMix
sql
spring
sysunit
taglibs
XBean

blogchalk: James/Male/31-35. Lives in United Kingdom/London/Islington and speaks English. Spends 60% of daytime online. Uses a Fast (128k-512k) connection.

this site is a java.blog

currently subscribed to:

<big>kev's</big> catalogue of this and that.

::Manageability::

Abe Fettig's Web Workshop

Absurdities and nonsenses

All Things Distributed

Andres Aguiar's Weblog

Aslak Hellesoy's Weblog

Blaug Blawg Blog

Blogging Pubbitch

Brett Morgan's ~~Insanity Weblog~~ Zilla

Brian Behlendorf's Blog

Brian Jepson's Weblog

Brian Maso's Tecno-Geek Weblog

Bright Eyed Mister Zen

Cafe con Leche XML News and Resources

Citations for : squishy

Clemens Vasters: Indigo'ed

Cocoon and more

Codito ergo sum

corporate eejit

Coty's Radio Weblog

Craig Burton: logs, links, life, and lexicon

crazybob.org - web log

Don Box's Spoutlet

Don Park's Daily Habit

Doug Kaye: Web Services Strategies

ericfreeman.com

Erik Hatcher - Blog

Forwarding Address: OS X

Gordon Weakliem's Weblog

graham glass: what's next?

Guido Casper's Radio Weblog

IKVM.NET Weblog

Java Testing, Tools, and Engineering

Jeff Turner's Weblog

Jeremy Allaire's Radio

Joel on Software

Jon's StudioZ.tv Blog

josh lucas' Radio Weblog

Ken Bereskin's Radio Weblog

Mac Net Journal

Mark O'Neill's Radio Weblog

Martin Dulisch's Radio Weblog

Matt Croydon::postneo

Michael J. Radwin's blog

Mitch Kapor's Weblog

Mozquito XForms

Nathan Torkington's Radio Weblog

Nicholas Riley's Weblog

Ockham's Flashlight

Off the beaten track

Otaku, Cedric's weblog

Outer Web Thought Log

Ovidiu Predescu's Weblog

Part of the problem.....

Patrick Chanezon's Radio Weblog

Patrick Logan's Radio Weblog

Peter Drayton's Radio Weblog

PSquad's Corner

Pushing the envelope

Raible Designs ~ We Build Web Apps

Random thoughts

Ray Ozzie's Weblog

Rick Salsa's Blog

Rod Waldhoff's Weblog

Royle's Random Ruminations

Sanjiva Weerawarana's Radio Weblog

Sean McGrath, CTO, Propylon

Service Oriented Enterprise

Sklires Skepsis...

Small Values of Cool

Spike's GeekBlog

Steve Conover's Weblog

Steve's Radio Weblog

Ted Leung on the air

The Mountain of Worthless Information

The Occasional Blogger

TheArchitect.co.uk - Jorgen Thelin's weblog

TheServerSide.Com: Your J2EE Community Forum

Thinking About Computing

Tobiased thoughts

Tomalak's Realm

Vincent Massol Think Tank

Weblog for Costin Manolache

Weblog: Morgan Delagrange

xpzen.com: what's on my mind

Here's how this works.