|
 |
Tuesday, February 04, 2003 |
Did some tests with the code at textmining.org. There are two Extractor classes in the package, one for Microsoft Word and the other for PDF. It is basically a wrapper for the POIFS and PDFBox packages respectively. There is quite a bit of processing needed for the Word document extractor. On the other hand, the PDF extractor code is calling just the PDF parse and text stripper from PDFBox. Will study the API of PDFBox further to decide if I will be using the PDFBox directly.
11:54:59 AM
|
|
© Copyright 2003 Choong Yong, Koh.
|
|
|