Note: this assumes you have python, win32all, and MS Word all installed properly.
Running Word from python is incredibly easy.
import win32com.client
# Start the application.
app = win32com.client.Dispatch('Word.Application')
# Open a document.
doc = app.Documents.Add('c:\path\to\document\test.doc')
# Or:
app.ChangeFileOpenDirectory('c:\path\to\document\')
doc = app.Documents.Add('test.doc')
# To save as HTML (filtered; i.e. without most of the MS cruft)
# Note: I don't know if the type-number is portable??
doc.SaveAs('test.html', 10) ## 10 == HTML-Filtered
doc.Close(0) ## 0 == don't save changes?
app.Quit()
This will all run "invisibly". If you want to display the app while doing all of this, set app.Visible = 1 somewhere near the top. If you simply want to fetch the text from the document, use doc.Content.Text. It will return a Unicode string and it's worth noting that those "smart quotes" have no representation in ASCII so you might have to be careful decoding it.