> For other formats, such as .doc, there's nothing to scale down;
> something (not necessarily the Microsoft Word application) must
> produce, from the .doc format, a view of the file. 

Microsoft office documents, and some other file formats on Windows
include a thumbnail view within the file properties, so you don't
actually need code which can read and understand the actual file data,
rather you need code which can extract the properties from the file. (Or
at least this is true for older versions of MSOffice, I'm not sure what
the latest version does). 

There could be something to be said for Plone if it understood the
summary information in uploaded office documents: the title, keywords
etc. could all be extracted automatically. On the other hand (as we
found to our cost)  most users never bother to set title and author
information in word documents (but they do copy existing documents): our
Google Search Appliance extracts the titles from any documents it finds
and the users then wonder why it ignores Plone's title and displays
something irrelevant instead. 

Anyway, if you can find some Python code to read POI and HPSF you could
extract the thumbnails. You can do it with Java, so maybe run an external 
command to extract the needed information.


