[Setup] Indexing and searching on Word.

alan at enfoldsystems.com alan at enfoldsystems.com
Tue Feb 28 20:35:06 UTC 2006

> Hey guys I have some questions if someone has any time.
> I have a basic plone site, and it works great. But I need to be able to
> search on the  content of Word docs. What's the easiest way to go about
> doing this? I read a bit up on Archetypes, but it sounds like that's a
> mechanism for converting information, which I don't necessarily need to
> to.

i think you want the converters in textindexng2.  I believe ingeniweb also
has some mechanism to index word documents.  enfold server (our commercial
product) does it with microsoft's native IFilter functionality.  there are
quite a few ways to do this.

> Looking at this document:
> http://plone.org/documentation/how-to/integrating-office-files

not familiar with it.

> It says install Archetypes, but isn't that already included with Plone
> (btw it would be great if it linked to the things it's asking you to
> install).

how about logging into plone and adding a comment?

  - I know a lot of the technologies such as pdf2html work but there are a
myraid of problems having the work out of the box.

> I downloaded Archtetypes 1.3.7 bundle, and the readme.txt mentions to
> run the "quickinstaller_tool", there's no such file in the Archetypes
> directory structure... ??

archetypes comes with modern plones, I believe since plone 2.0.0

> Any guidance would be most appreciated.

well.. programmatically you want to influence the SearchableText index to
index word documents.  this is a zope/python issue not something
particular to Plone/Archetypes.  Although you could make it so
SearchableText of a Archetype would return a string that would be
consumable by the catalog index.

> What are the odds that the next version of plone and index and search
> office docs out of the box?

limited if any.  its a REAL PITA.  what we should do is focus on
documentation and the best way to accomplish this indexing.  then if
everyone can agree on a strategy - it can end up in Plone.

also some tools work on linux but not on windows.  we have a
portal_transforms Word COM transformation we have contributed - several
people have gotten that to work.


More information about the Setup mailing list