[Setup] Indexing and searching on Word.
alan at enfoldsystems.com
alan at enfoldsystems.com
Tue Feb 28 20:35:06 UTC 2006
> Hey guys I have some questions if someone has any time.
>
> I have a basic plone site, and it works great. But I need to be able to
> search on the content of Word docs. What's the easiest way to go about
> doing this? I read a bit up on Archetypes, but it sounds like that's a
> mechanism for converting information, which I don't necessarily need to
> to.
i think you want the converters in textindexng2. I believe ingeniweb also
has some mechanism to index word documents. enfold server (our commercial
product) does it with microsoft's native IFilter functionality. there are
quite a few ways to do this.
> Looking at this document:
> http://plone.org/documentation/how-to/integrating-office-files
not familiar with it.
> It says install Archetypes, but isn't that already included with Plone
> (btw it would be great if it linked to the things it's asking you to
> install).
how about logging into plone and adding a comment?
- I know a lot of the technologies such as pdf2html work but there are a
myraid of problems having the work out of the box.
> I downloaded Archtetypes 1.3.7 bundle, and the readme.txt mentions to
> run the "quickinstaller_tool", there's no such file in the Archetypes
> directory structure... ??
archetypes comes with modern plones, I believe since plone 2.0.0
> Any guidance would be most appreciated.
well.. programmatically you want to influence the SearchableText index to
index word documents. this is a zope/python issue not something
particular to Plone/Archetypes. Although you could make it so
SearchableText of a Archetype would return a string that would be
consumable by the catalog index.
> What are the odds that the next version of plone and index and search
> office docs out of the box?
limited if any. its a REAL PITA. what we should do is focus on
documentation and the best way to accomplish this indexing. then if
everyone can agree on a strategy - it can end up in Plone.
also some tools work on linux but not on windows. we have a
portal_transforms Word COM transformation we have contributed - several
people have gotten that to work.
alan
More information about the Setup
mailing list