[Framework-Team] Plip : indexing files

Martin Aspeli optilude at gmx.net
Mon Jan 29 12:43:35 UTC 2007


Hi Thierry,

I think this sounds quite interesting. Certainly, a better "document"
story (which includes full-text indexing and a strategy to avoid ZODB
bloat, e.g. blobfile) is pretty high on my wishlist for 3.5 (and
limi's as well, fwiw).

I would like to see a proposal that is somwhat less AT centric,
though. It may be wishful to think that we can achieve this, but
ideally we'd decouple portal_transform entirely, replacing it with a
lighter framework based on Zope 3 adapters and utilities (a transform
is a utility, adapters take care of the actual extraction of data to
transform and consumption of the transformed text). This should also
allow some async option (register a consumer for the transform that is
called when the transform is complete).

At this point, we could extend ATFile relatively easily to use this. I
don't think we'd want a new content type, but rather to extend ATFile
as necessary.

I think BLOB storage and transform should be two separate proposals
and two separate implementations.

Martin

On 1/29/07, tbenita at atreal.net <tbenita at atreal.net> wrote:
> Hi,
>
> I'd like to make a proposal that extends Plip #177
> http://plone.org/products/plone/roadmap/177
>
> We developed a plone component that stores a file with its html preview :
> ATFilePreview .
>
> This does the following :
>
> - make the file available for download
>
> - create a html preview of the file
>
> - index the file's content in full text
>
>
> It has the following advantages :
>
> - it uses mimetypes registry in order to detect mimetypes
>
> - it uses portal transforms in order to create the preview and uses this
> preview in order to extract the text that has to be indexed
>
> - it stores both html preview and all subobjects into the object, as
> persistant sub-objects
>
> - it's totally generic : obviously it does preview and indexes for
> opendocuments, ms documents, pdf, rtf, html, python etc. It may also show
> a preview for zip files, video files, audio files or whatever you can
> imagine. Let's take the example of a video file : you may decide that all
> video that is uploaded will be transcoded to mkv format and streamed in
> the page via a java applet that displays the video. You only need to have
> a video_to_html transform that will achieve it. The result will be stored
> together with the original file and the html preview will be displayed.
>
> - the trunk (it's in collective) stores everything inside the object in
> zodb, so it has no dependency and can take place of normal file objects
>
> - there is another version that stores file, html and subobjects in the
> filesystem. It currently uses FSS but we'd like to move that to BlobFile
> as FSS is a bit too complex for our usecase.
>
> - we don't need all the TING mechanics in order to get the fulltext
> indexing : we only need the UnicodeLexicon as far as portal transforms
> send unicode results (tested in france ; you can imagine ;-) )
>
> - we already have the transforms for all office files in
> AROfficesTransform, for which we are currently doing the integration into
> archetypes.
>
>
>
> At this time there are 2 new things to consider :
>
> - portal transforms may overload the zope server
>
> - there may be decorators that should be applied to files in order to
> handle properly specific extra fields (especially for multimedia files :
> metadata etc.)
>
> * Concerning overload of zope server : I think that we should have an
> asynchronous portal transform that may run as a separate twisted deamon.
> This may live together with portal_transforms and may be called
> asynchronous_portal_transform (APT). The only difference with
> portal_transforms is that we need to give a callback method to APT in
> order to allow it to send the result of the transform after a while.
> Therefore if a content type is APT-aware and APT is activated, APT is used
> instead of portal_transforms. This allow to move the overload to one or
> many dedicated servers for example. We may also take a look at BlueDCS (I
> just heard of it but never tried it)
>
> * Concerning the decorators : there should be a kind of
> decorators_registry that would allow to add decorators based on mimetypes
>
> What do you think of all these points ?
>
> Best regards,
>
> Thierry.
>
> --
> atReal
> http://www.atreal.net
>
>
>
> _______________________________________________
> Framework-Team mailing list
> Framework-Team at lists.plone.org
> http://lists.plone.org/mailman/listinfo/framework-team
>
>




More information about the Framework-Team mailing list