[Product-Developers] ANN: collective.simserver finding semantically similar content

Christian Ledermann christian.ledermann at gmail.com
Thu Jan 26 14:15:03 UTC 2012


This product(s) connect plone to a Document Similarity Server

http://radimrehurek.com/gensim/simserver.html

What is a document similarity service?
Conceptually, a service that lets you :

* train a semantic model from a corpus of plain texts
  (no manual annotation and mark-up needed)
* index arbitrary documents using this semantic model
* query the index for similar documents (the query can be
  either an uid of a document already in the index, or an
  arbitrary text)

What is it good for?
Digital libraries of (mostly) text documents. More generally,
it helps you annotate, organize and navigate documents in
a more abstract way, compared to plain keyword search.


Why integrate it into plone?
The related items is a powerful feature of plone but content
managers mostly fail to do it, here simserver comes to the rescue and
does it automatically (well not yet, it still has to be invoked manually ;)

The plone product consists actually of two products:

1) collective.simserver.core
https://github.com/cleder/collective.simserver.core
provides the common core functionality like an abstracted
call interface, training of the corpus  and indexing

2) collective.simserver.related
https://github.com/cleder/collective.simserver.related
provides a form to query the simserver for similar items
and set them as related items.
a simserver collection that queries the simmserver
for all documents related to this collection
(useful for batch tagging with e.g.
collective.smartkeywordmanager)

Plone communicates with the simserver via HTTP.
for the plone products to work you will also need
restsims https://github.com/cleder/restsims
which is a small pyramid wrapper around simserver itself.

WARNING!
restsims does not yet do authentication so you
do NOT want to USE it on a PUBLIC network

BEWARE:
there is no documentation yet, hopefully coming soon.
But you can always ask ;)

Please give me some feedback or maybe someone
wants to contribute :)




-- 
Best Regards,

Christian Ledermann

Nairobi - Kenya
Mobile : +254 702978914

<*)))>{

If you save the living environment, the biodiversity that we have left,
you will also automatically save the physical environment, too. But If
you only save the physical environment, you will ultimately lose both.

1) Don’t drive species to extinction

2) Don’t destroy a habitat that species rely on.

3) Don’t change the climate in ways that will result in the above.

}<(((*>


More information about the Product-Developers mailing list