[Product-Developers] Re: Looking for A word filter for plone

Reinout van Rees reinout at vanrees.org
Thu Dec 6 15:14:28 UTC 2007


Mark Phillips wrote:
> I agree with you. Since I found the links to the Bayesian filter in
> python, I think it might not be that hard to implement in workflow.
> 
> 1. User submits document for review
> 2. Document is scanned, and is sent to either the spam or the review
> state (ham). The spam state holds spam that is close to the threshold.
> All 100% spam is automatically rejected.
> 3. Reviewer has 2 worklists - spam and ham
> 3a. Reviewer can reject, publish, or spam the ham - spam goes to the
> filter to train it, published material goes to the filter to train it.
> 3b. Reviewer can reject or publish the spam - rejected spam goes to the
> filter to be trained. Published items go to the filter to train it. 
> 
> A rough cut off the top of my head. Any suggestions?

You're missing a state: ham can be send to the spam state where it gets 
used to train the spam filter. But that same spam state is also the 
state where the suspected spam ends up in.

I'd keep the regular review queue in place (and the spam state) but add 
a spam-to-review state.

Option: submitted stuff just ends up in the regular review queue. No 
messing with two possible state destinations for one transition (though 
probably doable). No possibly expensive processing during the user's 
request (which might timeout or hose the server). Instead have a script 
or view (triggered from a cronjob?) go through the review queue once 
every few minutes to check whether it should transition a few items to 
the possible-spam state.

> BTW, what the heck does "overgehaalde dekzwabber" mean in English? I
> couldn't find a Dutch web translation service that would translate it.
> :-)

:-) hard to translate. Something like "idiotic 
broom-used-to-sweep-a-ship's-deck". It loses a bit of expessiveness when 
translated :-)

Reinout



-- 
Reinout van Rees  - Programmer at http://zestsoftware.nl/
http://vanrees.org/weblog/          reinout @ vanrees.org
"Information overload isn't the problem. If it was, you'd
walk into a library and die." (David Allen)





More information about the Product-Developers mailing list