[Product-Developers] Re: Looking for A word filter for plone
Reinout van Rees
reinout at vanrees.org
Thu Dec 6 15:14:28 UTC 2007
Mark Phillips wrote:
> I agree with you. Since I found the links to the Bayesian filter in
> python, I think it might not be that hard to implement in workflow.
>
> 1. User submits document for review
> 2. Document is scanned, and is sent to either the spam or the review
> state (ham). The spam state holds spam that is close to the threshold.
> All 100% spam is automatically rejected.
> 3. Reviewer has 2 worklists - spam and ham
> 3a. Reviewer can reject, publish, or spam the ham - spam goes to the
> filter to train it, published material goes to the filter to train it.
> 3b. Reviewer can reject or publish the spam - rejected spam goes to the
> filter to be trained. Published items go to the filter to train it.
>
> A rough cut off the top of my head. Any suggestions?
You're missing a state: ham can be send to the spam state where it gets
used to train the spam filter. But that same spam state is also the
state where the suspected spam ends up in.
I'd keep the regular review queue in place (and the spam state) but add
a spam-to-review state.
Option: submitted stuff just ends up in the regular review queue. No
messing with two possible state destinations for one transition (though
probably doable). No possibly expensive processing during the user's
request (which might timeout or hose the server). Instead have a script
or view (triggered from a cronjob?) go through the review queue once
every few minutes to check whether it should transition a few items to
the possible-spam state.
> BTW, what the heck does "overgehaalde dekzwabber" mean in English? I
> couldn't find a Dutch web translation service that would translate it.
> :-)
:-) hard to translate. Something like "idiotic
broom-used-to-sweep-a-ship's-deck". It loses a bit of expessiveness when
translated :-)
Reinout
--
Reinout van Rees - Programmer at http://zestsoftware.nl/
http://vanrees.org/weblog/ reinout @ vanrees.org
"Information overload isn't the problem. If it was, you'd
walk into a library and die." (David Allen)
More information about the Product-Developers
mailing list