Bogofilter

Description

Bogofilter is a statistical analysis, Bayesian mail filter. It has to be pre-trained with SPAM and HAM mails and can be later on tuned by retraining false positive as HAM or false negatives as SPAM.

Configuration

Please read first:

default_user

Default: -
Allowed values: string (path)
Required: no

Should contain path to a global bogofilter.cf file, if a shared SPAM database should be used. Typically, that would be /etc/bogofilter.cf

cmd_check

Default: "/usr/bin/bogofilter -c %user% -U -I %file% -v"
Allowed values: string (path to bogofilter and cmd args)
Required: yes

The command line to bogofilter check command, including all command line arguments. All variables (%user% = user, %file% = path to temporary mail file) can be used.

cmd_learn_spam

Default: "/usr/bin/bogofilter -c %user% -s -I %file%"
Allowed values: string (path to bogofilter and cmd args)
Required: yes

Command line for learning SPAM with bogofilter.

cmd_unlearn_spam

Default: "/usr/bin/bogofilter -c %user% -N -I %file%"
Allowed values: string (path to bogofilter and cmd args)
Required: yes

Command line for UNlearning SPAM for mails which has been marked as SPAM beforehand.

cmd_learn_ham

Default: "/usr/bin/bogofilter -c %user% -n -I %file%"
Allowed values: string (path to bogofilter and cmd args)
Required: yes

Command line for learn new HAM.

cmd_unlearn_ham

Default: "/usr/bin/bogofilter -c %user% -n -I %file%"
Allowed values: string (path to bogofilter and cmd args)
Required: yes

Command line for UNlearn a mail which has been falsely recognized as SPAM.

apply_spamicity

Default: 0
Allowed values: Bool

Whether the spamicity value of the bogofilter should be factored into the result score. Bogofilter returns a value between 0 and 1, thus if you score SPAM with -50 and bogofilter's spamicity is 0.5 the effective score will be -25.

Example

---

disable: 0

apply_spamicity: 0

cmd_check: '/usr/bin/bogofilter -c %user% -U -I %file% -v'
cmd_learn_spam: '/usr/bin/bogofilter -c %user% -s -I %file%'
cmd_unlearn_spam: '/usr/bin/bogofilter -c %user% -N -I %file%'
cmd_learn_ham: '/usr/bin/bogofilter -c %user% -n -I %file%'
cmd_unlearn_ham: '/usr/bin/bogofilter -c %user% -S -I %file%'

default_user: '/etc/bogofilter.cf'

Bogofilter hints

This is not about how to configure or run bogofilter in depth, just some issues that might come in handy. No warranties this is the best or even correct way to do it, though.

Global SPAM directory

If you want to use one SPAM database rather than one per (unix) user, you can set the bogofilter_dir in /etc/bogofilter.cf:

bogofilter_dir = /var/spool/bogofilter

In the bogofilter configuration in Decency you should then set the "default_user" to the global config file

default_user = /etc/bogofilter.cf

Initial train bogofilter

As all statistical analysis filters bogofilter requires to be trained before it might come into action. If you have a large SPAM database (HAM you probably have: your inbox), let's say at least 10,000 mails, use those. If you don't you can get an initial SPAM corpus from here or google it or collect it via the HoneyPot / HoneyCollector modules.

Assuming you have your SPAM and HAM files in two directories as eml files, you can train bogofilter like this:

cd spam-ham
find ham/ -type f -exec bogofilter --user-config /etc/bogofilter.cf -n -I {} \;
find spam/ -type f -exec bogofilter --user-config /etc/bogofilter.cf -s  -I {} \;

Or if you have mbox files:

cd spam-ham
bogofilter --user-config /etc/bogofilter.cf -n < ham.mbox
bogofilter --user-config /etc/bogofilter.cf -n < spam.mbox

More detailed informations can be found in the Bogofilter FAQ.

Performance

It has to analyze the whole mail, which can take up to several seconds, but should be mostly under one second. Depends on you SPAM/HAM dataset and the size of the mail.