CRM114
Table of content
Description
Implementation of the CRM114 Discriminator. CRM114 describes itself as a programmable, fast learning data examiner for various purposes. It can be easily trained to classify mails as SPAM or HAM.
Configuration
Please read first:
- default configuration
- disable
- max_size
- anti-SPAM module configuration
- weight_innocent
- weight_spam
- weight_translate
- user configuration
- default_user
- user_cmd
default_user
Allowed values: String (path to crm114 directory)
Required: no
Default: -
Can be set to global crm114 directory which probably is in /etc/crm114/ or /var/spool/crm114/ (see below).
The user in the CRM114 context is a directory where the CRM114 filter files (*.mfp, *.css, reaver_cache-dir and so on) resides.
Example:
CRM114 lower than -10 will be translated to -100, scores between -3 and -10 will be translated to -50 and so on ..
weight_translate:
5: 20
1: 10
0: 0
-2: 0
-3: -50
-10: -100
cmd_check
Default: "/usr/share/crm114/mailreaver.crm --fileprefix=%user% -u %user% --report_only"
Allowed values: String (path to bogofilter and cmd args)
Required: yes
The command line to bogofilter check command, including all command line arguments. All variables (%user% = user, %file% = path to temporary mail file) can be used.
cmd_learn_spam, cmd_unlearn_ham
Default: "/usr/share/crm114/mailfilter.crm --fileprefix=%user% -u %user% --learnspam
Allowed values: String (path to bogofilter and cmd args)
Required: yes
Command line used for training new SPAM / unlearning HAM mails.
cmd_learn_ham, cmd_unlearn_spam
Default: "/usr/share/crm114/mailfilter.crm --fileprefix=%user% -u %user% --learngood"
Allowed values: String (path to bogofilter and cmd args)
Required: yes
Command line used for training new HAM / unlearning SPAM mails.
Example
---
disable: 0
default_user: /var/spool/crm114/
# > 5: 20
# 1 -> 5: 20
# 0 -> 1: 10
# -2 -> 0: 0
# -3 -> -2: -50
# -10 -> -3: -100
# <-10: -100
weight_translate:
5: 20
1: 10
0: 0
-2: 0
-3: -50
-10: -100
# cmd_check: '/usr/share/crm114/mailreaver.crm --fileprefix=%user% -u %user% --report_only'
# cmd_learn_spam: '/usr/share/crm114/mailfilter.crm --fileprefix=%user% -u %user% --learnspam'
# cmd_unlearn_spam: '/usr/share/crm114/mailfilter.crm --fileprefix=%user% -u %user% --learngood'
# cmd_learn_ham: '/usr/share/crm114/mailfilter.crm --fileprefix=%user% -u %user% --learngood'
# cmd_unlearn_ham: '/usr/share/crm114/mailfilter.crm --fileprefix=%user% -u %user% --learnspam'
CRM114 hints
This is a very simplified installation. For detailed / more in-depth information: google. I assume you don't want an per-user- but a global-css-database.
Install CRM114
First of get crm114. In a debian system that would be:
aptitude install crm114
For anybody else: go to the download page and follow the instructions.
Setup
First you require a directory, which will contain the configuration files and also the "reaver_cache", which can grow as large as all the emails you fed in accumulated.
mkdir /var/spool/crm114 cd /var/spool/crm114
Now copy basic configuration file from your crm114 base installation in you directory.
cp /usr/share/crm114/mailfilter.cf .
Create empty required files
touch rewrites.mfp priolist.mfp whitelist.mfp blacklist.mfp
Create empty css files for SPAM and HAM
cssutil -b -r spam.css cssutil -b -r nonspam.css
Adjust the mailfilter.cf for your needs. Especially have a look at the following keys:
:spw: /mypassword/ :add_verbose_stats: /no/ :add_extra_stuff: /no/ :rewrites_enabled: /no/ :spam_flag_subject_string: // :unsure_flag_subject_string: // :log_to_allmail.txt: /no/
Also be aware of the thresholds, if you will not use the weight_translate, but the weight_spam and weight_ham directives:
:good_threshold: /10.0/ :spam_threshold: /-5.0/
Remember to adjust the directory ownership to your Decency user
chown mailuser:mailgroup -R /var/spool/crm114
Initial training
Train your first mails into crm114. You require a directory containing SPAM and one containing HAM, then this will work:
/usr/share/crm114/mailtrainer.crm --spam=/path/to/spamdir --good=/path/to/hamdir \
--fileprefix=/var/spool/crm114/
Thats all
Performance
It has to analyze the whole mail, which can take up to several seconds, but should be stay under two seconds, most of the time. Depends on you SPAM/HAM dataset and the size of the mail.