URL filtering

Introduction

URLBL filtering consists to extract all URLs from the body of the message and check the domain part against a blacklist.

Although it's possible to create your own blacklist of URLs, j-chkmail distributes a modified version of SURBL blacklist. SURBL is a very interesting blacklist as they don't include domain names if they appear in legitimate messages. This means a very low false positive rate.

You can integrated the blacklist of URLs in two ways :

  • Local BerkeleyDB database - this is the preferred way as BerkDB database queries are much faster than DNS queries. You can rsync the blacklist database once a day.
  • DNS blacklist - the filter will query a public DNS server. If you run a high volume mail server (e.g. a hundred thousand messages a day), you can have a copy of surbl zone at your DNS. Check :http://www.surbl.org/faq.html#high-volume.

The list distributed by j-chkmail (BerkeleyDB format) contains SURBL data augmented by some few domains inserted by j-chkmail maintainer.

Configuration

  • Enable it !

j-chkmail.cf

# SPAM_URLBL
#     Do pattern matching
#  Syntax : -----
#     VALUES :  NO  YES 
SPAM_URLBL                         YES
  • If you want to use the Berkeley DB format, enable this line :

j-chkmail.cf

# DB_URLBL
#     Database Real-Time URL Blacklist (used for content checking)
#  Syntax : -----
DB_URLBL                  j-urlbl.db     
  • If you prefer the DNS format, enable this one :

j-chkmail.cf

# DNS_URLBL
#     DNS Real-Time URL Blacklist (used for content checking)
#  Syntax : RBL[/CODE[/SCORE]] - multi.surbl.org/127.0.0.1/10
DNS_URLBL                 multi.surbl.org         
Enable only one format : DNS or Berkeley DB, but not both !

When to choose DNS format or BerkeleyDB format ?

  • BerkeleyDB queries are much much faster
  • BerkeleyDB remove all dependency of your server from external ressources.
  • BerkeleyDB needs a copy of all data in your computer - you'll need to set up three times the size of database - roughly 300 MBytes.
  • With the DNS format, the only thing you need to do is to configure the DNS_URLBL line.

URL blacklist database

URL blacklisted database are saved inside /var/jchkmail/cdb directory. You'll find two files there :

  • j-urlbl.txt - text version of the the blacklist.
  • j-urlbl.db - Berkeley DB version of text file. This is the format effectively handled by the filter.
Disk space required by URL blacklist database is three times its size. Currently, its size is almost 100 MB, so the required space is almost 300 MB.

Don't remove the j-urlbl.txt file. This file MUST be kept there, as it's used during database update to save bandwidth. If the file is there only the differences will be transfered. Maybe network bandwidth isn't a problem to you, but it may be to the rsync server.

the content of the text file is as follow:

/var/jchkmail/cdb/j-urlbl.txt

URLBL:zzzxzaasdx.com                           20:0:127.1.0.7:multi.surbl
URLBL:zzzyf.com                                20:0:127.1.0.7:multi.surbl
URLBL:zzzzzzzzzzzzzzzzzzzz.org.uk              20:0:127.1.0.7:multi.surbl
URLBL:130kg.com                                20:0:127.2.0.1:j-chkmail
URLBL:20fr.com                                 20:0:127.2.0.1:j-chkmail
URLBL:2288.org                                 20:0:127.2.0.1:j-chkmail

Just run make in that directory when touching the text database manually

Getting and Syncing database

You can find a script in source directory etc/get-urlbl.org. Rename that file and put it where you like, then launch it once a day in crontab. The result file is a big text file of 1.2M lines

URLBL database hacking

  • Modifying original database - You can modify scores of the original database or use a whitelist with the cvt-urlbldb script. The whitelist file is a text file with one domain per line.

Syntax and Example

Syntax :
cvt-urlbldb [-s newscore] [-w whitelist] [-o source] inputfile > outputfile

Exemple :
cvt-urlbldb -s 30 -w urlwl.txt -o multi j-urlbl.txt > j-urlbl-local.txt
  • Creating your own database or URLs - To create your own database, you can use the mk_dbin script. The input file is a text file with one domain per line.

Syntax and Example

Syntax :
mk_dbin [-s score] [-c code] -o source

Example :
mk_dbin -s 25 -c 127.1.0.1 -o local localbl.txt > j-urlbl-local.txt

  • You can concatenate multiple URLBL text files. If one entry was already found in another file, the first one is taken into account. So, order is important.
  • Look at /var/jchkmail/cdb/Makefile and /var/jchkmail/cdb/get-urlbl files. You'll probably need to modify the Makefile you'll find inside /var/jchkmail/cdb
  • You may eventually need to modify /etc/mail/jchkmail/j-chkmail.cf file to indicate the new database file : URLBL_DB configuration option.
  • Take care to not move or change the j-urlbl.txt file used by rsync.

Logging

Look at DBURLBL which showed that this mail has been rejected

/var/log/j-chkmail

Mar  4 17:08:18 mx0 j-chkmail[7771]: [ID 000000 local5.info] 47CD73F2.001 Connect from emailer99-151.emv1.net
Mar  4 17:08:21 mx0 j-chkmail[7771]: [ID 000000 local5.info] 47CD73F2.001 Bayes filter score :  0.685
Mar  4 17:08:21 mx0 j-chkmail[7771]: [ID 000000 local5.notice] 47CD73F2.001 DBURLBL : trc1.emv2.com :  20 BLACKLISTED in DBURLBL:j-chkmail
Mar  4 17:08:21 mx0 j-chkmail[7771]: [ID 000000 local5.notice] 47CD73F2.001 SPAM CHECK - M02 NB HTML > PLAIN : 1 0
Mar  4 17:08:21 mx0 j-chkmail[7771]: [ID 000000 local5.info] 47CD73F2.001 ORACLE - M02 text/html without text/plain (   0.2)
Mar  4 17:08:21 mx0 j-chkmail[7771]: [ID 000000 local5.notice] 47CD73F2.001 : SMQID=(NOID), Callback=(eom), Why=(Content Check : B=0.685 U=20 R=0
           O=0 -> G=1.082), PeerAddr=(84.14.99.151), PeerName=(emailer99-151.emv1.net), MAIL=(<email@club-prive.emv1.net>), NbRCPT=(1/1), RCPT=(<l
           XXX@univ.fr>), HeaderFrom=('Club-prive.fr' <email@club-prive.emv1.net>), Scores=(R=0 U=20 O=0 B=0.685 ->  1.082), Size=(6437), Reply
           =(550 5.7.1 Sorry, this message is being rejected as it seems to be a spam !)

You can see the scores as explained here B=0.685 U=20 R=0 O=0 → G=1.082

Which means that URLBL put a score of 20

doc/spam/url_filtering.txt · Last modified: 2008/03/07 20:57 by martins
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0