Welcome!

Machine Learning Authors: Elizabeth White, Zakia Bouachraoui, Carmen Gonzalez, Yeshim Deniz, Liz McMillan

Related Topics: Linux Containers

Linux Containers: Article

How to set up IMAP on the cheap, Part 4

Details on how to configure Procmail to block spam

(LinuxWorld) -- This is the fourth in a series of articles on how to setup Cyrus IMAP, Postfix, and Procmail to create a powerful mail system with spam filtering. If you really want powerful spam filtering, I recommend that you install the two programs Spam Assassin and Vipul's Razor in addition to the aforementioned programs. These two programs work together with procmail to eliminate nearly all incoming spam. If you use the default packages for any reasonably well-designed distribution, Spam Assassin and Vipul's Razor should be easy to install and configure (in most cases, such as with Debian, no manual configuration is necessary at all). Regardless, it is beyond the scope of this series to cover these two additions in detail, so we'll only address one "gotcha" that you may encounter when you combine Spam Assassin with postfix and/or Cyrus IMAP. More on that later.

Last week we touched briefly on the power of procmail and how to create a basic recipe file. This week we'll delve more deeply into procmail and set up a more useful set of recipes. However, if you really want to plumb the depths of procmail in a big way, the best resource I've found so far is the Procmail Documentation Project (see resources for link). In particular, check out the tips section. This is a link you'll want to browse to and keep open whenever you edit your procmailrc recipe files.

Let's get to work. Last week we created a working procmail recipe file, but it wasted effort we could have saved and wasn't terribly useful. Now that you've had a taste of procmail, let's do this up right. We're going to structure our procmail recipe files in the following way:

  • Assign global variables
  • Create a backup for every message that comes through
  • Run the message through spamassassin (optional)
  • Apply simple, generic spam traps
  • Include the user's own recipes for sorting incoming mail
    • Assign "local" variables
    • Sort mail that you don't want in the inbox into various folders
    • Send anything "not to me" to the spam folder
  • Apply porn spam traps
  • Apply any other spam traps that may cause false positives
  • If message gets this far, drop into the inbox

Here's how it works. First we assign some variables we can use throughout the rest of the procmail recipe file.

Then we create a backup of every message that comes into the system. This ensures that we'll never lose a message no matter how badly we design our first recipe file. Once you're confident your configuration is working, you may remove this step. The backup command is one of a few commands that lets the recipe file continue to execute after it is done. Normally, when a recipe finds a match, it delivers the message and there is no further processing.

The next command is the only other place in our file where processing continues after a recipe. We run the incoming message through an excellent spam filter called spamassassin, which optionally makes use of another spam filter called razor. This is a two-step process. The first step adds information to a message if it is likely to be spam. The second step checks to see if the added information has flagged the message as spam and drops it into the spam folder if so, after which all further processing stops.

If any of the rest of our recipes find a match, they will deliver mail and stop any further processing.

Next, we apply some very simple spam traps. For example, any message that lacks a "To" field is almost certainly bulk mail, which we can confidently say is spam. So we can safely trap it early, drop it in the spam folder, and save procmail the trouble of examining the message any further. If you're paranoid about such things, you can move this step closer to the bottom of the order of steps.

Then we include the user's own custom procmail recipes for sorting mail into special folders. For example, this is where a user can define a set of recipes to discover whether mail comes from a subscribed mailing list, and then automatically redirect that mail to special folders devoted to saving mail from those lists. The next thing the user's custom recipe set can do is discover if the mail is intended for him or her by checking any "To", "Cc", or similar field to see if it contains the user's name, any mail groups to which the user belongs, or any of the user's aliases. If none of the conditions is true, the mail is likely to be spam, so we can drop it into the spam folder. This step is tricky, and you don't want to include it until you know your way around procmail. I'm including it here as a tip if you want to use it.

As a side note, whenever you include a user's set of recipes it opens a potential security hole. However, we can take procedural and administrative steps to close that hole. More on that later.

Once the user's personal recipes are done, we can apply the pornography recipes. I put this step below the user's personal filters because I happen to use very strict obscenity filtering. Since I filter incoming mail for my young children, I consider any messages that contain certain four-letter-words in the body to be a high risk. The problem with filtering on such strict grounds is that I can lose valid messages intended for me. Many people use these four-letter-words in normal messages, such as when they discuss issues on mailing lists. But since I put this check after the user filters for sorting mailing list messages, those messages will have been deposited in the proper folder and no further processing of the message will occur, so it will never get to this porn filter.

This is also why I put any recipes that can create false positives for spam here, after the user's personal recipes. That way procmail will not drop them in the spam folder before the user recipes get a chance to identify the message as useful.

If the message has survived all the above tests, then it's probably a valid message that belongs in the INBOX. So the last thing we do is simply drop the message in the INBOX without even checking it.

This particular order works well for me, but you can shuffle the order any way that works best for your home or organization.

IMPORTANT NOTE: This approach to server-side spam mail filtering works extremely well with an IMAP mail system. It assumes that you create the spam folder for every account, and it drops spam messages in this folder. However, it isn't perfect, which is why you put the messages in a spam folder instead of discarding them. Every user should check his or her spam folder occasionally (very frequently at first) to see if valid messages are accidentally dropped there. If so, the users should report the errors to the administrator, who can adjust the spam filters or the user's personal recipes to make sure similar valid messages make it into the INBOX in the future.

Here's the whole master /etc/procmailrc file we'll be using:

LOGFILE="/var/log/mail/procmail.log"
DELIVERMAIL="/usr/cyrus/bin/deliver"
IMAP="$DELIVERMAIL -a $USER -q -m user.$USER"
SPAMIT="$IMAP.SPAM"

########################################################## ### Backup ##########################################################

:0 c | $IMAP.Backup

########################################################### ### Spam Assassin ###########################################################

:0fw | /usr/bin/spamassassin -P -F0 -a

:0 * ^X-Spam-Status: Yes | $SPAMIT

########################################################### ### Simple spam traps ###########################################################

# Mass mailing, no "To:" :0 * !^To: | $SPAMIT

:0 * !^From: | $SPAMIT

:0 * !^Subject: | $SPAMIT

########################################################### ### Include individual procmailrc ###########################################################

INCLUDERC=/home/$USER/.procmailrc

########################################################### ### Porn Spam ###########################################################

:0 * ^Subject.*(\|<\pornography\>) | $SPAMIT

:0 B * ^.*(\|<\pornography\>) | $SPAMIT

########################################################### ### From spam traps ###########################################################

:0 * ^FROM_advertising | $SPAMIT

:0 * ^From:.*(advertising|sales|offers|promotion|reply|request|theuseful) | $SPAMIT

########################################################### ### Subject spam traps ###########################################################

:0 * ^Subject:.*\[ADV\] | $SPAMIT

:0 * ^Subject:\ ADV | $SPAMIT

########################################################### ### If we get this far, just deliver it to the user inbox ###########################################################

:0 | $IMAP

:0w { EXITCODE=$? HOST }

Global variables

Now let's address the details. First, we assign certain necessary variables and add a few extras to make it easy to design shortcuts for delivering mail. For example, assuming your Cyrus IMAP deliver program is at /usr/cyrus/bin/deliver and the log file you want to use is /var/log/mail/procmail.log, then here's a decent set of variables to put at the head of your master procmailrc file (usually /etc/procmailrc).

LOGFILE="/var/log/mail/procmail.log"
DELIVERMAIL="/usr/cyrus/bin/deliver"
IMAP="$DELIVERMAIL -a $USER -q -m user.$USER"
SPAMIT="$IMAP.SPAM"

IMPORTANT NOTE: The "-q" switch within the IMAP definition tells Cyrus to deliver mail even if it puts the user's mailbox over quota. This is a good idea for test purposes, but if you are enforcing quotas on your system you'll want to remove the "-q" switch when you're satisfied with your setup.

These definitions create generic Cyrus IMAP delivery command we call IMAP, which we can use later as a shorthand for dropping mail into specific mail folders. For example, if we want to deliver a message to the user's sub-folder Lists/LinuxKernel, we can do simply by specifying the command $IMAP.Lists.LinuxKernel in our recipe. Assuming the user's name is daggett, the expansion of that command would look like this: /usr/cyrus/bin/deliver -a daggett -q -m user.daggett.Lists.LinuxKernel.

Likewise, we can define one or more variables for frequently used commands. There are a lot of cases where we'll deliver a message to the SPAM folder, so we can create a shortcut called SPAMIT that delivers a message to daggett's SPAM folder (user.daggett.SPAM in Cyrus IMAP parlance) with the definition SPAMIT="$IMAP.SPAM. From here on out, we simply use $SPAMIT to drop the message into the user's SPAM folder.

Backup

The next stage in our /etc/procmailrc file is to create a carbon copy of the incoming message and drop it into the user's "Backup" folder. The "c" letter in the ":0 c" portion of the recipe tells procmail to make the carbon copy and keep possessing. We can use the IMAP variable here.

:0 c
| $IMAP.Backup

Spam Assassin

Spam Assassin comes next. The "f" portion of the ":0fw" recipe tells procmail to consider the delivery portion of the recipe to be a message filter and then keep processing the message after that. The "w" portion tells procmail to wait until the recipe is finished before continuing to process the message. The "w" also returns an error code, but we're not checking that here.

If you read the Spam Assassin documentation, it implies you can simply use the command spamassassin -P as your filter. Either postfix or the Cyrus deliver program choked on that, however. (It was difficult to trace to one or the other, but I suspect it was postfix that was confused.) That's because Spam Assassin guesses whether your mailer wants to find a "From" field in a certain place in your header, and I suspect it was guessing wrong. The "-F0" switch fixed the problem.

The "-a" switch is a handy option. It tells Spam Assassin to create a "white list" of email sender addresses. Here's how it works. Suppose your best friend sends you many messages, none of which are tagged as spam. As you get more and more messages from your friend, Spam Assassin will eventually conclude that this person is a "trusted" sender of email and add your friend to the "white list. After that, it no longer matters if your friend accidentally sends you a message that would be misinterpreted as spam. It will get through the spam filter and arrive in your INBOX, anyway.

Finally, if Spam Assassin identified any email as spam, it modifies the message in several ways, one of which is to add a field called X-Spam-Status, and it sets that field to "Yes". That's why we have a second recipe, which checks this field. If it is set to "Yes", then it's probably spam, and it goes into the spam folder.

:0fw
| /usr/bin/spamassassin -P -F0 -a

:0 * ^X-Spam-Status: Yes | $SPAMIT

Simple spam traps

The next step is to eliminate the simplest forms of spam. It is likely that Spam Assassin will take care of this for you, but I include this here for those of you who do not intend to use Spam Assassin.

These recipes simply check to see if the message lacks any of the following header fields: "To", "From", or "Subject". If any of these fields are missing, it's probably spam.

###########################################################
### Simple spam traps
###########################################################

# Mass mailing, no "To:" :0 * !^To: | $SPAMIT

:0 * !^From: | $SPAMIT

:0 * !^Subject: | $SPAMIT

User custom recipes

This is probably the most controversial portion of the recipe file. If any user has his or her own .procmailrc file in the home directory, it is included here. This is problematic in several respects. First, if you are sloppy enough as an administrator, you may set up your system to run procmail as root. This means every user can now usurp root privileges simply by creating a recipe that executes any given program based on any given condition. Even if you don't make that mistake, users can still wreak havoc on your system either deliberately or by mistake. If you don't use mail quotas, for example, any user can fill up your mail partition with a bogus procmail recipe. Finally, this practice assumes that you have a valid UNIX user for every mail user on the system (or at least a valid home directory for every mail user).

If you want to avoid all these problems, then simply create a sub-directory somewhere safe (such as /etc/username, where username is the mail user's name, and assuming you've locked down the files and directories under /etc), and place the custom recipes there. Your users cannot edit this file themselves, but they can make requests to you (the administrator) and you can add their custom recipes.

Here's the home directory method.

###########################################################
### Include individual procmailrc
###########################################################

INCLUDERC=/home/$USER/.procmailrc

Here's a small excerpt from my personal .procmailrc file to give you an idea of how I sort incoming mail. Note that I can continue to use the global variables I set in the /etc/procmailrc file, such as IMAP.

LIST="$IMAP.Lists"
NEWSALERT="$IMAP.News-Alerts"
PRESSRELEASE="$IMAP.Press-Releases"
YESHUA="$IMAP.Yeshua"
HUMOR="$IMAP.Humor"

######################################################### ### Lists #########################################################

:0 * ^Subject.*(\[IPG\]|\[SMZ\]) | $LIST.NetPress

The remainder

The rest of the recipes in the /etc/procmailrc file are self-explanatory, with one exception. I deliberately rewrote the porn recipe to include only two words in order to avoid using profanity in this article. Note, however, that there's also a new concept here. The words "porn" and "pornography" are surrounded by two different bracket definitions. The words are preceded by a backslash followed by a less-than symbol, and the words are followed by a backslash then a greater-than symbol. These marks tell procmail that we're looking for whole words and not substrings.

The other new item is the parenthetical, along with the pipe symbol that separates the two words within the parentheses. This tells procmail to look for either of the two words. It is a logical "OR" condition. More on that in a moment.

###########################################################
### Porn spam
###########################################################

:0 * ^Subject.*(\|<\pornography\>) | $SPAMIT

:0 B * ^.*(\|<\pornography\>) | $SPAMIT

The final step

A little more spam processing, and we're finally ready to drop the message in the INBOX if it got past all the aforementioned conditions. We wait for a return code (which is why we add the "w" to the recipe) and then put a bit of stuff for the log (the final recipe, which also contains a "w" to wait for completion).

###########################################################
### If we get this far, just deliver it to the user inbox
###########################################################

:0w | $IMAP

:0w { EXITCODE=$? HOST }

ANDs and ORs

No doubt, you're going to want to create your own recipes. Others have created much better tutorials on procmail (see the documentation project in the resources section), but here are two indispensable tips: how to create AND and OR conditions.

AND conditions are extremely easy. You simply list them one after the other. For example:

:0
* ^From:.*daggett
* ^Subject:.*descrambler
| $SPAMIT

If the message is from anyone with "daggett" in the address AND the subject line includes the word "descrambler", the condition is true and the message will go to the spam folder.

OR is easy if you check a single field for multiple values. For example:

:0
* ^From:.*(bill_gates|steve_ballmer)
| $SPAMIT

In this case, if the mail is from "bill_gates" OR "steve_ballmer", it will go to the spam folder.

What if you want to use a complex OR condition on different header fields? That gets tricky. There are several ways to do it, but here's the formula I always follow:

:0
* ! ^From:.*bill_gates
* ! ^Subject:.*dotNet
( )
:0E
| $SPAMIT

IMPORTANT NOTE: Make sure the parentheses "( )" contain a space between them!

Believe it or not, this recipe says that if the message is from "bill_gates" OR the subject includes the string "dotNet", then drop the message in the spam folder. Just substitute the conditions you want in place of those in my example, and substitute the action you want in place of my $SPAMIT delivery command.

IMPORTANT NOTE: As counterintuitive as it may seem, you must negate any of the condition you expect to be TRUE in your OR combination so that the condition you are looking for is actually FALSE! In other words, although the OR condition we want to define looks for a match of "bill_gates", the actual recipe condition is negated by the exclamation mark "!". It looks like it's saying "match any message where From is NOT equal to "bill_gates", but this is exactly what we want in order to make the OR work.

Why do we need to negate these conditions and add things like the parentheses and space? It's actually quite logical if you know all the details, but you don't need to know the details to use this technique. Here's how to determine if you should care how the OR condition works. Answer this puzzle:

You come to a fork in the road. You know that one route will take you to the "Truth Teller" village, and the other will take you to the "Liar" village. "Truth Tellers" can never lie. "Liars" can never tell the truth. There is a man standing at the fork in the road. You don't know if that man is a "Truth Teller" or a "Liar." What single question can you ask this man in order to find out which way goes to which village? (No compound questions are allowed. In other words, you cannot ask "Are you an elephant and which way is it to the "Truth Teller" village?") I'll give you an answer (the spoiler below) in a moment.

If it takes you more than 5 minutes to figure out the answer to the above riddle, then just take my word for it that the technique works. If you figured out the puzzle quickly, then visit the following URL for an explanation of this procedure, which is called a "DeMorgan Rule."

Warning: Spoiler

Here's the answer to the riddle of the "Truth Teller" and "Liar" villages. You ask, "Which way is it to your village?" Whether the person is a "Liar" or a "Truth Teller" is irrelevant now. Either one must point to the "Truth Teller" village, which is all you need to know. This is as much of a "think outside the box" question as a test of Boolean logic, since the key to getting the simplest answer is to bypass the need to find out whether the person at the fork is a "Truth Teller" or a "Liar". You don't need that information. All you need to know is which village is which.

More Stories By Nicholas Petreley

Nicholas Petreley is a computer consultant and author in Asheville, NC.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


CloudEXPO Stories
There's no doubt that blockchain technology is a powerful tool for the enterprise, but bringing it mainstream has not been without challenges. As VP of Technology at 8base, Andrei is working to make developing a blockchain application accessible to anyone. With better tools, entrepreneurs and developers can work together to quickly and effectively launch applications that integrate smart contracts and blockchain technology. This will ultimately accelerate blockchain adoption on a global scale.
Cloud-enabled transformation has evolved from cost saving measure to business innovation strategy -- one that combines the cloud with cognitive capabilities to drive market disruption. Learn how you can achieve the insight and agility you need to gain a competitive advantage. Industry-acclaimed CTO and cloud expert, Shankar Kalyana presents. Only the most exceptional IBMers are appointed with the rare distinction of IBM Fellow, the highest technical honor in the company. Shankar has also received the prestigious Outstanding Technical Achievement Award three times - an accomplishment befitting only the most innovative thinkers. Shankar Kalyana is among the most respected strategists in the global technology industry. As CTO, with over 32 years of IT experience, Mr. Kalyana has architected, designed, developed, and implemented custom and packaged software solutions across a vast spectrum o...
SAP is the world leader in enterprise applications in terms of software and software-related service revenue. Based on market capitalization, we are the world's third largest independent software manufacturer. Harness the power of your data and accelerate trusted outcome-driven innovation by developing intelligent and live solutions for real-time decisions and actions on a single data copy. Support next-generation transactional and analytical processing with a broad set of advanced analytics - run securely across hybrid and multicloud environments.
When building large, cloud-based applications that operate at a high scale, it’s important to maintain a high availability and resilience to failures. In order to do that, you must be tolerant of failures, even in light of failures in other areas of your application. “Fly two mistakes high” is an old adage in the radio control airplane hobby. It means, fly high enough so that if you make a mistake, you can continue flying with room to still make mistakes. In his session at 18th Cloud Expo, Lee Atchison, Principal Cloud Architect and Advocate at New Relic, will discuss how this same philosophy can be applied to highly scaled applications, and can dramatically increase your resilience to failure.
Founded in 2002 and headquartered in Chicago, Nexum® takes a comprehensive approach to security. Nexum approaches business with one simple statement: “Do what’s right for the customer and success will follow.” Nexum helps you mitigate risks, protect your data, increase business continuity and meet your unique business objectives by: Detecting and preventing network threats, intrusions and disruptions Equipping you with the information, tools, training and resources you need to effectively manage IT risk Nexum, Latin for an arrangement by which one pledged one’s very liberty as security, Nexum is committed to ensuring your security. At Nexum, We Mean Security®.