Welcome!

Machine Learning Authors: Yeshim Deniz, Pat Romanski, Liz McMillan, Elizabeth White, Corey Roth

Related Topics: Linux Containers

Linux Containers: Article

How to set up IMAP on the cheap, Part 4

Details on how to configure Procmail to block spam

(LinuxWorld) -- This is the fourth in a series of articles on how to setup Cyrus IMAP, Postfix, and Procmail to create a powerful mail system with spam filtering. If you really want powerful spam filtering, I recommend that you install the two programs Spam Assassin and Vipul's Razor in addition to the aforementioned programs. These two programs work together with procmail to eliminate nearly all incoming spam. If you use the default packages for any reasonably well-designed distribution, Spam Assassin and Vipul's Razor should be easy to install and configure (in most cases, such as with Debian, no manual configuration is necessary at all). Regardless, it is beyond the scope of this series to cover these two additions in detail, so we'll only address one "gotcha" that you may encounter when you combine Spam Assassin with postfix and/or Cyrus IMAP. More on that later.

Last week we touched briefly on the power of procmail and how to create a basic recipe file. This week we'll delve more deeply into procmail and set up a more useful set of recipes. However, if you really want to plumb the depths of procmail in a big way, the best resource I've found so far is the Procmail Documentation Project (see resources for link). In particular, check out the tips section. This is a link you'll want to browse to and keep open whenever you edit your procmailrc recipe files.

Let's get to work. Last week we created a working procmail recipe file, but it wasted effort we could have saved and wasn't terribly useful. Now that you've had a taste of procmail, let's do this up right. We're going to structure our procmail recipe files in the following way:

  • Assign global variables
  • Create a backup for every message that comes through
  • Run the message through spamassassin (optional)
  • Apply simple, generic spam traps
  • Include the user's own recipes for sorting incoming mail
    • Assign "local" variables
    • Sort mail that you don't want in the inbox into various folders
    • Send anything "not to me" to the spam folder
  • Apply porn spam traps
  • Apply any other spam traps that may cause false positives
  • If message gets this far, drop into the inbox

Here's how it works. First we assign some variables we can use throughout the rest of the procmail recipe file.

Then we create a backup of every message that comes into the system. This ensures that we'll never lose a message no matter how badly we design our first recipe file. Once you're confident your configuration is working, you may remove this step. The backup command is one of a few commands that lets the recipe file continue to execute after it is done. Normally, when a recipe finds a match, it delivers the message and there is no further processing.

The next command is the only other place in our file where processing continues after a recipe. We run the incoming message through an excellent spam filter called spamassassin, which optionally makes use of another spam filter called razor. This is a two-step process. The first step adds information to a message if it is likely to be spam. The second step checks to see if the added information has flagged the message as spam and drops it into the spam folder if so, after which all further processing stops.

If any of the rest of our recipes find a match, they will deliver mail and stop any further processing.

Next, we apply some very simple spam traps. For example, any message that lacks a "To" field is almost certainly bulk mail, which we can confidently say is spam. So we can safely trap it early, drop it in the spam folder, and save procmail the trouble of examining the message any further. If you're paranoid about such things, you can move this step closer to the bottom of the order of steps.

Then we include the user's own custom procmail recipes for sorting mail into special folders. For example, this is where a user can define a set of recipes to discover whether mail comes from a subscribed mailing list, and then automatically redirect that mail to special folders devoted to saving mail from those lists. The next thing the user's custom recipe set can do is discover if the mail is intended for him or her by checking any "To", "Cc", or similar field to see if it contains the user's name, any mail groups to which the user belongs, or any of the user's aliases. If none of the conditions is true, the mail is likely to be spam, so we can drop it into the spam folder. This step is tricky, and you don't want to include it until you know your way around procmail. I'm including it here as a tip if you want to use it.

As a side note, whenever you include a user's set of recipes it opens a potential security hole. However, we can take procedural and administrative steps to close that hole. More on that later.

Once the user's personal recipes are done, we can apply the pornography recipes. I put this step below the user's personal filters because I happen to use very strict obscenity filtering. Since I filter incoming mail for my young children, I consider any messages that contain certain four-letter-words in the body to be a high risk. The problem with filtering on such strict grounds is that I can lose valid messages intended for me. Many people use these four-letter-words in normal messages, such as when they discuss issues on mailing lists. But since I put this check after the user filters for sorting mailing list messages, those messages will have been deposited in the proper folder and no further processing of the message will occur, so it will never get to this porn filter.

This is also why I put any recipes that can create false positives for spam here, after the user's personal recipes. That way procmail will not drop them in the spam folder before the user recipes get a chance to identify the message as useful.

If the message has survived all the above tests, then it's probably a valid message that belongs in the INBOX. So the last thing we do is simply drop the message in the INBOX without even checking it.

This particular order works well for me, but you can shuffle the order any way that works best for your home or organization.

IMPORTANT NOTE: This approach to server-side spam mail filtering works extremely well with an IMAP mail system. It assumes that you create the spam folder for every account, and it drops spam messages in this folder. However, it isn't perfect, which is why you put the messages in a spam folder instead of discarding them. Every user should check his or her spam folder occasionally (very frequently at first) to see if valid messages are accidentally dropped there. If so, the users should report the errors to the administrator, who can adjust the spam filters or the user's personal recipes to make sure similar valid messages make it into the INBOX in the future.

Here's the whole master /etc/procmailrc file we'll be using:

LOGFILE="/var/log/mail/procmail.log"
DELIVERMAIL="/usr/cyrus/bin/deliver"
IMAP="$DELIVERMAIL -a $USER -q -m user.$USER"
SPAMIT="$IMAP.SPAM"

########################################################## ### Backup ##########################################################

:0 c | $IMAP.Backup

########################################################### ### Spam Assassin ###########################################################

:0fw | /usr/bin/spamassassin -P -F0 -a

:0 * ^X-Spam-Status: Yes | $SPAMIT

########################################################### ### Simple spam traps ###########################################################

# Mass mailing, no "To:" :0 * !^To: | $SPAMIT

:0 * !^From: | $SPAMIT

:0 * !^Subject: | $SPAMIT

########################################################### ### Include individual procmailrc ###########################################################

INCLUDERC=/home/$USER/.procmailrc

########################################################### ### Porn Spam ###########################################################

:0 * ^Subject.*(\|<\pornography\>) | $SPAMIT

:0 B * ^.*(\|<\pornography\>) | $SPAMIT

########################################################### ### From spam traps ###########################################################

:0 * ^FROM_advertising | $SPAMIT

:0 * ^From:.*(advertising|sales|offers|promotion|reply|request|theuseful) | $SPAMIT

########################################################### ### Subject spam traps ###########################################################

:0 * ^Subject:.*\[ADV\] | $SPAMIT

:0 * ^Subject:\ ADV | $SPAMIT

########################################################### ### If we get this far, just deliver it to the user inbox ###########################################################

:0 | $IMAP

:0w { EXITCODE=$? HOST }

Global variables

Now let's address the details. First, we assign certain necessary variables and add a few extras to make it easy to design shortcuts for delivering mail. For example, assuming your Cyrus IMAP deliver program is at /usr/cyrus/bin/deliver and the log file you want to use is /var/log/mail/procmail.log, then here's a decent set of variables to put at the head of your master procmailrc file (usually /etc/procmailrc).

LOGFILE="/var/log/mail/procmail.log"
DELIVERMAIL="/usr/cyrus/bin/deliver"
IMAP="$DELIVERMAIL -a $USER -q -m user.$USER"
SPAMIT="$IMAP.SPAM"

IMPORTANT NOTE: The "-q" switch within the IMAP definition tells Cyrus to deliver mail even if it puts the user's mailbox over quota. This is a good idea for test purposes, but if you are enforcing quotas on your system you'll want to remove the "-q" switch when you're satisfied with your setup.

These definitions create generic Cyrus IMAP delivery command we call IMAP, which we can use later as a shorthand for dropping mail into specific mail folders. For example, if we want to deliver a message to the user's sub-folder Lists/LinuxKernel, we can do simply by specifying the command $IMAP.Lists.LinuxKernel in our recipe. Assuming the user's name is daggett, the expansion of that command would look like this: /usr/cyrus/bin/deliver -a daggett -q -m user.daggett.Lists.LinuxKernel.

Likewise, we can define one or more variables for frequently used commands. There are a lot of cases where we'll deliver a message to the SPAM folder, so we can create a shortcut called SPAMIT that delivers a message to daggett's SPAM folder (user.daggett.SPAM in Cyrus IMAP parlance) with the definition SPAMIT="$IMAP.SPAM. From here on out, we simply use $SPAMIT to drop the message into the user's SPAM folder.

Backup

The next stage in our /etc/procmailrc file is to create a carbon copy of the incoming message and drop it into the user's "Backup" folder. The "c" letter in the ":0 c" portion of the recipe tells procmail to make the carbon copy and keep possessing. We can use the IMAP variable here.

:0 c
| $IMAP.Backup

Spam Assassin

Spam Assassin comes next. The "f" portion of the ":0fw" recipe tells procmail to consider the delivery portion of the recipe to be a message filter and then keep processing the message after that. The "w" portion tells procmail to wait until the recipe is finished before continuing to process the message. The "w" also returns an error code, but we're not checking that here.

If you read the Spam Assassin documentation, it implies you can simply use the command spamassassin -P as your filter. Either postfix or the Cyrus deliver program choked on that, however. (It was difficult to trace to one or the other, but I suspect it was postfix that was confused.) That's because Spam Assassin guesses whether your mailer wants to find a "From" field in a certain place in your header, and I suspect it was guessing wrong. The "-F0" switch fixed the problem.

The "-a" switch is a handy option. It tells Spam Assassin to create a "white list" of email sender addresses. Here's how it works. Suppose your best friend sends you many messages, none of which are tagged as spam. As you get more and more messages from your friend, Spam Assassin will eventually conclude that this person is a "trusted" sender of email and add your friend to the "white list. After that, it no longer matters if your friend accidentally sends you a message that would be misinterpreted as spam. It will get through the spam filter and arrive in your INBOX, anyway.

Finally, if Spam Assassin identified any email as spam, it modifies the message in several ways, one of which is to add a field called X-Spam-Status, and it sets that field to "Yes". That's why we have a second recipe, which checks this field. If it is set to "Yes", then it's probably spam, and it goes into the spam folder.

:0fw
| /usr/bin/spamassassin -P -F0 -a

:0 * ^X-Spam-Status: Yes | $SPAMIT

Simple spam traps

The next step is to eliminate the simplest forms of spam. It is likely that Spam Assassin will take care of this for you, but I include this here for those of you who do not intend to use Spam Assassin.

These recipes simply check to see if the message lacks any of the following header fields: "To", "From", or "Subject". If any of these fields are missing, it's probably spam.

###########################################################
### Simple spam traps
###########################################################

# Mass mailing, no "To:" :0 * !^To: | $SPAMIT

:0 * !^From: | $SPAMIT

:0 * !^Subject: | $SPAMIT

User custom recipes

This is probably the most controversial portion of the recipe file. If any user has his or her own .procmailrc file in the home directory, it is included here. This is problematic in several respects. First, if you are sloppy enough as an administrator, you may set up your system to run procmail as root. This means every user can now usurp root privileges simply by creating a recipe that executes any given program based on any given condition. Even if you don't make that mistake, users can still wreak havoc on your system either deliberately or by mistake. If you don't use mail quotas, for example, any user can fill up your mail partition with a bogus procmail recipe. Finally, this practice assumes that you have a valid UNIX user for every mail user on the system (or at least a valid home directory for every mail user).

If you want to avoid all these problems, then simply create a sub-directory somewhere safe (such as /etc/username, where username is the mail user's name, and assuming you've locked down the files and directories under /etc), and place the custom recipes there. Your users cannot edit this file themselves, but they can make requests to you (the administrator) and you can add their custom recipes.

Here's the home directory method.

###########################################################
### Include individual procmailrc
###########################################################

INCLUDERC=/home/$USER/.procmailrc

Here's a small excerpt from my personal .procmailrc file to give you an idea of how I sort incoming mail. Note that I can continue to use the global variables I set in the /etc/procmailrc file, such as IMAP.

LIST="$IMAP.Lists"
NEWSALERT="$IMAP.News-Alerts"
PRESSRELEASE="$IMAP.Press-Releases"
YESHUA="$IMAP.Yeshua"
HUMOR="$IMAP.Humor"

######################################################### ### Lists #########################################################

:0 * ^Subject.*(\[IPG\]|\[SMZ\]) | $LIST.NetPress

The remainder

The rest of the recipes in the /etc/procmailrc file are self-explanatory, with one exception. I deliberately rewrote the porn recipe to include only two words in order to avoid using profanity in this article. Note, however, that there's also a new concept here. The words "porn" and "pornography" are surrounded by two different bracket definitions. The words are preceded by a backslash followed by a less-than symbol, and the words are followed by a backslash then a greater-than symbol. These marks tell procmail that we're looking for whole words and not substrings.

The other new item is the parenthetical, along with the pipe symbol that separates the two words within the parentheses. This tells procmail to look for either of the two words. It is a logical "OR" condition. More on that in a moment.

###########################################################
### Porn spam
###########################################################

:0 * ^Subject.*(\|<\pornography\>) | $SPAMIT

:0 B * ^.*(\|<\pornography\>) | $SPAMIT

The final step

A little more spam processing, and we're finally ready to drop the message in the INBOX if it got past all the aforementioned conditions. We wait for a return code (which is why we add the "w" to the recipe) and then put a bit of stuff for the log (the final recipe, which also contains a "w" to wait for completion).

###########################################################
### If we get this far, just deliver it to the user inbox
###########################################################

:0w | $IMAP

:0w { EXITCODE=$? HOST }

ANDs and ORs

No doubt, you're going to want to create your own recipes. Others have created much better tutorials on procmail (see the documentation project in the resources section), but here are two indispensable tips: how to create AND and OR conditions.

AND conditions are extremely easy. You simply list them one after the other. For example:

:0
* ^From:.*daggett
* ^Subject:.*descrambler
| $SPAMIT

If the message is from anyone with "daggett" in the address AND the subject line includes the word "descrambler", the condition is true and the message will go to the spam folder.

OR is easy if you check a single field for multiple values. For example:

:0
* ^From:.*(bill_gates|steve_ballmer)
| $SPAMIT

In this case, if the mail is from "bill_gates" OR "steve_ballmer", it will go to the spam folder.

What if you want to use a complex OR condition on different header fields? That gets tricky. There are several ways to do it, but here's the formula I always follow:

:0
* ! ^From:.*bill_gates
* ! ^Subject:.*dotNet
( )
:0E
| $SPAMIT

IMPORTANT NOTE: Make sure the parentheses "( )" contain a space between them!

Believe it or not, this recipe says that if the message is from "bill_gates" OR the subject includes the string "dotNet", then drop the message in the spam folder. Just substitute the conditions you want in place of those in my example, and substitute the action you want in place of my $SPAMIT delivery command.

IMPORTANT NOTE: As counterintuitive as it may seem, you must negate any of the condition you expect to be TRUE in your OR combination so that the condition you are looking for is actually FALSE! In other words, although the OR condition we want to define looks for a match of "bill_gates", the actual recipe condition is negated by the exclamation mark "!". It looks like it's saying "match any message where From is NOT equal to "bill_gates", but this is exactly what we want in order to make the OR work.

Why do we need to negate these conditions and add things like the parentheses and space? It's actually quite logical if you know all the details, but you don't need to know the details to use this technique. Here's how to determine if you should care how the OR condition works. Answer this puzzle:

You come to a fork in the road. You know that one route will take you to the "Truth Teller" village, and the other will take you to the "Liar" village. "Truth Tellers" can never lie. "Liars" can never tell the truth. There is a man standing at the fork in the road. You don't know if that man is a "Truth Teller" or a "Liar." What single question can you ask this man in order to find out which way goes to which village? (No compound questions are allowed. In other words, you cannot ask "Are you an elephant and which way is it to the "Truth Teller" village?") I'll give you an answer (the spoiler below) in a moment.

If it takes you more than 5 minutes to figure out the answer to the above riddle, then just take my word for it that the technique works. If you figured out the puzzle quickly, then visit the following URL for an explanation of this procedure, which is called a "DeMorgan Rule."

Warning: Spoiler

Here's the answer to the riddle of the "Truth Teller" and "Liar" villages. You ask, "Which way is it to your village?" Whether the person is a "Liar" or a "Truth Teller" is irrelevant now. Either one must point to the "Truth Teller" village, which is all you need to know. This is as much of a "think outside the box" question as a test of Boolean logic, since the key to getting the simplest answer is to bypass the need to find out whether the person at the fork is a "Truth Teller" or a "Liar". You don't need that information. All you need to know is which village is which.

More Stories By Nicholas Petreley

Nicholas Petreley is a computer consultant and author in Asheville, NC.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


@CloudExpo Stories
In an era of historic innovation fueled by unprecedented access to data and technology, the low cost and risk of entering new markets has leveled the playing field for business. Today, any ambitious innovator can easily introduce a new application or product that can reinvent business models and transform the client experience. In their Day 2 Keynote at 19th Cloud Expo, Mercer Rowe, IBM Vice President of Strategic Alliances, and Raejeanne Skillern, Intel Vice President of Data Center Group and ...
More and more brands have jumped on the IoT bandwagon. We have an excess of wearables – activity trackers, smartwatches, smart glasses and sneakers, and more that track seemingly endless datapoints. However, most consumers have no idea what “IoT” means. Creating more wearables that track data shouldn't be the aim of brands; delivering meaningful, tangible relevance to their users should be. We're in a period in which the IoT pendulum is still swinging. Initially, it swung toward "smart for smart...
In his Opening Keynote at 21st Cloud Expo, John Considine, General Manager of IBM Cloud Infrastructure, led attendees through the exciting evolution of the cloud. He looked at this major disruption from the perspective of technology, business models, and what this means for enterprises of all sizes. John Considine is General Manager of Cloud Infrastructure Services at IBM. In that role he is responsible for leading IBM’s public cloud infrastructure including strategy, development, and offering m...
DXWorldEXPO LLC announced today that All in Mobile, a mobile app development company from Poland, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. All In Mobile is a mobile app development company from Poland. Since 2014, they maintain passion for developing mobile applications for enterprises and startups worldwide.
@DevOpsSummit at Cloud Expo, taking place November 12-13 in New York City, NY, is co-located with 22nd international CloudEXPO | first international DXWorldEXPO and will feature technical sessions from a rock star conference faculty and the leading industry players in the world.
In his keynote at 19th Cloud Expo, Sheng Liang, co-founder and CEO of Rancher Labs, discussed the technological advances and new business opportunities created by the rapid adoption of containers. With the success of Amazon Web Services (AWS) and various open source technologies used to build private clouds, cloud computing has become an essential component of IT strategy. However, users continue to face challenges in implementing clouds, as older technologies evolve and newer ones like Docker c...
We all know that end users experience the internet primarily with mobile devices. From an app development perspective, we know that successfully responding to the needs of mobile customers depends on rapid DevOps – failing fast, in short, until the right solution evolves in your customers' relationship to your business. Whether you’re decomposing an SOA monolith, or developing a new application cloud natively, it’s not a question of using microservices - not doing so will be a path to eventual ...
The next XaaS is CICDaaS. Why? Because CICD saves developers a huge amount of time. CD is an especially great option for projects that require multiple and frequent contributions to be integrated. But… securing CICD best practices is an emerging, essential, yet little understood practice for DevOps teams and their Cloud Service Providers. The only way to get CICD to work in a highly secure environment takes collaboration, patience and persistence. Building CICD in the cloud requires rigorous ar...
DXWorldEXPO LLC announced today that ICC-USA, a computer systems integrator and server manufacturing company focused on developing products and product appliances, will exhibit at the 22nd International CloudEXPO | DXWorldEXPO. DXWordEXPO New York 2018, colocated with CloudEXPO New York 2018 will be held November 11-13, 2018, in New York City. ICC is a computer systems integrator and server manufacturing company focused on developing products and product appliances to meet a wide range of ...
Sanjeev Sharma Joins November 11-13, 2018 @DevOpsSummit at @CloudEXPO New York Faculty. Sanjeev Sharma is an internationally known DevOps and Cloud Transformation thought leader, technology executive, and author. Sanjeev's industry experience includes tenures as CTO, Technical Sales leader, and Cloud Architect leader. As an IBM Distinguished Engineer, Sanjeev is recognized at the highest levels of IBM's core of technical leaders.
Coca-Cola’s Google powered digital signage system lays the groundwork for a more valuable connection between Coke and its customers. Digital signs pair software with high-resolution displays so that a message can be changed instantly based on what the operator wants to communicate or sell. In their Day 3 Keynote at 21st Cloud Expo, Greg Chambers, Global Group Director, Digital Innovation, Coca-Cola, and Vidya Nagarajan, a Senior Product Manager at Google, discussed how from store operations and ...
Headquartered in Plainsboro, NJ, Synametrics Technologies has provided IT professionals and computer systems developers since 1997. Based on the success of their initial product offerings (WinSQL and DeltaCopy), the company continues to create and hone innovative products that help its customers get more from their computer applications, databases and infrastructure. To date, over one million users around the world have chosen Synametrics solutions to help power their accelerated business or per...
We are seeing a major migration of enterprises applications to the cloud. As cloud and business use of real time applications accelerate, legacy networks are no longer able to architecturally support cloud adoption and deliver the performance and security required by highly distributed enterprises. These outdated solutions have become more costly and complicated to implement, install, manage, and maintain.SD-WAN offers unlimited capabilities for accessing the benefits of the cloud and Internet. ...
As Cybric's Chief Technology Officer, Mike D. Kail is responsible for the strategic vision and technical direction of the platform. Prior to founding Cybric, Mike was Yahoo's CIO and SVP of Infrastructure, where he led the IT and Data Center functions for the company. He has more than 24 years of IT Operations experience with a focus on highly-scalable architectures.
Dion Hinchcliffe is an internationally recognized digital expert, bestselling book author, frequent keynote speaker, analyst, futurist, and transformation expert based in Washington, DC. He is currently Chief Strategy Officer at the industry-leading digital strategy and online community solutions firm, 7Summits.
Founded in 2000, Chetu Inc. is a global provider of customized software development solutions and IT staff augmentation services for software technology providers. By providing clients with unparalleled niche technology expertise and industry experience, Chetu has become the premiere long-term, back-end software development partner for start-ups, SMBs, and Fortune 500 companies. Chetu is headquartered in Plantation, Florida, with thirteen offices throughout the U.S. and abroad.
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...
DXWorldEXPO LLC announced today that Dez Blanchfield joined the faculty of CloudEXPO's "10-Year Anniversary Event" which will take place on November 11-13, 2018 in New York City. Dez is a strategic leader in business and digital transformation with 25 years of experience in the IT and telecommunications industries developing strategies and implementing business initiatives. He has a breadth of expertise spanning technologies such as cloud computing, big data and analytics, cognitive computing, m...
"DivvyCloud as a company set out to help customers automate solutions to the most common cloud problems," noted Jeremy Snyder, VP of Business Development at DivvyCloud, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
"Venafi has a platform that allows you to manage, centralize and automate the complete life cycle of keys and certificates within the organization," explained Gina Osmond, Sr. Field Marketing Manager at Venafi, in this SYS-CON.tv interview at DevOps at 19th Cloud Expo, held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA.
Bill Schmarzo, author of "Big Data: Understanding How Data Powers Big Business" and "Big Data MBA: Driving Business Strategies with Data Science," is responsible for setting the strategy and defining the Big Data service offerings and capabilities for EMC Global Services Big Data Practice. As the CTO for the Big Data Practice, he is responsible for working with organizations to help them identify where and how to start their big data journeys. He's written several white papers, is an avid blogge...