Using Rules to block SPAM in Drupal 7

Publication Date

There are a plethora of spam-prevention modules for Drupal (e.g. HoneyPot, reCAPTCHA, Mollom, Spamicide, MotherMayI, Spambot) . They work well, generally speaking, but not all websites are the same. I run a website that has a forum with content created by authenticated users. Account registration is open but protected. The node forms are protected, but I still get very specific spam... mostly things either pertaining to some fake technical support service, or Ugg boots advertisements.

Example spam post on Drupal 7
Nice try "AndrewBrown".

After a few years I thought about using Rules to look for certain common words/phrases. A single word comparison isn't too difficult, but using Regular Expressions in Rules isn't as easy as I hoped. For starters, the Text Comparison option for RegEx doesn't support flags, so a case insensitive match is a bit more tricky.

Here's the example that I came up with, including both phrase (case insensitive) matching and a search for a phone number. Several posts I observed had phone numbers, and I really don't want users posting phone numbers in this website's forum. After the matching conditions are met, the node is unpublished, the user is blocked, an email is sent to a content moderator, then a message is displayed to the user. Obviously replace the example domain/email values and adjust searching as needed in this Rules export:

 

{ "rules_forum_spam_filter" : {
   "LABEL" : "Forum SPAM Filter",
   "PLUGIN" : "reaction rule",
   "OWNER" : "rules",
   "TAGS" : [ "spam" ],
   "REQUIRES" : [ "rules" ],
   "ON" : { "node_presave--forum" : { "bundle" : "forum" } },
   "IF" : [
     { "OR" : [
         { "text_matches" : {
             "text" : [ "node:body:value" ],
             "match" : "[Cc][Uu][Ss][Tt][Oo][Mm][Ee][Rr][[:space:]]*[Ss][Uu][Pp][Pp][Oo][Rr][Tt]",
             "operation" : "regex"
           }
         },
         { "text_matches" : {
             "text" : [ "node:body:value" ],
             "match" : "[Cc][Uu][Ss][Tt][Oo][Mm][Ee][Rr][[:space:]]*[Ss][Ee][Rr][Vv][Ii][Cc][Ee]",
             "operation" : "regex"
           }
         },
         { "text_matches" : {
             "text" : [ "node:body:value" ],
             "match" : "[Hh][Ee][Ll][Pp][[:space:]]*[Ll][Ii][Nn][Ee]",
             "operation" : "regex"
           }
         },
         { "text_matches" : {
             "text" : [ "node:body:value" ],
             "match" : "[Gg][Mm][Aa][Ii][Ll][[:space:]]*[Hh][Ee][Ll][Pp]",
             "operation" : "regex"
           }
         },
         { "text_matches" : {
             "text" : [ "node:body:value" ],
             "match" : "(\\+0?1\\s)?\\(?\\d{3}\\)?[\\s.-]\\d{3}[\\s.-]\\d{4}",
             "operation" : "regex"
           }
         },
         { "text_matches" : {
             "text" : [ "node:body:value" ],
             "match" : "[Gg][Mm][Aa][Ii][Ll][[:space:]]*[Cc][Uu][Ss][Tt][Oo][Mm][Ee][Rr]",
             "operation" : "regex"
           }
         }
       ]
     },
     { "NOT data_is" : { "data" : [ "site:current-user:uid" ], "value" : "1" } }
   ],
   "DO" : [
     { "node_unpublish" : { "node" : [ "node" ] } },
     { "user_block" : { "account" : [ "site:current-user" ] } },
     { "mail" : {
         "to" : "to@example.com",
         "subject" : "Possible spam at example.com",
         "message" : "Please see node [node:nid] by author [site:current-user:uid].",
         "from" : "from@example.com",
         "language" : [ "" ]
       }
     },
     { "drupal_message" : {
         "message" : "The forum post that you submitted appears to be spam.  It will be evaluated over the next few days to confirm.  Your account has temporarily been suspended.",
         "type" : "error"
       }
     }
   ]
 }
}