January 2014

Gavin Pickin

Server Admin - From Mail Logs to DB Stats in a CFML Dashboard - Part 1

CFML Language, Server Admin

Lately, I have had a lot more Server Admin / Dev Ops duties than in the past… I am really enjoying a different type of problem solving as I am trying to leverage Dev Ops to increase automation and efficiency. I thought it might be interesting to map out my latest little project, one for my own learning, two for the purpose of sharing, and three for feedback on what I am doing wrong and what I could improve. This is Part 1 of a short series as I explore my problem, and how I find a solution.

As a Web Hosting Company, we obviously host websites, and a lot of Email Accounts for our Customers. These days, with Spam, you have to really be on top of your game, otherwise in a blink of an eye, your servers can get blacklisted, and you're constantly fighting to get your servers off blacklists, because it hurts each and every email customer on that server. Bad passwords, accounts being hijacked, viruses, and users sending out too many messages, or possibly bad messages, are just some of the things you get blacklisted for.

Our mail servers are setup on CentOS, its the OS of choice for our Linux servers. Our mail servers run qmail, with Dovecot for Imap and Vpopmail and Squirrel Mail / Round Cube for our Webmail. We run Nagios for monitoring, and although Nagios is a great product, and can be highly configured, sometimes its really hard to identify potentially dangerous activities, until its too late.

With my experience using CFML and Databases, I thought, if I can run filters on the logs, looking for key situations, we can create a dashboard for email, to quickly identify when things start to go off track, and use trends and more detailed reports to identify problems, and be able to gauge out mail situation better.

I know a few of you are probably thinking, Gavin, what is wrong with you. There are the right tools for the job, and CFML is not the right tool for reading and interpreting log files. You are right, and I do not intend to do all the heaving lifting with CFML, with these log files being hundreds and hundreds of megabytes each, there needs to be a process performed, before CFML can do its piece… so we need to identify those steps, and how to get the data to a point CFML can use it.

Remember, just because you are good with ColdFusion, Railo, CFML, MySql MSSQL, JavaScript, jQuery, NodeJS, Angular JS, EmberJS etc, it doesn't mean its always the best tool for the job. I did look around for other solutions out there, but nothing fit well, so this is a journey on how to work with my situation, using tools I know, and tools I don't, to get to a final solution that works, and works well. 

Step 1 - What Do I have to Work With

First task on my list, to look at the log files, how they are stored, where they are stored, how are they archived etc, and how the logs identified each of the use cases I wanted to monitor.

Our system is setup so qmail dumps all its logs into /var/logs/maillog.

Every 7 days, the file is archived, with a datestamp, like maillog-20131112 for example and the maillog file is reset and fills up again over the next week. For long term stats, getting the weekly log file would work, but if we're a week, or even a day late on picking up on a potentially bad activity, we're going to be blacklisted, and have to play the recovery game… so we'll need to look at a more continual process. Also, with separation of duties, we do not want to be running CFML and Log Processing on the same machine, so we have to think about moving the files across the back network too.

Step 2 - What are the Cases I want to Identify in the Logs

Before I can decipher the log files, to look for our use cases we want to monitor, lets decide what they are.

  • Message Sent - No Authentication - Our servers require authentication unless they are using Webmail, which of course requires a login.
  • Message Sent - Authenticated - Same User - When authenticated as the same user as the "sender".
  • Message Sent - Authenticated - Different User - When the authenticated account was used to send messages from a different sender.
  • Message Not Sent - Authentication Not Provided - Relaying was not allowed, only providing a sender
  • Message Not Sent - Authentication Failed - Relaying was not allowed, with A sender and an authenticated user

Step 3 - Identify the Patterns in the Logs - Find Relevant Data

Looking at the log files, I look for trends and patterns, to help me identify what each entry is actually about. Looking through the logs, and there is instantly a lot of noise to filter out. Here is 10 lines at the time of writing this entry.

This is not very useful, so I have to do some grepping to find useful information. Using my own email address, I was able to find a bunch of emails sent to me, and from me, and this helped me quickly identify a couple of useful statements.

Jan 28 22:00:01 independence qmail: 1390975201.373003 status: local 0/10 remote 0/20
Jan 28 22:00:01 independence qmail: 1390975201.373089 triple bounce: discarding bounce/788060
Jan 28 22:00:01 independence qmail: 1390975201.373101 end msg 788060
Jan 28 22:00:01 independence smtp-mx: 1390975201.476590 rblsmtpd: 
pid 11238: 451
Jan 28 22:00:01 independence smtp-mx: 1390975201.532184 tcpserver: end 11236 status 0
Jan 28 22:00:01 independence smtp-mx: 1390975201.532194 tcpserver: status: 2/100
Jan 28 22:00:01 independence smtp-mx: 1390975201.634301 tcpserver: end 11238 status 0
Jan 28 22:00:01 independence smtp-mx: 1390975201.634310 tcpserver: status: 1/100
Jan 28 22:00:01 independence dovecot: pop3-login: Login: user=<>, 
method=PLAIN, rip=, lip=, mpid=11277, session=</fXLphXxGQBMT2MD>


If i grep 'client not allowed' maillog I see hundreds of lines like the following

Jan 28 21:32:14 independence smtp-mx: 1390973534.709575 CHKUSER rejected relaying: 
from <> remote <> 
rcpt <> : client not allowed to relay

So lets break it down, so we can understand it.

Jan 28 21:32:14
First piece… is the date… missing the year, but we can still parse that into a date time object for graphing over time.

independence smtp-mx
Next, we have our server name, and the service. In this case, i'm looking at my independence server, and we're looking at the smtp-mx service. 

Next is our message id, so we can identify this message, and we can look for other log lines referencing this. This can be very important, some mail loggers actually have 5 or 6 lines for each message id, and you have to do some serious merging of the lines to get real information. Luckily, with qmail, this is not so, as you can see, there is a lot of information here.

CHKUSER rejected relaying
Next, we see the actual status of the log line, CHKUSER is the method, and the status is rejected relaying. This means that the user is not local (i.e. webmail which is on a trusted ip and allowed to send) and the user did not pass the authentication.

from <>
This is the sender information. If you notice, there are some colons in the angle brackets. It took me a while to work out what these were all referring to, but you'll see soon enough.

remote <>
This is the remote address of the sender, and it includes the domain name, the reverse dns lookup and then the IP address of the user trying to relay the message. 

rcpt <>
This is where the message is supposed to be sent. This domain name has been changed, for a little privacy for our customers. But this is an internal email address… and that is why the service is smtp-mx. I will show you another example of a relay being rejected to an external address, and the service is in fact different.

client not allowed to relay
Finally, the end of the log has a little more information, which clearly states in english, the client is not allowed to relay.

Now, for comparison of the local versus remote recipient, here is another log snippet.

Jan 28 21:19:10 independence smtp: 1390972750.741663 CHKUSER rejected relaying: 
from <> remote <> 
rcpt <> : client not allowed to relay


Now, here you see the rcpt or the recipient is a hotmail account. This is actually something we see popping up a lot, and without the start of this tool, we wouldn't have noticed. We see several attempts on random accounts on random domains, all to and etc. Basically, this robot goes and send out millions of messages, and if any of them make it to this account, they know that email address is vulnerable, and then use that to hammer the world with spam. Obviously, we do not have our servers setup as open relays, but if we did, this is how they fish for accounts.

I am going to cut this post short, I do not want to get too deep into logs on the first post. I hope you find it interesting enough to come back and see the next step, identifying our other use cases, and then how we go from knowing what to look for, to actually using it.

Thanks for joining me on my adventure,


Blog Search