Monday, November 06, 2006

Anton Security Tip of the Day #5: Sendmail Log Adventures

Following the new "tradition" of posting a security tip of the week (mentioned here, here ; SANS jumped in as well), I decided to follow along and join the initiative. One of the bloggers called it "pay it forward" to the community.

So, Anton Security Tip of the Day #5: Sendmail Log Adventures

Today we will dig thru some Sendmail logs. Why? Heh, why not :-) On a more serious note, Sendmail (as well as Exchange, QMail, Postfix, etc) MTA logs hide a plethora of insights, useful for email security as well as email performance management and troubleshooting.

For our analysis we are using our Linux server running an open source version of Sendmail, but using any other Unix OS would be pretty much the same. Sendmail logs a dizzying array of messages to syslog. Typically on a healthy server,  90% or more of Sendmail log records describe email messages as they are being sent and received by the server. For example:

Jul 3 18:20:41 ns1 sendmail[25425]: i63MKWZ25425: from=<>, size=0, class=0, nrcpts=0, proto=ESMTP, daemon=MTA, relay=[]

Everybody knows that such messages are useful for email accounting. What is more interesting is that they can also be used for higher level pattern discovery or anomaly detection (which I sometimes call "log mining") to look for things such as unusual count of email recipients, unusual message size, new high-volume email streams, etc. However, most of such log records are not too interesting individually (note that to trace an email, looking at just one record is not sufficient since the useful info is smeared over multiple sendmail log records). But what else is there? How do we dig our teeth into that elusive 10% which are different and potentially ominous? Well, one common approach is to look at rare messages; if such pool is still too  large - after all, a 10% of a billion is still a 100,000,000 -  one can then try to post-filter them with some "interesting" keywords, such as "error", "failure",  "abort", "crash" and then looking at the related and nearby (by time) messages (it is left as an exercise for the reader as to not why do it in the opposite order...)

So, what does such approach yield in one particular case:

Nov 20 03:39:22 combo sendmail[2022]: rejecting connections on daemon MTA: load average: 14

Nov 11 08:56:20 combo sendmail[23863]: runqueue: Aborting queue run: load average too high

Oooh, our mailserver was overwhelmed at one point and mail might have been lost. Is that an attack, a spam wave or we just became popular? More digging is definitely required ...  Check your own mail server for the presence of such log records and, upon finding them, be ready to handle complaints about missing emails.

Nov 11 08:54:49 combo sendmail[29820]: jABDsGrH029820: collect: premature EOM: Connection reset by

Hmm, what does that mean? Well, let's take a position - and it won't be a big stretch :-) - that we have no idea whatsoever. But the next message (notice the timestamp!) also shows up in our "rare messages" list

Nov 11 08:54:50 combo sendmail[29820]: jABDsGrH029820: SYSERR(root): collect: I/O error on connection from,

Now it is a bit more clear. The email server experienced a critical failure in mid-mail and the email was corrupted and thus not received. Further analysis reveals a gap in logs after the above message that lasted for 20 minutes. Network or hardware failure has occurred.

So, today's tip was about sendmail logs: now start looking for rare unusual messages using the approach we outlined. Specifically, watch for errors and failures in the set of rare messages and then investigate by reviewing the log records in the same timeframe. Also, look for gaps in logging, especially those gaps that occur after such rare messages.

Also, here is a link to my previous tip of the appropriate-time-interval (#1). Also, I am tagging all the tips on my feed. Here is the link: All Security Tips of the Day.

tags: , , , , , ,

1 comment:

Anonymous said...

Thanks to this post, i solved a crtical problem with our mailserver, where the load had shot up to 129. After stopping iptables and restarting ti again, the load went back to normal.


Dr Anton Chuvakin