Tuesday, October 02, 2007

More on 'root' FTP or "0wned? Again?" (and Simple User Profiling)

A follow-up to this. So, in search for some excitement, I loaded about 800MB of logs from a compromised Linux box (no, not this box - another one ...) into LogLogic and decided to get to the bottom of it. The box was deeply 0wned (like, exploit scanner and 'john' installed and running, etc)

So I run my totally-experimental super-secret log mining tools :-) on the whole set of logs and discovered that there is nothing that screams "0wned!" anywhere in the log set. Ok, so maybe it was owned before the log started,  like the other one? Nope, the dates on most of the attacker software and other artifacts are newer than the beginning of the log set (which in fact goes all the way into 2006) - and it didn't bear any signs of date manipulation.


Are you getting it? Of course, it is about stolen access credentials! One of the accounts on the server was used by the attacker, who simply logged in to the server and started using it as his 0wn. Now, how do you detect that? No NIDS will trigger, NIPS will let is pass, no unusual types of log records might ever by  produced (especially if only limited logging is enabled). And what raises the stakes is that this type of activity is not  only about "hacked" accounts, but also about insider abuse of accounts.

However, there will likely be changes in how normal log records are produced.

Let's summarize some known methods for using a simple user "profile" to detect account theft aka account sharing aka user impersonation aka access with stolen/shared credentials. It implies that we've been collecting the logs before the incident and have a solid trail of normal users and legitimate account owners.

  1. Unusual login source IP (e.g. normal user always comes from 10.0.X.Y or 10.1.X.Y, but now we see a login from 172.16.0.Z) - will work in some cases, but not for the free-for-all servers such as at a University
  2. Unusual login time (e.g. normal user always comes from 9AM to 5PM, now we see 3AM) - will work in most cases, but will fail if the attacker happens to be in the same time zone
  3. Unusual login session length  (e.g. normal user always stays logged in for 5-20 minutes, now we see a 10 hour-long session) - works only if logout is logged; might not catch a lot of realistic but malicious connections
  4. Unusual login frequency (e.g. legitimate user logs in once a day in the morning, now we see dozens of connections) - will work for some cases, but others will be missed
  5. Unusual login failure/success ratio before a successful login (e.g. normal user always types the password right the first time, now we see failures than successes) - not too reliable, obviously
  6. Unusual list of user actions performed (normal user only reads these files, but now we see writes to a very different set of files) - will work most frequently, but needs more granular logging of file access, object changes, etc (audit logging) [more on this in the near future!]

So, if you have logs of user activities, at the very least, logins and logouts ( but having records of more user activities is always better!),  for the last few weeks or months, one can compute the above profiles using historical data and then compare them with current numbers (very similar to some of the methods from my classic log mining presentation).

The final missing bit is for how long to collect your normal user behaviors: I discovered that 1 week to 1 month works pretty well. Less time yields unstable results and more time necessitates much more data crunching without much gain.

In the above case, it turned out that the method #1 "Unusual login source IP" did the trick. .ro anybody? The question how they came upon the valid account credentials, however, remains... No obvious password guessing was seen in the logs.

However, there is this little tiny issue here: the above implies nicely "parsed" logs stored in a database, where one can always run   a query (SELECT timestamp, username, source FROM logs WHERE <whatever> ORDER BY <whatever>) and then mangle the output (again, as I do here in  many different ways).

Now, what if you want the above algorithms to run over all logs that contain the usernames and indicate user login/logout activities, not just the "known", parsed logs, neatly stored in a database? Yes, this is where is gets really, REALLY, R-E-A-L-L-Y fun! :-) But I will leave it for future discussion ...

As a final note, all this implies that your logging is on. Otherwise, see this ancient quote still rings mighty true: "In an incident, if you don't have good logs, you'd better have good luck."

Dr Anton Chuvakin