Wednesday, August 11, 2010

Pathetic Analytics Epiphany!

As some of you know, I have been doing SIEM and log management for more than 9 years already. Nine years of looking at them logs is a long time, lemme tell you  :-) image
And that 86% of cases where intrusion evidence was present in logs (see Verizon 2010 Breach Report) just sent me down into cold rage. This is freakin’ year twenty-ten! Why are people STILL not looking at their logs?!! Not even monthly especially where daily is mandated (see PCI DSS)?! Are they that mind-blowingly stupid?! Do they love to live on the bloody edge, perhaps? Do they enjoy being violently penetrated and not even enjoy it, purely for masochistic purposes? I read some blog posts which basically expressed the same rage (example), and my rage just became The Epic Log Rage.
And while consumed by this rage, I had an epiphany! End-users are not really the ones to blame - not that much. Nobody can be blamed for not wanting to ‘grep’ a 245GB log file… I think :-)
Our log analysis tools are simply too pathetic. Think about it - they are!!!
Why is “empty search window” and “overly complicated correlation rule builder” represent the state of the art of log analysis after nearly 20 years of development in this field?! Why do we have to dig for log insights like fucking truffle hunting pigs?
Further, yesterday I was trying to explain the state of the art of log analysis to a client (who looks to use his cool  new technology for log analysis and SIEM), and I felt embarrassed to admit that, yes, “search” and “rules” are indeed the state of the art.
In other words, most of the analysis burden is on the tool USER BRAIN, not on the TOOL. They looked at me like I just wasted 10 years of my life, writing regexes and otherwise being a stupid monkey. Even things like profiling/baselining (example) or simple – and I mean SIMPLE – data mining (example, details) mostly stay on research drawing boards for ages.
So, I can talk about unsupervised learning, associative rule discovery and natural language processing (the other NLP) for logs as well as the next guy (and maybe better), but the tools you can buy just don’t have that shit. They have “compliance reports” [deep insight alert… NOT :-], “empty search window” and “learn what CustomInteger17 means and then you can write your very own correlation rule…that will function…maybe” while a simple Netflix movie selection triggers more brainpower on the backend than is available in all SIEM product combined…
To conclude, I have a suspicion that it is likely that in the near future all SIEM tools magically turn into electric typewriters.
P.S. My dear vendor friends and colleagues, don’t take offense! I still love you. We all just need to work and think a little harder – that’s all :-)
P.P.S My dear friends in academia, please DO take offense! Most log analysis research I’ve seen over the last 10 years is …mmm… not very practical. Get some real logs and get thinking!

Dr Anton Chuvakin