Saturday, June 20, 2009

Why No Open Source SIEM, EVER?

Here is a perfect weekend post – on SIEM :-) Ok, all this Google web traffic of people searching for “open source SIEM” (sometimes “open source SIM”, almost never “open source SEM” {Is SEM .. dead? :-)}) continues to fill my web server logs and it finally prompted me to write this post, rather than simply whine about, like I was doing for 3 years :-)

It all started here when Matasano folks (sockpuppet.org at the time), in a rare bout of punditry proclaimed back in 2005 (!):

“A Credible Open-Source SIM

I predicted that, just as SourceFire commoditized and co-opted the IDS market, a nascent open source project would challenge SIM products like ArcSight and Cisco MARS.

Result: No Credit [A.C. – this is a later addition to their post when their scored their 2006 predictions]

What’s taking you guys so long? Getting spooked that all the money seems to be going to log management? That’s exactly the dynamic Snort charged in to! Get with the program!”

or in another version posted on DailyDave:

“A Credible Open-Source SIM

There's about $100MM spent annually on products that manage and correlate logs. Guess what? None of it is hard to do. The underlying tools are there. Customers know how to do this better than the vendors do. Expect a mainstream open-source combination of Argus <http://www.qosient.com/argus/> and Sguil <to">to">to">http://sguil.sourceforge.net/>to own the security management conversation next year.”

When I saw it, I got upset that people otherwise so amazingly intelligent (example: Thomas Ptacek) can make claims so incorrect :-) A fun discussion of this prediction emerged in multiple places, also back in 2005-2006: the post comments, DailyDave (Dave’s post, my post, David Bianco post, my next post, the whole thread), my blog (On Open Source in SIEM and Log Management, )

Among all the discussion, this piece by Dave stood up:

“My prediction: No credible open source SIM (aka, log aggregator).

Boring work gets done by corporations, and that's that. Not to mention the impossibly high barrier to market of having to purchase and maintain all the random devices that generate logs.”

This is basically the essence of my argument which I also made here in my approaches to log management presentation, slide 10 (even though I was arguing against building one’s homegrown log management or SIEM). To summarize:

  1. Building a SIEM is fun (perfect for open source), BUT SIEM is inherently “high-maintenance” via a lot of boring, manual tasks (one example: check Cisco.com weekly for changes to log messages of their hundreds of devices THEN pull your hair out anyway when logs change without any documentation). Maintenance is NOT open-source forte, and for SIEM, “no meticulous maintenance –> no value.” Open source community is not so great with eternal commitments.
  2. To analyze logs, you need to have logs. Either you get the logging devices (expensive –> not for open source) or you get the logs. Many people said “oh, open source community will collaborate on that.” Guess what? It didn’t (attempts here, here, here (now redirects)). When log standards (CEE) emerge, it will change; today it is impossible.
  3. Can the task of log analysis be pushed to end users of the open source tool (after all, they are getting it for free, they can do some work…)? Yes, it can, provided there are tools to drastically simplify the logs->intelligence path (at one point, I hoped splunk’s “Event type discovery” will do it, but it didn’t); such tools do not exist. And, sadly, normal people don’t write regexes (good joke about it). To top it off, writing parsing rules is nowhere near as much fun as writing IDS sigs or vulnerability checks – and then packet headers don’t change on you, while log headers do.
  4. Log analysis or SIEM system needs to be able to handle volume, not only live flow, but also storage. A lot of tools work well on 10MB of logs, but then again, so does human brain. When you move to TB volumes, a lot of simple things start to require engineering marvels. Is it as hard as getting the Linux kernel (the pinnacle of open source engineering) to perform? Probably not as hard, but the OSS SIEM project creator need to be BOTH a log expert and a performance expert.
  5. SIEM is also a lot about integration and not just hard-core coding. I believe in open-source correlation engine (SEC, OSSEC, general-purpose Esper), maybe in open source parser generator, possibly in open source data presentation UI, but definitely not in all pieces working together and pulling log data and context data from all the required sources and then making sense of it. There are way too many moving pieces – as we all know, many SIEM deployments fail not because of crappy technology, but because of politics.

There are other related grand problems too, but I digress.

Some people (in the same DD thread) even suggested that the reason that open source community didn’t get to tackle the above problems is simple: SIEM products aren’t really needed (Richard doesn’t have much love for them, for example) and that the community will find some other way of solving it (“a small, useful, standalone tool will almost always be more functional and more reliable than a merit badge feature equivalent in a commercial product”) I agree with that in principle, but if part of SIEM’s value-add is "tying stuff together" then having analysts watching 10 "small, useful, standalone tools" is actually a way back, not forward.

Maybe an open source SIEM project can only support a few “right” log messages? This was a very popular view in the 90s: just filter the logs and see the important ones. But do you know why Marcus created “artificial ignorance”? ‘Cause “filter the logs” approach doesn’t really work: you never know what are the right ones, until you look at all.

What about the existing products, which are

  • Prelude is not a SIEM and hardly anyone uses it.
  • Sguil is not a SIEM. It is based on a different model, assumptions (=intelligent user) and use cases.
  • OSSEC is awesome, but also not a SIEM. It has correlation now and “wide-ish” log source support, but doesn’t measure up to SIEM in many dimensions.
  • OSSIM is indeed an open-source SIEM. Now that it ha a full-blown corporate parent, it has potential. In fact, when I first saw it in 2005 (maybe before, not sure), it had potential too. It is just now it has more of it!

Now, more on OSSIM: Dominic and the crew are awesome, but I think that the above considerations will prevent OSSIM from becoming widely adopted. Here is why: how many open source NIDS do you know? 94% [source: srand() :-)] of folks in security will say: one (Snort), another 3% will say two (Snort, Bro), another 2% will say 3 (Snort, Bro, Prelude), another 1% will say something else. Now, try that with open source SIEM: there is no “snort of SIEM” and the result will be different. IMHO this is inherent (=not a question of time) due to incompatibility of SIEM and open source model, shown in items 1.-5. above.

BTW, somehow recent Twitter SIEM madness (eh… #SIEM madness), caused other people to think about this too.

Conclusions:

  • So, at the risk of eating major crow later, I insist: no credible open source SIEM will emerge until 2020 (niche projects will continue, just as open source NIDS existed before Snort)
  • Taking it to an extreme, I think a commercial SIEM may die first before the open source one is born…
  • Topics of industry discussions on SIEM from 2005 are still relevant => SIEM equals stagnation.

BTW, did I mention cloud/SaaS SIEM? Oops, I did now :-)

Have fun with it!

Possibly related posts:

Dr Anton Chuvakin