NOTICE: after Gartner killed ALL blogs in late 2023, I am trying to salvage (via archive.org) some of the most critical blogs I've written while working there, and repost them with backdates here, for posterity. This one reminds the world that I popularized the term "output-driven SIEM"
Here is a great term I picked from another SIEM literati: “output-driven SIEM.” This simply means deploying your security information and event management tool in such a way that NOTHING comes into your SIEM unless and until you know how it would be utilized and/or presented. Thus, only existing/planned reports, visuals, alerts, dashboards, profiling algorithms, context fusion or whatever other means of using the data can make a SIEM implementer to “open the floodgates” and admit a particular log type into a tool. If a process exists outside of a SIEM tool that will make use of the SIEM data, that qualifies as well. In this model, goals drive security requirements, requirements drive use cases, use cases drive functionality and collection scope. By the way, this model is as well-known and effective … as it is, sadly, uncommon among the organizations deploying SIEM tools today. “Now that we have all this data [and now that our SIEM is very slow], how do we use it?” is much more common….
For example, if your goal is to make it possible to detect when your users abuse access credentials (or when somebody steals their credentials), requirements will call for login-counting correlation rules, user activity profiling as well as associated reporting on user access data. Thus, various types of authentication records (Unix syslog and Windows event logs, access control and remote access server logs, VPN, etc) need to be collected.
Now, this is dramatically different from an approach one should take with broad scope log management, aimed at general system troubleshooting or incident response support. This is where being “input-driven” and getting every possible bit of data in would be admirable. Collect “100% of all logs,” pile them in Hadoop, have them ready for use, etc works brilliantly there – pick the data now and sort it out later, don’t dwell on choosing collection-time filters. However, doing the same with a SIEM is a great way to turning your deployment into a quivering, jumbled mess of barely performing components and oodles of “crap-ta” (a hybrid of “crap” and “data”, as you can guess). “Big” or “small”, unused data just does not help the SIEM perform its security mission well.
How does such difference matter in real-world deployments?
Every log line going into a SIEM tool “costs” (and sometimes actually costs – i.e. in dollar and not just in computing resource terms) much more than a log line dropped into a log aggregator. $50,000 for an appliance system that does 100,000 EPS sounds like a great log management price, while SIEM deployments where 100,000 log messages are actually analyzed by a SIEM every second are both rare and really expensive (likely well into 7 digits territory).
Admittedly, “output-driven SIEM” is hard work. It makes soooooo much sense to “just collect it for now” and then “figure out how to use it later.” In many cases, however, this means that your deployment will be stuck. Sometimes it may work for you – but please be aware that for many people who thought that “it would work for them," it actually did not. At this point, it should be obvious to most readers that combining “input-driven” log aggregation and “output-driven” SIEM analysis is still the best way to go for most organizations. And, yes, as with every great useful rule, it has great useful exceptions …
On the architecture side, if your SIEM includes log management components (like most do today), the same logic applies: that aggregator component will see all of the data, while core SIEM analysis components and dashboards will only see the data that needs to be there. For two distinct tools, this “magic” is achieved via filters that are deployed between a log management system and a SIEM.
So, think about using the data before you admit it into a SIEM!
Related SIEM posts: