Back in 2002 when I was at a SIEM vendor that shall remain nameless (at least until they finally die), I fell in love with algorithmic "correlation." Yes, I can write correlation rules like there is no tomorrow (and have fun doing it!), but that’s just me – I am funny that way. A lot of organizations today will rely on default correlation rules (hoping that SIEM is some kinda “improved host IDS” of yesteryear … remember those?) and then quickly realize that logs are too darn diverse across environments to make such naïve pattern matching useful for many situations. Other organizations will just start hating SIEM in general for all the false default rule alerts and fall back in the rathole of log search aka “we can figure out what happened in days , not months” mindset.
That problem becomes even more dramatic especially when they try to use mostly simple filtering rules (IF username=root AND ToD>10:00PM AND ToD<7:00AM AND Source_Country=China, THEN ALERT “Root Login During Non-Biz Hours from Foreign IP”) and not stateful correlation rules, written with their own applications in mind. As a result, you'd be stuck with ill-fitting default rules and no ability to create custom, site-specific rules or even intelligently modify the default rules to fit your use cases better. Not a good situation - well, unless you are a consultant offering rule correlation tuning services ;-)
One of the ways out, in my opinion, is in wide use of event scoring algorithms and other ruleless methods. These methods, while not without known limitations, can be extremely useful in environment where correlation rule tuning is not likely to happen, no matter how many times we say it should happen. By the way, algorithmic or "statistical" correlation has typically little to do with correlation or statistics. A more useful way to think about is weighted event scoring or weighted object (such as IP address, port, username, asset or a combination of these) scoring
So, in many cases back then people used a naïve risk scoring where:
risk (for each destination IP inside) = threat (=event severity derivative) x value (=user-entered for each targeted “asset” - obviously a major Achilles heel in the real world implementations) x vulnerability (=derived from vulnerability scan results)
It mostly failed to work when used for real-time visualization (not historical profiling) and was also really noisy for alerting. But even such simplistic algorithm, however, still presents a very useful starting point to develop better methods, post-process and baseline the data, add dimensions, etc. It was also commonly not integrated with rules, extended asset data, user identity, etc.
Let’s now fast forward to 2011. People still hate the rules AND rules still remain a mainstay of SIEM technology. However, it seems like the algorithmic ruleless methods are making a comeback, with better analysis, profiling, baselining and with better rule integration. For example, this recent whitepaper from NitroSecurity (here, with registration) covers the technology they acquired when LogMatrix/OpenService crashed and now integrated into NitroESM. The paper covers some methods of event scoring that I personally know to work well. For example, a trick I used to call “2D baselining”: not just tracking the user actions over time and activities on destination assets over time, but tracking pair of user<->asset over time. So, “jsmith” might be a frequent user on “server1”, but only rarely goes to “server2”, and such pair scoring will occasionally show some fun things from the “OMG, he is really doing it!” category
So, when you think SIEM, don’t just think “how many rules?” – think “what other methods for real-time and historical event analysis do they use?”
Possibly related posts:
That problem becomes even more dramatic especially when they try to use mostly simple filtering rules (IF username=root AND ToD>10:00PM AND ToD<7:00AM AND Source_Country=China, THEN ALERT “Root Login During Non-Biz Hours from Foreign IP”) and not stateful correlation rules, written with their own applications in mind. As a result, you'd be stuck with ill-fitting default rules and no ability to create custom, site-specific rules or even intelligently modify the default rules to fit your use cases better. Not a good situation - well, unless you are a consultant offering rule correlation tuning services ;-)
One of the ways out, in my opinion, is in wide use of event scoring algorithms and other ruleless methods. These methods, while not without known limitations, can be extremely useful in environment where correlation rule tuning is not likely to happen, no matter how many times we say it should happen. By the way, algorithmic or "statistical" correlation has typically little to do with correlation or statistics. A more useful way to think about is weighted event scoring or weighted object (such as IP address, port, username, asset or a combination of these) scoring
So, in many cases back then people used a naïve risk scoring where:
risk (for each destination IP inside) = threat (=event severity derivative) x value (=user-entered for each targeted “asset” - obviously a major Achilles heel in the real world implementations) x vulnerability (=derived from vulnerability scan results)
It mostly failed to work when used for real-time visualization (not historical profiling) and was also really noisy for alerting. But even such simplistic algorithm, however, still presents a very useful starting point to develop better methods, post-process and baseline the data, add dimensions, etc. It was also commonly not integrated with rules, extended asset data, user identity, etc.
Let’s now fast forward to 2011. People still hate the rules AND rules still remain a mainstay of SIEM technology. However, it seems like the algorithmic ruleless methods are making a comeback, with better analysis, profiling, baselining and with better rule integration. For example, this recent whitepaper from NitroSecurity (here, with registration) covers the technology they acquired when LogMatrix/OpenService crashed and now integrated into NitroESM. The paper covers some methods of event scoring that I personally know to work well. For example, a trick I used to call “2D baselining”: not just tracking the user actions over time and activities on destination assets over time, but tracking pair of user<->asset over time. So, “jsmith” might be a frequent user on “server1”, but only rarely goes to “server2”, and such pair scoring will occasionally show some fun things from the “OMG, he is really doing it!” category
So, when you think SIEM, don’t just think “how many rules?” – think “what other methods for real-time and historical event analysis do they use?”
Possibly related posts:
- How Do I Get The Best SIEM?
- Log Management->SIEM Graduation Criteria: Violate at Your Own Peril!
- How to Replace a SIEM?
- SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?
- How to Write an OK SIEM RFP?
- On Choosing SIEM
- "So, What Should I Want?" or How NOT to Pick a SIEM-III?
- The Myth of SIEM as "An Analyst-in-the-box" or How NOT to Pick a SIEM-II?
- I Want to Buy Correlation” or How NOT to Pick a SIEM?
- Log Management + SIEM = ?
- On SIEM Complexity
- SIEM Bloggables: SIEM Use Cases and Whitepaper with detailed SIEM use cases
- Log Management / SIEM Users: "Minimalist" vs "Analyst"
- All posts labeled SIEM