Tuesday, October 24, 2023

How to Banish Heroes from Your SOC? [Medium Backup 10/12/2023]

 This blog was born from two parents: my never-finished blog on why relying on heroism in a Security Operations Center (SOC) is bad and Phil Venables “superb+” blog titles “Delivering Security at Scale: From Artisanal to Industrial.”

BTW, what is heroism? Isn’t that a good thing? Well, an ancient SRE deck defines “IT heroism” as relying on “individuals taking upon themselves to make up for a systemic problem.” As those who have seen the inside of a SOC can attest, this is, ahem, not entirely uncommon in many Security Operations Centers.

If you recall our Autonomic Security Operations (ASO) vision, we advocate for automation, consistent processes and systematic, and engineering-led approach to problems. Yet in real life heroes are very much needed at many SOCs for their routine operation. This is the essence of our conundrum: human heroism is usually good, but a system that relies on heroes for routine operation is bad.

Here is a great quote from another domain that explains this even better:

The need for heroism is revealing the fact that you haven’t scaled your organization’s processes to effectively withstand the brunt of the unexpected, leaving it on individuals to bear.” (source)

Is your SOC such a system? If yes, how to change it?

First, where might this show up in your SOC?

  • Heroic alert triage where analysts stay late, extend their shifts, accept escalations at all hours, etc (likely the most common example, frankly)
  • Heroic rule writing where rules and content gets created, instead of a detection engineering practice you have a detection firefighting crew…
  • Heroic remediation is the classic “wait, wait, I can fix it” syndrome that, statistically speaking, very rarely leads to a good solution.
  • Another classic: working long hours to resolve an incident alone.
  • Frequently coming up with creative one-off solutions to wide-ranging systemic problems.

What do you want instead? Well, you want an industrial system! What is it? Here, Phil explains it better than I can:

source: Phil’s blog https://www.philvenables.com/post/delivering-security-at-scale-from-artisanal-to-industrial

Now, let’s see if we can quickly contextualize it for SOC

source: I just made it :-)

Notice that the heroism makes many appearances in Phil’s “artisanal” side of the table. ”Dependent on individual artisans [read: heroes] to sustain work”, “Organization success is like spinning plates, if the people don’t show up there’s immediate and catastrophic failure“, “Hard to replicate” all carry the unmistakable mark of an IT hero…

OK, gimme some good news! How to fix it?

Trigger warning: this is going to be scary.

Ready?

source: privately shared

Now for the painful, painful truth: “It’s better to let a process break and uncover a systemic issue (like the need for better tooling or an adjustment of priorities), than to have individuals try to make up for the problem.“

You want more? Sorry, all I got is this ;-) Definitely more thinking and learning is required.

Now a question: have you successfully industrialized or “de-hero-ized” your SOC? Have you used our ASO ideas? What are the lessons? Insights? Key hurdles?

Related blogs:

No comments:

Dr Anton Chuvakin