Friday, May 16, 2008

In Passing on DLP

Now, I am not some world-famous DLP analyst, but it doesn't mean that I cannot have an opinion on this "searing-warm"  :-) security concept: "data leak 'prevention'" or DLP (notice the double quotes around prevention...)

I admit that in the past I poked jokes at DLP for being "ADLP", with "A" standing for "accidental." Indeed, most of the technology approaches I've seen were "good enough" for preventing accidental leaks (e.g. Excel sheet with SSNs being emailed to an external party by mistake)  and for preventing truly idiotic "insider" attacks of the same nature. Whether they sniffed or used desktop agents, the tools were good enough to do the above, but not much more (or, they allowed you to do more, but via a truly ginormous effort by your security team). And then a retarded kindergarten kid can bypass them in his sleep without working up a sweat ...

In other words, DLP was for keeping honest (but sloppy) people honest and keeping idiots idiotic (but a bit safer). Which is, don't get me wrong, pretty darn useful: after all, overall, employee mistakes still cause more damage than hackers (!)

However, whenever I heard about DLP, I always felt some deeper longing for more - maybe for a technology that CAN actually stop some, clearly defined classes of malicious data theft, perpetrated by non-idiots.

What such technology might be? Well, IMHO,  it should have three things:

  1. Easy on the end user (=information owner) - thus no manual information tagging needed (don't you know, its dead!)
  2. Easy on the tool operator (=security team) - thus no super-granular policy-writing  needed (and please - spare me the regexes!)
  3. Effective enough to stop malicious insider of reasonable skill  over specific information channels- thus, some new technology for accurate detection of possibly modified documents across channels (e.g. common network)

Tough to match? Yup, it sure it. But that's not all: I'd like it to defend against theft of  structured, unstructured and structured->unstructured (e.g. database contents pasted to email!) information over just about any network channel (not device theft and not USB/portal device download - these are a different story).  What's more, I think that to enable #3 above the DLP "box" needs to actually understand what the document is about and to do it in a human-like fashion (Yes, including rephrased (!) content. Yes, I am picky :-)).

The above clearly does NOT mean that the technology is  not bypassable - there is always an encrypted zip file and gpg, custom encrypted network protocols, or even a screenshot emailed, etc (not even going to device theft, USB xfers or camera phone + screenshot + MMS). It just means that it takes DLP a few big notches up from "anti-retard defense"  to blocking a malicious and dedicated non-IT employee from stealing the crown jewels.

And, if one is trying to be honest about DLP, he need to define what is out of scope (after all, only narrowly defined problems are actually solvable in this space, not "our MagicBox  6.1 will block ALL data theft," which is absurd - if you believe that, you need your head examined).

I was pretty shocked to learn that something like this actually exists today: the next wave of DLP start-ups is about to emerge. For example, NexTierNetworks can detect information traces even in modified and heavily edited documents (I would like to try rephrasing as well; I suspect it will work!). When I saw a demo I was pretty impressed that you can get a financial document, change a few things here and there, paste it to email - and the system will still stop it by saying "uh-uh, this is sensitive info, no can do" :-) Mind you, this is not what current DLP vendors call "fingerprinting," since it actually uses what the document is about i.e. works on a - hate the word! - semantic or meaning level. So, DLP + a bit of NLP (the other NLP) = magic :-)

As a disclosure, I have to say that I just joined their Advisory Board, but, as you can guess, I joined because I am impressed (not "impressed because I joined!" :-))

Technorati tags: , ,


Anonymous said...

This "straw man" argument about DLP vendors' overblown claims to treat every conceivable insider threat is dissapointing. So it's clear: the leading vendors focus on the problem of well-meaning insiders as the primary risk that can be (and should be) treated by their solutions.

Good to see that you too recognize the threat of well-meaning insiders to be such a significant issue. In terms of both breach rates and in terms of compliance failures, well-meaning insiders represent one of the biggest untreated information security risks.

Having said that, DLP solutions are still often deployed to treat some insider threats from malicious behavior as well. Although you are correct about the key features needed to address a range of these malicious insider behaviors, you don't seem to realize that these features are already a part of the leading solution suites. Resilience to document modification, ability to detect structured data copied to unstructured documents, and (most importantly) cost-of-ownership control are all key elements of leading DLP solution sets. In fact, these features have been in shipping product for some years now.

Frankly, there's already enough DLP technology out there to treat the top-most risks from insiders. Residual untreated risk from insider threats will never be down to zero, but the problems we can treat effectively right now cover the most likely modes of egress or exposure of data.

The faster we practitioners can get this news out to the rest of the security community (that the most important aspects of insider threat are treatable) the quicker we can get these alarming breach rates under control.

Anonymous said...

Augusto Paes de Barr: It is clearly a step forward, but because of changes in media type (e.g., screenshot of a Word window) the best DLP is still avoiding the determined and skilled malicious user to access sensitive information.

Anonymous said...

Actually, there's plenty of cases of screen-capture that can be stopped by DLP solutions as well.

It's not that this is an untreatable risk. It's that this is a low impact threat model. It's less convenient than the obvious means of exfiltrating the data (thumb drive, email etc...) and frankly its just not that common a risk.

DLP vendors aren't "avoiding" malicious insider risk, it's more to do with a rational prioritization of effort. We end up focusing on the biggest risks (well meaning insiders) because...well...that's what is happening in the enterprise.

Anton Chuvakin said...

>there's already enough DLP
>technology out there to treat the
>top-most risks from insiders.

Hmmm, I am not so sure about this; at least the ones I've seen are fairly easy to deceive (even if you are not in IT security, that is!)

Indeed, "well-meaning insiders" probably lead to more losses, but what annoys me that the DLP boxes are MORE often sold as "anti-insider-threat systems."

So, I think that more innovation is needed to bring the "state of the art" of technology closer to the "state of the art" of marketing which is - IMHO - too far ahead in case of DLP.

Do let me know how a DLP system (current OR FUTURE!!!) will detect a screenshot of sensitive info. It seems to be that such a technology doesn't exist (well, you can OCR the shot, I guess, but this will take a long time and will suffer pretty much from most of the CAPTCHA-breaking challenges ...)

Anonymous said...

I second what Kevin said about data modifications, cut and pastes of fragments, etc. Of course, it also depends on the matching mechanisms the products use. Often many mechanisms can be used / configured. For example, the company I work for (which will remain nameless because I'm not trying to promote anything here) has the exact data match technology (using various algorithms), partial data match technology (also using various algorithms), and linguistic analysis technology, which is probably the closest to what this new company is doing (obviously, the implementation is different... with a slightly different focus). The linguistic analysis engine can also allow you to catch rephrased sensitive data. I don't know this company's implementation, but it looks the rephrasing handling is done in a different place there... Still the results are similar.

Dr Anton Chuvakin