Dr Anton Chuvakin Blog (Original): June 2011

Thursday, June 16, 2011

PCI DSS in the Cloud … By the Council

The long-awaited PCI Council guidance on virtualization has been released [PDF]. Congrats to the Virtualization SIG for the mammoth effort! I rather liked the document, but let the virtualization crowd (and press!) analyze it ad infinitum – I’d concentrate elsewhere: on the cloud! This guidance does not focus on cloud computing, but contains more than a few mentions, all of them pretty generic.

Here are some of the highlights and my thoughts on them.

Section 2.2.6 “Cloud Computing” does contain some potentially usable (if obvious) scope guidance:

“Entities planning to use cloud computing for their PCI DSS environments should first ensure that they thoroughly understand the details of the services being offered, and perform a detailed assessment of the unique risks associated with each service. Additionally, as with any managed service, it is crucial that the hosted entity and provider clearly define and document the responsibilities assigned to each party for maintaining PCI DSS requirements and any other controls that could impact the security of cardholder data.“ [emphasis by A.C.]

Now, after spending the last few months working on a training class on PCI DSS in the cloud for Cloud Security Alliance (in fact, I am still finishing the exercises for our July 8 beta run), the above sounds like a total no-brainer. However, I know A LOT of merchants “plan” to make the mistake of “we use PCI-OK cloud provider –> then we are compliant”, which is obviously completely insane (just as PA-DSS payment app does not make you PCI DSS compliant…and never did).

Further, the council guidance clarifies the above with:

”The cloud provider should clearly identify which PCI DSS requirements, system components, and services are covered by the cloud provider’s PCI DSS compliance program. Any aspects of the service not covered by the cloud provider should be identified, and it should be clearly documented in the service agreement that these aspects, system components, and PCI DSS requirements are the responsibility of the hosted entity [aka “merchant” – A.C.] to manage and assess. The cloud provider
should provide sufficient evidence and assurance that all processes and components under their control are PCI DSS compliant.” [emphasis in bold by A.C.]

The above is actually a gem, a nicely condensed version of a pile of challenges and hard problems, all nicely summarized. Indeed, “PCI in the cloud” is largely about the above paragraph, but … there is A LOT OF DEVIL in the details Smile I’d like to draw your attention to the fact that providers have to “provide sufficient evidence and assurance” as opposed to just saying “we got PCI Level 1.” There is a big lesson for cloud providers in it…

In further sections (section 4.3, mostly), there is some additional useful guidance, such as:

“In a public cloud, some part of the underlying systems and infrastructure is always controlled by the cloud service provider. The specific components remaining under the control of the cloud provider will vary according to the type of service—for example, Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). […] Physical separation between tenants is not practical in a public cloud environment because, by its very nature, all resources are shared by everyone.” [emphasis by A.C. again; this reminds us that PCI does NOT in fact require such ‘physical’ separation of assets]

On top of this, the Council folks also highlight some of the additional cloud security challenges, affecting PCI DSS, such as (page 24, section 4.3):

“The hosted entity has limited or no visibility into the underlying infrastructure and related security controls.
“The hosted entity has no knowledge of ―who‖ they are sharing resources with, or the potential risks their hosted neighbors may be introducing to the host system, data stores, or other resources shared across a multi-tenant environment.” [the section in bold is kind of a hidden big deal! think about it – your payment environment may blow up since your cloud neighbor just annoyed LulzSec by something they said on Twitter…]

The guidance counters these and other challenges with additional controls:

“In a public cloud environment, additional controls must be implemented to compensate for the inherent risks and lack of visibility into the public cloud architecture. A public cloud environment could, for example, host hostile out-of-scope workloads on the same virtualization infrastructure as a cardholder data environment. More stringent preventive, detective, and corrective controls are required to offset the additional risk that a public cloud, or similar environment, could introduce to an entity’s CDE.” [notice MUST, not “may” or “should”; also notice REQUIRED and not “suggested” or “oh wow, would it be nice if…” ]

And if you don’t have such additional controls, then: “These challenges may make it impossible for some cloud-based services to operate in a PCI DSS compliant manner.”

In any case, it was definitely a fun and useful read; hopefully future detailed guidance on PCI in the cloud is coming (given that virtualization SIG took a few years, I am looking forward to 2017 or later here)

BTW, my PCI DSS in the cloud training class will happen on July 8 in Bay Area and you can still sign up.

Tuesday, June 14, 2011

Algorithmic SIEM “Correlation” Is Back?

Back in 2002 when I was at a SIEM vendor that shall remain nameless (at least until they finally die), I fell in love with algorithmic "correlation." Yes, I can write correlation rules like there is no tomorrow (and have fun doing it!), but that’s just me – I am funny that way. A lot of organizations today will rely on default correlation rules (hoping that SIEM is some kinda “improved host IDS” of yesteryear … remember those?) and then quickly realize that logs are too darn diverse across environments to make such naïve pattern matching useful for many situations. Other organizations will just start hating SIEM in general for all the false default rule alerts and fall back in the rathole of log search aka “we can figure out what happened in days , not months” mindset.

That problem becomes even more dramatic especially when they try to use mostly simple filtering rules (IF username=root AND ToD>10:00PM AND ToD<7:00AM AND Source_Country=China, THEN ALERT “Root Login During Non-Biz Hours from Foreign IP”) and not stateful correlation rules, written with their own applications in mind. As a result, you'd be stuck with ill-fitting default rules and no ability to create custom, site-specific rules or even intelligently modify the default rules to fit your use cases better. Not a good situation - well, unless you are a consultant offering rule correlation tuning services ;-)

One of the ways out, in my opinion, is in wide use of event scoring algorithms and other ruleless methods. These methods, while not without known limitations, can be extremely useful in environment where correlation rule tuning is not likely to happen, no matter how many times we say it should happen. By the way, algorithmic or "statistical" correlation has typically little to do with correlation or statistics. A more useful way to think about is weighted event scoring or weighted object (such as IP address, port, username, asset or a combination of these) scoring

So, in many cases back then people used a naïve risk scoring where:
risk (for each destination IP inside) = threat (=event severity derivative) x value (=user-entered for each targeted “asset” - obviously a major Achilles heel in the real world implementations) x vulnerability (=derived from vulnerability scan results)

It mostly failed to work when used for real-time visualization (not historical profiling) and was also really noisy for alerting. But even such simplistic algorithm, however, still presents a very useful starting point to develop better methods, post-process and baseline the data, add dimensions, etc. It was also commonly not integrated with rules, extended asset data, user identity, etc.

Let’s now fast forward to 2011. People still hate the rules AND rules still remain a mainstay of SIEM technology. However, it seems like the algorithmic ruleless methods are making a comeback, with better analysis, profiling, baselining and with better rule integration. For example, this recent whitepaper from NitroSecurity (here, with registration) covers the technology they acquired when LogMatrix/OpenService crashed and now integrated into NitroESM. The paper covers some methods of event scoring that I personally know to work well. For example, a trick I used to call “2D baselining”: not just tracking the user actions over time and activities on destination assets over time, but tracking pair of user<->asset over time. So, “jsmith” might be a frequent user on “server1”, but only rarely goes to “server2”, and such pair scoring will occasionally show some fun things from the “OMG, he is really doing it!” category Smile

So, when you think SIEM, don’t just think “how many rules?” – think “what other methods for real-time and historical event analysis do they use?”

Possibly related posts:

Wednesday, June 08, 2011

NIST EMAP Out

As those in the know already know, NIST has officially released some EMAP materials the other day (see scap.nist.gov/emap/). EMAP stands for “Event Management Automation Protocol” and has the goal of “standardizing the communication of digital event data.” You can think of it as future “SCAP for logs/events” (the SCAP itself is for configurations and vulnerabilities). Obviously, both twin standards will be “Siamese twins” and will have multiple connection points (such as through CVE, CPE and others).

In reality, SCAP and EMAP are more like “standard umbrellas” and cover multiple constituent security data standards – such as CPE, CVE, CVSS, XCCDF, etc for SCAP and CEE for EMAP. As the new EMAP site states:

The Event Management Automation Protocol (EMAP) is a suite of interoperable specifications designed to standardize the communication of event management data. EMAP is an emerging protocol within the NIST Security Automation Program, and is a peer to similar automation protocols such as the Security Content Automation Protocol (SCAP). Where SCAP standardizes the data models of configuration and vulnerability management domains, EMAP will focus on standardizing the data models relating to event and audit management. At a high-level, the goal of EMAP is to enable standardized content, representation, exchange, correlation, searching, storing, prioritization, and auditing of event records within an organizational IT environment.

[emphasis by me]

While CEE team is continuing its work on the log formats, taxonomy, profiles and other fun details of logging events, the broader EMAP effort creates a framework around it as well as proposes a set of additional standards related to correlation, parsing rules, event log filtering, event log storage, etc. The released deck [PDF] has these details as well as some use cases for EMAP such as Audit Management, Regulatory Compliance, Incident Handling, Filtered Event Record Sharing, Forensics, etc.

In the future, I expect EMAP to include event log signing, maybe its own event transport (run under CEE component standard) as well as a bunch of standardized representation for correlation (via CERE component standard) and parsing rules (via OEEL) to simplify SIEM interoperability as well as migration.

Everything public to read on EMAP is linked here (2009), here (2010), here, etc [links are PDFs], if you are into that sort of reading. SIEM/log management vendors, please pay attention Smile - some of you have already started implementation of this stuff….

Wednesday, June 01, 2011

Monthly Blog Round-Up – May 2011

Blogs are "stateless" and people often pay attention only to what they see today. Thus a lot of useful security reading material gets lost. These monthly round-ups is my way of reminding people about interesting and useful blog content. If you are “too busy to read the blogs,” at least read these.

So, here is my next monthly "Security Warrior" blog round-up of top 5 popular posts/topics this month.

“On Choosing SIEM” tops the charts this month. The post is about the least wrong way of choosing a SIEM tool – as well as why the right way is so unpopular. A related read is “SIEM Resourcing or How Much the Friggin’ Thing Would REALLY Cost Me?”, check it out as well. While reading this, also check this presentation
My commentary on the latest SIEM Magic Quadrant 2011 (“On SIEM MQ 2011”) is next – I not only share my insights, but also point some unintentional hilarity in the reports
“What To Do When Logs Don’t Help: New Whitepaper” announces my new whitepaper (written under contract for Observe-IT) about using other means for activity review and monitoring when logs are either not available or hopelessly broken
Also, “How to Replace a SIEM?” is on the list – it talks about a messy situation when you have to replace one SIEM/log management too with another
“Simple Log Review Checklist Released!” is still one of the most popular posts on my blog. Grab the log review checklist here, if you have not done so already. It is perfect to hand out to junior sysadmins who are just starting up with logs. A related “UPDATED Free Log Management Tools” is also still on top - it is a repost of my free log tools list to the blog.

Also, as a tradition, I am thanking my top 3 referrers this month (those who are people, not organizations). So, thanks a lot to the following people whose blogs sent the most visitors to my blog:

Also see my past annual “Top Posts” - 2007, 2008, 2009, 2010). Next, see you in May for the next monthly top list.

Possibly related posts / past monthly popular blog round-ups:

Dr Anton Chuvakin Blog (Original)