Wednesday, April 23, 2008

Some Burning Logging Questions - Answered!

I was wandering down a street and somebody came out and confronted me with these logging questions :-) So I answered them - now I am posting them here since they might be useful for my readers.

Q1: For those companies that have successfully implemented enterprise-wide logging, what were the big nasty surprises that they encountered?

A1: Here are a few:

  • political boundaries within the organization: "these are our logs, and you are not getting them"
  • privacy laws: some logs cannot be collected in some countries; some cannot cross the border, some cannot be seen by some people, etc. This is true mostly in EU, less in US.
  • legal blocks: work with legal before deploying any org-wide log management; legal might try to prevent certain data from ever being created (for fear of being legally discovered later)
  • log volume: underestimating log volume is common and pretty nasty
  • related to the last one: vendors being "optimistic" about their tool scalability
  • time synchronization (of course!), specifically, lack thereof.

Q2: For those companies that have successfully implemented enterprise-wide logging, what was their implementation approach?

A2: Typically, 2-3 vendor PoC or pilot first. Then with the chosen vendor: phased approach based on location + type of log source (e.g. firewalls, then routers, then OS, then proxies, etc) + network topology (e.g. DMZ, then internal) + log source criticality (e.g. critical servers first; the rest next). This might be handy to look at.

Q3: What kind of storage requirements have been experienced by those organizations who have successfully implemented enterprise-wide logging?

A3: Massive? :-)

Here is a simple example: PCI DSS is a bit more aggressive than NERC since it mandates 1 year of log retention vs NERC 90 days, so: 1 year worth of logs is = 365 days x 24 hours x 3600 seconds x 1 (one!!!) busy firewall with 100 log messages each second x 200 bytes per message average (e.g. valid for PIX and ASA devices) = 588 gigabytes / year of raw log data uncompressed (assuming 10x compression you'd get about 60GB of compressed log data per year)

Store it in RDBMS? Multiple it by 2-3. Have an index? Add about 30%.

The bottom line is: terabyte is the unit to measure logs.

Q4: At the organizations that have successfully implemented enterprise-wide logging, how logging impacted network and system performance?

A4: Too broad a question, so here are a few pointers:

  • logging affects performance much more on some types of systems compared to other types: most painful examples are databases where some people (can't find a link...sorry) report performance loss of up to 40% if logging all SELECT statements and other data retrieval commands (you need to log selectively on these); in other cases (e.g. web servers) there is no performance loss and logging is "always on"
  • log collection: agents impact system performance (long post on this subjects): a little when they run (everybody knows this) and A LOT when they crash (few people think about it - agent software memory leaks are not uncommon); unlike agents, remote agentless log collection barely affects system performance (unless you have one of the few esoteric cases)
  • log transfer and network performance: look for compressed (logs compress really well), TCP-based transfers; syslogging over UDP uncompressed has a chance of doing a pipe saturation DoS on your network. Yes, people say "use a dedicated LAN," but this is definitely wishful thinking for many. Also, raw UDP syslog in large quantities over WAN = insanity :-)

Q5: What were some successful strategies for obtaining buy-in from system owners and operators in regards to turning logging on?

A5: OK, also too broad a question, but here are some pointers:

  • provide them a useful service based on their logs (e.g. performance measurement, availability monitoring, compromise detection :-), or other security metrics, etc)
  • help them with their compliance mandates (e.g. create reports that they can show to the auditors that "bug" them)
  • give them tools to better solve their problems (e.g. allow access to a log management tool so that can investigate issues better, search the logs, check on their users, etc)

Q6: How the organizations that have successfully implemented enterprise-wide logging dealt with unusual devices (=log sources) that have no log management vendor support?

A6: They were in massive pain - if they choose a log management vendor wrong. You need to look for vendors that have "universal log source support" with NO requirement for a custom rules or custom collector/connector/agent development. Some vendors have generic text log collectors that can grab and analyze unknown logs. Typically this is done via some form of text indexing that works across all logs, including those from unknown, vertical, esoteric or custom-developed log sources

Hope it was useful!

No comments:

Dr Anton Chuvakin