The Wal-Mart story inspired me to summarize the most egregious, reckless, painful, negligent, sad, idiotic examples of “Log FAIL.” Here they are:
- Logging disabled: if you got a system which had operational logging enabled by default and then you turned it off before deploying in production – congratulations! You truly earn your title of a Log Idiot! :-)
- Logging not enabled: this is more sad than anything else … and the person who will suffer the most from this is likely the one who has caused it. After all, you’d need those logs at some point yourself. There is nothing sadder than see a person having to explain to management, police, FBI, press, QSA, SEC, whoever: “Well, logging … was … never … enabled!” (check out this motivational horror story)
- No log centralization: Windows admins, read this one – logs on the machine that crashed, was 0wned or even stolen will do you absolutely no good. It used to be that only Unix administratory can do this (via the magic of /etc/syslog.conf line “*.* @loghost.example.com”), but you, on the Windows side, could not. Please notice that the world is different now! (check out this deck on benefits and tips related to log centralization)
- Log retention period too short: the picture on the right should make this item (as well as the one above) painfully clear: doing “the right thing” and building the centralized logging infrastructure and then limiting the retention to 30 days is still “log FAIL.” Many, many scenarios today require logs from the past – for the juiciest examples check all the recent “compromised in 2006 – discovered in 2008” stories (see some here in this deck)
- No logging of “Granted”, “Accepted”, “Allowed”, etc: I don’t even know where to start on this one – maybe thus: logging a firewall “connection blocked” events simply means that the firewall was doing its job, logging “connection allowed” shows that somebody is now in your network… The same idea applies to logging “login failed” and missing “login successful” – please make really sure to always log both (read this tip for more examples, instructions and ideas)
- Bad logs: if you are in operations, this is truly not your fault. But if you are in development – it probably is. Creating such logging classics as “failed successfully” and “login failed” [with no actual user name recorded] are fine examples of this “log FAIL.” Be aware that our work on CEE will fix it eventually, but more hilarity will have to transpire before it happens (see this deck for some ideas on how not to engineer logging and how to do it – and for some examples of hilarity, of course)
- No log review or nobody is looking at logs: I am saving this “log FAIL” for last; logs are created to be reviewed, monitored, searched, investigated, etc and NOT – I assure you! – to simply use up disk space (check my famous “Top 11 Reasons to Look at Logs” as well the classic “Top Logging Mistakes” for more info on this one)
BTW, check out Branden’s musings about the same subject here.
Possible related posts: