Monday, March 24, 2008
So, you are a CSO for a major org (say a government agency, a bank or an Internet provider); you walk down the street and pass a typical street vendor selling books, software, etc. Suddenly you see "a database on DVD" for sale. You look closely and - oops! - it is your customer database with names, passport numbers, addresses, etc. Fun! :-)
Sunday, March 23, 2008
Am I a leading visionary in the field of log management? :-) Who cares - I will now pontificate as if I am :-) It is about time: specifically, timing logs. As I said in my Log Trust and Protecting Logs from Admins posts, the issue of trust is critical in the logging world. After all, logs = accountability; and the latter in unthinkable without trust. If we are to at least pretend that logs objectively record events and user actions, we need to unambiguously establish WHAT happened and WHEN. This post deals with the 'WHEN' issue.
So, can we trust that the time stamp in the log file or the one added by the log management system correctly describes when the event actually happened?
We will start from locating the timestamps in logs. Most of the log formats, such as file-based logs (web, application, some security gear, etc) and syslog, Windows event logs, database audit tables, proprietary ones, contain a timestamp. In fact, once I saw somebody use a timestamp to define logs as "timed records of IT activity." So, time is critical for logs being, well, logs :-) At this point it is worthwhile to note that file-based logs will contain a timestamp IN the file, while syslog records arriving over the UDP or TCP port 514 connection are usually timestamped upon arrival BY the syslog daemon (using its own "knowledge" of time) - and then it shows up in the syslog files in /var/log.
Let's assess whether this "in-log timestamp" provides an adequate way of timing the actual event that is being logged. Answering this question is important for investigations and troubleshooting, but becomes nearly a matter of life and death in case of log forensics.
Here are some fun cases and issues to consider:
First, what are the chances of a completely false timestamp in logs (BTW, today is Jan 1, 1970!) When might that happen? Typically when a logging system own clock is reset or not set correctly. This timestamp clearly should NOT be trusted.
Second, we can say that it’s always 5PM somewhere: in other words, what time zone are your logs in? EST? PDT? GMT? UTC? Or any of more than 24 other possibilities. If you have no idea, you should not trust the timestamp.
Third, are you in drift? Is your system clock? Those pesky drift seconds turn into minutes which then work to undermine the accuracy of timing the records (and thus your certainly and trust in evidence quality)
Fourth, syslog forwarder mysteries are plenty: some of the syslog messages will be delayed in transit and the be timestamped by the final recipient daemon, thus completely losing when the event was originally logged. Admittedly, this delayed syslog is rare, but as more people employ buffering syslog daemons (e.g. syslog-ng), it might happen more often.
Fifth, more esoteric, but still real (and really annoying): some system logs will contain two timestamps. If you don't possess in-depth knowledge of this specific log, confusion has a chance to cut the trust as well (so, which timestamp should I use?)
Sixth, most people will not think that they will fall to something that stupid: 24 vs 12 hour time. However, when facing an unknown (and poorly designed!) log format, beware that 5:17 might well be 17:17...
Finally, if you know that something got logged at 5:17AM, then when did it happen? Beware of "Log lag!" This issues is actually to tricky to give it justice here... The simplest example is when the process leaves a log records when it exits not when it starts, possibly days earlier (thus creating a log lag).
As we dive into more issues with timing logs, we also need to think about sequence timing and absolute timing. Sequence of logged events is a critical fact! Miss the sequence and the whole “house of cards” goes … But! Absolute time is also important! Can we be assured of both all the time? (hint: no)
So, when you look at logs next time and you see a timestamp there - start thinking about all this :-)
Thursday, March 20, 2008
So, here is one more piece of note, which has a bizarre quote: "And then there’s the fact that not many companies are aware of the need for log management as an element of compliance."
Really? Is anybody really that ... you know ... dim? I really want to get a copy of a "PCI Compliance" book and slap them with it :-)
Includes are the old faves "Copying confidential information onto a USB memory stick", "Accessing web-based e-mail accounts from a workplace computer", etc.
But there is a truly bizarre one: "Sending workplace documents as an attachment in e-mail." WTH? Is sending them in the body of the email message better? Do they really mean "... to personal email"?
In any case, read it.
If you want to say that I should have written a paper on using text clustering for log analysis or on tricks for analyzing sendmail logs, please say it :-)
Now, it is a sad thing to see a security company "go poof" and I am sure this one had good people, but I think certain market common sense should apply... For example, I know some people who want to launch a DLP vendor. Now, if their data loss "prevention" technology is better than anybody else's, they will probably fail. However, if they looked at the problem from a different angle and solved some of the challenges that nobody can touch (and which are real), now we are talking....
Wednesday, March 19, 2008
"Welcome to the 4th Carnival of the Security Catalyst Community. Each week, a different member of the Security Catalyst Community takes a turn pointing out three to five posts from the community and three links to blog articles by members of the security catalyst community. If you are not a member, but would like to join us, information on the community is included at the bottom of the post."
Here are some forum posts that I enjoyed in the last few days that may also benefit you.
- A day in the life of a lost USB Drive... a scientific approach? covers just that - what risk stem from a found USB drive. I am sure my readers won't pick a stray USB drive and stick it in their PC ... NOW ... will they? :-)
- Compliance Measurement and Verification Solutions covers tools and some fun discussion on whether such tools to "verify compliance" [with what?] are even possible. If you are looking to automate the WHOLE compliance .... keep looking (and see ya in year 3000 :-))
- Do you trust small vendors? made my blood curl first (darn it, why do you trust LARGE vendors? :-)), but there is some discussion on why "small is better" (sometimes)
Here are some recent blog posts from the members of the Security Catalyst Community
Thinking of joining the Security Catalyst Community - here is how:
To create your account, point your browser to: http://www.securitycatalyst.org/forums/ and register an account. Please register using your real full name in the following format: firstname.lastname (we generally use all lower case and separate the names with a period). This is important for our community of professionals. Accounts are reviewed quickly and activated. Your currency of the community is your participation. We look forward to learning from you!"
Past carnivals are:
- Carnival of the Security Catalyst Community for Tuesday, February 26, 2008
- Carnival of the Security Catalysts Community for 03/04/08
- Carnival of the Security Catalysts Community for 03/11/08
Tuesday, March 18, 2008
"Is it safe to continue shopping in your stores?
We have continually devoted significant round-the-clock resources to ensure Hannaford has comprehensive data security systems in place. For example, our security measures meet industry compliance standards and many go above and beyond what is required by industry standards."
Are they alluding to PCI here? I think so ... So, is this a PCI failure? Or this is simply a reflection of the fact that you CAN be 0wned, no matter how many compliance hurdles you overcame....?
Friday, March 14, 2008
As I mentioned before, I received a lot of fun questions from the audience during our "Log Management Thought Leadership Roundtable Webcast" (recording, some comments). Since they would be useful to my readers, I am answering some of them here (questions are anonymous and slightly rewritten for clarity):
Q1: When you mention "forensics", are you speaking in term of legal forensic terminology - or in terms of incident investigation?
A1: When I say "forensics", I usually mean it in the legal sense. I call other investigations simply "incident investigations;" forensics carries an extra burden of proof and seeks to establish facts, not just "good hunches."
Q2: Are there solutions that can handle 2-3 Terabytes of log data per minute?
A2: No. Easy, huh? :-) See this for a specific example. Well, let me take this back: theoretically, you can always use a vendor that can handle a lot of data (like LogLogic) AND that has an ability to run a distributed operation across many appliances. The catch? You will need a lot of the appliances since 2-3 TB/minute is about 90 millions of log messages/second (assuming an optimistic 200 bytes/message)
Q3: I have terabytes of log data but how can be analyzed all this data? Are there products that can process all this data and receive valuable information?
A3: Yes, but you need to ask one question first: analyze why (example reasons here)? To discover something "interesting" (my favorite reason)? To find some specific artifact that you need in the logs? Or for some other reason? Before anybody can answer a question about "are there tools to 'analyze this'?", you'd need to answer that dreaded "why" question.
Q4: We were told to log every access to every SQL database in our environment. Is this even feasible with the best products on the market?
A4: Yes, it is. However, one needs to be extra careful with this. Look at this post for options and ideas. It may turn out that logging every SELECT statement and then collecting those native database logs will not be the best approach (mostly for database performance reasons) and a dedicated tool will need to be used. Database built-in auditing are better used for selective auditing.
Q5: Once logs are captured, and centrally stored, who should be responsible for the management and review of those logs?
A5: Good question! Really, this is a very good question that a) is important to have answered and b) does not have an "accepted," standard answer. It also depends upon what logs are those; let's assume the most complex scenario of a diverse set of logs from networks, systems and applications. So, the choices are: security team (sometimes: CIRT i.e. incident response team), some dedicated team in IT that provides "log services" (uncommon option, but growing in popularity) or some unit in IT that is responsible for regulatory projects (if compliance driven). If your answer is nobody, then you will be in trouble :-) If you answer wrong, you might have to fight to access your own logs (example)
Q6: Most of the discussion so far is about how to get started. What about after the system is deployed? Products tend to focus on collection and not on action or response. Where are the tools heading in terms of usability, incident tracking, collaboration?
A6: That's a long story, really, and it is hard to provide a short answer to this. Yes, collection has been a focus of products in the last few years, but now we are at a point where analysis and various uses of the data will come to the forefront. At the very least, you should be able to run reports and searches on the logs that you collected.
Q7: Do vendors typically offer a template of which logs to collect based the desired use cases?
A7: They should, yes :-) In some cases what you have is a bit of a push-pull between a vendor and a customer: "Tell us what to do?" - "First, you tell us what you would like to accomplish?" - "No, really, you tell me what I should be looking to accomplish." - .... sometimes ad infinitum. Also, for some uses cases it is hard to come up with a credible list (see this discussion about PCI DSS here)
Q8: What are the biggest difficulties when the log management solution is going to to be integrated and deployed in an organization with a lot of different log sources?
A8: Political boundaries and "log ownership issues" (see some discussion here) If you need to submit a paper form in triplicate to add a line to /etc/syslog.conf and then send more forms when something doesn't work right and you need to troubleshoot it (a real story), everything becomes painfully slow and inefficient.
Wednesday, March 12, 2008
In this post called "A silent explosion", they say: "At first sight, logging infrastructure might seem simple, and log management trivial. This might have been true in the past, but nowadays it is unarguably a process of strategic importance, and not only because of the standards or regulations. Information is power, and you cannot guarantee the security of a large IT system without logs. The idea is simple: Collect the logs to a central place, preferably using an encrypted channel. Get proper filtering and archiving. Finally, add some intelligence and analyzing capabilities, and you will know what is happening on your network."
I am so looking for what they will come up with ...
If you need additional motivation (why?), then learn that Mike R called this " The Mogull just laid out your work for the next 10 years." :-)
Following the tradition of posting a tip of the week (mentioned here, here ; SANS jumped in as well), I decided to follow along and join the initiative. One of the bloggers called it "pay it forward" to the community.
So, Anton Security Tip of the Day #14: More access_log Fun: What Are You Not GETting?
In this tip, we will look at some bizarre artifacts that show up in web server access logs today. Here we have a production log from an Apache web server that is full of interesting (and sometimes ominous!) little mysteries that we will investigate in order to determine their impact on security and operational health of the site.
Logs do contain more mysteries than we have time, so we will focus on a few of them: specifically, unusual web request methods. Let's see who is trying to POST or use some other method (OPTIONS, HEAD, PUT or something - see a list here) on our site, instead of just GET'ting the content (GET command is used by web browsers to retrieve the pages, while POST is used to upload content, press buttons, etc - at least in "web 1.0" land - see earlier tip #12 where POST request was found in proxy logs)
Here is one little artifact that attracted my attention due to a POST request vs a web forum as well as a battery of slashes (which actually increases in subsequent request - of which there were many)
10.10.102.250 - - [12/Feb/2008:16:10:50 -0500] "POST /phpBB3////ucp.php?mode=register&sid=e5efaa77a777066c61f71808e9e57b19 HTTP/1.0" 200 14397 http://www.example.com/phpBB3///ucp.php?mode=confirm&id=7640df05c7e24b7acf7a68800fe6dc59&type=1&sid=e5efaa77a777066c61f71808e9e57b19 "Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.2) Gecko/20021126"
10.10.102.250 - - [12/Feb/2008:16:12:29 -0500] "POST /phpBB3///////////////ucp.php?mode=login&sid=e5efaa77a777066c61f71808e9e57b19 HTTP/1.0" 200 9355 "http://www.example.com/phpBB3//////////////ucp.php?mode=login&sid=e5efaa77a777066c61f71808e9e57b19" "Mozilla/5.0 (Windows; U; WinNT4.0; en-US; rv:1.2) Gecko/20021126"
This one really is a mystery; what do we know about it? The server responded to the request OK (code 200), so the POST actually happened. The first request was a request to register with a web discussion board and the second was a request to login. Multiple slashes are actually ignored by the web server, so why put them in the request (no answer)? Also, I think that the User-Agent is spoofed ... do you know why? Finally, if I see something like that in my logs, I will definitely investigate it, primarily due to the fact that Apache responded with 200 OK code.
The next one is so classic it it dumb (and so dumb, it's a classic :-))
10.10.123.226 - - [12/Feb/2008:03:46:54 -0800] "POST /_vti_bin/shtml.exe/_vti_rpc HTTP/1.1" 404 - "-" "MSFrontPage/6.0"
10.10.123.226 - - [12/Feb/2008:03:46:55 -0800] "OPTIONS / HTTP/1.1" 200 20210 "-" "Microsoft Data Access Internet Publishing Provider Protocol Discovery"
It is probably one of the ancient IIS attacks (check out this fun BlackHat preso on that, circa 2003) - why would someone probe for it now is beyond me. In any case, Apache on Linux and "*.exe" don't mix :-)
The final log record is also fun:
10.10.101.222 - - [12/Feb/2008:15:33:22 -0800] "PUT /zk.txt HTTP/1.0" 405 223 "-" "Microsoft Data Access Internet Publishing Provider DAV 1.1"
The above uses a PUT request which is pretty much deprecated now; the purpose of the above is clearly malicious. In fact, modern Apache shouldn't even allow it, thus it responds with code 405 "Method Not Allowed." Nothing to worry about (even though some poor critter got owned with that! BTW, if you follow that link, check out HTTP response code 201 - if you see it in your logs, run! :-))
Overall, this tip teaches to look for unusual request methods (POSTs to strange pages, all PUT, DELETE, OPTONS requests, etc) and then check the response codes to assess the impact. If your web server happily executed such strange request (code 200), that you'd need to dig further. And, you "lucky" :-) and you see the response code 201 "Run for the Hills" (in reality, it stands for "New File Created"), then you can go straight into incident response mode.
Another lesson to learn is that if you see too many POSTs or too many "GET then POST" sequences from the same IP in rapid succession, investigate it since no legitimate access should produce such a pattern...
As further reading, I heartily recommend this paper: "Detecting Attacks on Web Applications from Log Files"
A few representative quotes: "Honestly, no one wants to buy IT security. People want to buy whatever they want -- connectivity, a Web presence, email, networked applications, whatever -- and they want it to be secure." (do they really?)
"And sooner or later the need to buy security will disappear. " (bullshit, I say! :-) - analogous 'some day the need to have police will disappear...')
"It will disappear because IT vendors are starting to realize they have to provide security as part of whatever they're selling. " (year 3000?)
"IT is infrastructure. Infrastructure is always outsourced. And the details of how the infrastructure works are left to the companies that provide it." (hmmmm... is your information infrastructure? no!)
Mike R comments on that (here): "But the idea that the answer is neither and that outsourcing will be the death knell in the security business is interesting, but ultimately wrong. [...] Trying to wait for Big Security to die would give new meaning to the long and slow goodbye."
Enraged? Think he is pushing it too far? Being illogical? Me too :-) I don't think TJX example just goes and "disproves" it; we don't really know how it works with breaches and stock prices (some say 4-8% down, some say none, some say 'major impact', whatever...)
He then clarifies: "But let me point out that TJX has attributed $200 million in direct costs to this breach. It is easy to surmise this is bigger than just about anyone’s security budget. In TJX’s case some well known security practices and a little security spending would have avoided this whole incident."
Overall, a fun read. Still, I think breach impact assessment and breach's impact on anything (much less the stock price...) is not really well-defined or understood yet ...
Tuesday, March 11, 2008
Many of the "usual suspects" were there; some of the "die-will-you-die- already ... please" vendors made the showing (probably by selling those newly unneeded chairs to pay for the booth space).
I love to talk to people in the same or adjacent markets as LogLogic (euphemism for "competitors" :-)), some are friendly and you can have a fun and insightful conversation with them (with neither of us disclosing any deep and dark secrets about our solutions ...), others are obnoxious and think you are "out to steal their brochures."
However, the most fun part will definitely happen on Thursday - a Log Management Summit. MISTI folks planned a few very fun panels; will there by a vendor fight? A mud-slinging match? We'll see ...
As you know, I have long been on a quest to save the world from having to write long and ugly regular expressions (regexes) for log analysis. Back in 2005 (post, big discussion that ensued) and later in 2007 (post, another big discussion that again ensued), I have tried to poll people for approaches that convert logs into useful information without messing with massive quantities of regular expressions as well as performed some research on my own. In all honesty, I didn't notice a major breakthrough.
Until now? Here ("prequel" here and follow-up here) is what looks like an interesting and major development along that line. Indeed, one can automate the processing of some "self-describing" log formats (name=value pairs, comma/tab delimited with descriptive header, sequential names and values [yuck!], XML, etc) to obtain a semblance of structured data (not just a flow of text logs) from logs without any human involvement.
But is that an endgame, that "holy grail" of log analysis or yet another step towards it? First, bad logs break it (e.g. with space in names or values with spaces and without quotes) and thus call for a return of a human logging expert to write an even fancier regex that can deal with it (then again, bad logs often break human-written rules as well). Second, there is a more important issue that I will bring up. So, if logs contain "user=jsmith" we can certainly learn a new piece of info (that the "user" was probably "jsmith"). But what if they contain "bla_bla=huh_huh" - and we don't know what "bla_bla" and "huh_huh" mean? Do we really have more information at hand if we tokenize it as "object called 'bla_bla' has the value of 'huh_huh'" compared to just having a single blurb of text "bla_bla=huh_huh." I personally don't think so - but I've been known to be wrong before :-)
So, let's review what we have: I decided to organize the current approaches to logs in the form of this table (hoping to start a discussion!)
|Text Indexing||Field Extraction (Algorithmic)||Rule-based Parsing (Manual)|
|Pros||Easy - no human effort needed: just collect the logs and go||Easy - no per-log effort on behalf of the log analyst (but some creative code needs to be written)||Hard - an expensive logging expert must first understand the logs and then write the rules; normalization across devices implies having a uniform data store for logs|
|Cons||Output is low quality information; rather, a flow of raw data (needs more analysis)||Mixed - some new information emerges, but not in all cases (and you can't predict when) |
In general, no cross-device analysis is enabled ('user' is not the same as 'usr' in other log)
|High-quality output: tables, graphics, summaries and easy correlation across diverse log sources (highly useful information!)|
So, what can we conclude? It is too early to retire the human-written rules (so people will still have '\s' and '\w' coming up in bad dreams... :-)), but this automated approach should definitely be used on the logs that will "allow you to do it to them." :-) Personally, I am also very happy that somebody is thinking about such matters ...
The salvo is the paper called “The Fallacy of Information Security ROI” by Jon Pols ("ISSA Journal", February 2008) where Jon argues against the ROI for security (since there is no money earned by security, just saving which are NOT the same thing); Jon proposes "security as insurance" model which, in all honesty, I am not too comfortable with (since security doesn't "pay you back" after the breach).
ROI proponents "hit hard" in return: 'One is Jos Pols who, in his recent article “The Fallacy of Information Security ROI” in the February 2008 issue of the ISSA Journal (membership required to access link resource), claims that one cannot have a return where there is no income. .' They next bring back the "return in the form of savings" (which many disagree with ...): 'this is an overly restrictive view of the meaning of the word “income.” The avoidance of potential losses redounds to the bottom line, as does revenue, so that a cost saving is a return on an investment.' Read the whole pro-ROI counter-point here.
Previous "ROI War" is cataloged here. A new one is upon us? Unholster your handguns, charge the lasers, enrage your attack hamsters - hurraaaaaaaah!!!!! :-)
Monday, March 10, 2008
If any of my readers are at the conference and would like to meet, drop me an email or something :-)
Friday, March 07, 2008
My next fun logging poll is here - please vote! It is about tools for centralized collection of Windows Event Log from servers and other systems. One of the somewhat surprising discoveries from my previous poll was that few people look at Windows logs; this poll drills down into it.
UPDATE: just looked at the results collected so far, and I would like to say this: why - oh - why some people want to turn an honest research effort into a vendor war? Ye bastards, :-) you know who you are ...
Past logging polls and their analysis:
Thursday, March 06, 2008
This poll on looking at logs poll was relatively popular; lets see what we can learn (live results are also here).
First, what are the top 3 log types that people look at? They are:
- Unix/Linux server syslog
- Web server logs
- Firewall logs
How does that compare with the top 3 log types that people collect (see picture showing results from my previous poll below)?
- Unix/Linux server syslog
- Firewall logs
- Web server logs
Huh? They are the same - doesn't it just make sense? What are the possibilities here?
a. People only collect the logs they plan to look at, OR
b. People look at logs they collect (duh!).
Strangely, I find a) unlikely; I think most people collect more than they can review and that the incident/issue response and compliance needs drive collection more than review or analysis.
Another observation is that all of the "big 3" log types are useful for security, operations and compliance and not just for security (like NIDS/NIPS logs). Is that why they are so popular?
Second, I was fearful that "I only look at whatever logs needed for the incident/issue investigation" will win. It didn't!!! This to me indicates that proactive log review is not as unpopular as I feared. Good! It is working.
Third, obviously, nobody (well, 4%...) looks at all logs they collect.
Fourth, much more people look at Unix/Linux logs than Windows server logs (factor of 3x); this is not entirely unexpected and my next poll will drill down into this.\
Finally, I am SHOCKED that people don't look at NIDS/NIPS logs (only 11% do). People, what's wrong with you? :-) Why have you deployed those beasts if you don't look at what they produce? Then again, maybe you haven't :-(
Next poll coming up!
On an unrelated note, Hoff's comments on "McGovern's "Ten Mistakes That CIOs Consistently Make That Weaken Enterprise Security" are very fun too. Example quote: "Mistake 3: Putting network engineers in charge of security: When will you learn that folks with a network background can't possibly make your enterprise secure." Read on!
Monday, March 03, 2008
I saw this idea of a monthly blog round-up and I liked it. In general, blogs are a bit "stateless" and a lot of good content gets lost since many people, sadly, only pay attention to what they see today.
So, here is my next monthly "Security Warrior" blog round-up of top 5 popular posts and topics.
- Finally, one post I wrote this month bumped the "anti-virus saga" from the #1 popular spot: Welcome to the Platform Club! :-) post discusses requirements for a log management platform (and makes fun of some folks in the process ...)
- Now pushed to the #2 spot, next is the topic of anti-virus efficiency. Here are the posts: Answer to My Antivirus Mystery Question and a "Fun" Story, More on Anti-virus and Anti-malware, Let's Play a Fun Game Here ... A Scary Game, The Original Anti-Virus Test Paper is Here!, Protected but Owned: My Little Investigation, A Bit More on AV and Closure (Kind of) to the Anti-Virus Efficiency/Effectiveness Saga
- Next are again my Top11 logging lists: Top 11 Reasons to Collect and Preserve Computer Logs and Top 11 Reasons to Look at Your Logs (the third list, Top 11 Reasons to Secure and Protect Your Logs, was not quite that popular - I long argued that, sadly, few people care about log security yet). A new one was also added to the list: Top 11 Reasons to Analyze Your Logs. Check it out!
- PCI compliance is still all the rage! So, MUST-DO Logging for PCI? post was propelled to a place in my Top5 popular posts list. It discusses the fact that there is no "easy list" of what you MUST do to comply.
- My logging polls are hot as well. Specifically, the analysis of my newest poll (Logging Poll #5 "Top Logging Challenges" Analysis) is popular.
See you in March - I will continue to make logs popular, research new log analysis methods and make fun of some people (of course!) :-)
Possibly related posts / past monthly popular blog round-ups:
- Monthly Blog Round-Up - January 2008
- Monthly Blog Round-Up - December 2007
- Monthly Blog Round-Up - November 2007
- Monthly Blog Round-Up - October 2007
- Monthly Blog Round-Up - September 2007
- Monthly Blog Round-Up - August 2007