Monday, June 29, 2009

Free Log Data For Research - Update

This WASL 2009 workshop reminded me that I always used to bitch that academic researchers use some antediluvian data set (Lincoln labs 1998 set used in 2008 “security research”  makes me want to just curse and kick people in the balls, then laugh, then cry, then cry more…).

However, why are they doing it? Are they stupid? Don’t they realize that testing their “innovative intrusion detection” or “neural network-based log analysis” on such prehistoric data will not render it relevant to today’s threats? And will only ensure ensuing hilarity :-)

Well, maybe the explanation is simpler: there is no public, real-world source of logs that allows comparison between different security research efforts.

Correction! There wasn’t. And now there is!

I hereby acting on my promise to share my collection of real-world logs, mostly collected from systems in the honeynets I ran in 2004-2006.  As of now, if you need logs for research, please contact me  or get them directly here.

Here is the description of the collection currently shared (more to come!):


Size: 100MB compressed; about 1GB uncompressed

Date collected: 2006

Type: Linux logs /var/log/messages, /var/log/secure, process accounting records /var/log/pacct, other Linux logs,  Apache web server logs /var/log/httpd/access_log, /var/log/httpd/error-log, /var/log/httpd/referer-log and /var/log/httpd/audit_log, Sendmail /var/log/mailog, Squid /var/log/squid/access_log, /var/log/squid/store_log, /var/log/squid/cache_log, etc.

License:  public; use for whatever you want. Acknowledging the source is nice; Beerware license is even better.

Sanitization: No sanitization or modification was performed. No additional sanitization is required before use for research.


So, for now, if your research requires real-world logs with normal operation data, suspicious data, anomalous data and attack data – grab it here.

UPDATE: I have created a Google Group log-sharing to  notify those interested about the shared logs. Please sign up here. The  purpose of the group is to notify about new logs shared, discuss the shared logs, collect references to research that uses the logs, post requests for more logs, discuss the events observed in logs, etc.

UPDATE2: the logs are now hosted here, courtesy of one of my readers who prefers to remain anonymous. Thanks A LOT for hosting the logs! Despite the fact that the logs are fully public now, I suggest you still sign up for the Google group as I will announce new log sharing there.

Possibly related posts:

Dr Anton Chuvakin