"Log analysis is something that all systems administrators know they need to do. Many of us come to this point, either because there is a problem, there is a security requirement from the organization, or it keeps you up all night wanting to know what is going on in all of that data. Looking for best practices for log analysis on The Internet is difficult at best. Many years ago, I discovered a script that hashed log files by removing all of their numbers and replacing them with "#" characters. The results of this simple algorithm were phenomenal, logs could be reduced by a factor of ten. This was much more readable, yet left much of the quality data that I needed to determine if there was a problem. In the years since I discovered that simple algorithm, I have come to discover many techniques on text analysis which are commonly used in linguistics and anthropology to analyze natural languages. This has led me to develop very simple best practices for analyzing logs."
Download links broken, no release since 2011, and lots of unresolved/ignored issues reported.