Statistical Mirages

I have to admit that I’ve always been suspicious of statistics as they are used in computer security.  Oddly enough, I also should have been suspicious of statistics in other sciences as well.  Turns out that when you examine large datasets with lots of tests, simply by random chance you are likely to find something, i.e., one of the tests is likely to come up with something “significant.”  Hence you get results that suggest breathing in bus exhaust is good for you.  In other words, you get false positives.

If anomaly detection is ever to become a fundamental defense technology, it will have to move beyond statistics to being grounded in the mechanisms of computers and the real behaviors of users.  This is going to take a while, because this is a lot harder than just running a bunch of tests on datasets.  Of course, given the current disrepute of anomaly detection in security circles, perhaps the door is wide open for better approaches.