Antimalware Testing is Hard, Disputing a Flawed Test is Even Harder
First thing’s first—I’d like to introduce myself. I’m senior security analyst Randy Abrams, and I’m delighted to be part of the Webroot team and our online community.
Prior to joining the team, I was a research director responsible for analyzing and reporting the test results of antimalware products. I also helped create test methodologies and looked for anomalies in the testing process. Before that, I worked as a director of technical education, where my role was very similar to my mission here at Webroot: to help all users stay more secure on the internet. Everything else I do is the means to meet these ends.
Earlier in my career, I spent 12 years at Microsoft working closely with researchers from major antimalware companies, as well as from several smaller antivirus companies, to ensure we did not release infected software. As a result, I bring a unique perspective to my role here at Webroot. I am a consumer (I use Webroot on my laptop). In the past, I have been an enterprise customer, worked at a test lab, and served on the vendor side of the industry.
Testers are Human. They Do Not Always Get it Right.
One of the most contentious parts of working in the security industry is antimalware testing. I used to joke that the reason antimalware testers are so arrogant is because antimalware researchers created them in their own image. Relationships between the testers and vendors have improved quite a bit in the past few years, but there still is a lot of friction. Scoring poorly on a test not only affects sales, but when the reason for the poor score was due to mistakes in testing, users do not get the quality of information they need to properly compare products.
Unfortunately, antimalware testing is really, really hard to get right. And it is not because of incompetence. You will never hear me say, or even imply, that testers are incompetent. The reason that testing is so hard to get right is because antimalware products have become so complex. There are interdependencies on cloud-based protections, reputation systems, whitelisting and blacklisting, scanning and remediation, and in some cases, like ours, system rollback. Additionally, sample acquisition and selection are problematic.
In the past, I have seen some serious mistakes resulting in products offering high-quality protection appear mediocre or worse. For vendors, a deeper problem arises when trying to dispute the test results. The public tends to think that testers always get it right and that vendors are just whining because they didn’t receive the score they thought their product deserved. No vendor was ever acquitted of a bad test in the court of public opinion.
In coming blogs, I will discuss some of the challenges testers face, and the impact these have on accurately presenting the information needed to make informed choices when selecting security software. As a former research director for a test lab, I dealt with issues, like the selection of samples, that could seriously impact reported results. Sample selection and acquisition is much more difficult than it would seem. When you see vendors score 100 percent, then the sample selection was too small.
When measuring performance, the ratio of file types (exe, .bmp, .mp3, .docx, etc.), as well as compression methods used in the file set, make a huge difference in real-world performance results. Even the files selected for false positive testing can affect the perceived quality of a product. Which is more important, detection of a file we see attacking tens of thousands of users, or the file we saw three times six months ago and never again? Given equal protection scores, would you rather have a product with one false positive or one with three false positives? These are a few of the issues that affect the quality of testing and understanding of what was actually tested. Believe it or not, one of the biggest problems with testing is not the results, it is the lack of meaningful analysis.