disLEXia: Sa, 27. Okt 2001

Saturday, October 27, 2001

Risk of monoculture and exponential false AV positives
I'd like to point out two related risks: the risk of monoculture and the risk of a potential exponential increase in spurious collisions between legitimate software and anti-virus.
First, I'll summarize a complaint (on a mailing list) from a consultant: a popular AV (anti-virus) software package may be disallowing operation of normal software as being possibly viral. Of course, the "safe" solution The AV chooses is to disallow some file access by the offending software.
This simplistic, inflexible default is exacerbated by similar inflexibility on the part of the IT group which tends toward monoculture, admittedly in the face of overwhelming complexity. By monoculture, I mean restricted support of or interest in any software outside a narrow list of approved vendors.
The consultant uses a niche product with which the IT department is unfamiliar, therefore they lack the competence to check out his claim of innocence so he must assume the burden of proof. Furthermore, he has no authority to conduct a simple test, switching the anti-virus off and on again to show that it, not his software, is the problem.
The risk of monoculture is further raised by the speculation that the the AV conflict may be caused by his software directly writing files with binary data instead of using a more standard, and increasingly more common, access method such as ODBC.
This leads us to the 2nd risk: (possibly) exponentially increasing AV false positives.
I once had a similar problem with an AV: an optimization I was running triggered a virus warning and stopped the run. I suspected that the bit pattern of an intermediate file was matching that of a "known virus", so I shortened the inputs to the optimization by the least significant digit, thus slightly changing these intermediate values, and it ran without a problem after that. Fortunately I knew my results were not sensitive to such a small change.
As in the case above, I was using specialized, niche, software. However, the other risk this illustrates is the realization that the number of false positives from AV is the product of 2 numbers: how may different signatures (indicators of known viruses) being checked and the number of different intermediate results any software may produce.
Both of these factors are increasing over time. This increase may be exponential (in the loose sense) because, at first glance, this likelihood of collision resembles the Birthday Problem. This is the well-known, non-intuitive result that there's about a 50% chance that 2 people, out of a random group of 25 or 26, will share a common birthday.
Similarly, the chance of a spurious AV hit depends on the product of the linear increase of the 2 factors mentioned. [Devon McCormick via risks-digest Volume 21, Issue 73]
0:00 # G!