In the current Advanced-In-The-Wild Malware Test methodology, the EXCELLENT certificate was awarded to products that achieved at least 99% effectiveness in detecting and blocking threats in a given test edition. However, practice and analysis of data from multiple iterations show that this threshold does not fully reflect the real differences between the most effective solutions and those that only appear to be at the same level. The difference between 99% and values close to 100% can only be significant with very large samples, because only then does the margin of statistical error become small enough for such differences to be measurable.
In the case of unique in-the-wild threats that we use in our research, the actual number of samples that can be tested on a monthly basis is limited by both the activity of cybercriminals and the physical time required to analyze each sample. Assuming that the tests are run continuously—24 hours a day for 30 days—and that one sample requires a maximum of about 9 minutes of analysis in a Windows environment, to which additional delays related to task automation, system restarts, log parsing, and screenshot generation, the actual processing time for a single sample is approximately 10-11 minutes. This means that the throughput of the test is inherently limited and it is not possible to achieve large sample numbers.
We are raising the threshold from 99% to 99.6% for the Excellent certificate
To better separate the most effective products and avoid overinterpreting minimal differences resulting from chance, we are raising the EXCELLENT threshold to 99.6%.
The new value better reflects the statistical value and minimizes the risk of misclassifying products in situations where discrepancies of 0.1-0.3 percentage points may result from the natural variability of test samples. This will make the EXCELLENT certificate more accurate and the classification fairer for both manufacturers and recipients of the results.
IMPORTANT! Proposal for a new threshold:
Raising the EXCELLENT threshold to 99.6% is statistically and practically justified. The 100% threshold should not be treated as an absolute criterion, as it would only be reliable if the tester had the ability to:
- test every existing and historical variant of malware,
- guarantee the absence of any errors on the part of the laboratory.
In reality, 100% effectiveness in a limited sample set does not guarantee 100% effectiveness across the entire threat population. The result is always subject to randomness, as the product may simply be “lucky” with the selected sample set. All it takes is for the next 10 random samples to be different and the result could fall below 100%.
For this reason, one of the most reliable approaches is clustering, in which products achieving similar levels of effectiveness are placed in one cluster and receive the same rating, without artificially creating a ranking where the differences are not statistically significant.
Grouping results – examples
- Product A scores 100% in the test (which may be a sign of luck in this particular edition of the study).
- Product B scores 99.9%.
- Product C scores 99.7%.
The differences are minimal and perhaps statistically insignificant. Based on a single study, we cannot say with certainty that product A is actually better than B and C, and that product B is clearly better than product C.
Therefore, after analyzing many examples, we propose the following threshold:
99.6–100% → highest cluster with EXCELLENT certification
Maintaining a single threshold avoids a situation where products with very similar results are artificially placed in different quality categories. The difference between 99.6% and 99.5% is only 0.1 percentage points, and with such values, it is impossible to reliably conclude that one product is actually better than another. Such a small difference is most often due to the natural randomness of malware samples, rather than the actual technological advantage of one of the manufacturers.
Therefore, there is no second ranking within the group of the best products in the certification. The classification is binary:
- If you meet the 99.6% threshold → you receive an EXCELLENT certificate
- If you do not meet the threshold → you are not eligible for certification.
Introduction to the CI (Confidence Interval) table
To justify the choice of 99.6% as the minimum score for the EXCELLENT certificate, the confidence intervals (95% CI) for different sample sizes ranging from 200 to 1000 samples are presented below.
The table shows how the margin of error (passing a given number of samples through the tested product) changes at the required effectiveness level of 99.6%. This allows us to assess whether the result falls within a given range and whether differences of 0.1-0.3 percentage points can be attributed solely to the natural variability of the samples.
The confidence interval provides additional justification for choosing the 99.6% threshold and, in our opinion, confirms that this is a stable value that is resistant to random fluctuations depending on the number of samples in a given edition of the study.
Taking these confidence intervals into account confirms that differences of 0.1-0.3 percentage points fall within the natural variability of test results and should not lead to differentiation between products with similar effectiveness.
From 2026, the 99.6% threshold will be the minimum level of effectiveness required to obtain EXCELLENT certification in the Advanced-In-The-Wild Malware Test. This is a stable and statistically justified value that allows for a reliable and fair assessment of products in the highest effectiveness class.

