AV-Comparatives APT Detection Coverage Test 2026

Disclaimer
  1. This test shows how an antivirus behaves with certain threats, in a specific environment and under certain conditions.
    We encourage you to compare these results with others and take informed decisions on what security products to use.
    Before buying an antivirus you should consider factors such as price, ease of use, compatibility, and support. Installing a free trial version allows an antivirus to be tested in everyday use before purchase.

Just finished reading the report.
There's an important note to be fed to the consumers of these reports.

This test mainly measures how well antivirus products detect already known and extensively analyzed APT tools rather than how they would perform against new or unknown threats. Since the dataset is based on publicly documented APT groups, the samples have already been reverse engineered, studied, and integrated into threat intelligence feeds, which means vendors have likely had significant time to build signatures and behavioral detections for them. This creates a bias where detection rates appear very high, not necessarily because the products are highly effective against real APT activity, but because they are being tested against threats that are already well understood. As a result, the methodology ends up reflecting how well products recognize historical malware rather than how resilient they are against novel intrusion techniques or evolving attacker behavior in real-world scenarios.

Take it with a grain of salt, as always.
But this does not explain how Panda scored better than big names against those "old" samples!
 
But this does not explain how Panda scored better than big names against those "old" samples!
Major antiviruses are responsible for maintaining enormous signature databases. Old samples, like the ancient ones used in this particular APT detection test, no longer circulate in the wild, so the cost of preserving exact signatures for them easily outweighs the benefits.

It only makes sense that they periodically merge specific old signatures into broader generic or heuristic ones, disable legacy scanning routines, or retire signatures entirely for malware that's considered extinct or extremely rare.

Apparently, Panda does something that just so happened to give them an advantage in this test against unrealistically old samples.
 
This test mainly measures how well antivirus products detect already known and extensively analyzed APT tools rather than how they would perform against new or unknown threats.

Phase 4 of the test includes modified samples. Although those samples are created by well-known tools, the modified samples can still bypass the AV signature detection. Furthermore, this part of the test can partially show if the AV training malware base is complete (for example, if samples/methods related to old APT tools are included in training).

It seems that some AVs, like Trend Micro, are focused on modern threats. So, Trend Micro can still be effective as a protection, but could be bypassed more easily if the attackers started reusing older threats (modified or not).
 
Last edited:
you know who of this forum :) people use it but do not mention it.

You can live without LiveGuard but the Folder Guard is really a great addition to the Premium. I don’t know which people at ESET decide the features to be implemented in specific versions but the Folder Guard should be available in the Essential version also. Unless & until ESET get their heads together, I won’t be renewing it because other Av’s like Norton and AVG provide ransomware protection even in the basic & free versions.
I do not know what ESET did to their latest release but I have experienced A LOT more LiveGuard involvment in the last few weeks than I had in years of using it. I had about 15 LiveGuard events in 2 weeks. So it does work and out of those 15, 5 were real, 10 were PUA.
 
Regarding old samples and other considerations - I checked report again and it seems that we might be wrong about the purpose of this test.

In "About this test" section of report it is stated:
"The motivation behind this study was the broader question of how consistently consumer security products detect law-enforcement or state-developed surveillance tools and whether detection gaps may arise from technical limitations rather than intentional exclusions."
And later:
"Beyond overall detection performance, the evaluation examined whether APT groups from certain regions were more or less likely to be detected and whether any correlation existed between the origin of a security vendor and that of the APT group . This question is of particular interest as public reporting on specific region-aligned threat actors has decreased in recent years , a development influenced by shifting geopolitical and national-security considerations."
So objective of this test was not how good specific vendor will perform but rather if some vendors won't deliberately detect some samples and if they could be correlated to APT group from same country. As described in Phase 5 description:
The purpose was to identify potential factors contributing to persistent misses, such as possible correlations between the geographical origin of the security vendor and that of the APT group.
And findings:
These findings suggest that more recent campaigns might be less extensively represented in public threat-intelligence sources or training datasets. As a result, behavioural or heuristic detection might be less effective when such samples are modified.

Across both original and modified samples, no meaningful connection could be identified between the country of origin of an APT group and that of a security vendor within the limits of this dataset and test design. Missed detections were distributed across a wide range of regions , indicating that the remaining detection gaps are likely driven by technical factors , such as sample diversity, behavioural visibility, and detection-engine design, rather than geographical alignment.

No clear indication emerged that any specific APT group was systematically avoided or consistently undetected by particular products.
 
The discussion in this thread was focused on the overall detection performance, which was only a part of the test.
However, the test results cannot be easily translated to show the overall detection performance because the samples were of a special kind.
That is why Trend Micro scored poorly in this test despite the fact of decent scores in AV Comparatives Real World tests.
It is worth noting that a similar situation for Trend Micro can be seen in AV Comparatives Malware Protection tests.
The results of all those tests can confirm that in the case of some AVs (like Trend Micro), the authors' conclusion is true: "the remaining detection gaps are likely driven by technical factors, such as sample diversity, behavioural visibility, and detection-engine design, rather than geographical alignment."
 
The test results for Phase 4 (file execution) are consistent with a simple scenario.
All AVs have very similar (or identical) detection for current known threats (original or modified samples).
Some AVs have significantly worse detection on old samples (years 2010-2021), especially when modified.

This test does not show detections of current unknown threats. Such threats are tested in the Real-World tests and make a difference.

Edit.
The modified known samples have different signatures, but are known to AVs via Machine Learning (behavior-based detections).
It seems that some AVs do not use old samples to train behavior-based detections.
 
Last edited:
The test results for Phase 4 (file execution) are consistent with a simple scenario.
All AVs have very similar (or identical) detection for current known threats (original or modified samples).
Some AVs have significantly worse detection on old samples (years 2010-2021), especially when modified.

This test does not show detections of current unknown threats. Such threats are tested in the Real-World tests and make a difference.

Edit.
The modified known samples have different signatures, but are known to AVs via Machine Learning (behavior-based detections).
It seems that some AVs do not use old samples to train behavior-based detections.
And which av or avs do you consider had the best results, according to your criteria?
 
Meaning those vendors with lower score are "deliberately" not detecting?
Not necessary. Low or high detection rate does not play main role here.
Vendor could achieve high detection rate, but would not detect samples from specific APT group or groups from specific country. That was what they were looking for. But according to report they did not find any correlations.
 
And which av or avs do you consider had the best results, according to your criteria?

I did not introduce any new criteria. The results are included in the report. I only explained the differences presented for Phase 4 of the test.
Are you interested in the concrete results (Phase 1, 2, 3, 4)?
 
No, I wanted your opinion.

Except for phase 4, the results do not require comment.
After Phase II, the vendors improved databases, so the current detections are presented in Phases III and IV.
In my opinion, the winners should be those AVs that consistently scored > 99.5% (Avira and Kaspersky). Close after those two is Microsoft.

1774114315456.png
 
Except for phase 4, the results do not require comment.
After Phase II, the vendors improved databases, so the current detections are presented in Phases III and IV.
In my opinion, the winners should be those AVs that consistently scored > 99.5% (Avira and Kaspersky). Close after those two is Microsoft.

View attachment 296512
I don't mean to be difficult, but what was the basis for this result: Avira > Kaspersky > Microsoft?
 
  • Like
Reactions: simmerskool
I don't mean to be difficult, but what was the basis for this result: Avira > Kaspersky > Microsoft?

It is noted in my post:
"those AVs that consistently scored > 99.5%" (Phase III and IV).
I did not post: Avira > Kaspersky > Microsoft, but: Avira, Kaspersky > Microsoft.
 
It is noted in my post:
"those AVs that consistently scored > 99.5%" (Phase III and IV).
I did not post: Avira > Kaspersky > Microsoft, but: Avira, Kaspersky > Microsoft.
I left out a line because I didn't want to set up an order of magnitude. But, thank you for your answer.
 
Last edited:
Actually I'm quite impressed by K7, especially offline. Seems they have a solid BB (yes, I'm fully aware of it's limitations).
Guessing the tests were done in Default config, tweaking might further improve detection, but might also lead to FP, the BB was quite aggressive to my remembrance.
 
I left out a line because I didn't want to set up an order of magnitude. But, thank you for your answer.
A similar result will be for the results > 99%.(y)
The difference in the AV results < 1% is related to random factors caused by choosing a small testing set of samples from all samples in the wild.
It is kinda similar to the expected error in the election poll.
 
Last edited:

You may also like...