AVLab.pl Analysis of system protection against active online malware – July 2025

Status
Not open for further replies.
Disclaimer
  1. This test shows how an antivirus behaves with certain threats, in a specific environment and under certain conditions.
    We encourage you to compare these results with others and take informed decisions on what security products to use.
    Before buying an antivirus you should consider factors such as price, ease of use, compatibility, and support. Installing a free trial version allows an antivirus to be tested in everyday use before purchase.

Will there be more AV solutions in these tests?
I don't see Fsecure, Avira, AVG, Bitdefender and McAfee.
And if they don't, why not?And if they don't, why not?

You've already mentioned Kaspersky, so it's clear why there aren't any.

We usually test the solutions you have indicated at least a few times a year. Except for AVG = Avast. Avira does not want to participate in our tests, but we will see in 2026.

Check out previous editions:
results archive.png
 
If they have the revenue and strong position (which they do) they can subscribe to threat intelligence or sign a 2-way partnership deals for threat intelligence exchange. But an AV strength and Kaspersky strength is not about how much they will block when it’s already known, it is in how much of the unknown they can block and how quickly. In this relation, Kaspersky has always done very very well.
The problem I see and I have been a K user is the question: 'At which point does loosing bunch of files to a "legitimate" vendor becomes as malicious as bad actor data exfiltration?'. What if you are working on some Intellectual Property code which K doesn't know so it uploads it to their servers? How much do you trust a foreign entity? Especially a foreign entity that doesn't give a crap about IP? Since what? Are you going to sue Kremlin?
 
Eugene is a Russian billionaire oligarch. If he wanted to, he could move the entire company outside of Russia. However, he's never going to do that because his primary recruitment pool for personnel is ex-KGB and FSB personnel, which makes the Kaspersky companies problematic.

Norton and McAfee rely heavily upon Windows PC OEM agreements, at least for the consumer sector. Both are well represented out there in userland with the one offering the OEM the best deal for derived profit being the most popular.

Norton/Symantec is a still a thing on enterprise and government systems.

AV popularity and market penetration are driven largely by world region "proclivities" and political ideologies. For example, it comes as no surprise that one of the most popular AV in Deutschland is Avira.

The reasons for antivirus popularity by country or global region are complex. Many times cost is the primary determinate. Then after it is political beliefs and other stuff that consumers use as criteria to pick-and-choose the products.

The thing about Norton is that if it does not have good market traction in one region, it is compensated for by the market share of another one of GenDigital's products - such as Avast.
Eugene could always move the company from Russia but then he would have to live in a windowless rooms.

McAfee is not going anywhere. Trilix is on 3L agencies systems and it has been since if I recall 2008 prior to that was Symantec
 
How do you think we can improve the tests to make them more interesting? What data would you like to see from the tests in the files that are available for download to the community? (apart from LOLBINs, per-sample protection comparisons in CSVs)
Here are some thoughts to consider

Exploit Protection Metrics

While you're testing against in-the-wild malware, it would be awesome to see more specific data on how solutions handle exploits and vulnerability-based attacks. For example, a metric that shows how well a product prevents a zero-day exploit from dropping its payload, or how it responds to common attack vectors like malicious Office macros or a compromised browser. This goes beyond just file-based detection and gets into the core of how a security suite's behavioral engine works.


Resource Impact (CPU, RAM, Disk I/O)

For many users, especially those with older hardware or who are on a budget, system performance is a big deal. A security suite that offers great protection but slows down their computer to a crawl isn't a practical option. Including a metric in the downloadable data that shows the average CPU, RAM, and disk I/O usage during the test, both at idle and during active scanning, would be super helpful. This would let people balance protection with performance.

False Positive Rates (with context)

This is a huge one. A product can have a perfect 100% detection rate, but if it's constantly flagging legitimate files or applications as malicious, it's a nightmare to use. Providing a list of false positives, along with a brief explanation of why the product flagged the file, would be invaluable. This transparency helps users understand the "cost" of a more aggressive protection stance. You could even create a separate metric for this, similar to what AV-Comparatives does with their False Positive Test.

Network-Level Protection Data

You mentioned C&C connections in your initial post, which is fantastic. To build on that, it would be cool to see a breakdown of a product's network-level protection. For example, a metric that shows how many malicious URLs or IPs were blocked before any malware was even downloaded. This highlights the effectiveness of a product's web and network shields, which are a critical first line of defense.

Offline Protection Scenarios

What happens when the system isn't connected to the internet? Many threats are still able to execute or spread on an air-gapped network. Testing how a product's signature-based and heuristic engines perform without a cloud connection would provide a more complete picture of its capabilities.
 
Offline Protection Scenarios

What happens when the system isn't connected to the internet? Many threats are still able to execute or spread on an air-gapped network. Testing how a product's signature-based and heuristic engines perform without a cloud connection would provide a more complete picture of its capabilities.
Most classified systems are air-gapped. The route of infection would typically be external storage, assuming that rigorous standard security protocols are adhered to habitually. For the few that are isolated classified LANs there's sometimes connected backup and network storage, but this varies by a nation's security requirements for such machines and networks. (The security requirements can be extremely onerous and expensive - as in 5 million Euros for a small classified LAN) and require regular, routine inspections and audits. Other classified networks are logically separated by KVM switches or hardened switch-router-hardware I&A combos.

AV-Comparatives does occasional off-line testing and the results reveal the dependence upon cloud. Online detection rates plummet from the high 90%s down to the 80%s.

I get the infection route testing, but when it comes to these systems executable code virtually never comes directly from the internet. So the only effective test is execution from the desktop or executable storage.

As far as emanation attacks on air gapped systems they are more or less eavesdrop/electronic warfare types of attacks - as opposed to attempts to get malware onto the systems. There were POCs back in the day such as Tempest and Van Eck. Plus others.

The required proximity (effective distance) from the target system has grown substantially over the decades as sensor technology has been refined and made more effective.
 
I think the test should be tailored to the auidence.

When the audience is businesses, the only thing that matters is the capabilities against zero days (which will typically be exploit hardening, emulation, application control and such generic methods). Everything else provides very little value. We got no use seeing how much known malware they block through signatures and fuzzy hashes.

When the audience is home users, zero day handling is not of extreme importance, home users get infected largely through browsing the web (potentially through an email). This is why downloads are usually screened more aggressively (vendors try to balance convenience with detection) and in this case testing in non-realistic scenarious (against scripts that were derived from fake Captchas sent to 5 hotels, malware packs and collections, modified samples and so on), provides an augmented view of what the solution will actually do for the home user.

Now when not tesing the detection isolated however, that's also not correct. Relying on strong web filters just to cover up mediocre 'everything else' is not right.

A perfect test should focus on both - check how well the solution is doing in a real world scenario and also, quick check of the anti-malware capabilities on their own.
 
  • +Reputation
Reactions: simmerskool
Most classified systems are air-gapped. The route of infection would typically be external storage, assuming that rigorous standard security protocols are adhered to habitually. For the few that are isolated classified LANs there's sometimes connected backup and network storage, but this varies by a nation's security requirements for such machines and networks. (The security requirements can be extremely onerous and expensive - as in 5 million Euros for a small classified LAN) and require regular, routine inspections and audits. Other classified networks are logically separated by KVM switches or hardened switch-router-hardware I&A combos.

AV-Comparatives does occasional off-line testing and the results reveal the dependence upon cloud. Online detection rates plummet from the high 90%s down to the 80%s.

I get the infection route testing, but when it comes to these systems executable code virtually never comes directly from the internet. So the only effective test is execution from the desktop or executable storage.

As far as emanation attacks on air gapped systems they are more or less eavesdrop/electronic warfare types of attacks - as opposed to attempts to get malware onto the systems. There were POCs back in the day such as Tempest and Van Eck. Plus others.

The required proximity (effective distance) from the target system has grown substantially over the decades as sensor technology has been refined and made more effective.
With the rise of remote work and business travel, employees often use their devices in places with unreliable internet access, such as on planes, in remote areas, or in hotels with spotty Wi-Fi. In these situations, the endpoint's security relies entirely on its local, on-device capabilities. A strong offline protection suite ensures that these devices remain secure from threats like malware from a compromised website or a malicious file downloaded before the connection was lost.

There are also supply chain attacks that can introduce malware into a system through a trusted third party, such as a software update or a new hardware component. This malware may be dormant or may not require an internet connection to execute its malicious payload. A security product with robust offline heuristics and behavioral analysis can detect this type of threat, even if it has no network communication and is not yet known to cloud-based threat intelligence databases. Testing this capability allows you to determine how the product would perform in a supply chain attack, where the malicious code is already present on the system.

The viability of offline protection extends well beyond a single-use case. It is a fundamental component of a comprehensive security strategy that accounts for unreliable connectivity, critical infrastructure protection, the realities of modern workforces, and the evolving nature of sophisticated attacks.
 
The viability of offline protection extends well beyond a single-use case. It is a fundamental component of a comprehensive security strategy that accounts for unreliable connectivity, critical infrastructure protection, the realities of modern workforces, and the evolving nature of sophisticated attacks.
The only truly effective protection is very robust, skillfully crafted default-deny that covers all the intricacies and "holes" in an operating system and applications - blocking unauthorized network traffic and code execution in the first place before it is manually vetted and approved. The other part is disabling all unneeded functionality (Microsoft, for example, expects enterprises and governments to disable unneeded stuff but it doesn't advise consumers that they should do the same). But the ordinary citizen is incapable of coping with that level of cyber hygiene.
 
The only truly effective protection is very robust, skillfully crafted default-deny that covers all the intricacies and "holes" in an operating system and applications - blocking code execution in the first place before it is manually vetted. The other part is disabling all unneeded functionality (Microsoft, for example, expects enterprises and governments to disable unneeded stuff but it doesn't advise consumers that they should do the same). But the ordinary citizen is incapable of coping with that level of cyber hygiene.
They don't wanna give you this default-deny because user wants to use stuff and they wanna use it now. There are some implementations of default-deny and historically there were more. Norton in June 2009 released the 2010 beta and included a detection initially called RESER.Reputation.1 (later renamed to WS.Reputation.1). It was a typical default-deny implementation that covered a wide range of formats and removed the files that don't satisfy trust checks. This caused a billion subjects on tens of forums, and trillions of complaints, before eventually Norton suspended this detection. Developers specially cried their eyes out as if it was impossible to add one folder to exclusions and just compile your executables there.

When they suspended it, their protection levels dropped. Now Trend Micro (recent file warning or whatever they call it), Webroot old edition, Kaspersky and perhaps a handful of others have some implementations of that. There are other efficient security methods as well, but usually for home users they are not provided. Home users work best with one "let it run and dwell on it" setup. This setup is not proven to be a failure.
 
  • +Reputation
Reactions: simmerskool
A perfect test should focus on both - check how well the solution is doing in a real world scenario and also, quick check of the anti-malware capabilities on their own.
With security feature configuration inter-dependencies, it can be a difficult problem to test protections in an isolated manner - unless the software publisher is cooperative and extremely forthcoming with technical details (which virtually never happens because of their penchant for "security through obscurity"; "Oh, those technical infos are too sensitive and therefore confidential - so get lost tester").
 
They don't wanna give you this default-deny because user wants to use stuff and they wanna use it now.
Well, that's a society and people problem. Not a security solution problem.

Home users are only comfortable with a little bit of security. They are not OK with the type of security that they actually need.

Signature and filtering-based security will never go away because it caters to hooman nature - which the average hooman is intrinsically disinclined to cope with the difficult and is lazy.
 
The only truly effective protection is very robust, skillfully crafted default-deny that covers all the intricacies and "holes" in an operating system and applications - blocking unauthorized network traffic and code execution in the first place before it is manually vetted and approved. The other part is disabling all unneeded functionality (Microsoft, for example, expects enterprises and governments to disable unneeded stuff but it doesn't advise consumers that they should do the same). But the ordinary citizen is incapable of coping with that level of cyber hygiene.

Many consumer security tests, like the "Real-World Protection Test" from AV-Comparatives, assume a constant, stable internet connection. However, that's not the reality for many users. Think about a student on campus with spotty Wi-Fi, a remote worker on a train, or someone who downloads a file on their phone and transfers it to their PC via a USB drive. In these cases, the first line of defense is the on-device, offline protection. A test that includes these scenarios would provide actionable data for users with unreliable or no internet access.

By testing a product's offline capabilities, you're not just measuring its access to a cloud-based threat intelligence database. You're actually evaluating the fundamental effectiveness of its signature-based detection and, more importantly, its heuristic and behavioral analysis engines. These are the core technologies that allow an antivirus to detect threats it has never seen before, without needing to "phone home." This provides a deeper insight into the vendor's technology and its ability to handle zero-day threats or threats introduced via unconventional vectors, like a malicious external hard drive.

AV-Comparatives already does some offline testing, and their results show a clear trend, online detection rates almost always outperform offline rates. The difference, however, can be significant. By making this a more prominent and repeatable part of the test, AVLab would highlight this critical metric. For example, a test could include a section with results for both "Online Protection Rate" and "Offline Protection Rate," allowing consumers to make an informed decision based on their specific needs. This transparency empowers users to choose a product based on its foundational technology, not just its cloud-based features.
 
By testing a product's offline capabilities, you're not just measuring its access to a cloud-based threat intelligence database. You're actually evaluating the fundamental effectiveness of its signature-based detection and, more importantly, its heuristic and behavioral analysis engines. These are the core technologies that allow an antivirus to detect threats it has never seen before, without needing to "phone home."

This is more complicated. Most AVs use heuristic + behavioral analysis both locally and in the cloud. The comprehensive analysis is currently time&resource-consuming and available only in the cloud. On the contrary, the local analysis depends on already trained models and does not consume much time and resources.
By testing products offline, one can only see how important the cloud backend is for a particular AV. You will not get information on how strong the heuristic and behavioral engines are. Of course, such information about local abilities can be important for some users and worth testing.
 
This is more complicated. Most AVs use heuristic + behavioral analysis both locally and in the cloud. The comprehensive analysis is currently time&resource-consuming and available only in the cloud. On the contrary, the local analysis depends on already trained models and does not consume much time and resources.
By testing products offline, one can only see how important the cloud backend is for a particular AV. You will not get information on how strong the heuristic and behavioral engines are. Of course, such information about local abilities can be important for some users and worth testing.
The way I see it, the offline test isn't meant to be the definitive measure of an AV's effectiveness.

Instead, it's a way to assess a very specific aspect of its design, its resilience and performance when its cloud resources are unavailable.

Think of it as two separate, but equally important, metrics.

Online Performance

The most crucial test. This is the AV's full potential, leveraging all of its local and cloud-based resources to provide the best possible protection. This is the scenario that reflects most users' day-to-day experience.

Offline Performance

This is the 'stress test.' It's a way to measure how well the on-device engines can handle a threat on their own. This is a crucial data point for users who may have unreliable internet connections or for scenarios where malware might be designed to disrupt cloud communication.

By combining the results of both tests, you get a much more complete picture of the product's overall capabilities and its reliance on a constant internet connection. I completely agree that the overall strength of the heuristic and behavioral engines is best judged when they're working in tandem with the cloud backend.
 
This is more complicated. Most AVs use heuristic + behavioral analysis both locally and in the cloud. The comprehensive analysis is currently time&resource-consuming and available only in the cloud. On the contrary, the local analysis depends on already trained models and does not consume much time and resources.
By testing products offline, one can only see how important the cloud backend is for a particular AV. You will not get information on how strong the heuristic and behavioral engines are. Of course, such information can be important for some users and worth testing.
There are even business solutions with 0 local protection capabilities or some database of local hashes. For example Panda and WatchGuard only keep 50mb of very prevalent malware hashes. So testing offline will most likely turn into a disaster.
CrowdStrike is another business product that offline won’t really get you far (not to say nowhere at all). Webroot as well, pretty much no capabilities offline

There are others with more hybrid approach and some that use online queries in rare instances when local intelligence is inconclusive. Then there are the sgnature-heavy ones. When you disconnect these, they miss updates and their abilities quickly degrade.

In any case, offline tests are not extremely valuable. Even if the file is coming from backup, flash drives and so on, how did it enter the backups or flash drive? Someone must download it and put it there.
 
Here are some thoughts to consider

Exploit Protection Metrics

While you're testing against in-the-wild malware, it would be awesome to see more specific data on how solutions handle exploits and vulnerability-based attacks. For example, a metric that shows how well a product prevents a zero-day exploit from dropping its payload, or how it responds to common attack vectors like malicious Office macros or a compromised browser. This goes beyond just file-based detection and gets into the core of how a security suite's behavioral engine works.
From the point of view of our methodology, the system, applications, and browser are updated daily (except for the system), so effective exploitation is almost impossible in accordance with patch management practices.

See 8.4, 8.5, 8.6 in the methodology ---> Methods Of Carrying Out Automatic Tests » AVLab Cybersecurity Foundation

To be honest, I also see a problem here with the lack of availability of exploits in the wild.

Resource Impact (CPU, RAM, Disk I/O)

For many users, especially those with older hardware or who are on a budget, system performance is a big deal. A security suite that offers great protection but slows down their computer to a crawl isn't a practical option. Including a metric in the downloadable data that shows the average CPU, RAM, and disk I/O usage during the test, both at idle and during active scanning, would be super helpful. This would let people balance protection with performance.
This test cannot be part of the Advanced In-The-Wild Malware Test. I think AV-C does this better; we do not have the know-how to perform this test. In addition, interest from vendors may be low, and the costs of maintaining the server and machines to carry out this test may be high. This would require financial support from the community. We are not funded by the government like other lab.

False Positive Rates (with context)

This is a huge one. A product can have a perfect 100% detection rate, but if it's constantly flagging legitimate files or applications as malicious, it's a nightmare to use. Providing a list of false positives, along with a brief explanation of why the product flagged the file, would be invaluable. This transparency helps users understand the "cost" of a more aggressive protection stance. You could even create a separate metric for this, similar to what AV-Comparatives does with their False Positive Test.
To be honest, we have it on our map, and it can be done faster than you think. I think it's the main point in the next major update.

We have plenty of FP in the wild, so adding, for example, 1% of clean installers from the entire malware set in each edition could be interesting.

Network-Level Protection Data

You mentioned C&C connections in your initial post, which is fantastic. To build on that, it would be cool to see a breakdown of a product's network-level protection. For example, a metric that shows how many malicious URLs or IPs were blocked before any malware was even downloaded. This highlights the effectiveness of a product's web and network shields, which are a critical first line of defense.
We're already doing it – see the PRE_LAUNCH level :)

- web-level protection
- IP/URL reputation
- on-the-fly scanning
- payload scanning, e.g. Check Point + ZoneAlarm
- on-access scanning (but before launch)

All of this is first-layer protection, “network protection,” as PRE_LAUNCH in our Advanced In-The-Wild Malware Test.

Offline Protection Scenarios

What happens when the system isn't connected to the internet? Many threats are still able to execute or spread on an air-gapped network. Testing how a product's signature-based and heuristic engines perform without a cloud connection would provide a more complete picture of its capabilities.
By cutting off access to the AV network / cloud scanning, you also cut off access to the malware internet. Technically, many samples will not work, and it will be difficult to prove the usefulness of such a test. This test was more valuable a decade ago or more, when cloud development was not as widespread. Note that even if you want to scan your system online, you need to download the latest signatures.

OFFLINE protection, I think this should be a feature that cuts off all processes from the Internet except for processes of AV/EDR. Offline isolation is available in business solutions.

This is more complicated. Most AVs use heuristic + behavioral analysis both locally and in the cloud. The comprehensive analysis is currently time&resource-consuming and available only in the cloud. On the contrary, the local analysis depends on already trained models and does not consume much time and resources.
By testing products offline, one can only see how important the cloud backend is for a particular AV. You will not get information on how strong the heuristic and behavioral engines are. Of course, such information about local abilities can be important for some users and worth testing.
Very good opinion, which is why most manufacturers will not go for this type of testing, because you cut them off from their infrastructure.

The way I see it, the offline test isn't meant to be the definitive measure of an AV's effectiveness.

Instead, it's a way to assess a very specific aspect of its design, its resilience and performance when its cloud resources are unavailable.

Think of it as two separate, but equally important, metrics.

Online Performance

The most crucial test. This is the AV's full potential, leveraging all of its local and cloud-based resources to provide the best possible protection. This is the scenario that reflects most users' day-to-day experience.

Offline Performance

This is the 'stress test.' It's a way to measure how well the on-device engines can handle a threat on their own. This is a crucial data point for users who may have unreliable internet connections or for scenarios where malware might be designed to disrupt cloud communication.

By combining the results of both tests, you get a much more complete picture of the product's overall capabilities and its reliance on a constant internet connection. I completely agree that the overall strength of the heuristic and behavioral engines is best judged when they're working in tandem with the cloud backend.

There was also an opinion about malware with modified SHA to test 0-day protection:

Please note that our tests also include 0-day threats, but it is practically impossible to show which file is a 0-day threat for a given AV. We do this in general by providing protection after launch, i.e. POST-LAUNCH:

- it can be assumed that many of these samples are 0-day for the manufacturer, but this also depends on the protection in the browser, whether it is implemented or weak, such as Comodo/Xcitium,

- for AV1 and AV2, the same 0-day file may be 0-day, but it does not have to be, because they may have different signatures. The same file downloaded from a URL/IP may be on the blacklist of URLs, or it may be downloaded, so for one it will be a PRE result and for the other a POST LAUNCH result.

The modified sample is no longer from the in-the-wild set, so the test cannot be classified as Real World. In addition, such a modified file cannot be delivered from the original URL/IP—it must be a different protocol. In addition, such a modified file cannot be delivered from the original URL/IP – it must be a different protocol. This is a separate test, but it can be automated. We can add it to the road map and implement it in the future.

We have already considered your opinion in AMTSO, and there are more questions than answers in the case of modified samples, but the topic is still open.

Additionally, there is another problem, because depending on the placement of empty bytes, moving or adding empty bytes may result in different detection – sometimes missing, sometimes additional. What's more, why not extend the test to include false positives for clean files with additional bytes? It's getting messy :)
 
Last edited:
Additionally, there is another problem, because depending on the placement of empty bytes, moving or adding empty bytes may result in different detection – sometimes missing, sometimes additional. What's more, why not extend the test to include false positives for clean files with additional bytes? It's getting messy :)
In the cases of TLSH (or SSDEEP potentially) usage, depending on bytes and distance allowance, it could be the same detection. Yes, it can be a new one as well.

That's just too much work and modifying samples by playing with bytes doesn't really create zero-days.
 
Status
Not open for further replies.