AVLab.pl Advanced In-The-Wild Malware Test in March 2026 including effectiveness analysis and telemetry

Disclaimer
  1. This test shows how an antivirus behaves with certain threats, in a specific environment and under certain conditions.
    We encourage you to compare these results with others and take informed decisions on what security products to use.
    Before buying an antivirus you should consider factors such as price, ease of use, compatibility, and support. Installing a free trial version allows an antivirus to be tested in everyday use before purchase.

Adrian Ścibor

From AVLab.pl
Thread author
Verified
Well-known
Forum Veteran
Apr 9, 2018
290
3,110
469
Poland
avlab.pl
Dear Community

We have published the results for March 2026. The article has been updated with additional technical details. In each series, we will show how different solutions perform in a console or other central incident management platform.

In this edition, we’re also including Surfshark One with antivirus as an all-in-one solution, as well as CatchPulse Pro and Microsoft Business + EDR.
  1. Publication: Advanced In-The-Wild Malware Test in March 2026 including effectiveness analysis and telemetry » AVLab Cybersecurity Foundation
  2. Results: Recent Results » AVLab Cybersecurity Foundation
I would also like to add that we began testing EDR-XDR solutions in March. This process will continue for a few more weeks, with results expected in June or July 2026.

The evaluation includes a range of factors:
  • such as telemetry,
  • process correlation,
  • host linking in lateral movement,
  • command-line visibility,
  • AI assistant evaluation,
  • attack chain visualization,
  • and many others that are useful for the Security Operations Center.
You can still sign up for the tests.
 
Dear Community

We have published the results for March 2026. The article has been updated with additional technical details. In each series, we will show how different solutions perform in a console or other central incident management platform.

In this edition, we’re also including Surfshark One with antivirus as an all-in-one solution, as well as CatchPulse Pro and Microsoft Business + EDR.
  1. Publication: Advanced In-The-Wild Malware Test in March 2026 including effectiveness analysis and telemetry » AVLab Cybersecurity Foundation
  2. Results: Recent Results » AVLab Cybersecurity Foundation
I would also like to add that we began testing EDR-XDR solutions in March. This process will continue for a few more weeks, with results expected in June or July 2026.

The evaluation includes a range of factors:
  • such as telemetry,
  • process correlation,
  • host linking in lateral movement,
  • command-line visibility,
  • AI assistant evaluation,
  • attack chain visualization,
  • and many others that are useful for the Security Operations Center.
You can still sign up for the tests.
Thanks for sharing this update.

Notable points in this March 2026 edition
  • The report now includes additional technical details, which makes the results more useful for deeper analysis.
  • You are showing not only protection outcomes, but also how products perform through a central console or incident management platform.
  • The inclusion of Surfshark One with antivirus, CatchPulse Pro, and Microsoft Business + EDR broadens the comparison beyond traditional standalone antivirus products.
  • The planned EDR/XDR evaluation sounds especially relevant, because visibility and response workflow are often just as important as raw detection.

Why the EDR/XDR part matters
For many readers, the most valuable part may be the operational side rather than just whether malware was blocked.

Features such as:
  • telemetry quality,
  • process correlation,
  • lateral movement visibility,
  • command-line monitoring,
  • attack chain visualization,
  • and AI assistant usefulness
can make a major difference in real-world investigations, especially for managed environments and SOC teams.

A product can have strong blocking rates but still be difficult to investigate with if the console is weak, noisy, or missing context. On the other hand, good telemetry and correlation can significantly reduce response time.

A useful caution for readers
As always with this kind of testing, it is best to interpret results in context:
  • Protection results are important, but they are only one part of the picture.
  • Console design, alert fidelity, and investigation workflow can affect day-to-day usability.
  • Business products and consumer products are not always directly comparable because their goals and deployment models differ.
  • EDR/XDR results can depend heavily on configuration, policy tuning, and analyst skill.

Overall
This looks like a valuable expansion of the test methodology, especially the move toward evaluating detection visibility and investigation capability instead of focusing only on prevention. That should make the results more useful for both security enthusiasts and professional defenders.

Publication
Results
 
Mwhh have got a lieftime license of MBAM, third test in a row that MBAM is not the laugh of the day, but matches protection of well known brands (y)

Or should i wait (replacing Defender on wife's laptop), because "one swallow does not make a summer"?
You know, there is nothing wrong with running BOTH Defender and MB AT THE SAME TIME. Malwarebytes has a setting, "Always register Malwarebytes in the Windows Security Center". Just make sure that is turned OFF. I've been running both for many years, no problem.
Acadia
 
The interpretation of this test is more complex.
This test does not indicate that Avast or Trend Micro were worse in the wild in the testing period than other AVs.
AVLab correctly awarded all AVs (including Avast and Trend Micro) that missed 0-1 samples.

Only one AV (SurfShark) was not awarded (no Excellent mark). This means that there is a statistically significant chance that SurfShark was also slightly worse in the wild in the testing period.

AVLab samples are, on average, slightly older (the test runs once a day) than AV-Comparatives Real-World tests. Older samples cause fewer missed samples and require a higher award threshold.
In AV-Comparatives tests, the award threshold is roughly 99% (usually 0-4 missed samples). In this AVLab test, the threshold is greater than 99.5% (probably 99.7%, which gives 0-1 missed samples).
 
Last edited:
The interpretation of this test is more complex.
This test does not indicate that Avast or Trend Micro were worse in the wild in the testing period than other AVs.
AVLab correctly awarded all AVs (including Avast and Trend Micro) that missed 0-1 samples.

Only one AV (SurfShark) was not awarded (no Excellent mark). This means that there is a statistically significant chance that SurfShark was also slightly worse in the wild in the testing period.

AVLab samples are, on average, slightly older (the test runs once a day) than AV-Comparatives Real-World tests. Older samples cause fewer missed samples and require a higher award threshold.
In AV-Comparatives tests, the award threshold is roughly 99% (usually 0-4 missed samples). In this AVLab test, the threshold is greater than 99.5% (probably 99.7%, which gives 0-1 missed samples).
Your description is a bit misleading in my opinion.

Our test isn’t run once a day at a specific time instead, it runs 24 hours a day 30 days a month, constantly monitoring for new samples to test in live Windows 11 environment. It’s not a static test.

As for the differences with AV-C, I have confirmation that, unfortunately, they do not provide specific evidence to vendors or at least not to all of them, so a vendor cannot determine whether a sample was detectedive, outdated, or failed to function properly up to a certain chain attack.

I don’t know what you mean by “AVLab samples are, on average, slightly older (the test runs once a day) than AV-Comparatives Real-World tests.” -> You don’t know that, because only AVLab provides the SHA of the samples. No other lab does that. So, evidence please :)
 
You've caused the TEST DATE to be missing from the CSV file in the Community's monthly summary - here's a sample from March :)

Zrzut ekranu 2026-04-17 184405.png

We will enter the exact date of each tested sample into this CSV file. This is the kind of feedback vendors receive:

Zrzut ekranu 2026-04-17 184508.png

Please note that we are the most transparent lab.
 
Last edited by a moderator:
MS defender is known for predominance of post-execution than pre-execution stopping of threats, may be related on relying mainly on the cloud, as it is shown in July 2025 test results.
Capture.JPG

Surprisingly, two quarters later, MS defender became not relying on the cloud that much, as its pre-execution detection predominate in Jan 2026 test results!
Capture2.JPG
 
I don’t know what you mean by “AVLab samples are, on average, slightly older (the test runs once a day) than AV-Comparatives Real-World tests.” -> You don’t know that, because only AVLab provides the SHA of the samples. No other lab does that. So, evidence please :)

Here is why I posted this:
  1. AV-Comparatives staff use their own crawling system to search continuously for malicious sites and extract malicious URLs. They also search manually for malicious URLs.
  2. One of the MT members had contacted the AV-Comparatives staff and posted that they test samples shortly after gathering them.
  3. You can calculate the average detection rate of awarded AVs to see that, for the same AVs, it is significantly lower for AV-Comparatives compared to AVLab. This suggests that AV-Comparatives uses fresher (more demanding) samples.
However, I am not going to insist on fresher samples. You can replace "fresher" with "more demanding". So, on 400 samples, the top AVs usually miss 0-4 samples in AV-Comparatives tests instead of 0-1 samples.
Similarly, SE Labs uses even "more demanding" samples than AV-Comparatives.

There is nothing wrong with using "less demanding" samples in tests. Simply, the requirements for awards must be more demanding, as you did in the AVLab test. We talked about it a few months ago in a private conversation.
By the way, what is the current award threshold in the AVLab Advanced In-The-Wild Malware Tests?
 
Last edited:
MS defender is known for predominance of post-execution than pre-execution stopping of threats, may be related on relying mainly on the cloud, as it is shown in July 2025 test results.
View attachment 297207

Surprisingly, two quarters later, MS defender became not relying on the cloud that much, as its pre-execution detection predominate in Jan 2026 test results!
View attachment 297208

In previous tests, the Firefox web browser was used, which does not support Microsoft Defender's "Block at first sight". So most files were detected on launch. I posted about this issue in AVLab threads.
Now, the Opera web browser is used, which supports "Block at first sight". So, the downloaded executables are automatically checked against the Microsoft Cloud backend with no user interaction.
 
You've caused the TEST DATE to be missing from the CSV file in the Community's monthly summary - here's a sample from March :)

View attachment 297203


We will enter the exact date of each tested sample into this CSV file. This is the kind of feedback vendors receive:

View attachment 297204

Please note that we are the most transparent lab.

The vendors and I like this very much.:)(y)
 
AVLab samples are, on average, slightly older (the test runs once a day) than AV-Comparatives Real-World tests. Older samples cause fewer missed samples and require a higher award threshold.
In AV-Comparatives tests, the award threshold is roughly 99% (usually 0-4 missed samples). In this AVLab test, the threshold is greater than 99.5% (probably 99.7%, which gives 0-1 missed samples).
All testing agencies apply a representative set of prevelence malware. The determination of prevelence is not a golden standard, leading to differences in testsets composition/assembley (maybe independant testing organization 1 uses one sample of a certain family and another independant testing organization uses 4 of that same family while both claim they use a representative set based on prevelence).

What we only can see is the number of malware samples used and the average failure or block percentage per testing organization, so lets have a look at that.

AV-Test
When I look at the AV-test real world test the number of samples used in the februari test (spanning januari and februari) were 285 with an industry average block percentage of 99.7 percent, so over 140 samples per month causing 0.3 percent is 0.426 samples missed.

AV-Comparatives
When I look at AV-Comparatives real word test had 200 live cases in AV-Comparatives real wordt test in februari and march. I calculated the average block percentage and it was 99,2%, so with 100 samples a month on average 0.799 samples missed.

AV-labs
When I look at AV-labs publishes in March a real world test with 17 AV-solution using 421 samples and (i counted the misses) only 4 samples were missed meaning a 99,94% blockrate.

Conclusion:
Based on the blockrate percentages AV-comparatives has the most demanding samples, followed by AV-Test (on a distance neraly half as demanding) followed by AV-Lab (half the distance to AV-Test as AV-Test to AV-comparatives), ASSUMING they use the same malware sample prevelence (but that is most likely not)

@Andy Ful you are right about AV-labs having less demanding samples, but we don't know whether they are older (and looking at the high number of samples collected by AV-labs that seems an unfair assumption). We simply can't compare their test sets. AV-labs earns a compliment that it publishes hashes to make it transparant.
 
Last edited:
I calculated (Virus Total data) the average lifetime on 33 samples tested on March 1 and 2. I skipped 7 very old samples (1-6 years old) and included 26 samples 0-7 days old.
Average lifetime ~ 3 days.

Here is data from the AVLab file with hashes:
0-1 day ......................9 samples from 33 samples tested on 1 and 2 March.
2 days old:
sample 19................4b16baf674e02084875303e4ae72066d7b6431340efe58a37b7840eb36b6a026
sample 29................f0210da3603f43d66ac5fd9ce665c9bd544cdb66ec47d30b03b8191a6c1c09dc
3 days old:
sample 23................b80ac13edccd2ca442221d7b22a6e8c7eb98688426582d38bf4ebbd5e0ab265b
sample 25................4f0c95a1885411100649bf8150c2f189dc0941ac569b801b3765d1ca64b760dc
sample 27................1bd50faa3a761deff7f9b94efb42cac9ee9b074e5296ec721a089f34af9e972a
sample 28................98b8e1e32402b26c6b508bf39528d1f7d7a671bd038302a87514c3d446996a16
sample 30................0d5b21c50ace8394dd64e9be86a4187362e6fe07af4be39ea5ed8c1d6fe937d0
4 days old:
sample 24................670c2800abc04f1612d73f0df5f5040f6f348a55464dac69039e5ef1ecc79575
5 days old:
sample 22................2ec9acdb6e5ea802f5cb40e00ab7e973f3b1abae4033ba46b79e10af695d9e6c
6 days old:
sample 07...............e3d30c2ff0e0307a5be519bf058390990b7a66014f25cbd75dee744804bf2ca9
sample 20...............5ff84b7c231f6c4cb4a2377ae60e07cc6f6062a32c5a7a7c7b0396d599b7e1a7
sample 21................b91cebec72ba934cde8ee67e8c4135c8c558d8ff46a70d3fdca83d9c37dd377b
7 days old:
sample 08...............287f87db7206d01932a38c7971c3b658e5e2fc932dff378c18bff88e215338b1
sample 09...............0e7d48636d29a59361c13d60dcc16aff14a0e0b16f8a1dd346825a8b139e0ef7
sample 12...............4da6fe90bd0c2ba8f7bf419991cc1f86762c9a23fb9bc24f581b9d8050320a09
sample 14...............ebd21ac4ac71e466c1441dd998895dc5f9567d3ca999a30762f6028dfc59b4d5
sample 33...............1b5c10f6101a90728b135abfd9879da78400b7939f0dbd2c851b30ef51c8276d

More than 1 year old:
sample 10................237e360ee95e03d3d287c1efda0abcb8557b897336f6abe7863015642a985e3a
More than 2 years old:
sample 11................29955ba1e2193047ee5f4561445f81e218ae4de1a295f8fd296ad536bf381f17
sample 16................84b24071b0229e189f03bc643027a63c582b02f6e96e82d730e12793cfcd9abb
More than 3 years old:
sample 17................3cc739bb1882fc9dbb056f39ebe4965771aeca0ceb44e85da39d1ba7dade693f
sample 18................602dbcf4008c585582d5e5d5c8ddb1932fdee07a14308e9cbf937904f31df1f7
More than 13 years old:
sample 13................8a99353662ccae117d2bb22efd8c43d7169060450be413af763e8ad7522d2451
More than 6 years old:
sample 15................85bd47cc708f80a3e9aebc5948404017053eec1c316f2c3b527011f19597ab1f

Except for the very old samples, the rest look like typical Real-World scenario samples.
So we cannot be sure whether "more demanding" also means "fresher".
Maybe the problem is that almost all are EXE samples. The second reason could be a crawling system that can more quickly gather samples unknown to AVs than hotspots and other malware repositories used by AVLab.
 
Last edited:
@Andy Ful

First thank you for taking the time to check the sample age in the 33 samples of AV-Labs (y)
- 27,7% were zero day samples
- 72,3% was older, divided in
a) 51,0% was 1 to 7 days old
b) 21,3% was a year or older

Using the block percentages of the three leading testing organizations, this leads to an interesting observation (see disclaimer)

AV-Test uses (in their real world test)
- 40% zero day samples
- 60% older, but less than 4 weeks old samples (they state this explicitly in their test results)

AV-Comparatives uses (in their real world test)
- 65% zero days samples
- 35% older (than a day) samples (but less than two weeks?)

I don't know for sure but I once read in WS forum when Peter Stelzhammer was accused of using old samples, when they introduced the real world tests. Peter Stelzhammer explained their testing configuration using a lot of PC's with their own internet connection. The collected fresh samples are first checked whether it is real malware in a separated (non VM) PC and tested within an hour against AV's. In that post he also mentioned the maximum age of two weeks (so they drop older samples, but those samples are used in their consumer offline+online protection tests).

Disclaimer: zero day guestimate is based on "block percentage distance" (assuming zero days are the discriminating samples*) with only one observation (Andy's analysis) and one series of test results (the first quarter test results of AV-Labs, AV-Test and AV-Comparatives), makes this extrapolation a wobbly guesstimate. The block percentages of AV-lab (99.94%), AV-Test (99,7%) and AV-Comparatives (99,2%) clearly indicate how demanding (to use Andy's word) the collected samples are.

*) This assumption seems fair when you look at the results of the "older than 4 weeks" results of AV-Test which are nearly always 100%
 
Last edited: