Testing real-time protection of antiviruses with 10.124 Sample

Running a malware sample directly from the desktop is a fundamentally flawed testing method for the same reason a simple script is, it ignores the context of the attack.

Sometimes this can depend on the purpose of the test. It is a common method of showing concrete weakness or vulnerability in protection. Such a test is a warning that the presented attack vector can be used as part of a dangerous attack in the wild. Of course, those tests usually do not affect the overall efficiency of the AV. Unfortunately, they are commonly misunderstood, similarly to Malware Protection tests.
 
Last edited:
When those 10.000+ samples hit my computer while surfing, i let you guys know immediately,
Your computer likely won’t be hit by a single one of them. Whoever has the know-how and the well developed software, blocked them. From an archive or not, on the desktop, in documents, scanned, not scanned, the samples were blocked and cleaned… by whoever is able to do their job.

The remaining few need special conditions, ifs and buts.

Of course, excuses can always be made, when one wants to make them.
 
Sometimes this can depend on the purpose of the test. It is a common method of showing concrete weakness or vulnerability in protection. Such a test is a warning that the presented attack vector can be used as part of a dangerous attack in the wild. Of course, those tests usually do not affect the overall efficiency of the AV. Unfortunately, they are commonly misunderstood, similarly to Malware Protection tests.
Thank you for your well-articulated response. I completely agree that individual module testing serves a vital purpose. My only reservation is with the practice of declaring a product a failure based solely on these isolated results, rather than on a more holistic and comprehensive evaluation.
 
Then i ask to you. How we supposed to test 10124 sample? I really ask to do right.
No matter how you do or don’t do the test, people that have attachments to security software (Microsoft Defender, Eset and so on) will always find a way to criticise your test, your methodology, your samples, results and so on, except in the cases where you show the results they wanna see. In this case the same methodology that is now declared “flawed” will be very interesting and the most right one.

These people, you can easily recognise them, just create a thread about the said solution, and 2 in the morning they will pop-up on the thread in 5 min.

If the methodology can’t be criticised, they will say there is randomness, the difference is statistically insignificant and so on and so on. When one wants to support their choice, they can always find arguments.

Even the professional tests referenced, when convenient are being given as a reference, when not convenient are being criticised.

I suggest that you first:
-Scan the samples. Let the antivirus remove whatever it can remove. Which will probably be >90%.
-Execute the rest. Manual execution is recommended, however in your case with so many malware samples, not sure how it will happen.
-Scan the system and see if compromised.
 
Last edited:
It is worth saying something about the peculiarities of such testing. The in-the-wild protection mainly depends on non-signature features because most malware samples are short-living. On the contrary, the 1-4 week old malware is mainly detected by signatures. So in the second case, the test results are dominated by the completeness of signatures. As we can see, the reason that makes the AV a winner in the test is not the main reason for being a winner in the wild.

In the test, the signature detection is roughly contained in the blocked samples (although not the same).
In the test, Avast initially blocked significantly more samples than other competitors, so it had a greater chance of missing fewer samples after successful execution (and winning the test).
Is it possible to conclude that Avast should also win in the next similar test? No, it is not.
Here are the results of AV-Comparatives (on roughly 10000 samples in each test) from the period September 2023 - March 2025:

Vendor .......................... Missed samples
Avast ............................. 3+5+7+4 = 19
Avira .............................. 3+5+4+2 = 14
Bitdefender ................... 2+8+2+3 = 15
Eset ............................... 4+7+6+5 = 22
Kaspersky ..................... 3+10+5+0 = 18

In fact, any of the above AVs has great chances to be a winner in the next test.
The results for Microsoft are slightly worse:
Microsoft ......................5+6+16+6 = 33
This can suggest that Microsoft does not care as much about the signature completeness as some other AVs (although far better than Trend Micro).
 
Last edited:
I clicked on freeze.com for a screensaver. I got about 40 viruses at once. some were blocked. some got through.

some were held back. some that were held back, tried again and some were blocked and some got through.
These situations are happening every day to real people.
 
  • Sad
Reactions: simmerskool
I clicked on freeze.com for a screensaver. I got about 40 viruses at once. some were blocked. some got through.

some were held back. some that were held back, tried again and some were blocked and some got through.
freeze.com is not a screensaver website. It is a cybersecurity solution provider. You sure that it was freeze.com?
 
Same usual back-and-forth arguments about AV test results, AV performance, and whether or not the results reflect a correct test methodology and the tested product capabilities. Like this is the 49,287,199th iteration of this on MT.

I mean, it is honest rinse, repeat. The same points repeated the Nth time, like a derange AI bot that just repeats iterations over-and-over.

Quite honestly, disputing or doubting AV test results - particularly Youtube tester tests - does not achieve anything. Few of the general public (as in perhaps only a handful) will ever arrive here at MT with or without an account and read the multi-page, long-winded thread disputes and debates about how the test methodology and the results are inaccurate or mis-characterize the tested products.

Some feel compelled to have these discussions, but I've learned from experience that its just not worth the effort. The sole exception is that @Adrian Ścibor does care and considers others' perspectives.

With utmost respect for everyone - my comments here are not meant to offend anyone, but what I am stating is true...
 
freeze.com is not a screensaver website. It is a cybersecurity solution provider. You sure that it was freeze.com?
In the late 1990s and early 2000s, freeze was known for providing free screensavers, desktop wallpapers, and other PC customization utilities. Google acquired it in 2005 and it has since been repurchased by a security company.
 
Same usual back-and-forth arguments about AV test results, AV performance, and whether or not the results reflect a correct test methodology and the tested product capabilities. Like this is the 49,287,199th iteration of this on MT.

I mean, it is honest rinse, repeat. The same points repeated the Nth time, like a derange AI bot that just repeats iterations over-and-over.

Quite honestly, disputing or doubting AV test results - particularly Youtube tester tests - does not achieve anything. Few of the general public (as in perhaps only a handful) will ever arrive here at MT with or without an account and read the multi-page, long-winded thread disputes and debates about how the test methodology and the results are inaccurate or mis-characterize the tested products.

Some feel compelled to have these discussions, but I've learned from experience that its just not worth the effort. The sole exception is that @Adrian Ścibor does care and considers others' perspectives.

With utmost respect for everyone - my comments here are not meant to offend anyone, but what I am stating is true...
What is true is it gets twisted back and forth to fit what ever narrative folks want here. One minute they agree the next they don't. It still does not change the fact that they are incomplete thus inaccurate. Everything past that is speculation and perspective.
 
  • Like
Reactions: simmerskool
Same usual back-and-forth arguments about AV test results, AV performance, and whether or not the results reflect a correct test methodology and the tested product capabilities. Like this is the 49,287,199th iteration of this on MT.

I mean, it is honest rinse, repeat. The same points repeated the Nth time, like a derange AI bot that just repeats iterations over-and-over.

Quite honestly, disputing or doubting AV test results - particularly Youtube tester tests - does not achieve anything. Few of the general public (as in perhaps only a handful) will ever arrive here at MT with or without an account and read the multi-page, long-winded thread disputes and debates about how the test methodology and the results are inaccurate or mis-characterize the tested products.

Some feel compelled to have these discussions, but I've learned from experience that its just not worth the effort. The sole exception is that @Adrian Ścibor does care and considers others' perspectives.

With utmost respect for everyone - my comments here are not meant to offend anyone, but what I am stating is true...
This is a public space where different people upload different tests and results. The readers (those who arrive) will decide how to interpret the results. Problem is when the same attachments we discussed on another thread are masked as "concern for the reader". Anyway, the reader understands that too. 😉
 
In the late 1990s and early 2000s, freeze was known for providing free screensavers, desktop wallpapers, and other PC customization utilities. Google acquired it in 2005 and it has since been repurchased by a security company.
So an "incident" prior to 2005. OK. Got it. I should've known to ask "How long ago did you visit the site?" Instead, like a dumbass I assumed the OP meant a recent visit and incident.

Reminder to self: Always ask for complete details. Never assume anything.
 
This is a public space where different people upload different tests and results. The readers (those who arrive) will decide how to interpret the results. Problem is when the same attachments we discussed on another thread are masked as "concern for the reader". Anyway, the reader understands that too. 😉
I get it. And I agree. I just lost interest many years ago.

I would dispute that everyone that arrives at MT has the wherewithal to really figure out the nuances and full technical implications of discussions. That in no way diminishes the intent of the discussions, but those types of readers are still tilting there heads trying to figure stuff out.

What is true is it gets twisted back and forth to fit what ever narrative folks want here. One minute they agree the next they don't. It still does not change the fact that they are incomplete thus inaccurate. Everything past that is speculation and perspective.
I don't disagree.

Unless AMSTO adopts a much different set of testing standards, that is much more prescriptive (basically explains in great detail what is required to be acceptable tests) and recommends iteration testing, the general public is never going to get the type of testing methods and results that you advocate for. Which it will never do on the basis that it would impose an "impossible" testing expense onto the industry. (Poppycock. All the companies with billion Euro revenues can afford 1 million Euro testing. They're just notoriously, and widely known, to be cheap-ass corporations that are going to do whatever is cheapest, but achieves a good marketing strategy.

Mind you, AV-Comparative tests are 65,000 Euros a pop - per test - and companies complain bitterly about that cost. As if an unlicensed to practice dentist were pulling the company executives' teeth one-by-one with no anesthetic and mechanic's pliers.

Whatever AMSTO decides to do or not do, Youtube and other amateur testers are just going to do their thing. That's why I don't address test methods and results much. If I am going to try to change things, then the effort I make has to have a shred of chance of succeeding.

Discussing stuff here has about as much chance of inducing change as rioting in Paris to force the government to reverse its policy decisions. Direct bitching to AV test labs has an even less probability of bringing about change - even with the submission of evidence their test methods are flawed and unsound.

As Jean Girard would say: "Maybe, just maybe... if enough rich, powerful people were pwned by inadequate security solutions en masse - and they all lost not insignificant money as a result, then perhaps there would be effective outrage and demands for great improvements to the AV industry and testing/validation thereof. Until then, nobody has a chance of affecting change."

Stoic. Cynical. Resolved to leave it all to the Fates.
 
Last edited by a moderator: