Serious Discussion DEBATE | Are security YouTubers misleading people about antivirus effectiveness?

RoboMan

Level 38
Thread author
Verified
Honorary Member
Top Poster
Content Creator
Well-known
High Reputation
Forum Veteran
Jun 24, 2016
2,676
25,231
3,600
Hidden Village of Hispanic America
There's an interesting debate I'd like to chat with you guys.

I’ve been watching a lot of antivirus tests on YouTube lately, and I think there’s a fundamental problem with how these products are being evaluated.

Many videos show solutions like Microsoft Defender, Bitdefender, Kaspersky, or ESET NOD32 being tested by dumping hundreds of malware samples onto a system in a very short period of time.
That approach completely ignores how modern protection stacks are actually designed to work.

Current antivirus solutions are not just static scanners, they rely heavily on cloud intelligence, reputation systems, behavior monitoring, and staged detection over time. When you execute hundreds of samples back-to-back, you effectively bypass timing, context, and correlation mechanisms that are critical to real-world protection.

It also removes the attack chain aspect entirely. Real infections are not isolated binaries dropped in bulk, they involve delivery vectors, user interaction, scripting layers, persistence mechanisms, and often multiple stages where different protection components trigger at different points.

Another major issue is the lack of clarity around test conditions:
  • Whether cloud connectivity is active and allowed to respond in real time
  • If samples are fresh, prevalent, or already classified
  • Whether protection layers like behavior blocking, exploit mitigation, and rollback/remediation are actually being observed and measured
Focusing purely on initial detection rates in this kind of artificial scenario leads to conclusions that don’t reflect how these products perform under realistic conditions.

In practice, what matters is not just “did it detect the file instantly”, but whether the system ends up compromised after the full execution chain, including post-execution remediation and containment.

This is why controlled methodologies from organizations like AV-Comparatives and AV-Test tend to produce results that differ significantly from these YouTube-style tests, they attempt to simulate real-world exposure rather than synthetic stress scenarios.

The problem is that the visual impact of “X antivirus missed 200 samples” is much stronger than a nuanced explanation of layered protection.

As a result, I think a lot of viewers are walking away with a distorted understanding of how effective these solutions actually are in real usage.

Curious to hear thoughts, especially from people who have experience with malware analysis or have run structured tests themselves.
 
My thoughts are that this is a knife with 2 blades.

Many solutions handle downloads more aggressively.

This can include emulating downloads (whereas local files are not emulated) or taking into account the origin.

Where things go wrong is assuming that every solution will always pass the tests and will provide exceptional protection, if the files were downloaded rather than dropped.
Claiming that “all solutions provide acceptable protection in real-world scenario” is extrapolated, unless documentation (such as the Check Point or Eset emulation documentation which is quite comprehensive), detection logs (like the McAfee log) or anything else can be supplied to prove that the solution does better with downloads.

Then we can argue that the web filter is bypassed.
This is true.

But a full security suite should be able to detect malware from any source. These products are not just browser extensions.
 
Every testing video posted on YT or any other platform is appreciated, regardless of the used methodology.
Every few words describing a real-life experience are appreciated, regardlesso of the lack of experience or knowledge.
I collect data from those two sources and create my own conclusion; any pre-delivered, bundles conclusions is concerning the provider, not me.
And, of couse, any AI-generated data (not real-life derived) is discarded.
 
I believe that YouTube in general has become a site for completely slop content. As for videos specifically related to antivirus software, I think the vast majority of them are sponsored and paid for by the producing companies to make their products appear superior.
There are many that proudly take payments from vendors. Even if they don’t, remember that this is an incentivised platform. Content will be, to say the least, adjusted, so it is interesting for the viewers.

When it is interesting to the viewers and funds start flowing in, this is a dangerous and addictive cycle where honesty, performance and accuracy hardly matter anymore.

What matters is how many subscribers are there, how many of them will click and until which minute they will watch.
 
There's an interesting debate I'd like to chat with you guys.

I’ve been watching a lot of antivirus tests on YouTube lately, and I think there’s a fundamental problem with how these products are being evaluated.

Many videos show solutions like Microsoft Defender, Bitdefender, Kaspersky, or ESET NOD32 being tested by dumping hundreds of malware samples onto a system in a very short period of time.
That approach completely ignores how modern protection stacks are actually designed to work.

Current antivirus solutions are not just static scanners, they rely heavily on cloud intelligence, reputation systems, behavior monitoring, and staged detection over time. When you execute hundreds of samples back-to-back, you effectively bypass timing, context, and correlation mechanisms that are critical to real-world protection.

It also removes the attack chain aspect entirely. Real infections are not isolated binaries dropped in bulk, they involve delivery vectors, user interaction, scripting layers, persistence mechanisms, and often multiple stages where different protection components trigger at different points.

Another major issue is the lack of clarity around test conditions:
  • Whether cloud connectivity is active and allowed to respond in real time
  • If samples are fresh, prevalent, or already classified
  • Whether protection layers like behavior blocking, exploit mitigation, and rollback/remediation are actually being observed and measured
Focusing purely on initial detection rates in this kind of artificial scenario leads to conclusions that don’t reflect how these products perform under realistic conditions.

In practice, what matters is not just “did it detect the file instantly”, but whether the system ends up compromised after the full execution chain, including post-execution remediation and containment.

This is why controlled methodologies from organizations like AV-Comparatives and AV-Test tend to produce results that differ significantly from these YouTube-style tests, they attempt to simulate real-world exposure rather than synthetic stress scenarios.

The problem is that the visual impact of “X antivirus missed 200 samples” is much stronger than a nuanced explanation of layered protection.

As a result, I think a lot of viewers are walking away with a distorted understanding of how effective these solutions actually are in real usage.

Curious to hear thoughts, especially from people who have experience with malware analysis or have run structured tests themselves.
That’s why methodologies from orgs like AV-Comparatives and AV-Test look very different:
  • Staged execution
  • Realistic delivery vectors (phishing, exploit kits, droppers)
  • Time-aware scoring (initial miss ≠ final compromise)
  • Consideration of user interaction and system state
Another nuance most Youtube people miss:
A product that “misses” initially but blocks persistence or kills the payload during runtime is arguably doing its job. In contrast, a product with high static detection but weak behavioral controls can still lose against slightly modified or obfuscated threats.
 
Another nuance most Youtube people miss:
A product that “misses” initially but blocks persistence or kills the payload during runtime is arguably doing its job. In contrast, a product with high static detection but weak behavioral controls can still lose against slightly modified or obfuscated threats.
This has another nuance, behavioural monitoring is always done in async mode. I don’t know a single solution that will stall your program for 5 minutes whilst dwelling if it’s malware or not. This is the emulations job.

Though many solutions include policy enforcement (or whatever they wanna call it) which is essentially short heuristics that kick in quickly and block damage from being done, often by the time the malware is remediated, the system may already been compromised.

The fact that the solution generated alerts, graphs and reports, coupled with positive-reinforcement texts doesn’t necessarily mean that the situation was handled.

This again comes to understanding what exactly you are dealing with, being able to prove it through documentation and other means, and tailoring the test, rather than applying “one size fits all” approaches.
 
This has another nuance, behavioural monitoring is always done in async mode. I don’t know a single solution that will stall your program for 5 minutes whilst dwelling if it’s malware or not. This is the emulations job.

Though many solutions include policy enforcement (or whatever they wanna call it) which is essentially short heuristics that kick in quickly and block damage from being done, often by the time the malware is remediated, the system may already been compromised.

The fact that the solution generated alerts, graphs and reports, coupled with positive-reinforcement texts doesn’t necessarily mean that the situation was handled.

This again comes to understanding what exactly you are dealing with, being able to prove it through documentation and other means, and tailoring the test, rather than applying “one size fits all” approaches.
Ultimately, the metric that matters isn’t:

“Did the product raise an alert?”

It’s:

“Did the adversary achieve their objective before the control meaningfully intervened?”

And as you pointed out, without understanding the threat class and execution model, any generic “drop-and-run” test will keep producing misleading confidence signals.
 
“Did the product raise an alert?”

It’s:

“Did the adversary achieve their objective before the control meaningfully intervened?”
The reason I do not consider much auditing the remaining samples in the test video with an on-demand scanner such as Hitman pro.
Either you have executed you test samples without AV before the test and has confimred the malicious action using whatever forensic analysis you may use or has to do the same after the test.
 
There's an interesting debate I'd like to chat with you guys.

I’ve been watching a lot of antivirus tests on YouTube lately, and I think there’s a fundamental problem with how these products are being evaluated.

Many videos show solutions like Microsoft Defender, Bitdefender, Kaspersky, or ESET NOD32 being tested by dumping hundreds of malware samples onto a system in a very short period of time.
That approach completely ignores how modern protection stacks are actually designed to work.

Current antivirus solutions are not just static scanners, they rely heavily on cloud intelligence, reputation systems, behavior monitoring, and staged detection over time. When you execute hundreds of samples back-to-back, you effectively bypass timing, context, and correlation mechanisms that are critical to real-world protection.

It also removes the attack chain aspect entirely. Real infections are not isolated binaries dropped in bulk, they involve delivery vectors, user interaction, scripting layers, persistence mechanisms, and often multiple stages where different protection components trigger at different points.

Another major issue is the lack of clarity around test conditions:
  • Whether cloud connectivity is active and allowed to respond in real time
  • If samples are fresh, prevalent, or already classified
  • Whether protection layers like behavior blocking, exploit mitigation, and rollback/remediation are actually being observed and measured
Focusing purely on initial detection rates in this kind of artificial scenario leads to conclusions that don’t reflect how these products perform under realistic conditions.

In practice, what matters is not just “did it detect the file instantly”, but whether the system ends up compromised after the full execution chain, including post-execution remediation and containment.

This is why controlled methodologies from organizations like AV-Comparatives and AV-Test tend to produce results that differ significantly from these YouTube-style tests, they attempt to simulate real-world exposure rather than synthetic stress scenarios.

The problem is that the visual impact of “X antivirus missed 200 samples” is much stronger than a nuanced explanation of layered protection.

As a result, I think a lot of viewers are walking away with a distorted understanding of how effective these solutions actually are in real usage.

Curious to hear thoughts, especially from people who have experience with malware analysis or have run structured tests themselves.
I have already recorded a few tests but haven’t published them yet. These “game-changer” tests had a critical flaw: most modern malware incorporates advanced anti-analysis, anti-debugging, and anti-VM techniques, so the execution environment itself ends up biasing the results.

I have an old system with Core i3 10100F which can be ideal for such tests. These will be much closer to real victim environments—bare-metal, naturally “noisy,” and far less likely to trigger sandbox evasion heuristics compared to instrumented VMs. These tests are very much in pipeline.
 
But then I have another question, why is the solution not terminating the script/executable (whatever causes the malware to be executed), unless it is previously excluded by the YouTuber…?

Because I can tell you from experience that developing a malware pack heuristic is not exceptionally hard to do.

The correct response is, you have seen that this process launches high number of samples and all of them, if not malicious/suspicious are unknown, mostly unsigned, all of them highly questionable.

At that point it becomes evident this is not a trusted process. What stops these billion-dollar companies from processing the threat correctly, by terminating what caused the execution in the first place. And deep scanning the entire malware pack.

I don’t think that consumers should be charged prices as high as $150-200 nowadays for bells and whistles, and brittle protections that rely on loads of ifs and but-s.
 
Consumers should just learn to question, think and demand, and not just to accept the status quo and justify it with “realistic scenarious”, “drops vs downloads” and so on.

If you allow these excuses in development like “we would have done it if you did x, y and z”, this already not amazing.
If all businesses used these excuses, we would still be stuck in the Windows XP era.

It is the same sort of excuse like “I don’t have time”, and “I will do it next Monday”.
 
Even though I’m not directly involved in this debate (since I’m on Odysee and no one uses my method), I’m still going to weigh in.

Most YouTube testers will bombard the system. Some will even use scripts (like Malx) to overload the protection and the system.
That’s why the residential engine will struggle to neutralize everything (for example, with the Virut virus or Expiro, which will quietly infect the system while the engine is busy).

Personally: I run a scan, clean up, and then run the test myself with a short delay I set for launching the next round. I give the protection and the malware time to do their thing. :)

As for the packs, there’s no need to have 20,000 samples. For my part, I spend an hour or more selecting the samples myself by mixing them (old malware, worms, Trojans, stealers, ransomware, JS/VBS attacks, etc.)
 
Many YouTubers and most influencers are misleading and outright fooling people for fame and fortune; the people, or so-called fans, are aggressively and increasingly stupider than ever—following, subscribing, believing, and fighting for them! FANS—Fans Are New-Age Slaves!
 
If all businesses used these excuses, we would still be stuck in the Windows XP era.
;)
1775592539977.png
Most YouTube testers will bombard the system. Some will even use scripts (like Malx) to overload the protection and the system.
That’s why the residential engine will struggle to neutralize everything (for example, with the Virut virus or Expiro, which will quietly infect the system while the engine is busy).
I don't like that either. It's totally unrealistic. It's like downloading a game or app and getting 100 pieces of malware at once instead of just one. I've never seen that in the real world—multiple simultaneous infections with dozens of pieces of malware all at once. :)
 
I don't' see it as a fault of a content creator; any and every content ever created has an inherent bias and it's used to sway people towards whatever message the content creator is producing. That being said; it's always been up to the customer/consumer to conduct their own research into acquiring goods. Any large investment or any investment that leverages your life, wellbeing or finances require an in-depth analysis and comparisson between different sources in order to arrive at your own decision and not to be forced into a solution.