App Review Windows Defender Firewall Critique Part 2

It is advised to take all reviews with a grain of salt. In extreme cases some reviews use dramatization for entertainment purposes.
Content created by
Ophelia
Testing malware without the delivery part has some pros and cons. If one wants to show a weak point of the tested security layer (like @cruelsister), skipping malware delivery makes the test simpler and easier to understand. I did a similar thing in my tests on dismantling AVs.
The cons follow from the fact that it is hard to conclude the real-world danger because other security layers can partially cover the exposed weak points. For example, the particular malware made by @cruelsister would be blocked on download by SmartScreen in Edge, or on execution by Windows SmartScreen - if the file was downloaded directly from a malicious URL.
 
Testing malware without the delivery part has some pros and cons.
The entire point of not testing the malware delivery part is that it is not relevant to the test.

By not testing the delivery part, shows what would happen if the detection or blocking while being delivered failed.

This is an extremely simple concept but there are those that argue "any test that does not show the delivery part is an invalid test." That statement is not accurate and the people repeating it over-and-over have an agenda. That agenda is to discredit @cruelsister 's tests.

The cons follow from the fact that it is hard to conclude the real-world danger because other security layers can partially cover the exposed weak points.
That is a weak argument. As I have stated many times, the spreading of malware by shared USB flash drives happens at a large scale in south central and southeast Asia. The only way to test such a scenario is to either launch the malware from the USB drive or, what is most typical, from the desktop. Hundreds of millions of people use PCs in that region of the world but they do not have reliable internet. They solve this partially by sharing USB drives.

The whole argument "A test must also include the malware delivery (meaning internet download) to be valid" is a very self-centered, first-world perspective.

It is a completely false statement to say "Your test is invalid because it did not test every layer of the product." OK, so what about products that only have a single layer of protection? What about the case where Smartscreen fails to block? What then? Only certain people here at MT will say "Well, that is not real-world because the tester turned off Windows Smartscreen and other Windows Security protections. If they did not disable them, then the test would have failed." LOL, such statements are ridiculous and reveal a lack of basic understanding of test methodology. But what is really going on is certain people here take every opportunity to attack some aspect of any test demonstration that @cruelsister makes because their objective is to discredit the test, and thereby discredit @cruelsister herself.

Nobody here better ever go to a BlackHat conference. They will see proof-of-concept (POC), vulnerability attacks, and testing that they'll have to wash their eyes out with Clorox bleach afterwards. A significant amount of demonstrations at hacker and pentest conferences involve disabling aspects of the operating system - or more often - also includes slightly obsolete builds of the OS or software which are exploited.

"What if" or "What could potentially occur..." testing if this or that fails (by disabling it) to protect is a standard, widely-accepted industry pentest practice. Security layers are not infallible. They can be bypassed. So honest and accurate testing of a focused aspect of a system can be done by disabling a security feature, a security layer, or devising a test that does not utilize that feature or layer. It is a completely legit form of testing.

These are very simple concepts. Children on a schoolyard playground can understand them.

When tests are performed and demonstrated, it is not the responsibility of the person(s) performing the tests to explain all the caveats to the test. Any claim otherwise just ain't true. The responsibility is on the viewer to figure it out. If they do not have that knowledge then it is on them to gain the knowledge to completely understand what the test shows - and what it does not show. What the events shown mean or imply, and what they do not.

It is not @cruelsister 's responsibility to educate every viewer on the full details of her demonstrations. It is up to the viewer to figure out the limitations, the exceptions, the corner case & specificity of the test.

It is for this reason that neophytes are like deer caught in the headlights at a BlackHat conference. The difference is that they are there to learn and many soon get it. Whereas the intent at MT is to criticize tests to discredit them, and the person who created and performed the test.
 
Last edited:
A nice comparison for business AVs can be found in the MRG Effitas "360° Assessment & Certification" tests, in the sections "Real Botnet" and "Banking Simulator" (Eset and Malwarebytes on top).
Glad that you referenced MRG (although the same is essentially true for the other Majors).

If you view their testing Methodology it would be seen that testing a given malicious sample is accomplished by downloading the file via Chrome and then running the file from the DOWNLOAD DIRECTORY. Having Chrome save the downloaded file to that directory is an arbitrary choice as it could have just as easily been saved anywhere else (like the Desktop directory).

Also, they are testing specific products without either the help of or mention of SmartScreen; the test would be flawed it they did. Also not at all mentioned was the use of a Blocklist (the presence of which is barely an inconvenience for a malicious coder as (like Confluence Networks in the video) something can work as proxy for a Virginal website containing the malicious file.

Finally malware can arise from anywhere (direct download, infected USB, worms over the Network, torrented files, email attachments, etc). The methodology in my videos parallels that which is used by the professional testing sites and most (if not all non-pro's). A true "Real World" test is acquiring a malicious file from ANYWHERE, plopping it on a system in ANY directory, and testing it against a product utilizing only the defense mechanisms that are installed by that product.

Oh, and "in the sections "Real Botnet" and "Banking Simulator" (Eset and Malwarebytes on top)"- ESET did not get Top marks in my recent test of it. but I suppose some tests are Crueler then others.
 
@bazang,

I did not write that:
  • all tests must include a delivery part,
  • all malware must be web-originated,
  • the test is invalid because the delivery part is missing.
  • the tester must educate others,
  • etc.
If you did not notice, I made similar videos without a delivery part.

These are very simple concepts. Children on a schoolyard playground can understand them.
I would not use such words. They will not help build your authority.

I am confused with your post. You answered questions in a way that could suggest they were asked by me (but they were not). Did you watch my videos? For example:
 
Last edited:

@cruelsister,​


I agree with everything you wrote except that your video is a real-world test. :)
This does not mean that the test is invalid as a video demonstration. The idea of your tests is similar to several of my tests (it is not an accident because I watched several of your videos). I would not publish my tests if I thought them invalid or unuseful. Of course, my tests were also non-real-world.
 
Last edited:
The cons follow from the fact that it is hard to conclude the real-world danger because other security layers can partially cover the exposed weak points.
Precisely. All tests have to be taken with a grain of salt.
The whole argument "A test must also include the malware delivery (meaning internet download) to be valid" is a very self-centered, first-world perspective.
Who made this claim?
 
One thing that I guess should be mentioned is exactly what the focus of a given test is. For the Professional tests it is against whatever (verified) malware shows up in their Honeypots (or whatever) during a specific time frame (a month, a quarter, a year), and unless the test is more specific (using only ransomware or only data Stealers), it is done without regard to the mechanism by which the malware works.

My testing differs in that they are for the most part testing specific mechanisms without regard to age (they may be very old or they may be very new- freshly coded- and these may or may not be included in any time-constrained Honeypot); this is done to determine if the AM app to be tested will protect against them.

But as with the Pro tests, these tests utilize the "Lowest Common Denominator" method- get the malware from somewhere, place the file somewhere on the system and run the malicious file on that system which has no other protection than the AM app itself.
 
For the Professional tests it is against whatever (verified) malware shows up in their Honeypots (or whatever) during a specific time frame (a month, a quarter, a year), ...

It is not so simple. The samples should be representative too. So, the tester must avoid morphed samples, POCs, etc. This is probably the most challenging part of real-world testing. It can also be controversial because most AV testing labs do not share details of how the samples are chosen. One has to believe that the AV vendors and leading AV testing labs can cooperate to make those tests reliable. I noticed that the real-world scorings of AV-Test, AV-Comparatives, and SE Labs can differ significantly over a year. So, their testing methodologies are not perfect.

Edit.
It is possible that reliable AV testing methodology does not exist and the test scorings are so reliable as predicting the weather for a month. :)
 
Last edited:
I should probably explain why I cannot treat the video from this thread as a real-world test. Simply, the leading AV testing labs use this term for "0-day malware attacks, inclusive of web and e-mail threats".

It is interesting how AV testing labs choose the representative samples. For example, AV-Test registers over 450000 new malware per day and only a small part of registered samples are used in the test.
https://www.av-test.org/en/statistics/malware/

Here is a fragment of AV-Comparatives testing methodology related to finding new threats for real-world tests:
We use our own crawling system to search continuously for malicious sites and extract malicious URLs (including spammed malicious links). We also search manually for malicious URLs. In the rare event that our in-house methods do not find enough valid malicious URLs on one day, we have contracted some external researchers to provide additional malicious URLs (initially for the exclusive use of AV-Comparatives) and look for additional (re)sources.
Another fragment related to the statistical analysis of results:
In this kind of testing, it is very important to use enough test cases. If an insufficient number of samples is used in comparative tests, differences in results may not indicate actual differences in protective capabilities among the tested products. Our tests use much more test cases (samples) per product and month than any similar test performed by other testing labs. Because of the higher statistical significance this achieves, we consider all the products in each results cluster to be equally effective, assuming that they have a false-positives rate below the industry average.
https://www.av-comparatives.org/real-world-protection-test-methodology/

For example in the latest test, 13 AVs belong to the same cluster so they must be treated as equally effective, even though some of them detected all tested samples and some others missed 4 samples. The statistical model AV-Comparatives used, says that the differences in missed samples are not real. Those differences can appear with high probability as artifacts of the testing methodology.

1726786297362.png


https://www.av-comparatives.org/tests/real-world-protection-test-february-may-2024/

Please forgive me if this post is off-topic, but most readers usually do not realize such important details.
 
Last edited:
The cons follow from the fact that it is hard to conclude the real-world danger because other security layers can partially cover the exposed weak points. For example, the particular malware made by @cruelsister would be blocked on download by SmartScreen in Edge, or on execution by Windows SmartScreen - if the file was downloaded directly from a malicious URL.
My recent videos are about testing a specific product against a specific malicious mechanism of action. As it being hard to conclude the Real Worldliness of such a test, a better example would be my last video on Worms:

In that one the malware employed that were ignored by Zone Alarm were neither new in Age nor in Mechanism (and are definitely in the wild), and for all I know may have actually been included in the 1000's of samples used in the pro tests.

Also, whether or not something like SmartScreen would have detected it if they were downloaded is inconsequential as when testing a specific product defenses one must NOT include any additional type of malware defense that are extraneous to the product being tested- in other words if a SmartScreen popup did occur, the proper procedure would be to override the warming and run the file anyway (as we are testing a specific product and NOT a specific product with an assist from Microsoft).

Finally it had been written by another (not you) on MT that running a file from the Desktop folder is somehow invalid. This is obviously silly as a file must be run from somewhere- from the Download folder (in the case of the file being downloaded), from within a folder from whatever Email client is used for email attachments, or from whatever folder torrented files are stored, etc.

So although such a video may not be worth the time to watch (and as I normally get only about 100 views this seems to be the case), stating that they are not in any way Real World as they are not as inclusive as a pro test has questionable validity.
 
So although such a video may not be worth the time to watch (and as I normally get only about 100 views this seems to be the case), stating that they are not in any way Real World as they are not as inclusive as a pro test has questionable validity.
No offense. My objection is related only to the terminology. If you use the term "Real World test", many people will understand it as a test on "0-day malware attacks, inclusive of web and e-mail threats". It would be better to use terminology consistent with professional tests. In your case, using the "Malware Protection test" (or simply the malware test) would not cause misunderstandings. For example, AV-Comparatives use the term "Malware Protection test" when using in-the-wild samples executed on the system (like you do):
In the Malware Protection Test, malicious files are executed on the system. While in the Real-World Protection Test the vector is the web, in the Malware Protection Test the vectors can be e.g. network drives, USB or cover scenarios where the malware is already on the disk.
https://www.av-comparatives.org/tests/malware-protection-test-march-2024/
 
@bazang,

I did not write that:
  • all tests must include a delivery part,
  • all malware must be web-originated,
  • the test is invalid because the delivery part is missing.
  • the tester must educate others,
  • etc.
Apologies. I did not mean you stated it. I was referring to others. Sorry for the confusion. I am fully aware of all your tests. The methodology you use. The specifics of the tests themselves.

Finally it had been written by another (not you) on MT that running a file from the Desktop folder is somehow invalid. This is obviously silly as a file must be run from somewhere- from the Download folder (in the case of the file being downloaded), from within a folder from whatever Email client is used for email attachments, or from whatever folder torrented files are stored, etc.
I can take you to the U.S. government lab where a completed air-gapped LAN is attacked at a specific point, and the lateral and vertical hoobahjoob results in all systems being infected with malware running in randomly chosen file directories or executed right in memory.

The hoobahjoob is a real thing, without any further specification.
 
Who made this claim?
I do not recall the user names but it has been stated quite a few times here at MT. Perhaps @cruelsister can provide a point to a post or the posts?

So, the tester must avoid morphed samples, POCs, etc.
90+ % of all malware is morphed. It is all morphed. The dinguses create a DevOps pipeline and bot that craps out morphed variants at the rate of X samples per hour. Those samples get uploaded to a platform that sends them along to various other platforms with different means to disperse. It ensures massive malware dispersal to the four corners of the Earth.

If any AV or security solution cannot effectively cope with morphed malware at least 95 or better % of the time, then the user should find a different solution.
 
Precisely. All tests have to be taken with a grain of salt.
All tests are specific and\or contrived whether performed by a security software test lab, a researcher, or an enthusiast. As long as the premise of the test is sound, then the test itself is valid.

Insurance companies cover a lot of things in their policies that organizations and individuals will never suffer a financial loss from, but the insurance company keeps including them and charging the policy owners for that very-unlikely-to-happen coverage. It is 100% profit for the insurance company. The insurance underwriters argument is "If it can happen, then it should be covered [and you should have to pay for that coverage whether or not you want it].)

This is how any security software and testing of it should be viewed. If something is within the realm of possibility, then a test showing that potentiality is valid. It matters not if one considers it "real-world" or not. Software publishers use the "Not real-world" argument to 1) dismiss or discredit a test result and 2) as the justification for not fixing demonstrated weaknesses or vulnerabilities.

Lots of security software do very well in "real-world" tests and yet they fail on a daily basis in the real-world incidents against "real-world-in-the-wild" threats. Those products that are widely acclaimed among the user base as "The best-of-the-best-of-the-best, Sir!" fail to protect hundreds of thousands of systems. And then there are those millions of users who never get infected because - regardless of the system configuration - they do not do the things that get others infected. That truth does not in any way discredit nor diminish any sound test results. Tests are assessments of "What ifs, corner cases, and abstractions of stupid human behaviors, misconfigurations, weaknesses & vulnerabilities."

The categorization of any tests as "real-world" is actually a misnomer. Because all security tests are fabricated or contrived, no matter who does them nor the underlying protocols or methods. AV test lab methodology is only an approximation of what a typical security software user would experience against the average security threat.

The term "real-world" and "360 Assessments" as a "test methodology" or suite of tests was done to quash complaints by security software publishers that the testing was not showing their product features in the best light. The babies cried "Foul! Not fair! Not fair!" So labs came up with jingaling-jingaling marketing labels for their tests. This made their sensitive clients happy because it provided tests named and designed to provide the "proof" that they are quality security software where the publisher can state "You are protected." It's 100% marketing driven - and Microsoft itself is mostly responsible for why this kind of testing and marketing exists.

Security software developers design their products around a set of features they believe to be the best way to protect against threats. Any test that does not show off these features to the publisher's satisfaction, that publisher will consider "invalid" and do everything they can to discredit the test results. Or, no matter what, the publisher - when it comes down to it - will place the blame on the user with the predictable arguments "The user did something that is not covered by the product, the user did not understand the product, the user misconfigured the product, the user selected "Allow," users do not look at advanced settings to increase protections to cover this case, etc."

Unfortunately, all the test labs have caved-in to these publisher complaints and created test protocols that are acceptable to the security software publishers - who pay the lab money. Any entity that derives its living from "Clients" is going to cater to those clients in order to keep those clients happy and the revenue inflows going. This is not to say that the testing is not well designed, fundamentally compromised by "profit before accurate test results" or similar. It just means AV Labs are not going to perform any tests that will bypass every single security product. They will not assess the products in a way that goes beyond security software publisher accepted "vanilla" testing. Anything outside of that, those publishers will cry "Invalid!"

The best, most accurate testing are independent enthusiasts that find ways to bypass specific security features. This is where you get a clear and honest demonstration that, if you understand the demonstration, that realize "What the security software publisher says just ain't true or it is not entirely true." For those that have greater insight they realize that a test is specificity. It might even be purpose-built to show weakness in one software and that the weakness does not exist in the other. That demonstration does not invalidate what is being demonstrated.

Google's Project Zero operates on this basis. Tavis Ormandy has been notorious for ignoring security software publisher and security software enthusiast complaints that his findings are not valid. His reply to any detractors has always been: "F*** O**. The test results are accurate and what I am saying is the truth.

It is unfortunate, but there are those who automatically assume that a person's preference for one security software over any other automatically makes their demonstrations nefarious or wrongly biased. Well if that is the case then every security software publisher out there has commissioned very specific tests with assessment firms such as MRG Effitas to show their product is better than the ones in that the security software publisher picks-and-chooses to be assessed against - thereby guaranteeing the end result that it wants - which is "their product is better than all others."

All tests should be approached with "I need to figure out what is being shown here. What it implies. And most importantly what it does not show or imply. And I need to not add words or intent to the test. Unless something is explicitly stated then I should assume nothing. There are an infinite number of ways I can interpret the test and its results. I should remove my own biases when viewing, interpreting, and reviewing the results."

The vast majority of people cannot do that. They bring their own personal junk and can't get past themselves when interpreting anything.
 
Last edited:
All tests should be approached with "I need to figure out what is being shown here. What it implies. And most importantly what it does not show or imply. And I need to not add words or intent to the test. Unless something is explicitly stated then I should assume nothing. There are an infinite number of ways I can interpret the test and its results. I should remove my own biases when viewing, interpreting, and reviewing the results."
Thus, my comment. "All tests have to be taken with a grain of salt."
 
All tests are specific and\or contrived whether performed by a security software test lab, a researcher, or an enthusiast. As long as the premise of the test is sound, then the test itself is valid.
Unfortunately, there is no way to check if the test's premise is sound. You have to believe the tester's authority. After reading the documentation of AMTSO (AMTSO - Anti-Malware Testing Standards Organization - AMTSO), I am ready to believe them. Making a test with AMTSO standards is extremely hard. I can also believe some well-known researchers. I have no reason to believe most enthusiasts (with a few exceptions).

Lots of security software do very well in "real-world" tests and yet they fail on a daily basis in the real-world incidents against "real-world-in-the-wild" threats.

That is normal. If you take the results of real-world tests of leading AV testing labs, the top AVs miss on average about 1 per 500 samples. There are more than 400000 samples per day in the wild, so we have many unhappy people daily.

The babies cried "Foul! Not fair! Not fair!" So labs came up with jingaling-jingaling marketing labels for their tests.

I think that the labels are sensible. The samples from the "Real-World" and "Malware Protection" tests are very different. The first category includes very fresh web-based samples. The second category includes the older samples originating from non-web sources. The average infection rate of the fresh web-based samples is several times higher. The web-based samples are mainly morphed samples that live short in the wild. The non-web malware can be stored (alive) for weeks.

For example:
Avast "Real-World" infection rate: 10/6748 ~ 0.0015
Avast "Malware Protection" infection rate: 15/270634 ~ 0.000055

This is not to say that the testing is not well designed, fundamentally compromised by "profit before accurate test results" or similar. It just means AV Labs are not going to perform any tests that will bypass every single security product. They will not assess the products in a way that goes beyond security software publisher accepted "vanilla" testing. Anything outside of that, those publishers will cry "Invalid!"
I am unsure if this is true, but one cannot exclude the possibility that the influence of AV vendors on AV testing labs can make the tests somewhat biased.
Anyway, in the end, the cumulative results of those tests are probably close to truth:

Real-World 2021-2022: SE Labs, AV-Comparatives, AV-Test (7548 samples in 24 tests)
Norton 360, Avast, Kaspersky ............12 - 18 missed samples <------- top AVs
Microsoft, McAfee ..............................27 - 37 missed samples <------- very good AVs


Google's Project Zero operates on this basis.
Yes, this project and some others (Black Hat, Bug Bounty, etc.) are valuable.

It is unfortunate, but there are those who automatically assume that a person's preference for one security software over any other automatically makes their demonstrations nefarious or wrongly biased.
Yes, it is unfortunate and irrational.

Unless something is explicitly stated then I should assume nothing. There are an infinite number of ways I can interpret the test and its results. I should remove my own biases when viewing, interpreting, and reviewing the results."
Well said. :)(y)
 
I have no reason to believe most enthusiasts (with a few exceptions).
Most enthusiasts that want to be taken seriously do the things necessary in their testing to build a solid reputation. They are easily identified and differentiated from the typical YouTube tester.

I did not mean any random YouTube tester.

I am unsure if this is true, but one cannot exclude the possibility that the influence of AV vendors on AV testing labs can make the tests somewhat biased.
Those being assessed always have influence upon the assessors and the methodology used to assess, particularly when the relationship between the assessed and the assessor is subject to multi-level conflicts of interest, collusion, a common motive or objective, a pay-for-assessment system, gaming the assessment system (cheating), etc.

It is probably a good thing to be skeptical in such arrangements, even when every bit of infos available to you points to all of it being performed and completed in a trustworthy manner.

Well said. :)(y)
Lord Burghley (William Cecil) once said to someone he wanted to degrade and grievously insult:

"If it were not for you being you, you would be a much better person. You are powerless to help yourself. A curse upon your family and our society. You could have been great." (I wonder what that sounded like in the original Old English. The tone and inflection of voice he used.)

When he said it he knew it applies to us all in one way, shape or form. Himself excluded, of course. Even though he knew it to be particularly apt to himself. He was great by the mere fact that he considered himself to be one of the greatest in English history. Such it is for those that dictate the rules of Empire and their place in it. They even get to fabricate their own pleasant fictions about themselves.
 
Last edited:
Most enthusiasts that want to be taken seriously do the things necessary in their testing to build a solid reputation. They are easily identified and differentiated from the typical YouTube.

If we exclude the tests made by MT members, are there examples of easily identified enthusiasts accepted by you? I have a problem with such examples.

Those being assessed always have influence upon the assessors and the methodology used to assess, particularly when the relationship between the assessed and the assessor is subject to multi-level conflicts of interest, collusion, a common motive or objective, a pay-for-assessment system, gaming the assessment system (cheating), etc.

It is possible, but I did not discover a significant influence in the typical professional tests. Furthermore, the tests done by enthusiasts and researchers do not contradict the results of professional tests. Did you encounter any contradictions?
 
If we exclude the tests made by MT members, are there examples of easily identified enthusiasts accepted by you? I have a problem with such examples.
Most people I know who are not professional researchers or security testers that test do not make videos. They make demos at conferences such as Wild West Hackin' Fest or Black Hat. They are self taught and take their "hobby" very seriously. They're not creating YouTube videos for likes or to be influencers (not that there is actually anything wrong with either, intrinsically).

It is possible, but I did not discover a significant influence in the typical professional tests.
All the tests performed by AV Comparatives, AV Test, etc - they are all to the dictates of what the security software publishers as an industry group find acceptable. That industry group has great influence as they are the source of all AV lab revenue.

If you are a business and you do things that your clients do not find acceptable, then you will not be in business for very long.

Furthermore, the tests done by enthusiasts and researchers do not contradict the results of professional tests. Did you encounter any contradictions?
It depends upon what you define as a "contradiction."

Researchers and enthusiasts expose are all the corner cases and things not covered by the dedicated professional test labs. Your video showing how the publicly available infos to stop services can be weaponized against security software is a prime example. I know you did more testing than you published. You were able to bork other security software using the method, but you chose not to publish the results because you did not want to deal with MT drama from certain people.

Researchers definitely test differently than test labs and their objective is not to perform "general malware testing" like the AV test labs. They are motivated to find unique problems, problems in areas where others did not think to look, etc.

If you were given a budget of $20 million USD and instructed to hire researchers and enthusiast pentesters\security software testers to put all the leading security software through various multi-level rounds of thorough testing, after you did that and posted the test results here the readers would have a much different, more cautious, less trusting attitude towards their favorite security software.

Knowing you, your counter-argument is going to immediately go to "Home users are not targeted so that kind of testing is not required and the home users need not worry." While it may be true they are not targeted, that is not the point. The point is that there are lots of ways to put holes in security software and it is not the test labs finding them. It is researchers and enthusiasts that do it.
 
Last edited: