The peculiarity of EXE malware testing.

Andy Ful · Mar 16, 2022

The peculiarity of EXE malware testing.

Let's look at the distribution of malware in the wild by file types:

https://static.poder360.com.br/2021/07/relatorio.pdf

We know that archives contain mostly EXE files, documents, and scripts. So, we can conclude that roughly 1/3 of initial malware are EXE files, 1/3 documents (PDF, Office), and 1/3 scripts. The attacks started with documents or scripts can often include EXE files at the later infection stage. So, most EXE malware will not be initial malware but EXE payloads that appear later in the infection chain.

Let's inspect the protection of two AVS ("A" and "B") which have got identical capabilities for detecting unknown EXE files, but the "A" has got additional strong protection against documents and scripts. It is obvious that:
Protection of "A" > Protection of "B"

The "A" can detect/block the document or script before the payload could be executed. So, there can be many 0-day EXE payloads in the wild attacks that will compromise "B" and will be prevented by "A".
After being compromised, the "B" creates fast signature or behavior detection, and for the "A" the payload is invisible.

If we will make a test on AVs "A" and "B" the next day the results will be strange:
The "B" will properly detect these payloads and the "A" will miss them. The result of the test:
Protection of "A" < Protection of "B"

The above false result holds despite the fact that "A" and "B" have got identical capabilities for detecting EXE files and despite the fact, that in the wild the "A" protects better.
The error follows from the fact that the AV cloud backend can quickly react when it is compromised, and much slower if it is not.
There are many people involved in threat hunting, so after some time most of these invisible (for "A") EXE payloads are added to detections (even if they did not compromise "A" in the wild).
Anyway, this reasoning does not apply to AVs that use file reputation lookup for EXE files. Such AVs will block almost all EXE samples both in the wild and in the tests.

How to minimize this error?

Using initial attack samples (like in the Real-World tests).
Using many 0-day unknown samples. The more unknown samples the smaller will be the error.
Using prevalent samples (like in the Malware Protection tests made by AV-Comparatives and AV-Test). The "A" has more chances to quickly add the detections of prevalent samples.

For a less technical explanation, please look here:
https://malwaretips.com/threads/pecuriality-of-exe-malware-testing.112854/post-979714

Andy Ful · Mar 16, 2022

The sad thing is that the "EXE test error" can be different for different AVs and can also depend on some random factors related to spam campaigns, targetted attacks, the choice of samples, etc. Furthermore, it is kind of a systematic error, so increasing the number of tests cannot improve the results.
We can try to roughly estimate how big it can be.

The payloads come from non-EXE initial attacks. Let's assume that:

Payloads are 50% of all tested samples (probably more).
Antivirus "A" can block in the wild 30% more scripts and documents than antivirus "B".
Each of these additionally blocked scripts and documents could use one EXE payload (if not blocked). In real attacks, there can be more than one EXE payload, but sometimes there will be none. These payloads are unknown "false payloads" for antivirus "A", because they were not used against this antivirus in the wild.
About 2/3 of these fresh & unknown "false payloads" could be detected/blocked by the antivirus "A", and 1/3 could infect the computers.
About 1/2 of these undetected "false payloads" could be added before the test to the detection basis via threat hunting and borrowing the detections from other AVs. This can be the main source of differences between AVs. Some vendors are slow and some much faster in adding detections for "false payloads". The slow vendors will get greater "EXE test error" as compared to fast vendors.
The chances of reusing the payload as an initial malware are negligible. This assumption is not true when lateral movement is an important attack vector or flash drives are extensively used for sharing files, cracks, pirated software. See also:
https://malwaretips.com/threads/pecuriality-of-exe-malware-testing.112854/post-979543

Rate of false infections = 0.5 * 0.3 * 1/3 * 1/2 = 15/600 ~ 2.5%

Of course, this calculation is only an example. But it shows that the testing error can hit a few percent. Furthermore, it does not apply to AVs that use file reputation lookup (the factor 1/3 must be replaced by something close to 0).

Edit.
Added point 6.

MacDefender · Mar 16, 2022

I definitely agree, a lot of the drive-by samples that I get by clicking through malvertising on my spam honeypot aren't PE executables. My best guess is that with all the SmartScreen and other warnings, it might tip off a user to the fact that this is unusual.

For Powershell/VBS initial vectors, I've seen a lot of them attempt to disable AV services by various ways. Most commonly, they look in Program Files for presence of specific AV software and then either run or download another script that cleverly tries to turn off each AV. It seems like they've thought through how to defeat some anti-tamper systems -- for some AVs it's as simple as stopping a service. For others, they'll edit a configuration file or change a registry key.

That might be a big part in answering how easily detectable EXE payloads seem to make it through AVs.

Andy Ful · Mar 17, 2022

Although the "EXE test error" makes the interpretation of test results much harder, the EXE tests can show one thing.
One should prefer AVs that have got strong protection/prevention against all kinds of files (especially scripts and documents). But, using an additional security layer for EXE files (or better for PE files) can be still recommended in businesses.
In businesses, many payloads can be delivered directly via lateral movement without using documents (and truly malicious scripts). So, we have more initial malware and fewer true payloads (probably 1/10 instead of 1/2) for the computers in the already infected network. The "EXE test error" is smaller:

Rate of false infections = 1/10 * 0.3 * 1/3 * 1/2 = 3/600 ~ 0.5%

The "EXE test error" is now caused by ignoring some scripting methods in the test.

Edit1.
At home, a similar situation can be caused when flash drives are extensively used for sharing files, cracks, pirated software. Most EXE payloads can be delivered in this way without using documents and scripts.

Andy Ful · Mar 17, 2022

The importance of MOTW in testing.

MOTW is Mark Of The Web.

Mark-of-the-Web from a Red Team's Perspective | Outflank

Zone Identifier Alternate Data Stream information, commonly referred to as Mark-of-the-Web (abbreviated MOTW), can be a significant hurdle for red teamers and penetration testers, especially when attempting to gain an initial foothold. Your payload in the format of an executable, MS Office...

outflank.nl

It is important in Real-World testing. MOTW is attached to files downloaded from the Internet by using web browsers, OneDrive, etc. Many AVs use it for better detection of PE files (EXE, DLL), documents, MSI installers, and scripts. These additional protection layers are usually triggered even with empty MOTW.

Of course, the payloads are downloaded by initial malware without MOTW. So, when testing how AVs can protect against lateral movement in businesses and organizations, one should probably use samples without MOTW.

Andy Ful · Mar 17, 2022

Is Defender better than TrendMicro?

Here are the results of AVLab tests from the years 2019-2021. I skipped the test from January 2022 because the results for Defender (only L3 level detections) contradict the AVLab testing methodology.

............................MONTH:.. J......S.....O...N....j....m...M...J....S....N....j...m...M...J....S....N..
Defender ............................ x ... x ...17.. 0 .. x.. 20.. x... x... 0... x... 8... 0... 0 ...x... 2... x = 47
TrendMicro ........................ x ... x ... x ... x ... x.. 2..158 x ... x ...x ...x ...x ...x ...x ...x ... x = 160

TrendMicro missed more samples in 2 tests than Defender in 8 tests. So the result for TrendMicro is much worse.
But, does it mean that TrendMicro has got much worse protection against the EXE files? I do not think so. In such a situation, TrendMicro could not get top results (better than Defender free) in the Real-World tests. So what is happening here?

Defender is used by the customers as a free AV and in Enterprises as a paid AV. Defender free on default settings has not got such strong protection/prevention against weaponized documents as TrendMicro (not sure about scripts). It also seems that Microsoft can more carefully add detections of "false payloads". So, the much better result of Defender (113 fewer missed samples) is not real and probably caused by "EXE test error".

I think that "EXE test error" can be also responsible for the results in AV-Comparatives Malware Protection tests.
Malware Protection 2019-2021, missed samples:
Microsoft........36
TrendMicro...481
Samples.....61 919

The "EXE test error" (about 445 samples per 61919) is in fact relatively small ~ 0.7% . This can be related to using prevalent samples two-weeks-old (on average). In the case of AV-Comparatives, the samples were prevalent, so it is probable that Defender free missed in the wild most of these 445 payloads (via undetected documents), when they were 0-day or 1-day old samples (before the test).

Post edited.

Andy Ful · Mar 18, 2022

YouTube tests and "EXE test error"

My previous posts were rather technical and probably hard to understand. So, it would be useful to describe it in a simpler way.

Many people are convinced that the home-made AV tests on YouTube can reflect the true capabilities of AVs. They use thousands of few-days-old (or few-weeks-old) EXE samples and think that using so many samples makes their tests faithful. They also think that if the particular AV has not got a very good scoring in such tests, then it cannot protect very well against unknown 0-day EXE malware.

Unfortunately, they are wrong for several reasons. One of these reasons is "EXE test error". Such tests could be reliable when the samples were few-hours-old, but not few-days-old.
When one uses older EXE samples in testing, then the test shows simply how could be the protection if the attackers would bother to reuse older EXE payloads as initial malware, instead of creating new malware. So, the result of such a test is highly artificial. The attackers prefer to use new samples because these samples are far more effective.

Generally, the "EXE test error" says that AVs that do not have good protection against scripts and weaponized documents are favored (get undeservedly better scorings) compared to AVs that have got stronger protection/prevention against scripts and weaponized documents. This happens even when AVs have got the same capabilities to detect EXE files.
The quantity of the "EXE test error" can also depend on how much the AV vendors care about the older samples that never hit their customers.
Anyway, it does not apply to AVs that use reputation file lookup for EXE files (or smart default-deny approach), because these AVs can protect against EXE files independently of the non-EXE protection layers.

Can the EXE tests be valuable?

Yes, they can. But not to compare the protective abilities against EXE malware (when few-days-old or older samples are used).

The EXE tests can be still valuable in several ways:

They can mimic the scenario when flash drives (USB drives) are used in an extensive way to share cracks, pirated software, etc.
For AV vendors to improve their products.
For people who want some more information about how AVs work.
As the demonstration of capabilities.
As a kind of support test to Real-World testing.

The "EXE + scripts" test can mimic the scenario of attacks via lateral movement in businesses and organizations.

Edit.
Slightly simplified the content.

Andy Ful · Mar 19, 2022

Can the "EXE error" be an excuse for poor AVs?

One can think that this threat can be a convenient explanation for poor AVs. Such thinking is natural after seeing the test results as below:

AV-Comparatives + AV-Test, Consumer Malware Protection tests 2021
Missed samples - tested few-weeks-old prevalent samples: only EXE files:
Avira ..................... 39
TrendMicro ......... 224

The guy in the OP claims that TrendMicro can still have similar capabilities of detecting/blocking EXE malware? He is possibly crazy or simply stupid.

But, look here:
AV-Comparatives + AV-Test, Consumer Real-World Protection tests 2021
Missed samples - tested 0-day samples: EXE files, documents, scripts:
TrendMicro ...... 7
Avira ................ 27

Something here is upside down, isn't it?
So, No - the "EXE error" is not an excuse. It is a probable explanation only in specific situations, when one AV scores better in Real-World tests (tested EXEs, scripts, documents) and significantly worse in Malware Protection tests (tested only EXEs). If this AV has got good protection against scripts/documents and scores better for fresh samples (R-W tests), then it simply cannot have fewer capabilities to detect/block EXE files. So, Avira does not have more capabilities than TrendMicro to detect/block EXE files, and the same says the "EXE error".

In fact, there is a simple explanation. In the first example. Avira also had missed most of the samples missed by TrendMicro, but this had happened long before the test, via undetected documents and scripts. The difference is that Avira had missed these samples in the wild (some consumers had been infected), and TrendMicro missed them in the test (consumers were not infected). The test also shows that the TrendMicro consumers might be infected anyway if the attackers would bother to reuse these samples (payloads) in the wild. Furthermore, it shows that TrendMicro does not care much to add detections for such malware.

wat0114 · Mar 19, 2022

So in a nutshell I guess you are saying an AV with initial protection against documents and scripts in addition to detection of exe's is better than just detecting exe's using latest definitions? This seems obvious to me, but maybe you mean something a little different? Just like Windows Defender for home users is so much stronger when something like the vaunted Configure Defender is used to augment it

BTW, not meaning to nitpick, but thought I'd point out that you may have meant to use "Peculiarity" as the noun in the thread title

Andy Ful · Mar 19, 2022

wat0114 said:
So in a nutshell I guess you are saying an AV with initial protection against documents and scripts in addition to detection of exe's is better than just detecting exe's using latest definitions?

Not exactly. I say that it is as important as detecting EXE files by all methods (latest definitions, behavior-based, detonation in a sandbox, etc.). If you skip documents and scripts in a test, then you will face the "EXE error" that can make some results look anomalous (and misguiding).

wat0114 said:
Just like Windows Defender for home users is so much stronger when something like the vaunted Configure Defender is used to augment it

Something like that. Nowadays, even perfect protection against EXE files is not sufficient. The fileless methods have become too popular.

wat0114 said:
BTW, not meaning to nitpick, but thought I'd point out that you may have meant to use "Peculiarity" as the noun in the thread title

Yes (corrected).

Andy Ful · Mar 19, 2022

wat0114,

Your last post has given me an idea about better and worse signatures in relation to "EXE error".
Let's look again at the test results:

AV-Comparatives + AV-Test, Consumer Malware Protection tests 2021
Missed samples - tested few-weeks-old prevalent samples: only EXE files:
Avira ..................... 39
TrendMicro ......... 224

The results for relatively old and prevalent EXE files depend mostly on signatures. So, it is natural to conclude that TrendMicro has got worse signatures.
But this is not necessarily true because, in the test, not all signatures are used.
The test uses signatures for the EXE files, but ignores the possible signatures for documents, scripts, etc. In reality, the signatures for documents and scripts can be better and more efficient protection than signatures for EXE files. In many cases, the EXE payloads are delivered only by documents and scripts. Furthermore, one signature for a document or script can protect against several EXE payloads. The scripts and documents often include only the URL to the payload, and the payloads there are often changed after a few hours.

So, in this example, TrendMicro has got a much worse result not because of worse signatures. The issue is the testing methodology which is anomalous for TrendMicro. The test can see the signatures for bullets (that killed people) but ignores the signatures for guns and shooters.

Andy Ful · Aug 3, 2022

Interesting article about the delivery of ransomware:

Dark Web Research Suggests 87% of Ransomware Brands Exploit Malicious Macros

The findings uncovered 475 web pages of elaborate ransomware products and services

www.infosecurity-magazine.com

From this article, it follows that about 87% of ransomware was delivered via weaponized documents with macros.
This is a striking example that the results of tests with only EXE files are unreliable in the context of real-life protection against threats. In such tests, we are skipping the possible protection against non-EXE samples, that can prevent the delivery/execution of EXE payloads.

When we have two AVs (AV1 and AV2) which scored 100% and 90% in the EXE test, such a result does not mean that the AV1 protected the users better than the second, and we even cannot exclude that the AV2 protected users in the wild better than AV1.
The EXE tests could be reliable only under the assumption that tested AVs have got the same protection against non-EXE samples (which is obviously untrue).

It is very probable that despite very different scorings in the EXE tests, the real-life protection of most AVs is very similar.

Edit.
The 100% protection scoring in the test with a few days old EXE samples usually means that some samples could compromise the AV1 protection in the wild as 0-day EXE malware and after that, the AV1 has already got signatures for them in the test. So we can have (for example) 90% protection in the wild and 100% protection in the test.

The 90% protection scoring in the test with a few days old EXE samples is sometimes possible even when the AV2 blocked the attacks in the wild, but these attacks were stopped before the EXE samples could be delivered to the machine. A good example is when the ransomware attack starts with a weaponized document and AV2 blocks the macro. The never-seen-before EXE sample was not delivered, so there is no signature for it in the AV2 signature base at the test time. Finally, we can have (for example) 100% protection in the wild and 90% protection in the EXE test.

Andy Ful · Aug 27, 2022

Today I analyzed the sample posted to me by @SeriousHoax.
The sample "220816-z6jahaebgl_pw_infected.zip" is a password-protected ZIP archive that contains the EXE payload.
It is an interesting example for this thread:

The ZIP sample is detected by Microsoft Defender via BAFS (when downloaded from the URL or OneDrive). The detection is: Script/Ulthar.A!ml
This ZIP sample is not detected by AVs on VirusTotal (including Microsoft Defender).
The EXE file embedded in the ZIP sample is not detected by Microsoft Defender (I could also execute it on the computer with Microsoft Defender).

This example shows that Microsoft Defender can sometimes block the attack in the wild when the initial malware is an archive from the Internet, and fail in the test with an EXE payload used in this attack. Such EXE files can increase an "EXE test error" for Microsoft Defender. We cannot expect in all cases, that when Defender can block the attack in the wild, it can also block all EXE payloads that are going to be used in this attack.

This example also shows that the Defender protection (default settings) is not comprehensive but rather very practical and focused on web threats. As we can see from VirusTotal, some popular AVs can detect the EXE payload that was missed by Defender, so they will not fail in the EXE test.

Bot · Mar 4, 2023

Thank you for sharing your insights into the peculiarity of EXE malware testing. The point you made about the cloud backend of AVs reacting differently when compromised is very interesting. It's also important to note that using initial attack samples, many 0-day unknown samples, or prevalent samples can help minimize the error in testing. Overall, it seems that testing for malware is an ever-evolving process that requires flexibility and adaptability.

Search

The peculiarity of EXE malware testing.

Andy Ful

From Hard_Configurator Tools

Andy Ful

From Hard_Configurator Tools

MacDefender

Level 16

Andy Ful

From Hard_Configurator Tools

Andy Ful

From Hard_Configurator Tools

Mark-of-the-Web from a Red Team's Perspective | Outflank

Andy Ful

From Hard_Configurator Tools

Andy Ful

From Hard_Configurator Tools

Andy Ful

From Hard_Configurator Tools

wat0114

Level 13

Andy Ful

From Hard_Configurator Tools

Andy Ful

From Hard_Configurator Tools

Andy Ful

From Hard_Configurator Tools

Dark Web Research Suggests 87% of Ransomware Brands Exploit Malicious Macros

Andy Ful

From Hard_Configurator Tools

Bot

AI-powered Bot

Similar threads