A Brief Critique of Professional AV Tests

Shadowra · Apr 29, 2023

Finally someone who talks about it and it is our @cruelsister !
Thanks

Indeed, I give NO credit to AV-C or AVTests. Using samples when many may contain anti-vm or simply running unobtrusively may alter the test.
Some even used old malware packs because publishers can pay them to be ranked 1st.

ForgottenSeer 97327 · Apr 29, 2023

The big ticket in security is earned in the corporate market. In the corporate market the bottom line counts, not the test results, not the numbers of malware samples blocked, but keeping mission critical systems running. That is why we will see more security vendors offering an insurance (paying a penalty to the client when they get infected). Those security companies have to assess their risk of having to pay money to their customers also. This is the ratio behind security vendors paying professional labs for testing their products.

A number one rank with bad protection level will generate a lot of "infection claims" with matching damage to the financial position of that security company, their reputation and ultimately the trust investors/people have in the stock value of that company. See what happened to First Republican and Credit Suisse. So all those conspiracy theories of professional tests being manipulated are more an urban legend ("broodje Aap" or a monkey sandwich in Dutch), fed by the fuzzy/untransparant way those professional labs gather their test data.

So not agreeing with @Shadowra, but I am with @cruelsister. She addresses a valid point. The professional labs should be more transparant in the way they validate and select their samples. AVLab.pl is a positive exception and provides insights in their test set build-up.

Trident · Apr 29, 2023

The samples containing anti-vm would alter any test, not just the AVC or AV-Test samples. Evasion of VM can be controlled and reduced down to minimum, the points malware checks to determine whether it’s running on a real system have been discussed many times in many different places.
For example Joe Sandbox, Anyrun, Hybrid Analysis/Falcon Sandbox, the CheckPoint emulation are all highly resistant to evasion.

There are other, deeper issues with these tests that Cruelsister discussed in the notepad files already.
This coin has multiple sides though.

On one side we have:

Small malware scope. Many vendors claim they block about 400 new malware samples per second. 12K samples for a whole month in this case are nothing.
Potentially badly curated malware, not enough studied, not enough understood. I personally know how easy it is to go down a rabbit hole with one sample or group of samples and you can devote few days on that. It is impossible to research in-depth 12K samples.
Everyone produces high, sometimes inconsistent scores so in the process of choosing a product, the test becomes useless. I mean AV-Test continuously rates over 10 products as “TOP” with 6 on protection, 6 on performance and 6 on usability. Who do you go for in that case.
There are various other issues but if I keep discussing them I will write a whole book and we are not here for that.

But then there is the other side of the coin.

It’s easy to sit down and say “my javascript was not detected, my vbs script was not detected” so the antivirus is rubbish. But:

Everybody knows antiviruses miss malware. By publishing such statements one is hardly re-discovering the hot water. If antivirus could detect and prevent everything, malware creation would become unprofitable, end and together with that, antivirus development will have to be suspended too. So far we see this hasn’t happened.
How much of this malware affects home users? Is it even relevant to test Avira Antivirus Free vs malware that’s targeting business environments? There is a ton of defences more that businesses use on top of “antivirus”. You can’t expect 1 layer out of the many that is offered to home users, to block such attacks.
Antivirus software is developed not only with detection and protection in mind. There is always the performance challenge which is extreme, you only got milliseconds to analyse a file and output the verdict. You always need to make sure false positives are not produced as this robs users from certain experiences and creates enormous overhead for the developers who are not at work all day to deal with anti-malware vendors. And finally, it is the automation. You can’t produce constant messages and prompts, decisions should be automated.

All this leads to anti-malware vendors researching, balancing and compromising, and producing solutions for real people experiencing real life problems. Such solutions can’t perform well on various tests.
It’s also difficult to come up with a strategy that properly tests their abilities. The majority of people criticising antivirus software haven’t actually developed their own alternatives, have they? If anyone thinks any antivirus is rubbish because it didn’t detect their JavaScripts and VBS files, they are welcome to point us to the antivirus they/their business develops, that detects and blocks 100% of everything.

TairikuOkami · Apr 29, 2023

Now to address the main issue of the video. Where do we get that cat?

Andy Ful · Apr 29, 2023

It is worth mentioning that a valid critique that follows from the video is related to AV-Comparatives Malware Protection tests and to the part of the AV-Test tests performed on the "reference set". Those tests use the prevalent malware samples discovered in the last few weeks. Such testing cannot say much about real protection because most malware in the wild is short living. Good scorings do not show good protection in the real world. Furthermore, some AVs with poor scorings can provide good protection in the real world.

One cannot use similar arguments in relation to the Real-World tests made by AV-Comparatives and AV-Test. The methodology of current AVLab tests is similar to these tests. The Real-World tests are more reliable, but they contain a big random factor due to a small number of samples. So, one such test cannot say much about AV protection. The randomness of scorings decreases when one will take into account many tests.

Andy Ful · Apr 29, 2023

Shadowra said:
Using samples when many may contain anti-vm or simply running unobtrusively may alter the test.

Yes, this can be a problem for one test. But the percentage of samples that could detect the testing environment is small (compared to all tested samples), so the error is not big when we take into account many tests.

Harputlu · Apr 29, 2023

i think these discussions are unnecessary. home users should use a successful anti-ransomware software and a keylogger protection, along with an antivirus program.

Adrian Ścibor · Apr 29, 2023

Interesting topic... I have just already watching a Matrix trilogy once again, and I felt bored, so I switched to MalwareTips news or something and voilà! The real game is better!

Backing to the point and video with a quote from AVLab's methodology. I need to add a full description of how we chose a malware file for testing:

Methodology » AVLab Cybersecurity Foundation

Question - What does the name of the "Advanced In-The-Wild Malware Test" mean?

avlab.pl

We used malware source in the wild:

MWDB project by CERT Poland.
Malware Bazaar project by Abuse.ch.
Starting from May, we will use public URLs from urlquery.net - Automated URL scanner
Custom honeypots based on Dionaea -> but this is not a good source, because there are a lot of samples, but a lot is repeated, duplicated of SHA.
We can use additional sources, if you can help us with that, then we can include and automate to our tests. We tried, for example, with app.any.run, but they do not have the right API to get only URLs.

Generally, the whole industry used something like WildList, which is not being developed. At AMTSO we are working on replacing it with own list, but it's not easy. It requires a lot of people and often for free to build something like this. AMTSO currently has its own RTTL list, but we do not use this.

What is going on next?

1. Every URL potentially contains something that can be downloaded. We download it and check it (mal score.png)

a. a file is scanning by Linux tool for matching file type and duplication of SHA256 in database (if duplicate, it is rejected and starting new queue)
b. a file is scanning by some Yara rules:

Rules included in packer_compiler_signatures.yar to detect broken or damaged portable executable files.
Rules included in maldocs_index.yar to detect good or bad files with macros contained in Microsoft Office.
Rules included in anti_sandboxing.yar to detect anti-vm techniques that prevent from executing in virtual environment of Windows.

c. only after all, the sample runs in the black box (Windows, without AV protection) to check potential malicious changes based on Sysmon rules and logs - as mentioned by the author of the thread.

If the potential file is "good" based on point C, a parameter from the URL is passed to the browser to all machines with security products installed. From this point onwards, the malware is analyzed at the same time and the response of the security software is checked.

In addition, after the May edition, we'll publish an external CSV with 3rd party scanner opinion about malware. This is implemented and it's not a secret. The technology provider is Arcabit/MKS_VIR from Poland. We do not test it, so I do not see a conflict of interest here. We will add these data to our changelog soon.

Hopefully, this will further exclude usage of potentially useless, non-malware samples in AVLab tests (Advanced In The Wild Malware Test).

#####
If there is a willingness on your part to be interested, I can make a video for you of how it all works in turn from the inside.

Andy Ful · Apr 29, 2023

Harputlu said:
... home users should use a successful anti-ransomware software and a keylogger protection, along with an antivirus program.

It is not bad advice ...

but unnecessary here.

Shadowra · Apr 29, 2023

Adrian Ścibor said:
Interesting topic... I have just already watching a Matrix trilogy once again, and I felt bored, so I switched to MalwareTips news or something and voilà! The real game is better!

Backing to the point and video with a quote from AVLab's methodology. I need to add a full description of how we chose a malware file for testing:

Methodology » AVLab Cybersecurity Foundation

Question - What does the name of the "Advanced In-The-Wild Malware Test" mean?

avlab.pl

We used malware source in the wild:

MWDB project by CERT Poland.

Malware Bazaar project by Abuse.ch.

Starting from May, we will use public URLs from urlquery.net - Automated URL scanner

Custom honeypots based on Dionaea -> but this is not a good source, because there are a lot of samples, but a lot is repeated, duplicated of SHA.

We can use additional sources, if you can help us with that, then we can include and automate to our tests. We tried, for example, with app.any.run, but they do not have the right API to get only URLs.

Generally, the whole industry used something like WildList, which is not being developed. At AMTSO we are working on replacing it with own list, but it's not easy. It requires a lot of people and often for free to build something like this. AMTSO currently has its own RTTL list, but we do not use this.

What is going on next?

1. Every URL potentially contains something that can be downloaded. We download it and check it (mal score.png)

a. a file is scanning by Linux tool for matching file type and duplication of SHA256 in database (if duplicate, it is rejected and starting new queue)
b. a file is scanning by some Yara rules:

Rules included in packer_compiler_signatures.yar to detect broken or damaged portable executable files.

Rules included in maldocs_index.yar to detect good or bad files with macros contained in Microsoft Office.

Rules included in anti_sandboxing.yar to detect anti-vm techniques that prevent from executing in virtual environment of Windows.

c. only after all, the sample runs in the black box (Windows, without AV protection) to check potential malicious changes based on Sysmon rules and logs - as mentioned by the author of the thread.

If the potential file is "good" based on point C, a parameter from the URL is passed to the browser to all machines with security products installed. From this point onwards, the malware is analyzed at the same time and the response of the security software is checked.

In addition, after the May edition, we'll publish an external CSV with 3rd party scanner opinion about malware. This is implemented and it's not a secret. The technology provider is Arcabit/MKS_VIR from Poland. We do not test it, so I do not see a conflict of interest here. We will add these data to our changelog soon.

Hopefully, this will further exclude usage of potentially useless, non-malware samples in AVLab tests (Advanced In The Wild Malware Test).

#####
If there is a willingness on your part to be interested, I can make a video for you of how it all works in turn from the inside.

Andrian, don't be targeted by my remark, I admire your work and the only testing company I trust

(and that completes my tests)

Trident · Apr 29, 2023

Andy Ful said:
Yes, this can be a problem for one test. But the percentage of samples that could detect the testing environment is small (compared to all tested samples), so the error is not big when we take into account many tests.

If they are focused on testing executables, many, if not all contain logics that will inspect different parts of the OS and hardware, and will suspend delivery of the real malicious payload.
These logics include (here also a book can be written), but are not limited to:

Checking recently opened documents, browser history, browser tabs.
Checking programs list to determine is it minimal, clean installation or is it a real production system.
Checking system parameters such as firmware name, CPU temperature, RAM and free disk space size, and others.
Using listeners to determine is there keyboard/mouse activity. You’ve just opened the malicious sample, you should still be on the computer and using it.
Performing malicious actions when document is scrolled to the last page.
Requiring buttons to be clicked.
Checking debug privileges.
Playing with time, delaying execution, getting the system installation time/date and others.

And many, many, many more, most of which can somehow be controlled but of course, malware authors are creative. There are many projects online that have researched VM/emulation (which also uses VM + logics to speed up the execution so more can be observed). Many of them (still not all) are discussed here:

Evasion techniques

evasions.checkpoint.com

I haven’t seen more comprehensive guide on how to control evasion of VM than this one.

cruelsister · Apr 29, 2023

When VM evasion is spoken of, many infer that the malware will "break out" of the box when in actuality they just won't initiate and will be treated as junk in testing.

Trident · Apr 29, 2023

cruelsister said:
When VM evasion is spoken of, many infer that the malware will "break out" of the box when in actuality they just won't initiate and will be treated as junk in testing.

Yeah but the “breaking out” is not under the evasion category generally, it falls under the exploits category -> Virtualisation Escape.

Evasion always refers to tactics that camouflage the malicious behaviour/payload, so various analysis as well as researchers won’t notice it.

They shouldn’t really be confused.

Andy Ful · Apr 29, 2023

Trident said:
There are many projects online that have researched VM/emulation (which also uses VM + logics to speed up the execution so more can be observed). Many of them (still not all) are discussed here:

Evasion techniques

Evasion techniques

evasions.checkpoint.com

I haven’t seen more comprehensive guide on how to control evasion of VM than this one.

Nice source about evasion techniques and possible countermeasures.

Trident · Apr 29, 2023

Andy Ful said:
Nice source about evasion techniques and possible countermeasures.

Hands down the best.
They have one more but it’s more for malware/forensic analysts. It explores anti-debug tricks.

Anti-Debug Tricks

anti-debug.checkpoint.com

A Brief Critique of Professional AV Tests

Level 42

Level 34

ForgottenSeer 97327

Level 28

Level 36

Attachments

From Hard_Configurator Tools

From Hard_Configurator Tools

Level 5

From AVLab.pl

Attachments

From Hard_Configurator Tools

Level 34

Level 28

Level 42

Level 28

From Hard_Configurator Tools

Level 28

Similar threads