Do you really understand AV test results?

Andy Ful · Dec 10, 2017

shmu26 said:
Hi @Andy Ful , this looks like a very interesting post, maybe I am tired or something, but could you please summarize the main point? If you could give us a short explanation about why a lower detection result might actually mean better protection?

Summary:

Many 0-day undetected samples can be valued incorrectly in standard tests, when AV uses AI to create postinfection malware signatures. This incorrect valuation underestimates protection of AV (with AI).
The error from point 1. can be important, and if it is sufficiently big, then the AV (with AI) can get worse 0-day detection than some without AI, but anyway can give the better 0-day protection. It is like measuring the height of people by looking at their heads, when ignoring that some of them can wear heels and some can be barefoot.
The above can have an impact also on the tests with one-month samples, because one-month results for AVs are very close one to another.

This effect can be important if AI can create postinfection signatures in some minutes, and do not miss many malware samples. Sadly, we do not know for sure that it is true for the real AIs, because the special kind of tests is required (for example the repeated test). Such tests are hard to be done.

509322 · Dec 10, 2017

Windows_Security said:
Just looking at AV-Test: October results for Windows Defender is 96.3% (1) and for Bitdefender rated top by AV-Test is 100% (2) When the statistical relevance of 3 to 4 percent protection is that low (near zero), why bother to install a third party Anti-virus? Just keep Windows Defender.

Behind the test results the industry average is showed by AV-Test. When AV-Test info is true that the zero day malware protection average is indeed 99%, why would market research show an annual growth of 12-15% on security spending by companies (3)

1: Link to Windows Defender test results: AV-TEST – The Independent IT-Security Institute
2: Link to Bitdefender test results: AV-TEST – The Independent IT-Security Institute
3. Link to Market forecast cyber security: The Cybersecurity Market Report covers the business of cybersecurity, including market sizing and industry forecasts, spending, notable M&A and IPO activity, and more.)

The problem is that the AV test labs test using out of date or obsolete (variant) samples. Like I said in my post "Windows Defender is great !" just shows people don't understand the AV testing.

The full complement of Microsoft Enterprise protection is sufficient, but Windows Defender all by itself is not. If you throw a bunch of new stuff at Defender, it falls apart.

Average Joe sitting in front of a workstation is the reason for 54+% of all corporate and government agency infections. That's the reason for increased IT spending. Anyone that relies solely on Windows Defender in a higher risk environment invariably ends up with infected systems.

Ask people why they spend money on AVs - and the reasons span from rational to the absurd to the I can't be bothered to manage my security to I don't have the knowledge to I don't understand IT security to the I want a security soft to tell me what to do.

Like I said initially, it all comes down to understanding those 4 criteria I listed.

shmu26 · Dec 10, 2017

Andy Ful said:
Summary:

Many 0-day undetected samples can be valued incorrectly in standard tests, when AV uses AI to create postinfection malware signatures. This incorrect valuation underestimates protection of AV (with AI).

The error from point 1. can be important, and if it is sufficiently big, then the AV (with AI) can get worse 0-day detection than some without AI, but anyway can give the better 0-day protection. It is like measuring the height of people by looking at their heads, when ignoring that some of them can wear heels and some can be barefoot.

The above can have an impact also on the tests with one-month samples, because one-month results for AVs are very close one to another.

This effect can be important if AI can create postinfection signatures in some minutes, and do not miss many malware samples. Sadly, we do not know for sure that it is true for the real AIs, because the special kind of tests is required (for example the repeated test). Such tests are hard to be done.

Thanks. I think I got it now:
the first unfortunate user, the one who got infected, actually saves everyone else. Because there is POST-infection Ai detection, and a cloud data base to push it in real-time to everyone using that AV. Please correct any mistakes here.

Which major AVs have post infection Ai detection? Or should I ask which ones do not have?

509322 · Dec 10, 2017

shmu26 said:
Thanks. I think I got it now:
the first unfortunate user, the one who got infected, actually saves everyone else. Because there is POST-infection Ai detection, and a cloud data base to push it in real-time to everyone using that AV. Please correct any mistakes here.

Which major AVs have post infection Ai detection? Or should I ask which ones do not have?

Ai\Machine Learning is (signature) detection by another name.

Andy Ful · Dec 10, 2017

Maybe I should give an example of sample valuation in the repeated test.
Let's check 0-hour samples every 10 minutes, as follows:

In the first round, we check all samples. All detected samples are valuated as 1, and thrown out from the pool.
We make another checking after ten minutes. Some samples can be detected, and they are valuated as 0.8 , and thrown out of the pool.
We continue checking, valuating, and throwing. The detected samples are valuated as 0.2 less than in the previous round.
The last checking is done after an hour (for 0-hour samples). Some samples can be detected, and they are valuated as 0.2 , and thrown out of the pool. The rest is valued as 0.

In the standard (one round) test, all the samples that were not detected in the first round are valued as 0 (there are no more rounds). So they have underestimated values (except the last round). In the repeated test, the samples having values less 1 but greater than 0, can be called partially detected.
The above test, would give a good estimation of partially-detected 0-hour samples, if AI could create postinfection signatures in about 20 minutes (on average).

509322 · Dec 10, 2017

Andy Ful said:
Maybe I should give an example of sample valuation in the repeated test.
Let's check 0-hour samples every 10 minutes, as follows:

In the first round, we check all samples. All detected samples are valuated as 1, and thrown out from the pool.

We make another checking after ten minutes. Some samples can be detected, and they are valuated as 0.8 , and thrown out of the pool.

We continue checking, valuating, and throwing. The detected samples are valuated as 0.2 less than in the previous round.

The last checking is done after an hour (for 0-hour samples). Some samples can be detected, and they are valuated as 0.2 , and thrown out of the pool. The rest is valued as 0.

In the standard (one round) test, all the samples that were not detected in the first round are valued as 0 (there are no more rounds). So they have underestimated values (except the last round). In the repeated test, the samples having values less 1 but greater than 0, can be called partially detected.
The above test, would give a good estimation of partially-detected 0-hour samples, if AI could create postinfection signatures in about 20 minutes (on average).

That's all fine and dandy, but AV lab tests don't work like this. And they certainly don't report their results like this. In fact, they go well out of their way to avoid any kind of grey interpretation of their test results. The only one who makes their test results overly complicated to interpret is AvLab.pl. In fact, I would argue that AV-C, AV-Test, etc make their test results too simplistic - essentially all graph-based - or I should say that they do not emphasize enough that their pretty pictures are absolutely meaningless without the readers' full comprehension of the notes and methodology - which they do a poor job of explaining in their documentation. I have sat and watched an average Joe read AV lab supplementary notes and look up and be like,... "huh, errrr... what ?."

The labs do not explain anything in simple, easy to understand terms and examples that a layperson outside the industry can readily grasp.

Andy Ful · Dec 10, 2017

shmu26 said:
Thanks. I think I got it now:
the first unfortunate user, the one who got infected, actually saves everyone else. Because there is POST-infection Ai detection, and a cloud data base to push it in real-time to everyone using that AV. Please correct any mistakes here.

Which major AVs have post infection Ai detection? Or should I ask which ones do not have?

Yes, that is so. The postinfection signatures are made by Defender for all malware types. BitDefender can do it for the ransomware (that I am sure of). I did not make research about other AVs. But, one should remember that using AI is not equal to creating by AI postinfection signatures. In my first post, I gave an example of Avast CyberCapture module to show the difference.

Rebsat · Dec 10, 2017

Andy Ful said:
MalwareTips tests for Avast (from some last months) show that it has actually a very strong 0-day protection for EXE files (CyberCapture module), but not so strong protection against scripts and scriptlets.

Have you found a way or tweaks settings to strengthen Avast protection against scripts and scriptlets? Thanks bro

Andy Ful · Dec 10, 2017

Lockdown said:
That's all fine and dandy, but AV lab tests don't work like this.

Yes. And I think that they will not in the near future. That is the problem. Such test will be very hard to be done.
But, the point is that, the standard tests do not measure the real protection (especially 0-hour, 0-day), by ignoring some important AI capabilities.

509322 · Dec 10, 2017

Andy Ful said:
Yes. And I think that they will not in the near future. That is the problem. Such test will be very hard to be done.
But, the point is that, the standard tests do not measure the real protection (especially 0-hour, 0-day), by ignoring some important AI capabilities.

Read their test methodology. For the October AV-C test they used 316 "live URLs" of "wide-spread malicious samples." Whatever that means. To me that sounds like a bunch of days old malware. So to me, that makes any detection less than 100 % unacceptable - but that's just me.

Andy Ful · Dec 10, 2017

Rebsat said:
Have you found a way or tweaks settings to strengthen Avast protection against scripts and scriptlets? Thanks bro

Yes. Any SRP solution (AppGuard, Hard_Configurator, SSRP, SRP via GPO, SRP reg tweaks). But, this is a prevention type solution = 'do not open/execute to be safe'.
Most people do not like prevention.

509322 · Dec 10, 2017

Andy Ful said:
Most people do not like prevention.

And that is why security is basically non-existent. Default-allow so "users can use stuff." Until that paradigm changes, nothing will change.

Andy Ful · Dec 10, 2017

Lockdown said:
Read their test methodology. In some tests they are using samples collected within the prior 10 days. LOL.

So it's a joke when Windows Defender scores 96.3 %.

The postinfection error is related to 0-hour (0-day samples).
The tests that do not make a combined result '0-day' + 'Longer time interval', but randomly uses samples only from 'Longer time interval', have the lower error. It follows from the fact, that 0-day samples are rare in the pool of all samples. But, then the differences between tested AVs are also lower. So it is hard to differentiate between AVs, when looking only at the test results. I wrote about this effect in my first post (one-month tests).

Lockdown said:
So it's a joke when Windows Defender scores 96.3 %.

I do not know what test you have in mind. But anyway, it is possible to have occasionally bad results, because we do not know the samples. For example, if AI created many malware postinfection signatures in the test (significant amount in the pool), the result for AV (with AI) would be bad.

509322 · Dec 10, 2017

Andy Ful said:
The postinfection error is related to 0-hour (0-day samples).
The tests that do not make a combined result '0-day' + 'Longer time interval', but randomly uses samples only from 'Longer time interval', have the lower error. It follows from the fact, that 0-day samples are rare in the pool of all samples. But, then the differences between tested AVs are also lower. So it is hard to differentiate between AVs, when looking only at the test results. I wrote about this effect in my first post (one-month tests).

Read my revised post. Their methodology states "316 URLs pointing to wide-spread malicious samples."

Some labs use samples collected within the previous 10 days. Other labs do similar sampling. Whatever the case may be, they are using old and\or obsolete (know variant\polymorphic) samples.

So, it is not like the labs are giving a representation of true "zero-day" capabilities. If they did that, the test results would be much different - and marketing-wise probably unacceptable. For most publishers, there would be no value for the security soft publisher to pay for the AV labs to test their product, because the test result would not yield an end result that would be conducive to promote their product sales. But like I've said repeatedly, the labs are purportedly using sample sets that are "representative" of what a user might expect to encounter during real-world computing. This vague definition is where the arguments and disagreements begin.

Sunshine-boy · Dec 10, 2017

Andy Ful said:
postinfection malware signatures

Sorry if I ask, but what is this?:notworthy: I just don't know!

Also pls someone tell me which one is real zero-day:
1-The malware that Opcode created to bypass vs?
2-The ccleaner malware?
3-The samples(specially thsoe.jar files) in MH!
To me, the number 1 and 2 is a real zero-day

am I wrong?

Andy Ful · Dec 10, 2017

Lockdown said:
The problem is that the AV test labs test using out of date or obsolete (variant) samples. Like I said in my post "Windows Defender is great !" just shows people don't understand the AV testing.

The full complement of Microsoft Enterprise protection is sufficient, but Windows Defender all by itself is not. If you throw a bunch of new stuff at Defender, it falls apart.

Average Joe sitting in front of a workstation is the reason for 54+% of all corporate and government agency infections. That's the reason for increased IT spending. Anyone that relies solely on Windows Defender in a higher risk environment invariably ends up with infected systems.

Ask people why they spend money on AVs - and the reasons span from rational to the absurd to the I can't be bothered to manage my security to I don't have the knowledge to I don't understand IT security to the I want a security soft to tell me what to do.

Like I said initially, it all comes down to understanding those 4 criteria I listed.

That is true. And furthermore, for Enterprises/Companies the postinfection signatures are not so good (I wrote about this in my first post). For the high-risk environment, the prevention solutions are the best.

Sunshine-boy said:
Sorry if I ask, but what is this?:notworthy: I just don't know!

Also pls someone tell me which one is real zero-day:
1-The malware that Opcode created to bypass vs?
2-The ccleaner malware?
3-The samples(specially thsoe.jar files) in MH!
To me, the number 1 and 2 is a real zero-day am I wrong?

Some AV can use AI to create the malware signature in a real-time (in some minutes), based on telemetry from the infected computer. This will not save the infected computer, but can save other computers.
.
You should ask the one who used this phrase: 'real zero-day'.

Sunshine-boy · Dec 10, 2017

I think the zero-day is a malware that is not detected by any sig but the bb or any other advanced detection method may detect it, right?
So those samples in MH are not zero-day because most of them have detection rate in Vt! but the opcode malware is different! I want to know which av can block opcode hacks lol.

Andy Ful said:
but can save other computers

Thanks, I didn't know that!

Andy Ful · Dec 10, 2017

Lockdown said:
Read my revised post. Their methodology states "316 URLs pointing to wide-spread malicious samples."
.

I could only find this:
"The results are based on the test set of 316 live test cases (malicious URLs found in the field), consisting of working exploits (i.e. drive-by downloads) and URLs pointing directly to malware. Thus exactly the same infection vectors are used as a typical user would experience in everyday life."
https://www.av-comparatives.org/wp-content/uploads/2017/11/avc_factsheet2017_10.pdf
So they rather say, that this can include 0-hour and 0-day samples. But, the result for Defender in the above test was 99.1% (not 96.3 %), so maybe we talk about different tests.
.
But, please do not make me force to defend the Defender. I used BitDefender and Defender as examples. The true AI capabilities were not tested, and may have bugs (very possible for Defender). The main idea of the thread was that standard tests are missing the AI capabilities related to postinfection signatures, and this was stated very clearly.
So, let's stop talking about Defender. There is no test proof, that it is a very good AV, and there is no proof that it cannot be a very good AV. I would be cautious to believe that it is one of the best AVs.

shmu26 · Dec 10, 2017

So the upshot is that WD is better than the tests make it sound, but we don't know how much better, and reliable protection against zero-days can only be achieved by a default/deny strategy such as SRP or anti-exe.

In MT malware hub testing, which uses fresh samples, the outstanding exception to the above rule is Kaspersky. It seems to come out clean almost every time. That always puzzled me. Maybe it is because some testing is done with Trusted Applications Mode enabled, or other similar tweaks that make it function in a quasi-default/deny mode?

Andy Ful · Dec 10, 2017

shmu26 said:
So the upshot is that WD is better than the tests make it sound, but we don't know how much better, and reliable protection against zero-days can only be achieved by a default/deny strategy such as SRP or anti-exe.

Yes, that is the point. And Defender is only an example of the problem.
.
In MT malware hub testing, which uses fresh samples, the outstanding exception to the above rule is Kaspersky. It seems to come out clean almost every time. That always puzzled me. Maybe it is because some testing is done with Trusted Applications Mode enabled, or other similar tweaks that make it function in a quasi-default/deny mode?[/QUOTE]
Every test shows that Kaspersky is one of the best AVs.

Search

Do you really understand AV test results?

Andy Ful

From Hard_Configurator Tools

509322

shmu26

Level 85

509322

Andy Ful

From Hard_Configurator Tools

509322

Andy Ful

From Hard_Configurator Tools

Rebsat

Level 6

Andy Ful

From Hard_Configurator Tools

509322

Andy Ful

From Hard_Configurator Tools

509322

Andy Ful

From Hard_Configurator Tools

509322

Sunshine-boy

Level 29

Andy Ful

From Hard_Configurator Tools

Sunshine-boy

Level 29

Andy Ful

From Hard_Configurator Tools

shmu26

Level 85

Andy Ful

From Hard_Configurator Tools

Similar threads