Andy Ful
From Hard_Configurator Tools
Thread author
Verified
Honorary Member
Top Poster
Developer
Well-known
- Dec 23, 2014
- 8,118
Many people think so. I have also thought, that it is a simple thing to understand. But ..., I was wrong.
I realized, that it is possible that AV with a lower detection result (static + dynamic), can give a better protection than another AV with a higher detection result.
It sounds like a kind of paradox, but please read first the post, and then you can decide for yourself.
.
The possibility of using Artificial Intelligence (AI) locally and in the cloud (extended AI), made AV testing more complicated, than a few years ago. Artificial Intelligence can recognize the suspicious behavior on the infected computer and create the malware (postinfection) signature. The more signatures it creates, the smarter it is, and the better is AI behavior monitoring. So, the better signatures will be created in a shorter time. For example, BitDefender 2017 Anti-Ransomware module, can create in the cloud the postinfection signature up to 30 minutes. Next, it takes approximately 3 hours to propagate it on all endpoints not connected to the cloud via product update. So, in the BitDefender 2017 AV cloud, the 0-day ransomware in the wild will live only up to 30 minutes (thanks to postinfection signature), as compared to several hours in the standard AV clouds.
With the above scenario (AI included), the fact of infecting before the test one of the computers connected to the AV cloud, can have a substantial influence on the AV malware detection afterward. It follows from the fact, that postinfection signatures can be created by AI in the cloud in some minutes.
.
The conclusion is that actual real-world protection 0-hour (0-day) test, should:
In repeated tests, the malware samples that bypassed AV protection, are checked again after some minutes. This procedure should be repeated for all not-detected samples. For some samples, several repetitions will be required up to one hour (for 0-hour tests). Those samples that could bypass AV protection in all repetitions are counted as undetected (valued as 0), otherwise they are counted as partially detected (valued as something between 0 and 1).
The malware detection will be valued slightly less than 1, when it is detected just a few minutes after the first test.
.
Why some AVs are scoring high in tests.
BitDefender example for 0-hour real-world malware test.
In this case, the result of 0-hour protection standard test will be excellent.
Also, from the fact that BitDefender has decent standard signatures, it follows that the real-world test samples, that happened to be new for BitDefender, are very rare (if any).
.
Microsoft Defender example for 0-hour malware test.
The point 4. is the key factor of the bad opinion about Defender. If Defender could block 99.99% of malware samples then AI should create the postinfection signatures very rarely (0.01%). As we know this is not the case, and Defender cannot block malware so efficiently in known tests. That is why postinfection signatures are important for Defender.
.
The detection results (static + dynamic) will be lower, for Defender cloud than in reality, because the results for samples from point 4. are not properly valued in the standard tests. They are always counted as undetected, but they should be counted as partially detected, because some minutes later that malware will be mostly stopped in Defender Cloud (AI creates postinfection signature).
I used the phrase '(probably) excellent', because the protection related to Defender AI (locally and in the cloud) was not directly measured (even when in theory it should work excellently). On the other side, the analogical Anti-Ransomware protection in BitDefender 2017 works excellently, so '(probably) excellent' for Defender is not just a stupid assumption. One should also bear in mind, that Defender can use Windows massive telemetry about the system events, and Microsoft has enormous resources to make Defender AI the best.
.
The 0-hour malware files have an important influence on 0-day detection scores, because the rate of infection is highest in the first few hours after the malware shows itself in the wild (about 30% infections happen in the first 4 hours).
Furthermore, when you read the malware test reports, you can see the substantial differences between AVs when testing the 0-day detection and the very small differences when testing the detection of malware discovered over a period of some weeks. So in fact, the better the 0-day result the better is a combined, final AV score.
Generally, AVs which use postinfection telemetry to create malware signatures on the fly, will have underestimated protection scores, especially when the shorter is the time required to create a malware signature, and worse the standard malware signatures. In real life, the worse standard signatures will be compensated by postinfection signatures created by AI.
.
So, are the AVs standard detection tests useless? I do not think so.
Let's take a Microsoft Defender example for one-month malware samples.
As an example, let's take the AV-Test one-month detection results (static + dynamic, average on a year interval October 2016 - October 2017):
BitDefender, FSecure, Kaspersky, McAffe, Norton, Trend Micro ~ 100%
Avast, Avira, Eset, Microsoft Defender . . . . . . . . . . . . . . . . . . . . ~ 99.7% - 99.9%
https://www.av-test.org/en/antivirus/home-windows/
.
As the second example, let's take the AV-Comparatives one month real-world detection results (static+dynamic, average on the interval of 4 months: July - October 2017, user dependent actions counted as infected):
Avira, BitDefender, FSecure, Kaspersky, Norton, Trend Micro ~ 99,6% -100%
Avast, Eset, McAffe, Microsoft Defender . . . . . . . . . . . . . . . . . ~ 98,9% - 99,5%
Real-World Protection Test - AV-Comparatives
.
The differences are very small, and one of the best AVs in the AV-Test scoring (McAfee) got the worse result in AV-Comparatives scoring. Also, BitDefender, FSecure, Kaspersky, Norton, and Trend Micro have excellent scores in both tests.
One should adopt other protective factors in purpose to differentiate AVs, especially based on the anti-0-day AV capabilities (like CyberCapture in Avast, 'Application Control' in Kaspersky, 'Advanced Threat Control' in BitDefender, DeepGuard in F-Secure, etc.), and other unique features.
.
It would be interesting to compare Avast cloud with Defender cloud.
In Avast 2017, the CyberCapture feature was introduced to fight the 0-day malware. If the EXE or DLL file is recognized as suspicious, then CyberCapture locks the file, and can upload it to the cloud for detailed analysis in a controlled environment. After the analysis is finished, which can take up to about 2 hours, the file is unlocked - and if malicious the signature of the malware is created on the fly, and Avast will quarantine the file. That feature gave Avast excellent detection results, as good as for the best AVs. CyberCapture is an idea to protect all users at the price of some inconvenience (file locked up to 2 hours). CyberCapture does not create postinfection signatures (the file is locked until analysis will be finished).
.
Defender in Window 10 adopted the known military solution: protect millions by sacrificing a few. The time for cloud analysis was shortened to 10 seconds (can be extended to 60s). If the file was recognized as dangerous the malware signature is created. After this, the file is unlocked and can be quarantined (infected) or executed (not infected). Such analysis is not as deep as in the case of Avast CyberCapture, so some malware files can slip through the detection procedure to infect the computer. Yet, the telemetry about suspicious system events is still continued, and analyzed by the local AI models and cloud AI extended models. When the signs of infection are recognized, then all data is reanalyzed in the cloud. If some files are recognized as malicious, then malware signatures are created and files are quarantined (the average time of creating postinfection signatures on the fly is not known). Defender AI can create both before-execution and postinfection signatures on the fly.
.
The final words.
I should underline, that the above reasoning is related to home users protection. Because of the targetted attacks and fast malware propagation in the big local networks, postinfection signatures are not so useful for Companies/Enterprises.
The second point is that I am not accusing anyone of publishing the wrong detection test results. I accept that detection results are correct within the boundaries of applied testing procedures.
One should remember anyway, that detection results for some AVs, cannot be easily transferred to protection results, especially when malware signatures can be also created on the fly by AI (in some minutes) on the basis of postinfection telemetry (Defender, BitDefender, etc.).
.
YouTube tests.
Homemade tests are not very useful, because they have many cons:
MalwareTips tests.
I like them, because they show how the concrete AV can fight the concrete malware file. All samples have available the analysis on hybrid-analysis.com and virustotal.com . From those tests, one can see the strong and weak points of AV protection. For example, MalwareTips tests for Avast (from some last months) show that it has actually a very strong 0-day protection for EXE files (CyberCapture module), but not so strong protection against scripts and scriptlets.
They also show that most of the malicious documents use embedded or external scripts (scriptlets) to download and execute payloads. The script trojan-downloaders are often highly obfuscated, but after de-obfuscation is evident that in many cases they differ only with links to malware sites. Those are only some examples, the research potential is great.
So, those tests can be much more useful, as compared to tests on AVs scoring results.
.
Please, post here about thoughts related to AVs tests.
I realized, that it is possible that AV with a lower detection result (static + dynamic), can give a better protection than another AV with a higher detection result.
It sounds like a kind of paradox, but please read first the post, and then you can decide for yourself.
.
The possibility of using Artificial Intelligence (AI) locally and in the cloud (extended AI), made AV testing more complicated, than a few years ago. Artificial Intelligence can recognize the suspicious behavior on the infected computer and create the malware (postinfection) signature. The more signatures it creates, the smarter it is, and the better is AI behavior monitoring. So, the better signatures will be created in a shorter time. For example, BitDefender 2017 Anti-Ransomware module, can create in the cloud the postinfection signature up to 30 minutes. Next, it takes approximately 3 hours to propagate it on all endpoints not connected to the cloud via product update. So, in the BitDefender 2017 AV cloud, the 0-day ransomware in the wild will live only up to 30 minutes (thanks to postinfection signature), as compared to several hours in the standard AV clouds.
With the above scenario (AI included), the fact of infecting before the test one of the computers connected to the AV cloud, can have a substantial influence on the AV malware detection afterward. It follows from the fact, that postinfection signatures can be created by AI in the cloud in some minutes.
.
The conclusion is that actual real-world protection 0-hour (0-day) test, should:
- Contain only the 'true real-world samples' that already infected every tested AV cloud (hard to be done in practice).
- Make use of the repeated tests.
In repeated tests, the malware samples that bypassed AV protection, are checked again after some minutes. This procedure should be repeated for all not-detected samples. For some samples, several repetitions will be required up to one hour (for 0-hour tests). Those samples that could bypass AV protection in all repetitions are counted as undetected (valued as 0), otherwise they are counted as partially detected (valued as something between 0 and 1).
The malware detection will be valued slightly less than 1, when it is detected just a few minutes after the first test.
.
Why some AVs are scoring high in tests.
BitDefender example for 0-hour real-world malware test.
- The non-standard signatures of 'malware caught by Bitdefender AI on customer computers' are excellent.
- The standard (before execution) Bitdefender signatures are excellent.
- The non-signature protection is excellent.
In this case, the result of 0-hour protection standard test will be excellent.
Also, from the fact that BitDefender has decent standard signatures, it follows that the real-world test samples, that happened to be new for BitDefender, are very rare (if any).
.
Microsoft Defender example for 0-hour malware test.
- The non-standard signatures of 'malware caught by Defender AI on customers computers' are (probably) excellent.
- The standard (before execution) Defender signatures are poor.
- The non-signature protection is (probably) excellent.
- The number of test samples that will require creating postinfection signatures by Defender AI can be statistically important.
The point 4. is the key factor of the bad opinion about Defender. If Defender could block 99.99% of malware samples then AI should create the postinfection signatures very rarely (0.01%). As we know this is not the case, and Defender cannot block malware so efficiently in known tests. That is why postinfection signatures are important for Defender.
.
The detection results (static + dynamic) will be lower, for Defender cloud than in reality, because the results for samples from point 4. are not properly valued in the standard tests. They are always counted as undetected, but they should be counted as partially detected, because some minutes later that malware will be mostly stopped in Defender Cloud (AI creates postinfection signature).
I used the phrase '(probably) excellent', because the protection related to Defender AI (locally and in the cloud) was not directly measured (even when in theory it should work excellently). On the other side, the analogical Anti-Ransomware protection in BitDefender 2017 works excellently, so '(probably) excellent' for Defender is not just a stupid assumption. One should also bear in mind, that Defender can use Windows massive telemetry about the system events, and Microsoft has enormous resources to make Defender AI the best.
.
The 0-hour malware files have an important influence on 0-day detection scores, because the rate of infection is highest in the first few hours after the malware shows itself in the wild (about 30% infections happen in the first 4 hours).
Furthermore, when you read the malware test reports, you can see the substantial differences between AVs when testing the 0-day detection and the very small differences when testing the detection of malware discovered over a period of some weeks. So in fact, the better the 0-day result the better is a combined, final AV score.
Generally, AVs which use postinfection telemetry to create malware signatures on the fly, will have underestimated protection scores, especially when the shorter is the time required to create a malware signature, and worse the standard malware signatures. In real life, the worse standard signatures will be compensated by postinfection signatures created by AI.
.
So, are the AVs standard detection tests useless? I do not think so.
Let's take a Microsoft Defender example for one-month malware samples.
- The non-standard signatures of 'malware caught by Defender AI on customers computers' are (probably) excellent.
- The standard (before execution) Defender signatures are good (better than 0-hour or 0-day).
- The non-signature protection is (probably) excellent.
- The number of test samples that will require creating postinfection signatures by Defender AI, is not statistically important.
- The one-month detection result should be near to excellent, because the number of malware samples that should be counted as partially detected (instead of undetected) is not statistically important now. Also, one-month standard Defender signatures are better than 0-hour or 0-day.
As an example, let's take the AV-Test one-month detection results (static + dynamic, average on a year interval October 2016 - October 2017):
BitDefender, FSecure, Kaspersky, McAffe, Norton, Trend Micro ~ 100%
Avast, Avira, Eset, Microsoft Defender . . . . . . . . . . . . . . . . . . . . ~ 99.7% - 99.9%
https://www.av-test.org/en/antivirus/home-windows/
.
As the second example, let's take the AV-Comparatives one month real-world detection results (static+dynamic, average on the interval of 4 months: July - October 2017, user dependent actions counted as infected):
Avira, BitDefender, FSecure, Kaspersky, Norton, Trend Micro ~ 99,6% -100%
Avast, Eset, McAffe, Microsoft Defender . . . . . . . . . . . . . . . . . ~ 98,9% - 99,5%
Real-World Protection Test - AV-Comparatives
.
The differences are very small, and one of the best AVs in the AV-Test scoring (McAfee) got the worse result in AV-Comparatives scoring. Also, BitDefender, FSecure, Kaspersky, Norton, and Trend Micro have excellent scores in both tests.
One should adopt other protective factors in purpose to differentiate AVs, especially based on the anti-0-day AV capabilities (like CyberCapture in Avast, 'Application Control' in Kaspersky, 'Advanced Threat Control' in BitDefender, DeepGuard in F-Secure, etc.), and other unique features.
.
It would be interesting to compare Avast cloud with Defender cloud.
In Avast 2017, the CyberCapture feature was introduced to fight the 0-day malware. If the EXE or DLL file is recognized as suspicious, then CyberCapture locks the file, and can upload it to the cloud for detailed analysis in a controlled environment. After the analysis is finished, which can take up to about 2 hours, the file is unlocked - and if malicious the signature of the malware is created on the fly, and Avast will quarantine the file. That feature gave Avast excellent detection results, as good as for the best AVs. CyberCapture is an idea to protect all users at the price of some inconvenience (file locked up to 2 hours). CyberCapture does not create postinfection signatures (the file is locked until analysis will be finished).
.
Defender in Window 10 adopted the known military solution: protect millions by sacrificing a few. The time for cloud analysis was shortened to 10 seconds (can be extended to 60s). If the file was recognized as dangerous the malware signature is created. After this, the file is unlocked and can be quarantined (infected) or executed (not infected). Such analysis is not as deep as in the case of Avast CyberCapture, so some malware files can slip through the detection procedure to infect the computer. Yet, the telemetry about suspicious system events is still continued, and analyzed by the local AI models and cloud AI extended models. When the signs of infection are recognized, then all data is reanalyzed in the cloud. If some files are recognized as malicious, then malware signatures are created and files are quarantined (the average time of creating postinfection signatures on the fly is not known). Defender AI can create both before-execution and postinfection signatures on the fly.
.
The final words.
I should underline, that the above reasoning is related to home users protection. Because of the targetted attacks and fast malware propagation in the big local networks, postinfection signatures are not so useful for Companies/Enterprises.
The second point is that I am not accusing anyone of publishing the wrong detection test results. I accept that detection results are correct within the boundaries of applied testing procedures.
One should remember anyway, that detection results for some AVs, cannot be easily transferred to protection results, especially when malware signatures can be also created on the fly by AI (in some minutes) on the basis of postinfection telemetry (Defender, BitDefender, etc.).
.
YouTube tests.
Homemade tests are not very useful, because they have many cons:
- The test may be done by AV fanboy, so only the tests for an advantage of the concrete AV may be published.
- The malware samples are not real-world. They are taken from some malware collection sites, and many of samples have no chances to infect any computer.
- The number and diversity of samples are usually too small to be statistically representative.
- Often only one AV is tested, so it is impossible to compare the results with other AVs (on the same pool of samples).
- If there is more than one AV, then they are not tested at the same time - some hours can have a substantial influence on the results.
- Sometimes, several AVs are tested one after one without refreshing the system, so in fact, they are tested on different system state.
MalwareTips tests.
I like them, because they show how the concrete AV can fight the concrete malware file. All samples have available the analysis on hybrid-analysis.com and virustotal.com . From those tests, one can see the strong and weak points of AV protection. For example, MalwareTips tests for Avast (from some last months) show that it has actually a very strong 0-day protection for EXE files (CyberCapture module), but not so strong protection against scripts and scriptlets.
They also show that most of the malicious documents use embedded or external scripts (scriptlets) to download and execute payloads. The script trojan-downloaders are often highly obfuscated, but after de-obfuscation is evident that in many cases they differ only with links to malware sites. Those are only some examples, the research potential is great.
So, those tests can be much more useful, as compared to tests on AVs scoring results.
.
Please, post here about thoughts related to AVs tests.
Last edited: