Forums
New posts
Search forums
News
Security News
Technology News
Giveaways
Giveaways, Promotions and Contests
Discounts & Deals
Reviews
Users Reviews
Video Reviews
Support
Windows Malware Removal Help & Support
Inactive Support Threads
Mac Malware Removal Help & Support
Mobile Malware Removal Help & Support
Blog
Log in
Register
What's new
Search
Search titles only
By:
Search titles only
By:
Reply to thread
Menu
Install the app
Install
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Forums
Security
Security Statistics and Reports
Randomness in the AV Labs testing.
Message
<blockquote data-quote="Andy Ful" data-source="post: 1123461" data-attributes="member: 32260"><p>The OP has been updated.</p><p></p><p>The probability of finding x=0, 1, 2, 3, ... undetected malware was calculated in the OP:</p><p><strong>p( x ) = B( m - k , n - x ) * B( k , x ) / B( m , n )</strong></p><p>where B( p , q ) is binomial coefficient.</p><p></p><p><img src="https://malwaretips.com/attachments/1744978145533-png.288143/" alt="1744978145533.png" class="fr-fic fr-dii fr-draggable " style="width: 264px" /></p><p></p><p>I noticed (by numerical experiments) that for sufficiently large numbers of samples in the wild ( <strong>m >> k , n</strong> ) and a small number of missed samples ( <strong>x << n</strong> ), the function <strong>p(x)</strong> depends on the infection rate (<strong> r = k/m</strong> ) and the number of tested samples ( <strong>n</strong> ). Now we can use the probabilistic approximate formula:</p><p></p><p><strong>p( x ) ~ B( n , x ) * r ^ x * (1 - r ) ^ ( n - x )</strong></p><p></p><p>[ATTACH=full]288198[/ATTACH]</p><p></p><p>So, increasing the number of in-the-wild samples does not change significantly the probabilities if the infection rate <strong>k/m</strong> does not change and <strong>m</strong> is big enough.</p><p>We do not know how exactly the AV Labs choose the malware samples. But most probably, they choose the test samples from large feeds (over 300,000 suspicious and malicious threats per day) and eventually remove some morphed samples of the same malware. If so, the approximate formula for p(x) is very accurate.</p><p>The example of the malware feed:</p><p>[URL unfurl="true"]https://www.mrg-effitas.com/services/threat-feeds-malware/[/URL]</p><p></p><p>If we know the average infection rate of top AVs, the formula for p(x) can be used to determine if a particular AV can be awarded in a test (as a top AV) or not. For example, the missing samples threshold can be calculated as:</p><p><span style="font-size: 18px">p(<span style="color: rgb(0, 168, 133)"><strong>t</strong></span>) < 0.05</span></p><p><strong>It means that missing <span style="font-size: 18px"><span style="color: rgb(0, 168, 133)">t</span></span> samples disqualifies the particular AV result from the top award, because such a result can happen for a top AV due to pure accident with chances less than 5%.</strong></p><p></p><p></p><p>Edit.</p><p>Corrected a typo error in the formula for p(x).</p></blockquote><p></p>
[QUOTE="Andy Ful, post: 1123461, member: 32260"] The OP has been updated. The probability of finding x=0, 1, 2, 3, ... undetected malware was calculated in the OP: [B]p( x ) = B( m - k , n - x ) * B( k , x ) / B( m , n )[/B] where B( p , q ) is binomial coefficient. [IMG width="264px" alt="1744978145533.png"]https://malwaretips.com/attachments/1744978145533-png.288143/[/IMG] I noticed (by numerical experiments) that for sufficiently large numbers of samples in the wild ( [B]m >> k , n[/B] ) and a small number of missed samples ( [B]x << n[/B] ), the function [B]p(x)[/B] depends on the infection rate ([B] r = k/m[/B] ) and the number of tested samples ( [B]n[/B] ). Now we can use the probabilistic approximate formula: [B]p( x ) ~ B( n , x ) * r ^ x * (1 - r ) ^ ( n - x )[/B] [ATTACH type="full" width="271px" alt="1745315397010.png"]288198[/ATTACH] So, increasing the number of in-the-wild samples does not change significantly the probabilities if the infection rate [B]k/m[/B] does not change and [B]m[/B] is big enough. We do not know how exactly the AV Labs choose the malware samples. But most probably, they choose the test samples from large feeds (over 300,000 suspicious and malicious threats per day) and eventually remove some morphed samples of the same malware. If so, the approximate formula for p(x) is very accurate. The example of the malware feed: [URL unfurl="true"]https://www.mrg-effitas.com/services/threat-feeds-malware/[/URL] If we know the average infection rate of top AVs, the formula for p(x) can be used to determine if a particular AV can be awarded in a test (as a top AV) or not. For example, the missing samples threshold can be calculated as: [SIZE=5]p([COLOR=rgb(0, 168, 133)][B]t[/B][/COLOR]) < 0.05[/SIZE] [B]It means that missing [SIZE=5][COLOR=rgb(0, 168, 133)]t[/COLOR][/SIZE] samples disqualifies the particular AV result from the top award, because such a result can happen for a top AV due to pure accident with chances less than 5%.[/B] Edit. Corrected a typo error in the formula for p(x). [/QUOTE]
Insert quotes…
Verification
Post reply
Top