Reply to thread

Message: <blockquote data-quote="WiseVector" data-source="post: 916387" data-attributes="member: 76851">Sorry for the late reply.I think the most important things in Machine Learning are how deeply you can parse a file, the train set you selected, the features you extracted.Algorithms and ideas are secondary.Take PE files for example, there are so many compilers(VB, .Net, Delphi, VC), packers(UPX, VMP, ASPACK) and installers(NSIS, SFX, Inno). The ML model accuracy depends on how deeply you parse these files. On the other hand, it is fundamentally impossible for machine learning to avoid FPs. Suppose we have two files. One call UrlDownloadFile to download a file from microsoft and the execute it. The other one will download malware from a malicious website and execute it. The pseudocode is:File one:UrlDownloadToFile (hxxps://www.microsoft.com/xx.exe, good.exe)shellexecute(good.exe)File two:UrlDownloadToFile (hxxps://www.xxx.com/xx.exe, good.exe)shellexecute(good.exe)As you can see, there are minor differences between the two files. If you can parse the file deeply enough, the AI will eventually realize that file two has a bigger threat level than file one. But if you do that you will find it will have a bigger performance impact. So that's why ML engines often have more FPs than signatures based engines.We always keep improving the ability to parse a file to reduce FPs. WV is nearly three years old and during this time we have received a number of FP files from users. These files are great for us to reduce FPs. If you can parse a file very well and have a good data set, you can do anything you want. For example, identifying malware by training legit files, or identifying legit files by training malware.We have come to realize that AI based static scanning has too many limitations. So we spent a lot of time to develop AI based events analysis and AI based memory  scanning. Finally malware will perform its malicious behavior or decrypt its payload in memory.</blockquote>

Verification

Top