Machine Learning for Cyber Security – Static Detection of Malicious PE Files

Andy Ful

Level 79
Thread author
Top poster
Dec 23, 2014
This article is from the year 2019 but still worth recalling. It contains some well known information about factors that are important to static malware detection. Here are some interesting fragments:

"PE Imports
A PE can import code from other PEs. To do so, it specifies the PE file name and the functions to import. It is important to analyze the imports to get a coherent image of what the PE is doing. Some of the imported functions are indicative of potential malicious operations such as crypto APIs used for unpacking/encryption or APIs used for anti-debugging.Some example of potential malicious imports:

Import NamesPotential Malicious Usage
KERNEL32.DLL!MapViewOfFileCode Injection
KERNEL32.DLL!GetThreadContextCode Injection
KERNEL32.DLL!ReadProcessMemoryCode Injection
KERNEL32.DLL!ResumeThreadCode Injection
KERNEL32.DLL!ResumeThreadCode Injection
KERNEL32.DLL!WriteProcessMemoryCode Injection
USER32.DLL!SetWindowsHookExWAPI Hooking
KERNEL32.DLL!MapViewOfFileCode Injection
KERNEL32.DLL!CreateToolhelp32SnapshotProcess Enumeration
ADVAPI32.DLL!OpenThreadTokenToken Manipulation
ADVAPI32.DLL!DuplicateTokenExToken Manipulation

All these features enable us to learn about the new PE before it is executed or loaded, and therefore before it affects the system.



From these results, we can conclude that the most useful feature for distinguishing between benign PE files and malicious PE files is the maximum entropy of all the PE section entropies. This observation fits with our assumptions that high entropy is not common with benign PE files. In addition, it seems that there is great importance to the signature status of the file. Namely, if the PE file is not signed or it is signed with an unverified signature there is a very high probability that it is a malicious PE file.

The next most important features are related to section names and permissions. Malware often uses packing techniques to avoid being detected by antivirus signatures. This results in nonstandard sections names and write permissions.

We also notice that the categories of the suspicious import had an impact on the model accuracy. In these features, we grouped different suspicious API functions by categories such as evasion, encryption, remote allocation etc. In each group, there can be several functions from different DLLs. This allowed us to learn the malicious activity without overfitting to specific functions."

Full article:
Last edited: