Q&A VM in malware

Discussion in 'Develop Coding Skills - Tutorials' started by Zhou He, Dec 22, 2017.

  1. Zhou He

    Zhou He Level 1

    Mar 13, 2017
    Windows 10
    Some more advanced malware and packers, all of them, they use some kind of VMs, it's even hard for me to detect that sample uses a VM, after that analysis and creating a debugger or a disassembler is even harder.

    Is there any good resource about virtualization used by malware or packers, tutorials/books or any kind of posts?
    Opcode likes this.
  2. Opcode

    Opcode Level 24
    Content Creator

    Aug 17, 2017
    I mop the floors at Lidl
    By "Virtual Machine", it isn't really the same as when you refer to it in a normal scenario. For example, the malicious process isn't executed under a "virtual environment" which is isolated from the Host environment. It basically works similar to how MSIL executables work, so I'll try to explain it as best I can to help you out.

    Let's say we have a program which does the following.
    1. Creates a variable of data-type integer and sets the value to 50.
    2. Creates another variable of data-type ULONG (unsigned long) and sets it to 0.
    3. Calls printf (C run-time function) to print the value of the first variable.
    4. Calls MessageBoxW (KERNEL32 - Win32 API function) to display an alert with the main body text as "This is a test application".
    5. Enumerates through all the running processes and for each one found, calls NtSuspendProcess (NTDLL - Native API).

    Now if we think about MSIL (Microsoft Intermediate Language), which is what the .NET Framework is based on, we know that there is a Just-In-Time compiler. This means that the managed assembly is not "natively" compiled. When the managed executable is executed in memory, the instructions are translated by the Common Language Run-Time (CLR). The CLR needs to be initialised under any process executing managed code, or it simply cannot be done (unless you have a replacement for the CLR, such as manual emulation of it). Of course though, managed assemblies will contain native code because it won't simply "know" to load the run-time libraries when it starts up, among other things - it'll still be a valid Portable Executable (PE) and follow the PE File Format (structure), but the authors code is nothing but byte-code, and the CLR will understand the byte-code so it knows to translate the byte-code into genuine instructions which the system will be able to understand. This is why you can easily reverse a managed assembly (even if its obfuscated - you can try deobfuscation), because the byte-code must be understood by the CLR and therefore it opens the opportunity for others to take advantage of this, and understand it themselves in a way that allows them to "reverse" the byte-code back to readable source code.

    The sort of virtualisation protection you're thinking of regarding packers it pretty much the same as byte-code which is manually induced from a native language, unless I've misunderstood. This means that the original code which is to be "virtualised" will be "scrambled" so it doesn't make any sense, but when the program executes, the "virtualisation engine" (or whichever title you prefer, maybe the "interpreter" or the "emulator") will understand the scrambled code so it knows to translate the scrambled code (also known as "byte-code") back to the original instructions.

    If we take the program which does the 5 steps from earlier, the packer will scramble all of it. This will leave it in a nonsense state for anyone who investigates the routine (function stub) via manual disassembly (static reverse-engineering technique). When the program executes, the scrambled code is translated into its original instructions. However, sometimes it isn't as straight forward as other times. An example of a more difficult scenario, would be where each scrambled section of code is passed in for unscrambling (emulation) as it needs to be executed. Meaning you cannot obtain all of the original instructions easily without spending lots of time.

    C#.NET, VB.NET and Java all rely on Just-In-Time compilation, because they are managed assemblies. This packing technique simply evolves around replicating what Microsoft did for the .NET Framework, but with their own byte-code rules and their own emulator in the "Virtual Machine". The "Virtual Machine" being the engine which accepts the scrambled instructions and unscrambles them so they can be understood by the system when executed - this can be done in a manner so the scrambled code in memory always remains scrambled at its location, and the "Virtual Machine" will take the scrambled code, unscramble a copy passed in from the parameter/s and execute it for the requester (and thus the original scrambled code stays scrambled, the unscrambled code prior to execution of the genuine instructions are free'd from memory, and this continues on until all the scrambled code has been successfully executed).

    This sort of packing is a lot more difficult to get around, and requires more reverse-engineering time depending on the scenario. One idea would be to reverse-engineer the "Virtual Machine" engine to understand how it works, which will allow you to decode the scrambled code (the byte-code to be understood by the Virtual Machine engine). Another idea would be to monitor memory during execution to catch the original instructions before they are free'd from memory (if they are) or see if the engine will "decrypt" the encrypted code all at once for you to steal, etc.

    I should note though, because it'd be mean of me not to, these packing mechanisms will do absolutely nothing to API Monitoring. At the end of the day, the API calls will have to be executed one way or another. The only way you can bypass API Monitoring which is occurring from user-mode is to perform system calls or unhook the hooked routines, and 99% of malware does not do either of those. Therefore, if you're really struggling due to these packing mechanisms, try to give monitoring API calls a shot and save yourself some time. It isn't embarrassing to get stuck with "Virtual Machine" packing... I think it is normal for that to happen because they do step things up far from traditional and common methods. Although, these more advanced packing methods are on the rise for usage on a regular basis, in a few years time it will be normal for it to be common instead of not-as-common as traditional stub encryption and decryption at run-time.

    Packing may mess with the Import Address Table (IAT). The IAT is basically a large table containing statically linked imports... For example, a program might have kernel32.dll linked (only native images which use NTDLL directly and no Win32 API routines like kernel32.dll can do this otherwise the process would crash) for NtTerminateProcess, and this will show up on the Import Address Table. However, packing may change it so all the statically linked routines the authors code will make use of, are turned into "dynamic imports". Dynamic imports is the process of acquiring the address to a routine during run-time and using the routine with its address, therefore it isn't statically-linked (until execution you won't know of the usage - unless of course the parameter data is leaked through string references in static analysis, or the alike). Malware authors may also fake the Import Address Table to put you off-guard and leave you looking at why various API calls are made which are never actually made, but are referenced dozens of times among junk code. To confront this, and to beat it, you can use API Monitoring and look for the usage of LdrGetProcedureAddress. The only problem though is that it isn't a necessity to use LdrGetProcedureAddress/Ex (which is called by GetProcAddress); these routines will use the Export Address Table scanner within the Windows Loader, and malware authors can simulate this activity manually to scan the Export Address Table, therefore not being dependent on Windows routines for acquiring the address. Although, unless 'manual mapping' (also known as 'reflective loading') is used for external DLLs, LdrLoadDll (NTDLL) will be triggered for module loads, so this can be beneficial to track sometimes as well.

    Therefore if you get really stuck, try out API Monitoring. As long as you can find out which routines the sample uses, you're good to go a lot of the time. It doesn't always cut it, but from my experience, API Monitoring is one of the best techniques available. If user-mode doesn't cut it, then you can patch IA32_LSTAR and intercept KiSystemCallXx for enhanced monitoring (watch out for PatchGuard on 64-bit systems).

    I did some online searching just now to see if I could find any good references about your specific question, and I did find an article published by McAfee during 2017 (author: Thomas Roccia): Malware Packers Use Tricks to Avoid Analysis, Detection | McAfee Blogs - there's a bit about virtualisation covered.

    The following is quite old (2006) but is an interesting read about 'dynamic translation' from VirusBulletin: Virus Bulletin :: Improving proactive detection of packed malware

    PS: You might want to use the Malware Analysis sub-forum for questions like these instead of this code section. ;)
  3. Zhou He

    Zhou He Level 1

    Mar 13, 2017
    Windows 10
    Thank you for reply.
    Opcode likes this.
Similar Threads Forum Date
Malware leverages web injects to empty users’ cryptocurrency accounts Security News Yesterday at 1:44 PM
Cryptocurrency mining malware uses five-year old vulnerability to mine Monero on Linux servers Security News Yesterday at 11:28 AM
Q&A Fileless Malware G Data Tuesday at 4:15 PM