Malware Analysis Dynamic Forking identification [TIPS ONLY]

Discussion in 'Malware Analysis' started by Opcode, Nov 20, 2017.

  1. Opcode

    Opcode Level 18
    Content Creator

    Aug 17, 2017
    890
    6,302
    Caille
    Windows 10
    #1 Opcode, Nov 20, 2017
    Last edited: Nov 20, 2017
    Hi all!

    Dynamic Forking has many names. You can refer to it as 'Process Hollowing' or 'RunPE'. It is a technique which used to be quite prevalent in malicious software but has plummeted down a bit because it isn't very common in threats like Ransomware and those sort of threats are mostly prevalent nowadays.

    Dynamic forking works by replacing a PE image with another in memory. The launcher will start up another process in a specific state, replace the PE image in memory with another, and then resume the process so the new process can start executing code... The only difference is that the newly started up process is executing code from another PE image (e.g. a malicious one) instead of the code for the original PE image which was present in memory for that process.

    To elaborate... If we have a program called hithere.exe and we applied Dynamic Forking, we can have the PE image of ohno.exe executed under the process for hithere.exe instead of the code present under the PE image for hithere.exe.

    I have plans to do a full analysis example on dynamic forking and explain it in full but for now I have some tips for you analysts out there who may encounter it... You can check the Import Address Table (and API Logs) for usage of the following functions in a sequence. A sequence isn't always present so you can check the target for the API calls (e.g. based on vectors like the target process from the Process Handle in some API calls).

    Simple explanation outline
    - Process creation with the CREATE_SUSPENDED flag. -> functions like CreateProcessA/W (KERNEL32).
    - Handle acquiring -> functions like OpenProcess (KERNEL32) -> NtOpenProcess (NTDLL)
    - Memory unmapping. NtUnmapViewOfSection (NTDLL) -> using the handle which was acquired. Some process hollowing deployments will use a custom wrapper for unmapping memory because some do not have good success with this function.
    - Memory allocation -> functions like VirtualAllocEx -> NtAllocateVirtualMemory.
    - Memory write operations -> functions like WriteProcessMemory -> NtWriteVirtualMemory.
    - Getting the thread context and setting it -> functions like GetThreadContext (KERNEL32 -> NtGetContextThread (NTDLL)) and SetThreadContext (KERNEL32 -> NtSetContextThread (NTDLL)).
    - Process resuming -> functions like ResumeThread (KERNEL32 -> NtResumeThread (NTDLL)) or NtResumeProcess.

    About handle acquiring: with functions like CreateProcessA/W, the author doesn't need to open a handle afterwards because they already get one back passed to a structure for process information on the start-up request.

    The reason the context of a thread is received is because the launcher needs to get context information for the main thread of the targeted process (for the dynamic forking). Because the process was started up in a suspended state, there is only one thread present which is the main one until the main thread is resumed. This means the main thread of the process is targeted, and this can also be retrieved like the process handle from the process start-up request... Therefore OpenThread (KERNEL32 -> NtOpenThread (NTDLL)) does not need to be called to retrieve the handle to the main thread.

    The context data is modified and then the context is changed with the SetThreadContext (-> NtSetContextThread) function to change the context to the modifications. Usually it is applied for 32-bit targets and this would mean the register EAX is targeted. EAX gets pointed to the new start address for the new PE image written to the processes memory, so when the main thread is resumed, it starts executing at the new start address -> doesn't crash because it knows where the start address for the malicious PE image in memory is. Look into RAX about 64-bit.

    The reason dynamic forking may be applied in the first place is for concealment. If someone opens up Task Manager and finds their browser running, it will look normal if it actually is running (and some browsers stay running in the background sometimes like Google Chrome). It would appear as though a trusted process is running. It can be concealed more with system processes like svchost.exe. However in actual fact, while the process will point to the correct PE image on disk, the PE image in memory will not be the same.

    In other words, thread hijacking takes place because the main thread is hijacked into executing code at another address than the original. The PE image in memory is replaced with a malicious copy. After the operation has succeeded, the launcher in malware will usually terminate itself and get auto-deleted in some cases.

    For identifying process hollowing having taken place in security software development, you can make a memory scanner which will check the characteristics of all running processes and take action accordingly due to the returned results. If the process points to a PE image on disk but does not match the same PE image from the image in memory -> it is process hollowed probably. Unless the process did a lot of run-time patch modifications, which would also be suspicious in itself. You could improve accuracy by comparing specific sections of the PE image in memory to the one on disk in-case there were some run-time patch modifications to memory to help avoid FP detection's. There are many ways to identify process hollowing other than this though, and many ways to deploy it other than above described.

    For prevention during behavioural analysis, Native API routines such as NtSetContextThread can be detoured to prevent handle hijacking. This should also mitigate dynamic forking. Every usage of it I've seen at least will rely on that routine to manipulate the start address to the new base image address.

    This should help for the time being, stay focused! Stay tuned, I'll try to get something more thorough pushed out for Christmas time if it helps you guys analyse more malware

    And thank you for reading. :)
     
    m4n0w4r, Umbra, Zhou He and 8 others like this.
  2. Opcode

    Opcode Level 18
    Content Creator

    Aug 17, 2017
    890
    6,302
    Caille
    Windows 10
    #2 Opcode, Nov 20, 2017
    Last edited: Nov 20, 2017
    NtCreateFile/NtOpenFile and NtReadFile is commonly triggered because the launcher will most likely read the rogue image from disk into memory for the virtual memory operations. NtQueryInformationFile will be triggered if the launcher attempts to calculate the total file size of the image (required for the read operation). Other routines may be invoked because the launcher will need to remotely retrieve the image base address of the image existent within the suspended process, too.

    EAX is used for 32-bit and RAX is used for 64-bit. Native process hollowing for managed images I believe will include initialisation of the Common Language Run-Time (CLR) which is a bit more complicated.

    I will be glad if any of these tips will have assisted someone with studying process hollowing analysis with real live samples though. I don't currently have any samples which deploy the technique but once I do I should be able to make a proper analysis example for it if it helps.
     
    Umbra, Zhou He, Andy Ful and 4 others like this.
  3. tim one

    tim one Level 18
    Trusted AV Tester

    Jul 31, 2014
    893
    9,013
    Europe
    Windows 10
    Emsisoft
    Another great post @Opcode :)

    A process is an entity created by the operating system and managed by the kernel and, in addition to containing the code to execute, it also contains all the information that define its state. A thread is a part of the process that is performed regardless of the state of the rest of the process. So you can see a process as a set of threads. Each process has at least one thread (itself), but it can have more than one that are part of the process but they work independently from the other threads. Also the time of the execution of a thread is independent from the execution time of the other one.

    But a technical question mate, is it possible to run a process inside a thread because of its logical structure/limitations ?
     
  4. Opcode

    Opcode Level 18
    Content Creator

    Aug 17, 2017
    890
    6,302
    Caille
    Windows 10
    No problem mate. :)

    My answer is No because threads are responsible for executing the code, the process itself is just a "shell" to hold it all together. The process points to the PE image on disk and holds all the characteristics but at the end of the day, the "process" can only execute code thanks to the threads, and a process cannot be "contained" under a thread. Due to the threads, more than one function can be executed simultaneously (e.g. 5 threads could be doing something and a 6th one could be updating on-screen results). If you mean can you have like a full process in itself on a thread then no but if you mean can you spawn a process' code from memory within an existent one, then yes that can definitely happen.

    You can however execute a PE image from memory (as well as a DLL). Maybe that is what you meant in the first place and I misunderstood? Hahaha. You can also deploy Dynamic Forking in a file-less manner (executable code in memory and not on disk within a file).
     
  5. tim one

    tim one Level 18
    Trusted AV Tester

    Jul 31, 2014
    893
    9,013
    Europe
    Windows 10
    Emsisoft
    Thanks buddy :)
    Oh no, you got the point, that was what I thought, but your clarification confirms it :)
     
  6. Zhou He

    Zhou He Level 1

    Mar 13, 2017
    29
    76
    China
    Windows 10
    ESET
    After mapping our PE, do we need relocation?

    Just mapping header, mapping sections, and setting EAX/RAX is enough?
     
    Opcode and tim one like this.
  7. Opcode

    Opcode Level 18
    Content Creator

    Aug 17, 2017
    890
    6,302
    Caille
    Windows 10
    @Zhou He You can perform relocation of addresses within the rogue image after copying it across to the suspended process (which you do after unmapping memory starting at its calculated image base address of course).

    The modification to the EAX/RAX is to change the start address of the main thread so when it is resumed, it'll start executing code at the new address for your own image which was placed in the process. EAX is for 32-bit processes and RAX is for 64-bit processes because it is the 64-bit equivalent of RAX (AX would be the 16-bit version for example). The reason we have EAX for 32-bit and RAX for 64-bit is because EAX can store up to 4 byte value in its storage whereas RAX can hold up to 8 bytes in its storage (which is why some use RAX for API hooking on 64-bit processes by moving the callback routine address into RAX and then redirecting execution flow to the value stored in RAX with a simple JMP RAX instructions).

    The context activity is thread hijacking, because you are controlling execution abusing a thread which belongs to the targeted process. Thread hijacking can be used for non-dynamic forking techniques as well. For example, you can allocate memory in the process and write to that allocated memory to store the location of your shell-code and then hijack a thread within the process via NtSetContextThread to execute at the address of your shell-code now present in the address space of the targeted process. You could have the shell-code perform reflective DLL loading, load a DLL normally and then patch the Process Environment Block (hide from the PEB for the loaded DLL without actually initially manual mapping it), and tons of other things.

    I found an online paper just now which you may like to read, it appears to be quite good but I've just done a skim read, I would assume it will go into a bit more detail for you: http://www.autosectools.com/process-hollowing.pdf

    Anyhow you don't need to worry about identifying relocation and what-not, you can just look for functions involved in thread hijacking like NtSetContextThread. If you find them, then you can back-track and look for operations for virtual memory operations into the targeted process of the thread hijacking, process creation in a suspended state, etc. It doesn't need to be over-complicated to identify process hollowing from API logs, I normally watch for code injection by default anyway. Bear in mind the API calls can be huge if you don't do it right because every memory allocation which is genuine etc will increment to the logged calls and you can easily go from 0 API calls on the logs to over a million if you're using a tool like API Monitor.

    A more elegant method for identifying dynamic forking during malware analysis would be to make a custom DLL which will detour NtSetContextThread. In the callback routine you can obtain the Process ID where the thread belongs to (using the handle to the thread from the function parameters on the stack) and if it isn't from the current process then you know it belongs to another process, which successfully verifies the process is attempting to hijack a thread within another process. At this point you can have it automatically noted down for you and then investigate further if necessary to understand if it was using thread hijacking for dynamic forking, general shell-code injection, etc. A bit like building a HIPS but optimised for independent malware analysis. Watch out for evasive malware that will scan for unwanted modules -> attempt to unload, attempt to unhook APIs and similar though. In kernel-mode if you're building a custom analysis tool for malware you can call PsGetProcessId() and PsGetThreadProcessId() to compare if they are equal after checking if the first PID is for the process being monitored. You can't do this on x64 though because of PatchGuard, although a lot of malware is 32-bit supported because malware authors want a widen scope for attack so still useful to do that in some cases.

    PsGetThreadProcessId() takes in one parameter for the thread (PETHREAD) and returns the ProcessId for which the thread is under that process of the returned PID. ObReferenceObjectByHandle can be used prior to this since NtSetContextThread takes in a HANDLE to the thread of course.
     
    Andy Ful, Umbra and Zhou He like this.
Loading...