Malware Analysis Code injection identification [Malware Analysis]

D

Deleted member 65228

Guest
#1
Some images would not load properly so I attached them to the thread. They should all be in order.


Code injection and malware analysis


Introduction
Code injection is a technique which is applied by many different types of malware (“malicious software”) for different purposes. Of course, the result is code being executed within the address space of another running program (which means another program is running code which was created by someone else), however the technique is not applied for the same end-result all the time - for example, banking malware may inject into the web browser processes to detour API functions which would allow it to log credentials of accounts for websites the user is using (form-grabbing/WebInject), whereas a user-mode rootkit may inject code into running programs to prevent the targeted program from finding/terminating the protected process. You can even use code injection for process termination, or as a way of bypassing firewall rules (by using a trusted program which is already running to deploy the malicious code).

Code injection has been around for a very long time, and I do not think it is going to go anywhere any-time soon considering it is quite a big topic and is abused a lot for a variety of things. In fact, it is also used by security solutions (not always but it would not be “uncommon”).

To name a few examples of malware which used code injection for whichever purpose: Zeus (banking malware); Carberp (banking malware); Kronos (banking malware); SpyEye (banking malware); and general types of malware such as backdoors and rogue anti-virus software (bear in mind that genuine security software may also perform code injection for behavior monitoring purposes) may rely on code injection for user-mode rootkit technology.


Different types of code injection

There are different types of code injection techniques, however to name a few: DLL injection, code-cave injection, Asynchronous Procedure Call (APC) injection / Atom bombing, Thread hijacking, and Window hooking (can make use of a DLL however it is not essential).


Code injection techniques can be used to develop/replicate existent techniques. For example, Dynamic Forking (also known as Process Hollowing or RunPE) evolves around replacing a process' PE image in memory with another.

I’ll be going over DLL injection in the most depth, the other ones I’ll be brief with. This article is not supposed to be a fully-fledged tutorial on how you can perform code injection, but just explain a bit about how it works and how you can apply the knowledge with malware analysis. There are however plenty of resources online to study code injection further.


DLL injection
There are different ways that DLL injection can be performed, however the most common method for DLL injection would be forcing the target program running to make a call to a Win32 API function called LoadLibraryA (which is the Ascii version of the function, you can use LoadLibraryW instead if you’d like). The way it works is to open a handle to the process you are targeting with sufficient access rights for virtual memory operations and thread creation, then you allocate memory within the target process with a size sufficient of holding the file path to your DLL on-disk, then you fill the allocated memory with the path the file path to your DLL and you finalize the technique by creating a remote thread within the process which will make a call to LoadLibraryA/W, passing the memory you created within the process which holds the file path as the parameter argument (which is used for the LoadLibrary call).

A basic example of using this technique would be the following code I wrote up to demonstrate it (educational purposes). There are many open-source code examples on this injection technique, and for more sophisticated examples though.



Code:
BOOL InjectDll(HANDLE ProcessHandle, char *DllPath)
{
    FARPROC fpLoadLibraryA = GetProcAddress(GetModuleHandle("kernel32.dll"), "LoadLibraryA");
    if (fpLoadLibraryA)
    {
        DWORD dwDllLength = strlen(DllPath);
        PVOID pvDllMemory = VirtualAllocEx(ProcessHandle, 0, dwDllLength, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);
        if (pvDllMemory)
        {
            SIZE_T NumberOfBytesWritten = 0;
            if (WriteProcessMemory(ProcessHandle, pvDllMemory, DllPath, dwDllLength, &NumberOfBytesWritten))
            {
                HANDLE ThreadHandle = CreateRemoteThread(ProcessHandle, NULL, NULL, (LPTHREAD_START_ROUTINE)fpLoadLibraryA, pvDllMemory, NULL, NULL);
                if (ThreadHandle)
                {
                    return TRUE;
                }
            }
        }
    }
    return FALSE;
}
I’ll explain how the above code works to help you understand how this basic injection method is performed. To keep things simple to follow, I’ll put it in numbered stages.

1. The injection function accepts a handle to the target process and the DLL path for the parameters.

2. The injection function will get the address of LoadLibraryA using the Win32 API function GetProcAddress (from kernel32.dll). We will be forcing the program we wish to inject into to call this function so our DLL can be injected; our DLL won’t magically end up within the address space of the target process - we have to make the target process run code which will load our DLL! The LoadLibrary function (A for Ascii, W for Unicode) is from the Win32 API and it is used to load a module (*.DLL - Dynamic Link Library) into the memory of the process performing the call.

3. The injection function will calculate the length of our DLL path - the same path we passed as the second parameter to the function. We will use this for future reference so we can specify how many bytes need to be allocated in memory (we only want enough to hold our DLL path) and written to with the allocated memory. To do this, a function called strlen is used (this function is not part of the Win32 API, but the C/C++ library. You can access it by including <iostream> for example, which is for basic Input and Output functions).

4. The injection function will allocate memory within the target process but only for enough bytes to hold our DLL file path; there is no need for us to allocate too much memory and if we allocate too little then our DLL file path won’t fit! VirtualAllocEx is used to allocate memory for another process.

5. The injection function will write to the allocated memory to actually insert our DLL file path into the memory. We use WriteProcessMemory to do this, a Win32 API function just like VirtualAllocEx.

6. We finish the injection by creating a remote thread within the target process which is used to call LoadLibraryA. We pass the address of the memory which we allocated and wrote to for holding the DLL path which exists in the target process as the argument so the LoadLibraryA function will use our DLL path for the call - resulting in our DLL being loaded in the target process!


The GetProcAddress function is exported by kernel32.dll (a module that every non-native program will make use of), and it contains a lot of functions which are apart of the Win32 API. When it is called, code execution will eventually land at LdrGetProcedureAddress (NTAPI) which is exported by ntdll.dll - it is an undocumented function.

The function prototype (aka. structure) is below.
Code:
FARPROC WINAPI GetProcAddress(
  _In_ HMODULE hModule,
  _In_ LPCSTR  lpProcName
);
In the injection function, we use GetModuleHandle to get the a HMODULE for kernel32.dll and we pass “LoadLibraryA” as the second parameter because we wanted the address to that function which is exported within kernel32.dll. The reason we called GetModuleHandle is because we did not already have a HMODULE for the module, however you can use LoadLibraryA/W for the first parameter as well if you are targeting a module not already loaded within the process (if it is already loaded then it won’t “reload” it, but will use the already found module).


The VirtualAllocEx function is also exported by kernel32.dll. When the function is called, code execution will eventually land at NtAllocateVirtualMemory (NTAPI) - it is documented under the Zw* prefix for kernel-mode development.

The function prototype is below.
Code:
LPVOID WINAPI VirtualAllocEx(
  _In_     HANDLE hProcess,
  _In_opt_ LPVOID lpAddress,
  _In_     SIZE_T dwSize,
  _In_     DWORD  flAllocationType,
  _In_     DWORD  flProtect
);
In the injection function, we use VirtualAllocEx to allocate the memory within the target process which we can use later on to actually store the path of our DLL. The reason we must allocate memory and then write to this memory to store the DLL path is because, if we do not, the LoadLibraryA call will not have access to a DLL path… If you try to pass the DLL path as the argument without putting it into the process’ memory then you will crash the process! A work-around to this would be shell-code injection to call LoadLibraryA however you would use a char array to hold the DLL file path (each array item would be one character for the path).

The VirtualAllocEx function takes in 5 parameters. The first parameter is for the process handle (the handle to the process you are trying to allocate memory within), the second parameter is the start address for the allocation (if you pass 0, aka. NULL, it will not allocate at memory already in-use), the third parameter is for the size of how many bytes in the memory to allocate (in our case we only want enough bytes to hold our DLL path!), the fourth parameter is the allocation type (check the MSDN page for VirtualAllocEx for more information on this), and the fifth (but last) parameter is for the memory protection (I use PAGE_READWRITE because the memory should not be executable, it will be used to store the file path therefore we require the protection to support write access and we will need to read it so the LoadLibraryA call can read the file path which will be present at the address later on).



The WriteProcessMemory function is exported by kernel32.dll. When the function is called, code execution will land at NtWriteVirtualMemory (NTAPI - exported by ntdll.dll however undocumented).

The function prototype is below.
Code:
BOOL WINAPI WriteProcessMemory(
  _In_  HANDLE  hProcess,
  _In_  LPVOID  lpBaseAddress,
  _In_  LPCVOID lpBuffer,
  _In_  SIZE_T  nSize,
  _Out_ SIZE_T  *lpNumberOfBytesWritten
);
In the injection function we use WriteProcessMemory to write to the allocated memory so we can actually store the path to your DLL on-disk. This means when we pass the address of the allocated memory for the remote thread creation later on, it will actually hold the path to the DLL file which means LoadLibraryA will be able to access the DLL path.


The WriteProcessMemory function takes in 5 parameters. The first parameter is for the handle of the process we are targeting. The second parameter is for the address in the process’ memory which we wish to write to… The memory write operation will start at the address passed in for this second parameter. The third parameter is for the data we actually wish to write to the memory (we pass our DLL path for this parameter). The fourth parameter is for how many bytes we wish to write to (e.g. if we wish to only write to 5 bytes starting at the lpBaseAddress then we pass in 5 for this parameter, the total calculation is lpBaseAddress + nSize = final destination for the end of the write operation). The last parameter is passed out which is why we use the & for the NumberOfBytesWritten variable in the injection function, and it lets us know how many bytes were written to (so we can ensure the correct amount was written to should you need to).


The CreateRemoteThread function is exported by kernel32.dll. When the function is called, code execution will land at RtlCreateUserThread (eventually) which is a Native API (NTAPI) function exported by ntdll.dll. However, unlike other NTAPI functions like NtAllocateVirtualMemory and NtWriteVirtualMemory, a system call will not be performed. The system call for thread creation is performed by another NTAPI function called NtCreateThreadEx, which will be called eventually. Both RtlCreateUserThread and NtCreateThreadEx are undocumented to Microsoft, however as with a lot of undocumented functions, you are bound to find information on them online.


To keep it short, it goes like this:
CreateRemoteThread (KERNEL32) -> RtlCreateUserThread (NTDLL) -> NtCreateThreadEx (NTDLL) -> Kernel


There will be many other functions called during that operation though.

The function prototype for CreateRemoteThread is below.

Code:
HANDLE WINAPI CreateRemoteThread(
  _In_  HANDLE                 hProcess,
  _In_  LPSECURITY_ATTRIBUTES  lpThreadAttributes,
  _In_  SIZE_T                 dwStackSize,
  _In_  LPTHREAD_START_ROUTINE lpStartAddress,
  _In_  LPVOID                 lpParameter,
  _In_  DWORD                  dwCreationFlags,
  _Out_ LPDWORD                lpThreadId
);
The function takes in 7 parameters, however we are not interested in all of them for the injection operation. We only take note of 3 of the parameters: the first one (data-type HANDLE); the fourth one (data-type LPTHREAD_START_ROUTINE); and the fifth one (data-type LPVOID). The ones we do not have interest in we can just pass 0 or NULL (which we do in the demonstration code), however for more information on how those parameters work/what they are used for, I recommend checking the MSDN documentation.

We use the first parameter for targeting the process we are injecting into, using the HANDLE passed to our injection function. We use the fourth parameter to specify the address of the LoadLibraryA function which we want our new thread to call, and we use the fifth parameter to specify the address within the process’ memory which holds the path to our DLL (remember the memory we allocated and wrote to which was within the address space of the target process?).

Now that the demonstration code has been explained (e.g. why we use the functions we do, and the NTAPI equivalents which can be used to enhance the complexity of the injection operation), we need to discuss the limitations. I will list some below.


#1 - The injection operation will not support processes which are running from other user accounts. For example, if you wish to use this injection technique on a program running under SYSTEM (NT Authority Account) or on another normal user-account, the demonstration code will simply fail. Why is this and how is this limitation resolved? The issue persists because to allocate/write to a process running from another user-account, you’ll require a privilege known as “debugging rights” (SeDebugPrivilege). This privilege can be enabled via Win32 API functions such as AdjustTokenPrivileges (the NTAPI equivalent would be NtAdjustPrivilegesToken, however you can use RtlAdjustPrivilege which will end up calling that function anyway since it is easier to use the Rtl* version). The other reason this limitation persists is due to CreateRemoteThread not actually supporting processes running from another user-account! The solution to this issue would be to use an NTAPI function for the thread creation - as discussed earlier, CreateRemoteThread will trail to RtlCreateUserThread and then the trail ends at NtCreateThreadEx… RtlCreateUserThread, or just using NtCreateThreadEx will do just fine.

#2 - The injection operation cannot be used for LdrLoadDll/LdrpLoadDll without changing the technique into code-cave/shell-code injection. The reason for this is because you can only pass one parameter argument by default to the starting address (the function) the thread should start executing at! In case you were unaware of LdrLoadDll, it is the NTAPI function which eventually becomes called once LoadLibraryA/W is called (LdrLoadDll is exported by ntdll.dll, LoadLibraryA/W exported by kernel32.dll), and LdrLoadDll will make a call to LdrpLoadDll internally (and this function is actually not exported by ntdll.dll therefore if you wish to use it manually you would have to manually find the address). LdrpLoadDll is used for manual map injection which I will talk about briefly after this part. The solution would be to resort to code-cave injection (for example), which would consist of allocating memory for a loader function, writing the allocated memory to hold the loader function, and then calling LdrLoadDll/LdrpLoadDll from within the loader function (or any other function you wish the remote thread to call with multiple parameters). You can pass a structure as the parameter argument and access the contents from within the loader function for function addresses, unless you write your own GetProcAddress/LoadLibrary replacement.

Manual map injection is a stealthier technique for DLL injection, and it still involves memory allocation, memory writing and remote thread creation. Although, you also need to copy the actual DLL image to the memory of the target process. As I briefly noted beforehand, the function LdrLoadDll is called by LoadLibraryA/W, and this function is responsible for loading the DLL into the address space of the caller process. However, to do this LdrLoadDll will rely on other functions to help it perform this task! One of the functions LdrLoadDll will call is an unexported and undocumented function called LdrpLoadDll. Genuine manual map injection will rely on code-cave/shell-code injection for a loader function, and this loader function will handle import resolving and actually calling LdrpLoadDll. The point of manual map injection is to be stealthier, and the reason it is stealthier is because the technique results in your injected module being “hidden”; the Process Environment Block (PEB) is a huge structure and there is a PEB for every single running process on the system, and this structure is huge because it contains many different entries to hold data about the process - one of the entries is a list of modules (I’ll refer to it as ModulesList for now), and this list contains all the loaded modules (DLLs) for that process. However, with manual map injection, the module which has been injected will never be added to the PEB ModulesList, because this will have been done by LdrLoadDll and you would be using LdrpLoadDll (which LdrLoadDll would have internally called) instead!

The Process Environment Block (PEB) structure is below:
Code:
typedef struct _PEB {
    BYTE Reserved1[2];
    BYTE BeingDebugged;
    BYTE Reserved2[21];
    PPEB_LDR_DATA LoaderData;
    PRTL_USER_PROCESS_PARAMETERS ProcessParameters;
    BYTE Reserved3[520];
    PPS_POST_PROCESS_INIT_ROUTINE PostProcessInitRoutine;
    BYTE Reserved4[136];
    ULONG SessionId;
} PEB;
Notice the PPEB_LDR_DATA entry in the PEB structure? That is for holding data about the loaded modules within the process. It is a pointer to PEB_LDR_DATA structure.

The structure PEB_LDR_DATA is in the Process Environment Block and this holds the data about the loaded modules. By “ModulesList” I am referring to the InMemoryOrderModuleList (LIST_ENTRY) entry, I say ModulesList for short. The structure is below for you (thanks to NirSoft - source: struct PEB_LDR_DATA):

Code:
typedef struct _PEB_LDR_DATA
{
     ULONG Length;
     UCHAR Initialized;
     PVOID SsHandle;
     LIST_ENTRY InLoadOrderModuleList;
     LIST_ENTRY InMemoryOrderModuleList;
     LIST_ENTRY InInitializationOrderModuleList;
     PVOID EntryInProgress;
} PEB_LDR_DATA, *PPEB_LDR_DATA;
You can learn more about the PEB_LDR_DATA over on MSDN as well: https://msdn.microsoft.com/en-us/library/windows/desktop/aa813708(v=vs.85).aspx

When manual map injection has been performed, if you enumerate through the loaded modules of a process using normal techniques, you simply won’t find the manually mapped DLL. You’ll have to resort to other memory scanning techniques.

Before I finish with the theory for this section regarding DLL injection, I have written some example code on how you can allocate/write to memory and create a remote thread via the Native API (ntdll.dll exported functions). The source code uses NtAllocateVirtualMemory, NtWriteVirtualMemory and RtlCreateUserThread.

You can find the example source code over on my GitHub: GitHub - NtOpcode/NTAPI-Injector: DLL injection which relies on NTAPI for memory allocation, memory write and thread creation operations.


Code-cave injection
Code-cave injection is much stealthier than DLL injection because you won’t require any dependencies - this means you won’t need to drop to disk at all. It is used in malicious software however only more complex threats would resort to this technique over DLL injection because it would require a more skilled developer.

Code-cave injection is more or less the same as shell-code injection, the difference is that with the term “code-cave” I am referring to you writing your injected stub in a language like C or C++ and then having this injected into the target process. Whereas, with shell-code you will get the byte representations for the operation code (opcode - yes I am an opcode too!) instructions (ASM) for the code yourself and then inject this.

Shell-code does not have to appear one way all the time. For example, below are examples of shell-code (it is not real shell-code for you to use, just a format example):

#1 - 0xE9, 0x00, 0x00, 0x00, 0x00
#2 - “\xE9\x00\x00\x00\x00\”


In the #1 we have an array of opcode instructions (from ASM) in byte representation. However in #2 we have it displayed differently, with a \ and then an “x” and then the binary representation for the operation code.

You can allocate memory in another process and then fill the allocated memory with your code, and then create a remote thread so it will execute at the address of your injected code. It does not matter if you rely on C/C++ for your injected code or rely on shell-code (bytes from the function), but you need to know that there are rules when using this technique… For example, you cannot just call a Win32 API function from within the injected code, you’ll need to have the address! With C/C++ you can have your injected function accept a parameter (e.g. a structure) which you fill to hold addresses of functions before the injection so you can access addresses that way, or you can just write your own GetProcAddress/GetModuleHandle/LoadLibrary wrapper so you can handle it all yourself (which you’ll typically do with shell-code anyway).

The good news is thanks to how Windows handles its own modules (e.g. ntdll.dll and kernel32.dll) the addresses for functions exported by those DLLs for the target process will be the same addresses for the ones in your own process. This means if you are trying to inject code which will call TerminateProcess from kernel32.dll and you get the address of kernel32!TerminateProcess before the injection and somehow pass this to your injected function when it becomes executed on the remote thread (e.g. the parameter argument usage), that address will be the same for the function in the process you injected into. Although in most cases you will need to re-write GetProcAddress (just scan the Export Address Table and use a re-write of LoadLibrary if the address retrieval does fail), etc.

E.g:
Code:
BYTE Opcode[5] = {
0xE9, // this is the binary representation for an ASM instruction JMP
0x00, 0x00, 0x00, 0x00 // some place holder for an address (4 bytes long )
}
In fact, code-cave injection can be used as a stealthier way of performing API hooking; usually most people will rely on a DLL to deploy their hook initialization so API calls end up being redirected to the callback function by the author of the hook (to intercept the call and perform operations prior to the real call being executed, or to block it/manipulate the original call). However, you can rely on code-cave injection to have a function executed which is responsible for setting the hooks (and of course you would allocate memory for your callback functions and then write to the allocated memory to hold the callback code, and then you’d pass the address to the injection function which gets executed on the remote thread so it knows where the callback addresses area). You’d need a way of knowing the address of the unhook/hook functions from within the callback (to unhook/re-hook) or the trampoline to allow the call to pass through though.


You can perform code-cave injection using the same functions used in the DLL injection method we discussed earlier, except the goal is different. You’ll need to make changes of course, and ensure your injected code does not call any functions which it cannot (e.g. you must get the addresses otherwise you’ll cause a crash - pass a structure to the injected function which holds the data required.


Asynchronous Procedure Calls
Asynchronous Procedure Calls (APC) is another common method of code injection, and it works by having code executed within a target thread - the downside is there there is a queue system and this can result in waiting for the thread to be scheduled again. MSDN have a very good explanation to how it works and I am not going to be able to provide a better one than them.

An asynchronous procedure call (APC) is a function that executes asynchronously in the context of a particular thread. When an APC is queued to a thread, the system issues a software interrupt. The next time the thread is scheduled, it will run the APC function. An APC generated by the system is called a kernel-mode APC. An APC generated by an application is called a user-mode APC. A thread must be in an alertable state to run a user-mode APC.

Each thread has its own APC queue. An application queues an APC to a thread by calling the QueueUserAPC function. The calling thread specifies the address of an APC function in the call to QueueUserAPC. The queuing of an APC is a request for the thread to call the APC function.
Source: https://msdn.microsoft.com/en-us/library/windows/desktop/ms681951(v=vs.85).aspx


The above quote from MSDN states that you can queue an APC to a thread by calling QueueUserAPC, which is part of the Win32 API. It is exported by kernel32.dll. The Native API (NTAPI) equivalent for this function is NtQueueApcThread (undocumented).

The function prototype for QueueUserAPC is below.
Code:
DWORD WINAPI QueueUserAPC(
  _In_ PAPCFUNC  pfnAPC,
  _In_ HANDLE    hThread,
  _In_ ULONG_PTR dwData
);
We can see that the function takes in three parameters: pfnAPC, hThread and dwData.


The first parameter, pfnAPC (data-type PAPCFUNC) is responsible for holding the address of the function which needs to be called by the thread you will be targeting. In terms of DLL injection, you would pass the address of LoadLibraryA/W. You may need to type-cast the address to PAPCFUNC, but it will still work.

The second parameter, hThread (data-type HANDLE) is responsible for holding the handle to the thread in-which you are targeting. APC works by having a function called by the targeted thread, therefore you’ll need to target a thread within another running process if you wish to use it for code injection purposes. You can enumerate through the threads of a process and then pick one of the threads, then open a handle to that thread and use it for the APC injection - in fact, if you’re using it for DLL injection since the same DLL can only be injected into a process once (no matter how many times you call LdrLoadDll, the DLL will only be loaded once if its the same), you can attempt APC injection on every single thread you can get a handle to within the target process which increases the chances of the injection occurring sooner and becoming successful (since as noted earlier, there is a queue system and you’ll need to wait for the thread).


The third parameter, dwData (data-type ULONG_PTR) is for the argument parameter which can be passed to the function for the APC. For example, if you remember earlier when we discussed DLL injection and we used LoadLibraryA (and we passed the address of the memory we made in the target process to hold the DLL path location), in this case if we were putting the address of LoadLibraryA as the pfnAPC, we would pass the address of the DLL path within the target process in the dwData parameter. However, you would type-cast the PVOID from the memory allocation to (ULONG_PTR).

I have written an example of APC usage for DLL injection to help you further understand how it works. The injection function will take in a HANDLE to the target process you wish to inject into and a char* for the DLL path (e.g. “C:\\Opcode.dll”).

The injection function will do the following in-order:
1. Get the PID from the HANDLE
2. Get the address of kernel32!LoadLibraryA
3. Calculate the length of the DLL path
4. Allocate memory to hold the DLL path in the target process
5. Write to the allocated memory to actually store the DLL path
6. Enumerate through all the threads and filter for the ones within the target process (via a comparison which references the PID we obtained from the HANDLE)
7. Obtain a handle to any threads filtered for the target process via kernel32!OpenThread (lands at ntdll!NtOpenThread)
8. Perform a call to kernel32!QueueUserAPC to attempt the injection (lands at ntdll!NtQueueApcThread).


The example is below.
Code:
BOOL InjectDll(HANDLE ProcessHandle, char *DllPath)
{
    // check if the process handle is valid
    if (!ProcessHandle)
    {
        return FALSE;
    }

    // get the PID of the process from the handle
    DWORD dwProcessId = GetProcessId(ProcessHandle);

    // check if the PID was successfully received
    if (!dwProcessId)
    {
        return FALSE;
    }

    // get the address of kernel32.dll!LoadLibraryA
    FARPROC fpLoadLibraryA = GetProcAddress(GetModuleHandle("kernel32.dll"), "LoadLibraryA");

    // check if the address could be obtained
    if (fpLoadLibraryA)
    {
        // get the length of the DLL path
        SIZE_T sDllLength = strlen(DllPath);

        // allocate memory which we'll use later to store the DLL path
        PVOID pvDllMemory = VirtualAllocEx(ProcessHandle, 0, sDllLength, MEM_RESERVE | MEM_COMMIT, PAGE_READWRITE);

        // check if the memory allocation was successful
        if (pvDllMemory)
        {
            // write to the allocated memory to store the DLL path
            if (WriteProcessMemory(ProcessHandle, pvDllMemory, DllPath, sDllLength, NULL))
            {

                // try catch for stability
                try
                {
                    // enumerate through all the threads
                    HANDLE ThreadHandle, hThreadHandle;
                    THREADENTRY32 ThreadEntry32;
                    ThreadEntry32.dwSize = sizeof(THREADENTRY32);
                    hThreadHandle = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
                    do {

                        // check if the thread is within the target process we want to inject into
                        if (ThreadEntry32.th32OwnerProcessID == dwProcessId)
                        {
                            // open a handle to the thread
                            ThreadHandle = OpenThread(THREAD_ALL_ACCESS, FALSE, ThreadEntry32.th32ThreadID);

                            // check if the thread could be opened
                            if (ThreadHandle)
                            {
                                // try to inject with QueueUserAPC
                                QueueUserAPC((PAPCFUNC)fpLoadLibraryA, ThreadHandle, (ULONG_PTR)pvDllMemory);
                            }

                            CloseHandle(ThreadHandle);
                        }

                    } while (Thread32Next(hThreadHandle, &ThreadEntry32)); // keep looping!
                }
                catch(...)
                {
                    // exception occurred so -> return FALSE
                    return FALSE;
                }

                // return success because after all, we got as far as were gonna get if it was gonna work
                return TRUE;

            }
        }
    }

    // failed so return false
    return FALSE;
}
You can suspend the targeted threads before the APC attempt and resume them after to improve it, but it still works well regardless.



Code injection identification

We have gotten most of the theory out of the way about some methods of code injection, however one the most important parts we have discussed so far is the API functions which are commonly used for code injection - you may have noticed similarities between DLL injection, code-cave/shell-code injection and APC injection methods. The similarity is the typically used API functions for memory allocation, memory writing and thread creation. With shell-code injection you can just make a remote thread to deploy the shell-code, but this will require a method of getting that remote thread created… In the end, it’ll land at NtCreateThreadEx because CreateRemoteThread will end up with RtlCreateUserThread being called and then NtCreateThreadEx will end up being called (and NtCreateThreadEx is where the system call will be performed for execution to land at the kernel which is where the operation really occurs for the thread-creation). Closer to the end, functions like LdrInitializeThunk (ntdll.dll) will be called by the target process (which can actually be locally-hooked at ease to prevent thread creation as a self-protection feature).

Key functions we have encountered so far:
- LoadLibraryA/W -> LdrLoadDll -> LdrpLoadDll
- VirtualAllocEx -> NtAllocateVirtualMemory
- WriteProcessMemory -> NtWriteVirtualMemory
- CreateRemoteThread -> RtlCreateUserThread -> NtCreateThreadEx
- QueueUserAPC -> NtQueueApcThread

Other key functions we have not yet discussed which you will benefit knowing about:
- OpenProcess -> NtOpenProcess
- OpenThread -> NtOpenThread
- GetThreadContext -> NtGetContextThread
- SetThreadContext -> NtSetThreadContext


I have a test sample which is based off the example source codes from this article, however the DLL injection sample relies on the NTAPI injector (source code at my GitHub). I’ll be using it to demonstrate some ways of identifying malicious software which may perform code injection, or will perform it.


Static analysis
In this part we’ll be performing a quick inspection of the non-packed sample (in the same condition it was in after Visual Studio compiled them).


Injector1.exe
We’ll start off by checking the detection ratio at VirusTotal as this is typically something most analysts do with real samples from the wild.

At the time of writing this article, the first test injector sample appears to be FUD (Fully Undetected) according to VirusTotal. Bear in mind, the sample is not actually malicious software (it won’t cause harm and I did not develop it for malicious purposes, just for this thread). The VT link is here, maybe it will have detection’s by the time you view depending on the date: https://www.virustotal.com/#/file/921b0fe6062006f3663b706914ad6811da53fcce7abd79a0b1c69b860af5829b/detection (if you go to the Details tab on VT you can even see the path for ‘debug artifacts’ is “C:\github\NTAPI Injector\x64\Release\NTAPI Injector.pdb”)





I will open the sample up in IDA so we can obtain other information. The sample is compiled as 64-bit therefore you will see 64-bit registers being used during the disassembly such as RAX, RDI, RBX.

Imports:
Code:
Address          Ordinal Name                                  Library
-------          ------- ----                                  -------
0000000140027000         WriteProcessMemory                    KERNEL32
0000000140027008         GetModuleHandleA                      KERNEL32
0000000140027010         CloseHandle                           KERNEL32
0000000140027018         GetProcAddress                        KERNEL32
0000000140027020         VirtualAllocEx                        KERNEL32
0000000140027028         CreateRemoteThread                    KERNEL32
0000000140027030         VirtualFreeEx                         KERNEL32
0000000140027038         GetLastError                          KERNEL32
0000000140027040         CreateToolhelp32Snapshot              KERNEL32
0000000140027048         Process32Next                         KERNEL32
0000000140027050         CreateFileW                           KERNEL32
0000000140027058         WideCharToMultiByte                   KERNEL32
0000000140027060         EnterCriticalSection                  KERNEL32
0000000140027068         LeaveCriticalSection                  KERNEL32
0000000140027070         DeleteCriticalSection                 KERNEL32
0000000140027078         SetLastError                          KERNEL32
0000000140027080         InitializeCriticalSectionAndSpinCount KERNEL32
0000000140027088         CreateEventW                          KERNEL32
0000000140027090         Sleep                                 KERNEL32
0000000140027098         TlsAlloc                              KERNEL32
00000001400270A0         TlsGetValue                           KERNEL32
00000001400270A8         TlsSetValue                           KERNEL32
00000001400270B0         TlsFree                               KERNEL32
00000001400270B8         GetSystemTimeAsFileTime               KERNEL32
00000001400270C0         GetModuleHandleW                      KERNEL32
00000001400270C8         EncodePointer                         KERNEL32
00000001400270D0         DecodePointer                         KERNEL32
00000001400270D8         MultiByteToWideChar                   KERNEL32
00000001400270E0         CompareStringW                        KERNEL32
00000001400270E8         LCMapStringW                          KERNEL32
00000001400270F0         GetLocaleInfoW                        KERNEL32
00000001400270F8         GetStringTypeW                        KERNEL32
0000000140027100         GetCPInfo                             KERNEL32
0000000140027108         SetEvent                              KERNEL32
0000000140027110         ResetEvent                            KERNEL32
0000000140027118         WaitForSingleObjectEx                 KERNEL32
0000000140027120         RtlCaptureContext                     KERNEL32
0000000140027128         RtlLookupFunctionEntry                KERNEL32
0000000140027130         RtlVirtualUnwind                      KERNEL32
0000000140027138         UnhandledExceptionFilter              KERNEL32
0000000140027140         SetUnhandledExceptionFilter           KERNEL32
0000000140027148         GetCurrentProcess                     KERNEL32
0000000140027150         TerminateProcess                      KERNEL32
0000000140027158         IsProcessorFeaturePresent             KERNEL32
0000000140027160         IsDebuggerPresent                     KERNEL32
0000000140027168         GetStartupInfoW                       KERNEL32
0000000140027170         QueryPerformanceCounter               KERNEL32
0000000140027178         GetCurrentProcessId                   KERNEL32
0000000140027180         GetCurrentThreadId                    KERNEL32
0000000140027188         InitializeSListHead                   KERNEL32
0000000140027190         RtlUnwindEx                           KERNEL32
0000000140027198         RtlPcToFileHeader                     KERNEL32
00000001400271A0         RaiseException                        KERNEL32
00000001400271A8         FreeLibrary                           KERNEL32
00000001400271B0         LoadLibraryExW                        KERNEL32
00000001400271B8         HeapAlloc                             KERNEL32
00000001400271C0         HeapReAlloc                           KERNEL32
00000001400271C8         HeapFree                              KERNEL32
00000001400271D0         GetStdHandle                          KERNEL32
00000001400271D8         WriteFile                             KERNEL32
00000001400271E0         GetModuleFileNameA                    KERNEL32
00000001400271E8         ExitProcess                           KERNEL32
00000001400271F0         GetModuleHandleExW                    KERNEL32
00000001400271F8         GetCommandLineA                       KERNEL32
0000000140027200         GetCommandLineW                       KERNEL32
0000000140027208         GetACP                                KERNEL32
0000000140027210         GetFileType                           KERNEL32
0000000140027218         IsValidLocale                         KERNEL32
0000000140027220         GetUserDefaultLCID                    KERNEL32
0000000140027228         EnumSystemLocalesW                    KERNEL32
0000000140027230         FlushFileBuffers                      KERNEL32
0000000140027238         GetConsoleCP                          KERNEL32
0000000140027240         GetConsoleMode                        KERNEL32
0000000140027248         ReadFile                              KERNEL32
0000000140027250         SetFilePointerEx                      KERNEL32
0000000140027258         GetProcessHeap                        KERNEL32
0000000140027260         FindClose                             KERNEL32
0000000140027268         FindFirstFileExA                      KERNEL32
0000000140027270         FindNextFileA                         KERNEL32
0000000140027278         IsValidCodePage                       KERNEL32
0000000140027280         GetOEMCP                              KERNEL32
0000000140027288         GetEnvironmentStringsW                KERNEL32
0000000140027290         FreeEnvironmentStringsW               KERNEL32
0000000140027298         SetEnvironmentVariableA               KERNEL32
00000001400272A0         SetStdHandle                          KERNEL32
00000001400272A8         ReadConsoleW                          KERNEL32
00000001400272B0         HeapSize                              KERNEL32
00000001400272B8         WriteConsoleW                         KERNEL32


All the found imports are from kernel32.dll. The functions which suggest virtual memory operations/thread creation (which can be used for code injection as we have already established) which are listed in the Imports are below.

Code:
WriteProcessMemory
VirtualAllocEx
CreateRemoteThread
VirtualFreeEx
The above functions are indeed used for the DLL injection the test sample performs, however the sample also supports injection via the same technique using the NTAPI equivalents, and the used NTAPI functions are dynamic imports. We’ll investigate more and find evidence throughout the static analysis of the dynamic import activity.


Before we start trying to find the main function of the program, we’ll quickly check the strings of the PE. You can use a free tool called Strings however I have access to IDA and it supports string scanning.

Due to not wanting to exceed the character limit I have uploaded the contents of the strings dump to Pastebin (lifetime expiration): https://pastebin.com/DsfQcVm7


The strings that can provide a bit of information to us:
Code:
.rdata:0000000140036EE0 0000000A C ntdll.dll
.rdata:0000000140036EC8 00000018 C NtAllocateVirtualMemory
.rdata:0000000140036EF0 00000015 C NtWriteVirtualMemory
.rdata:0000000140037048 0000000E C NtOpenProcess
.rdata:0000000140036F08 00000014 C RtlCreateUserThread
.rdata:0000000140036FE0 0000000C C notepad.exe
.rdata:0000000140036FF0 0000000E C C:\\opcode.dll
We have found references to: Native API (ntdll.dll exported) functions; a Windows built-in program (hard-coded); and a static file path (hard-coded) for a DLL (Dynamic Link Library) file, within the strings. This implies that at some point a program called notepad.exe will be used in some shape or form, and the same goes for the DLL file (“opcode.dll”). The only reason the Native API functions could be referenced in the strings is if it was staged to distract someone to look into why they are there (even if they had no real purpose), or because they are going to be used for dynamic imports. This does not mean that dynamic imported functions are always leaked in a strings scan, this is not the case at all. In this scenario, no anti-reversing techniques have been applied (e.g. string obfuscation/scrambling, debugger checks, etc.).

We can also find references to ‘iostream’ and ‘std’ (the first is a header for input/output in C++, the second is a name-space in C++).



If we go to the Exports tab we fill find an entry for ‘start’. This is not the main function of the program which was made by me, it was auto-generated by the compiler and it will call other functions related to initialization.






We can see from the above image that the start function which the PE exports will call a function, and then it will return with another function (which means the value returned by the function called for returning is returned by the function that returned the function called).

The pseudo-code for that start function:
Code:
signed __int64 start()
{
  sub_140009C20();
  return sub_140009138();
}



We’re not interested in the function which is called (with the CALL instruction), but we are interested in the function which is used for the return. Right now we are trying to find the main function of the PE which the author wrote (which is me in this case).



All of these operations are performed by the CRT (C++ Run-Time). The sample was written in native C++ (not C++.NET) but it still requires a run-time so it can function.

To make it easier to read, I’ll move to pseudo-code view.
Code:
signed __int64 sub_140009138()
{
  char v0; // si@3
  char v1; // bl@3
  __int64 v2; // rcx@3
  _QWORD *v4; // rax@10
  void (__fastcall **v5)(_QWORD, signed __int64); // rbx@10
  void (__fastcall *v6)(_QWORD, signed __int64); // rbx@12
  _QWORD *v7; // rax@13
  _QWORD *v8; // rbx@13
  __int64 *v9; // rax@16
  __int64 v10; // rdi@16
  _DWORD *v11; // rax@16
  _DWORD *v12; // rbx@16
  __int64 v13; // rax@16
  int v14; // ebx@16
  __int64 v15; // rcx@16

  if ( !(unsigned __int8)sub_140009300(1i64) )
  {
    sub_140009998(7i64);
    __debugbreak();
  }

  v0 = 0;
  v1 = sub_1400092C4();
  v2 = (unsigned int)dword_14003D7D8;
  if ( dword_14003D7D8 == 1 )
    sub_140009998(7i64);
  if ( (_DWORD)v2 )
  {
    v0 = 1;
  }
  else
  {
    dword_14003D7D8 = 1;
    if ( sub_14001441C(&unk_140027348, &unk_140027390) )
      return 255i64;
    sub_1400143B8(&unk_1400272E8, &unk_140027340);
    dword_14003D7D8 = 2;
  }

  LOBYTE(v2) = v1;
  sub_1400094C8(v2);
  LODWORD(v4) = sub_140009D1C();
  v5 = (void (__fastcall **)(_QWORD, signed __int64))v4;
  if ( *v4 && (unsigned __int8)sub_14000942C(v4) )
  {
    v6 = *v5;
    j___guard_check_icall_fptr(v6);
    v6(0i64, 2i64);
  }

  LODWORD(v7) = sub_140009D24();
  v8 = v7;

  if ( *v7 && (unsigned __int8)sub_14000942C(v7) )
    sub_140014710(*v8);

  LODWORD(v9) = sub_1400147D0();
  v10 = *v9;
  LODWORD(v11) = sub_1400147C8();
  v12 = v11;
  LODWORD(v13) = sub_140014360();
  v14 = sub_140002130(*v12, v10, v13);

  if ( !(unsigned __int8)sub_140009AE4() )
    sub_140014754((unsigned int)v14);

  if ( !v0 )
    sub_1400146F4();
 
LOBYTE(v15) = 1;
  sub_1400094EC(v15, 0i64);
  return (unsigned int)v14;
}



If you look at the code in the CODE tags, at the very end you will see the function ends by returning v14. v14 has a data-type of integer, but where does it become referenced? Lets identify it being used in the pseudo-code.




We can clearly see from the above image that the returned value from another function is put into the v14 variable, and the entire function we are currently looking at will return the value of v14. The reason this function is important to us is because it is the main function from the author (well by me)! Since the sample is written in C++, the main function will return an integer… And the CRT will move control to the main function once it has finished executing its own initialization code, so all that was needed was to follow the start() function and find where execution control is given to (in our case, it is a function called sub_140002130 - I have globally renamed the function to ‘mainfunc’.





Code:
int sub_140002130()
{
  HANDLE v0; // rbx@1
  __int64 v1; // rax@2
  char *v2; // rcx@2

  __int64 v3; // rax@2

  int v4; // edx@3

  int v5; // er8@3

  unsigned __int64 v6; // rcx@8
  unsigned __int64 v7; // rax@10
  DWORD v8; // ebx@15
  unsigned __int64 v9; // rcx@17
  unsigned __int64 v10; // rax@19
  __int64 v11; // rax@24
  __int64 v12; // rax@24
  __int64 v13; // rcx@24
  DWORD v14; // ebx@26
  __int64 v15; // rcx@26
  __int64 v16; // rax@26
  __int64 v17; // rax@26
  __int64 v18; // rbx@26
  __int64 v19; // rax@26
  __int64 v20; // rdx@26
  __int64 v21; // rdi@26
  void (__fastcall ***v22)(_QWORD, _QWORD); // rax@27
  unsigned __int8 v23; // al@29
  __int64 v25; // [sp+0h] [bp-1A8h]@30
  __int128 v26; // [sp+20h] [bp-188h]@1
  __int128 v27; // [sp+30h] [bp-178h]@1
  __int128 v28; // [sp+40h] [bp-168h]@24
  const char *v29; // [sp+50h] [bp-158h]@24
  __int64 v30; // [sp+58h] [bp-150h]@1
  PROCESSENTRY32 pe; // [sp+60h] [bp-148h]@1
  __int64 v32; // [sp+190h] [bp-18h]@30
  v30 = -2i64;
  *((_QWORD *)&v27 + 1) = 15i64;
  LOBYTE(v26) = 0;
  *(_QWORD *)&v27 = 11i64;
  sub_14000A640(&v26, "notepad.exe", 11i64);
  BYTE11(v26) = 0;

  pe.dwSize = 304;
  v0 = CreateToolhelp32Snapshot(2u, 0);

  while ( 1 )
  {
    LODWORD(v1) = sub_1400023F0(&v26);
    v2 = pe.szExeFile;
    v3 = v1 - ((_QWORD)&pe + 44);

    do
    {
      v4 = (unsigned __int8)v2[v3];
      v5 = (unsigned __int8)*v2 - v4;

      if ( (unsigned __int8)*v2 != v4 )
        break;
      ++v2;
    }

    while ( v4 );
    if ( !v5 )
      break;
    if ( !Process32Next(v0, &pe) )
    {
      CloseHandle(v0);
      if ( *((_QWORD *)&v27 + 1) >= 0x10ui64 )
      {
        v6 = v26;
        if ( (unsigned __int64)(*((_QWORD *)&v27 + 1) + 1i64) >= 0x1000 )
        {
          if ( v26 & 0x1F || (v7 = *(_QWORD *)(v26 - 8), v7 >= (unsigned __int64)v26) || (v6 = v26 - v7 - 8, v6 > 0x1F) )
          {
            sub_14000E984(v6);
            __debugbreak();
          }
          else
          {
            v6 = *(_QWORD *)(v26 - 8);
          }
        }
        sub_14000901C(v6);
      }
      v8 = 0;
      goto LABEL_24;
    }
  }
  CloseHandle(v0);
  v8 = pe.th32ProcessID;
  if ( *((_QWORD *)&v27 + 1) >= 0x10ui64 )
  {
    v9 = v26;
    if ( (unsigned __int64)(*((_QWORD *)&v27 + 1) + 1i64) >= 0x1000 )
    {
      if ( v26 & 0x1F || (v10 = *(_QWORD *)(v26 - 8), v10 >= (unsigned __int64)v26) || (v9 = v26 - v10 - 8, v9 > 0x1F
      {
        sub_14000E984(v9);
        __debugbreak();
      }
      else
      {
        v9 = *(_QWORD *)(v26 - 8);
      }
    }
    sub_14000901C(v9);

  }

LABEL_24:
  LOBYTE(v26) = 0;
  _mm_storeu_si128((__m128i *)&v27, _mm_load_si128((const __m128i *)&xmmword_140037060));
  LODWORD(v11) = sub_140005D30(v8);
  *(_QWORD *)&v28 = v11;
  *((_QWORD *)&v28 + 1) = 0x100000000i64;
  v29 = "C:\\opcode.dll";
  v26 = v28;
  *(_QWORD *)&v27 = "C:\\opcode.dll";
  LODWORD(v12) = sub_140001110((__int64)&v26);
  if ( v12 )
  {
    sub_140004C80(v13, "Success\n");
  }
  else
  {
    v14 = GetLastError();
    LODWORD(v16) = sub_140004C80(v15, "Failed");
    LODWORD(v17) = sub_140002580(v16, v14);
    v18 = v17;
    *((_QWORD *)&v28 + 1) = *(_QWORD *)(*(_QWORD *)(*(_DWORD *)(*(_QWORD *)v17 + 4i64) + v17 + 64) + 8i64);
    (*(void (**)(void))(**((_QWORD **)&v28 + 1) + 8i64))();
    LODWORD(v19) = sub_140004AD0(&v28);
    v21 = v19;
    if ( *((_QWORD *)&v28 + 1) )
    {
      LODWORD(v22) = (*(int (**)(void))(**((_QWORD **)&v28 + 1) + 16i64))();
      if ( v22 )
        (**v22)(v22, 1i64);
    }
    LOBYTE(v20) = 10;
    v23 = (*(int (__fastcall **)(__int64, __int64))(*(_QWORD *)v21 + 64i64))(v21, v20);
    sub_1400059E0(v18, v23);
    sub_140004780(v18);
  }
  sub_14000E738();
  return sub_140008C90((unsigned __int64)&v25 ^ v32);
}
First off, we can see variables being declared. The ones we can take interest in to start off would be the following ones:
Code:
HANDLE v0;
PROCESSENTRY32 pe;
If you are a programmer and are familiar with Win32 API structures and functions, and have ever needed to loop through all the running processes or similar on the system for whatever purpose, you’ll know what PROCESSENTRY32 is commonly used for.

Code:
typedef struct tagPROCESSENTRY32 {
  DWORD     dwSize;
  DWORD     cntUsage;
  DWORD     th32ProcessID;
  ULONG_PTR th32DefaultHeapID;
  DWORD     th32ModuleID;
  DWORD     cntThreads;
  DWORD     th32ParentProcessID;
  LONG      pcPriClassBase;
  DWORD     dwFlags;
  TCHAR     szExeFile[MAX_PATH];
} PROCESSENTRY32, *PPROCESSENTRY32;
As you can see from the above structure, a lot of information can be stored (about a process running).


If we keep looking at the mainfunc function, we will see one of the very first things it does is call a function referencing “notepad.exe”.



Code:
sub_14000A640((__m128i *)&v26, (const __m128i *)"notepad.exe", 0xBui64);
Notice the & character before a variable v26 is referenced? & is used to set data to a variable, not pass in its contents to be used by a function. This implies that v26 will receive a value and it is somehow linked to “notepad.exe”. Lets keep following the function.





Code:
 pe.dwSize = 304;
 v0 = CreateToolhelp32Snapshot(2u, 0);
Earlier a variable called ‘pe’ (data-type PROCESSENTRY32) was declared and there was also a v0 variable (data-type HANDLE). The above pseudo-code shows us that a function called CreateToolhelp32Snapshot is going to be called.

Code:
HANDLE WINAPI CreateToolhelp32Snapshot(
  _In_ DWORD dwFlags,
  _In_ DWORD th32ProcessID
);
In the pseudo-code the decimal 2u is being passed as the dwFlags parameter (which is the same as ‘\x02\’). Let’s go to the MSDN documentation and check what can be passed as the dwFlags parameter: https://msdn.microsoft.com/en-us/library/windows/desktop/ms682489(v=vs.85).aspx


TH32CS_SNAPPROCESS
0x00000002

After looking at the above, we can see that the value for TH32CS_SNAPPROCESS is being passed into the function, and the second is NULL (0). This tells us that the sample is creating a snapshot of all the running processes.


Straight after this code has been executed, a loop occurs. Lets take a look at the rest of the entire operation.




Code:
while ( 1 )
  {
    LODWORD(v1) = sub_1400023F0(&v26);
    v2 = pe.szExeFile;
    v3 = v1 - ((_QWORD)&pe + 44);
    do
    {
      v4 = (unsigned __int8)v2[v3];
      v5 = (unsigned __int8)*v2 - v4;
      if ( (unsigned __int8)*v2 != v4 )
        break;
      ++v2;
    }
    while ( v4 );
    if ( !v5 )
      break;
    if ( !Process32Next(v0, &pe) )
    {
      CloseHandle(v0);
      if ( *((_QWORD *)&v27 + 1) >= 0x10ui64 )
      {
        v6 = v26;
        if ( (unsigned __int64)(*((_QWORD *)&v27 + 1) + 1i64) >= 0x1000 )
        {
          if ( v26 & 0x1F || (v7 = *(_QWORD *)(v26 - 8), v7 >= (unsigned __int64)v26) || (v6 = v26 - v7 - 8, v6 > 0x1F) )
          {
            sub_14000E984(v6);
            __debugbreak();
          }
          else
          {
            v6 = *(_QWORD *)(v26 - 8);
          }
        }
        sub_14000901C(v6);
      }
      v8 = 0;
      goto LABEL_24;
    }
  }
  CloseHandle(v0);
  v8 = pe.th32ProcessID;
  if ( *((_QWORD *)&v27 + 1) >= 0x10ui64 )
  {
    v9 = v26;
    if ( (unsigned __int64)(*((_QWORD *)&v27 + 1) + 1i64) >= 0x1000 )
    {
      if ( v26 & 0x1F || (v10 = *(_QWORD *)(v26 - 8), v10 >= (unsigned __int64)v26) || (v9 = v26 - v10 - 8, v9 > 0x1F) )
      {
        sub_14000E984(v9);
        __debugbreak();
      }
      else
      {
        v9 = *(_QWORD *)(v26 - 8);
      }
    }
    sub_14000901C(v9);
  }
To cut it short, you can more or less make sense out of it to assume the sample is enumerating through the snapshot and assigning each process entry with a PROCESSENTRY32 filled structure which is accessed within the while loop to determine a comparison between the data. Since the v26 variable is referenced quite a bit and it most likely holds a value for “notepad.exe”, and it is also used in comparison with the v8 variable which is assigned the value of pe.th32ProcessID, my guess is that the code tries to find the PID of notepad.exe (which would be useless on a normal machine as no one leaves notepad.exe running usually and the sample does not spawn it itself).

It also appears that the code for this process enumeration operation belonged within another function, but the compiler pushed it together (most likely because of optimization features being enabled).

In C++ you could expect something like this:
Code:
DWORD dwRetProcessId(std::string processname)
{
    HANDLE hProcess;
    PROCESSENTRY32 processEntry32;
    processEntry32.dwSize = sizeof(PROCESSENTRY32);
    hProcess = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, 0);
    do {
        if (!strcmp(processEntry32.szExeFile, processname.c_str()))
        {
            CloseHandle(hProcess);
            return processEntry32.th32ProcessID;
        }
    } while (Process32Next(hProcess, &processEntry32));
    CloseHandle(hProcess);
    return 0;
}
After this operation of searching through processes has finished, another function is called whilst passing the value of v8 as the parameter - v8 now holds the process ID of what was being searched for (notepad.exe in this case).



Code:
LODWORD(v11) = sub_140005D30(v8);
If we go to the function being called whilst passing in v8.






Code:
int __fastcall sub_140005D30(unsigned int a1)
{
  __int64 v1; // rbx@1
  HMODULE v2; // rax@1
  FARPROC v3; // rax@1
  int result; // eax@4
  __int64 v5; // [sp+0h] [bp-78h]@4
  __int64 v6; // [sp+20h] [bp-58h]@3
  __int64 v7; // [sp+28h] [bp-50h]@3
  __int64 v8; // [sp+30h] [bp-48h]@3
  int v9; // [sp+38h] [bp-40h]@3
  __int64 v10; // [sp+40h] [bp-38h]@3
  __int64 v11; // [sp+48h] [bp-30h]@3
  int v12; // [sp+50h] [bp-28h]@3
  __int128 v13; // [sp+58h] [bp-20h]@3
  __int64 v14; // [sp+68h] [bp-10h]@4

  v1 = a1;

  v2 = GetModuleHandleA("ntdll.dll");
  v3 = GetProcAddress(v2, "NtOpenProcess");

  if ( v3
    && (_DWORD)v1
    && (v9 = 48,
        v6 = 0i64,
        v10 = 0i64,
        v12 = 0,
        v11 = 0i64,
        v8 = 0i64,
        _mm_storeu_si128((__m128i *)&v13, 0i64),
        v7 = v1,
        ((int (__fastcall *)(__int64 *, signed __int64, int *, __int64 *))v3)(&v6, 0x2000000i64, &v9, &v7) >= 0) )
  {
    result = sub_140008C90((unsigned __int64)&v5 ^ v14);
  }
  else
  {
    result = sub_140008C90((unsigned __int64)&v5 ^ v14);
  }
  return result;
}
We can see that ntdll.dll and NtOpenProcess are being referenced. The variable v2 has a data-type of HMODULE and is used with GetModuleHandleA and then afterwards the function GetProcAddress is called - GetProcAddress is part of the Win32 API and is used to retrieve the address of an imported function). Afterwards, the address is used for a function call.

To cut it short, this function takes in a parameter which is for the PID of the target process and then it will call ntdll!NtOpenProcess to get a HANDLE to that process, and then it will return the process handle.

Going back to the mainfunc function, we can see the “C:\\opcode.dll” DLL path being put into a variable.




Code:
v29 = "C:\\opcode.dll";
However before that we can find:
Code:
 *(_QWORD *)&v28 = v11;
  *((_QWORD *)&v28 + 1) = 0x100000000i64;
It appears that there is a structure and values are being put into the structure, however we do not have access to the structure contents. The return values from various operations are all being grouped together for usage of an operation.




Code:
 LODWORD(v12) = sub_140001110((__int64)&v26);
  if ( v12 )
  {
    sub_140004C80(v13, (__int64)"Success\n");
  }
A function is being called and its return value is going into v12. Depending on this return value, a value “Success\n” is being used in another function call. Due to the “\n” this suggests an std::cout call (cout is a function within the iostream library in C++). However, “\n” isn’t always required. If !v12 occurs then it lands at the else { } of the if statement, where it puts the value from a call to GetLastError() into a variable and references “Failed”.

We will look at the function being called which uses the contents of v26.







Code:
int __fastcall sub_140001110(__int64 a1)
{
  __int128 v1; // xmm1@1
  SIZE_T v2; // rax@1
  void *v3; // rbx@1
  LPCVOID v4; // rdi@1
  void *v5; // rax@6
  HMODULE v6; // rax@7
  FARPROC v7; // rax@7
  DWORD *v8; // r15@8
  HMODULE v9; // rax@8
  HMODULE v10; // rax@8
  int (__fastcall *v11)(void *, LPVOID, LPCVOID, _QWORD); // rsi@8
  HMODULE v12; // rax@8
  FARPROC v13; // r12@8
  __int64 *v14; // rcx@8
  DWORD (__stdcall *v16)(LPVOID); // rsi@18
  void *v17; // rax@19
  __int64 v18; // [sp+0h] [bp-61h]@17
  DWORD flProtect[2]; // [sp+20h] [bp-41h]@11
  DWORD dwCreationFlags[2]; // [sp+28h] [bp-39h]@11
  LPDWORD lpThreadId; // [sp+30h] [bp-31h]@13
  LPVOID lpAddress; // [sp+38h] [bp-29h]@1
  SIZE_T dwSize; // [sp+40h] [bp-21h]@3
  char *v24; // [sp+48h] [bp-19h]@1
  SIZE_T NumberOfBytesWritten; // [sp+50h] [bp-11h]@3
  char v26; // [sp+58h] [bp-9h]@13
  __int128 v27; // [sp+68h] [bp+7h]@1
  LPCVOID lpBuffer; // [sp+78h] [bp+17h]@1
  FARPROC v29; // [sp+80h] [bp+1Fh]@8
  __int64 v30; // [sp+88h] [bp+27h]@10

  v1 = *(_OWORD *)a1;
  v2 = -1i64;
  lpBuffer = *(LPCVOID *)(a1 + 16);
  v3 = (void *)v1;
  v4 = lpBuffer;
  v27 = v1;
  lpAddress = 0i64;
  v24 = 0i64;
  do
    ++v2;
  while ( *((_BYTE *)lpBuffer + v2) );
  dwSize = v2;
  NumberOfBytesWritten = 0i64;
  if ( !(_QWORD)v1 )
  {
    if ( !DWORD2(v27) )
      return sub_140008C90((unsigned __int64)&v18 ^ v30);
    LODWORD(v5) = sub_140005D30(DWORD2(v27));
    v3 = v5;
    if ( !v5 )
      return sub_140008C90((unsigned __int64)&v18 ^ v30);
  }
  v6 = GetModuleHandleA("kernel32.dll");
  v7 = GetProcAddress(v6, "LoadLibraryA");
  if ( !DWORD3(v27) )
  {
    v16 = (DWORD (__stdcall *)(LPVOID))v7;
    if ( v7 )
    {
      v17 = VirtualAllocEx(v3, 0i64, dwSize, 0x3000u, 4u);
      lpAddress = v17;
      if ( !v17 )
      {
LABEL_16:
        CloseHandle(v3);
        return sub_140008C90((unsigned __int64)&v18 ^ v30);
      }
      if ( WriteProcessMemory(v3, v17, lpBuffer, dwSize, &NumberOfBytesWritten) )
      {
        v24 = (char *)CreateRemoteThread(v3, 0i64, 0i64, v16, lpAddress, 0, 0i64);
        if ( v24 )
        {
LABEL_22:
          if ( lpAddress )
            VirtualFreeEx(v3, lpAddress, dwSize, 0x8000u);
          CloseHandle(v3);
          return sub_140008C90((unsigned __int64)&v18 ^ v30);
        }
      }
    }
LABEL_14:
    if ( lpAddress )
      VirtualFreeEx(v3, lpAddress, dwSize, 0x8000u);
    goto LABEL_16;
  }
  *(_QWORD *)&v27 = v7;
  v8 = (DWORD *)v7;
  v9 = GetModuleHandleA("ntdll.dll");
  *((_QWORD *)&v27 + 1) = GetProcAddress(v9, "NtAllocateVirtualMemory");
  v10 = GetModuleHandleA("ntdll.dll");
  lpBuffer = GetProcAddress(v10, "NtWriteVirtualMemory");
  v11 = (int (__fastcall *)(void *, LPVOID, LPCVOID, _QWORD))lpBuffer;
  v12 = GetModuleHandleA("ntdll.dll");
  v13 = GetProcAddress(v12, "RtlCreateUserThread");
  v29 = v13;
  v14 = (__int64 *)&v27;
  while ( *v14 )
  {
    ++v14;
    if ( v14 == &v30 )
    {
      dwCreationFlags[0] = 4;
      flProtect[0] = 12288;
      if ( (*((int (__fastcall **)(_QWORD, _QWORD, _QWORD, _QWORD))&v27 + 1))(v3, &lpAddress, 0i64, &dwSize) >= 0 )
      {
        *(_QWORD *)flProtect = 0i64;
        if ( v11(v3, lpAddress, v4, (unsigned int)dwSize) >= 0 )
        {
          v24 = &v26;
          dwSize = (SIZE_T)&v24;
          lpThreadId = v8;
          *(_QWORD *)dwCreationFlags = 0i64;
          *(_QWORD *)flProtect = 0i64;
          if ( ((int (__fastcall *)(void *, _QWORD, _QWORD, _QWORD))v13)(v3, 0i64, 0i64, 0i64) >= 0 )
            goto LABEL_22;
        }
      }
      goto LABEL_14;
    }
  }
  return sub_140008C90((unsigned __int64)&v18 ^ v30);
}
The above function must be the injection routine (responsible for injecting the DLL). We can see that functions such as VirtualAllocEx, WriteProcessMemory and CreateRemoteThread are called. However, we can also see that an alternate route is address retrieval for NTAPI functions (NtAllocateVirtualMemory, NtWriteVirtualMemory and RtlCreateUserThread) via GetProcAddress is performed and then the variables containing the addresses for said functions are used for a call.





It is hard to make much sense out of what is going on because it looks complicated, you can re-name variables and function names to make it easier to read but it can still be quite difficult. However, the pseudo-code provides us an “idea” on what is going-on and how the sample works/what behaviour we should expect when performing dynamic analysis.


Dynamic analysis
We know that a common but simple method of DLL injection will rely on VirtualAllocEx, WriteProcessMemory and CreateRemoteThread. We also know that common code-cave injection uses the same functions but for a different purpose, and that shell-code injection still relies on getting its code executed somehow (e.g. thread hijacking for shell-code execution works different but typical shell-code injection can rely on a new thread). We even know that you can use an existing thread to execute a function via APC!

VirtualAllocEx (KERNEL32) -> NtAllocateVirtualMemory (NTDLL)
WriteProcessMemory (KERNEL32) -> NtWriteVirtualMemory (NTDLL)
CreateRemoteThread (KERNEL32) -> RtlCreateUserThread -> NtCreateThreadEx (NTDLL)
QueueUserAPC (KERNEL32) -> NtQueueApcThread (NTDLL)

We know that the injection techniques discussed in this article rely on a handle to process and/or a thread handle, so other functions we should be interested in:

OpenProcess (KERNEL32) -> NtOpenProcess (NTDLL)
OpenThread (KERNEL32) -> NtOpenThread (NTDLL)


I’ll be using API Monitor to monitor the API calls. You can find out more information about it here: API Monitor: Spy on API Calls and COM Interfaces (Freeware 32-bit and 64-bit Versions!) | rohitab.com





I have set a break-point (Before call) for RtlCreateUserThread, however I’ve marked NtAllocateVirtualMemory and NtWriteVirtualMemory to be logged on the call list. I already ran notepad.exe so the sample would be able to find the process. You can see here that there is no loaded module prior to the injection operation.












After starting to monitor the program, due to the injection attempt being so quick, I was hit by a breakpoint for RtlCreateUserProcess (used for the injection) at the speed of light. By the time this break-point was hit, the process had already opened a handle to notepad.exe, allocated memory within the process and filled that allocated memory with data.

The allocation operations (NtAllocateVirtualMemory) are not all related to the injection routine; two of them were performed by kernel32.dll, related to the program itself. The third NtAllocateVirtualMemory API call which is logged was related to the injection routine, and the fourth entry for NtWriteVirtualMemory was also related to the injection routine.




After the memory allocation/write attempts had been completed, the remote thread creation is attempted.







I let the call pass through and then checked the loaded modules within notepad.exe. As predicted, the injection operation had been successfully completed. You can see opcode.dll listed under the modules (it is in the Process Environment Block for the modules list so Process Hacker can find it there).




The only problem is that we never got to know what the PID of the target process being attacked by the injection routine was. We get the HANDLE parameters on the API call logs but this does not tell us the PID. The solution is to monitor with a log for NtOpenProcess, allowing you to check the CLIENT_ID parameter which contains an entry called UniqueProcess (this will contain a PVOID data-type, which is the PID).












Under real circumstances where you do not know what the sample will be doing, it can be very appropriate to rely on debugging. Two very popular debuggers would be WinDbg and OllyDbg, you can even use debugging in IDA Pro (which I am a fan of). However, monitoring API calls with API Monitor is fine as well - just make sure to monitor for force crash attempts and similar and log things as you go along in-case the monitored malware does something to crash the system/make you lose the results, etc.


Thank you for reading and hopefully this helped you,
- Opcode
 

Attachments

Last edited by a moderator:

Andy Ful

Level 30
Content Creator
Verified
Joined
Dec 23, 2014
Messages
1,957
OS
Windows 10
Antivirus
Microsoft
#3
Very nice.:)
I used API Monitor to identify calls into Safer Apis, related to Software Restriction Policies. It is a very good piece of software.
Maybe you will find some time to write something about code injection mitigations in Windows 10.
 
D

Deleted member 65228

Guest
#4
8. Perform a call to kernel32!QueueUserAPC to attempt the injection (lands at ntdll!NtQueueApcThread
You can use NtQueueApcThread (NTDLL) at ease, it is hardly any trickier than using QueueUserAPC (KERNEL32).

Code:
typedef NTSTATUS(NTAPI *pNtQueueApcThread)(HANDLE ThreadHandle, PIO_APC_ROUTINE ApcRoutine, PVOID ApcRoutineContext OPTIONAL, PIO_STATUS_BLOCK ApcStatusBlock OPTIONAL, ULONG ApcReserved OPTIONAL);
To set up the function for usage (dynamic import):
Code:
FARPROC fpNtQueueApcThread = GetProcAddress(GetModuleHandle("ntdll.dll"), "NtQueueApcThread");
if (fpNtQueueApcThread)
{
    pNtQueueApcThread fNtQueueApcThread = (pNtQueueApcThread)fpNtQueueApcThread;
}
To call the function:
Code:
fNtQueueApcThread(THREADHANDLE, (PIO_APC_ROUTINE)TARGETADDRESS, ARGUMENTADDRESS, NULL, NULL);
Parameters explained:
THREADHANDLE -> HANDLE to the target thread you are targeting.
TARGETADDRESS -> address of the function you want the target thread to call (e.g. LoadLibraryA)
ARGUMENTADDRESS -> address of the parameter for the function at the TARGETADDRESS (put NULL if you don't have one), e.g. where you allocated the memory for the DLL path storage.

You can make use of NtSuspendThread/NtAlertResumeThread to enhance it. You suspend the thread before performing the APC attempt, and then you call NtAlertResumeThread so you resume the thread but also "wake" it up for the APC queue.

Maybe you will find some time to write something about code injection mitigations in Windows 10.
Prior to Windows 10 there were already implementations for process protection embedded within the Windows Kernel, such as with the function PspOpenProcess (it isn't exported by ntoskrnl.exe therefore to use it/hook it you'd need to manually find the address of it), which blocked the call depending on the protected process flag within an *EPROCESS structure passed to the function and the requested desired access rights. Since Windows 8 there was also "Protected Processes Light" which has different protection modes for stricter protection, also a method of blocking code injection attacks (however all of this stuff is exclusive to Windows protected processes such as csrss.exe, smss.exe, with the exception of PPL but Microsoft would need to grant you permission to use it unless you are already in kernel-mode and manually enable it which can be done without triggering anything like a BugCheck BSOD since it isn't part of PatchGuard AFAIK).

In Windows 10 built-in software such as Microsoft Edge do have mitigation against attacks like code injection. If you try to inject a DLL normally it will fail, however manual map injection should not fail. I've just done a check and memory allocation/write attacks are successful, even thread creation and APC targeting handles within MicrosoftEdge.exe/MicrosoftEdgeCP.exe (processes) are successful, however you cannot just target LoadLibraryA/W for it. Since memory allocation/write attempts work, you can manually copy the target DLL over to the address space of MicrosoftEdge.exe/MicrosoftEdgeCP.exe and then you can inject your own stub code to resolve imports and call the DLL main entry point. That should work fine.

Regarding Microsoft Edge, it actually has a Windows module called apphelp.dll loaded. Therefore if you do inject into it, be cautious with how you retrieve addresses... Don't use an address to GetProcAddress to use that function, scan the Export Address Table instead. Reason being is apphelp.dll is linked to the Shim engine which was implemented in Windows 8, and this can cause spoofed address returns when you are trying to get an address from a module like ntdll.dll depending on the circumstances. I encountered this before after injecting code into a running program to detour a few APIs, but getting the address from the Export Address Table will resolve the problem.

I took a look at it awhile ago, but here are a few images which can imply things regarding hooks. If you inspect apphelp.dll, shimeng.dll you can find some information about it. Here are just some quick screenshots from quick inspection, not proper analysis.









As always (since Vista) you can use kernel-mode callbacks which are of course still supported on Windows 10 systems:
- ObRegisterCallbacks -> Block handle creation for the process itself and its threads depending on the requested access rights (strip the access rights).
- PsSetLoadImageNotifyRoutine -> Notification for when an image is loaded in memory. You could filter to detect when a new module is loaded within your own process and then take any necessary steps to remove the injected module if you can.
- PsSetCreateThreadNotifyRoutine - Notification for thread creation so you could filter for your own process and then sort it out if the thread creation is not permitted.

You can block code injection attacks entirely from user-mode as well though if it is essential, here are some examples:
- Local hook on ntdll.dll!LdrLoadDll. This would allow you to block unwanted modules from being loaded in your process... Hooking ntdll.dll!LdrpLoadDll locally would be more secure. This would mitigate attacks which rely on a remote thread to call LoadLibraryA/W.

- Local hook on ntdll.dll!LdrInitializeThunk. This would allow you to block unwanted thread creation within your own process, so if another process creates a remote thread in your process the hook callback will become triggered.

- Local hook on ntdll.dll!KiUserApcDispatcher. This would allow you to block unwanted Asynchronous Procedure Calls targeting threads within your own process.






Using kernel-mode would be safer though from a security point-of-view, and the Microsoft protected process implementations are safe IMO - you can actually remove the PPL feature from a process which has it but you'd need to be in kernel-mode to do it anyway, and of course if malware is there then it is already game over.

Hope this helped,
- Opcode.
 
Last edited by a moderator:
D

Deleted member 65228

Guest
#5
To name a few examples of malware which used code injection for whichever purpose: Zeus (banking malware); Carberp (banking malware); Kronos (banking malware); SpyEye (banking malware)
Banking malware tend to inject code into the browser processes so they can detour APIs exported by modules such as nss3.dll (Old Google Chrome versions, current Firefox versions), wininet.dll (Internet Explorer & Microsoft Edge), and the current Google Chrome versions relies on SSL functions.










A complex banking Trojan could rely on shell-code injection for both the API hook operation and the callback routines (memory allocation, memory write and thread creation/APC to use an existent thread). A less complex banking Trojan could rely on standard DLL injection attacks however in this case, browsers such as Microsoft Edge would be protected (because you need to do more than basic common method/s to successfully inject, such as use manual mapping for a DLL instead).

Form-grabbing/WebInject can be really dangerous since the last thing you want is to have your login credentials stolen in real-time and find out the next morning that your bank account has been emptied out. Form-grabbing functionality will target functions used for sending the requests performed by the browser, whereas WebInject functionality will target functions called by the browser whilst receiving data after it has been decrypted (but before it is presented to the user).

Win32 networking functions such as WSASend and WSARecv can be intercepted as well.

It is great to see browsers such as Microsoft Edge trying to tackle code injection attacks because really preventing code injection is a key part in mitigating banking malware. I can't say the same about Firefox though... Yet.

It would be a wise decision to use a tool to protect the memory of your web browsers, or run them within a contained environment (e.g. Avast have a secure browser which can be used). This would help mitigate code injection attacks to your web-browser, thus helping stop form-grabbing/WebInject attacks should you ever become infected and not be aware!
 

Andy Ful

Level 30
Content Creator
Verified
Joined
Dec 23, 2014
Messages
1,957
OS
Windows 10
Antivirus
Microsoft
#6
...
It would be a wise decision to use a tool to protect the memory of your web browsers, or run them within a contained environment (e.g. Avast have a secure browser which can be used). This would help mitigate code injection attacks to your web-browser, thus helping stop form-grabbing/WebInject attacks should you ever become infected and not be aware!
...
I think that something like Excubits Memprotect, could be adopted to mitigate code injection attacks to web-browsers.
 

Umbra

Level 85
Content Creator
Verified
Joined
May 16, 2011
Messages
18,417
OS
Windows 10
Antivirus
Default-Deny
#8
Hi

Thanks for the write-up. It's too difficult for me to comprehend.

As an end user I would like to know what software can protect me from the different types of code injection techniques?
AVs with Behavior Blockers or HIPS. Anti-exe, Anti-exploits, and some SRPs.
 
Last edited:
D

Deleted member 65228

Guest
#9
As an end user I would like to know what software can protect me from the different types of code injection techniques?
The short answer would be to use software which has BB/HIPS and supports preventing code injection attacks, but you will never be "100% secure" from code injection attacks altogether. Why? New techniques are invented all the time (e.g. via new exploitation methods (e.g. look at Atom Bombing from last year)), and if malware is already embedded on the system with high privileges (e.g. kernel-mode) then it can bypass whatever you have running to protect (which means it is already game over and time for you to format and reinstall OS).

However, here are examples of security software which can be beneficial
1. Emsisoft Anti-Malware -> Behavior Blocker
2. Kaspersky Internet Security -> Application Control
3. Comodo Sandbox / Sandboxie
4. Execubits MemProtect (protect the processes for things like the web browser)
5. HitmanPro.Alert -> Detects a compromised browser
6. AppGuard -> SRP

If you use User Account Control properly then you will be better off as well. For malware to inject code into a process running on another user-account (e.g. SYSTEM) it will require to be elevated and have acquired debugging rights (SeDebugPrivilege). Due to how the Windows Integrity Mechanism, a standard rights process isn't going to be injecting code into an elevated process. Bypasses for UAC can pop up from time to time but if you keep your OS up-to-date with the latest security patches then you'll get patches for them too.

There is no need to be entirely paranoid about it though. Stick to a neat configuration and apply your own knowledge and you'll be fine hopefully. Personally if I had to decide between the above products I would go for either EAM, KIS or WD + HMP.A whilst keeping UAC and Smart Screen enabled. Just my two cents. :)
 

Umbra

Level 85
Content Creator
Verified
Joined
May 16, 2011
Messages
18,417
OS
Windows 10
Antivirus
Default-Deny
#10
However, here are examples of security software which can be beneficial
1. Emsisoft Anti-Malware -> Behavior Blocker
2. Kaspersky Internet Security -> Application Control
3. Comodo Sandbox / Sandboxie
4. Execubits MemProtect (protect the processes for things like the web browser)
5. HitmanPro.Alert -> Detects a compromised browser
6. AppGuard -> SRP
I'm using 1,3 (equivalent) , 5 and 6 ; i think i'm good :D
 

Sunshine-boy

Level 26
Verified
Joined
Apr 1, 2017
Messages
1,587
OS
Windows 10
Antivirus
ESET
#11
@Opcode
Thanks for sharing your knowledge with us:notworthy:
I appreciate you :giggle:Unfortunately, I cant understand your analysis because it's too complicated for me but I like guys like you pls keep posting:notworthy:
PPl like you can make this forum a better place:)
What about HIPS module in ESET?
Can you test the Eset hips in interactive mode??
 

HarborFront

Level 41
Content Creator
Verified
Joined
Oct 9, 2016
Messages
3,040
#12
The short answer would be to use software which has BB/HIPS and supports preventing code injection attacks, but you will never be "100% secure" from code injection attacks altogether. Why? New techniques are invented all the time (e.g. via new exploitation methods (e.g. look at Atom Bombing from last year)), and if malware is already embedded on the system with high privileges (e.g. kernel-mode) then it can bypass whatever you have running to protect (which means it is already game over and time for you to format and reinstall OS).

However, here are examples of security software which can be beneficial
1. Emsisoft Anti-Malware -> Behavior Blocker
2. Kaspersky Internet Security -> Application Control
3. Comodo Sandbox / Sandboxie
4. Execubits MemProtect (protect the processes for things like the web browser)
5. HitmanPro.Alert -> Detects a compromised browser
6. AppGuard -> SRP

If you use User Account Control properly then you will be better off as well. For malware to inject code into a process running on another user-account (e.g. SYSTEM) it will require to be elevated and have acquired debugging rights (SeDebugPrivilege). Due to how the Windows Integrity Mechanism, a standard rights process isn't going to be injecting code into an elevated process. Bypasses for UAC can pop up from time to time but if you keep your OS up-to-date with the latest security patches then you'll get patches for them too.

There is no need to be entirely paranoid about it though. Stick to a neat configuration and apply your own knowledge and you'll be fine hopefully. Personally if I had to decide between the above products I would go for either EAM, KIS or WD + HMP.A whilst keeping UAC and Smart Screen enabled. Just my two cents. :)
Many thanks.

Is it possible to categorize which software for protection against which type of code injection technique(s)? Not possible to install so many, right?

Thanks again
 
D

Deleted member 65228

Guest
#13
What about HIPS module in ESET?
Can you test the Eset hips in interactive mode??
You can configure the ESET HIPS so it will perform better, but I haven't used ESET for a few long time (years I think). The best person to ask about configurations would be Umbra, Cruelsister or Lockdown though.

s it possible to categorize which software for protection against which type of code injection technique(s)? Not possible to install so many, right?
Emsisoft Anti-Malware is sufficient for blocking code injection attacks like DLL injection (standard methods and maybe manual mapping as well), APC injection and code-cave injection.

HitmanPro.Alert might not actually block code injection (hopefully someone who uses it can correct me if I am wrong) but it is capable of identifying a compromised browser (e.g. if banking malware detours APIs exported by modules the browser will use to steal data).

As for Kaspersky Application Control (within the Internet Security) depending on the configuration and the trust level of the malicious process, it won't be able to perform operations for allocating memory/writing to memory within another process, let alone make a remote thread. Kaspersky also have a safe browser for online banking which is isolated so processes running on the host cannot intercept the safe browser at all (e.g. prevent banking malware affecting it).

Sandbox solutions like Comodo Sandbox and Sandboxie will isolate the program and thus depending on the configuration they won't be able to attack non-isolated processes at all. Comodo uses virtualisation, I believe Sandboxie does not but I could be wrong.

I'm using 1,3 (equivalent) , 5 and 6 ; i think i'm good :D
There's more chance of the Bill Gates getting infected with malware than you! :D

--------------------------------------------------------------------------------
To add to the list I mentioned above, you could use an Ai system alongside an AV solution or anti-exploit, etc. I don't like doing this but I have noticed that many people tend to do this so I will do a quick test to demonstrate a positive of doing this.
Let's take the source code to the injector demo I wrote. We will pack it with UPX just for demonstration purposes. Now we will see the VoodooAi score:




Now I will take the source code and perform some changes, edit resource information for the PE after compilation and manually pump the file size up with a HEX editor

Time to run the sample...




Look at that! No alert will be made with the Auto-Pilot because it appeared too and clean to the Ai score, and even with Smart Mode enabled you can see how low that score is for a sample containing code which can be easily altered for injecting code into the browser or similar (without even requiring administrative rights). The sample will inject a DLL called opcode.dll into another process on the system. This is evidence that packing is a disadvantage!

Since a lot of malware tends to be packed, the entropy calculation for those PE files will be taken into account by Ai systems like VoodooShield which would increase the Ai score. Thus detecting more malicious programs.

The point here isn't that VoodooAi or other Ai systems can be defeated by thinking outside the box instead, the point is that since a majority of malware samples do rely on packing, Ai systems are more likely to mark those samples at a suspicious level. If you ever do happen to run a malware sample which is packed by accident, a big alert with a red background and an Ai score which doesn't look good to your eye could make you re-think carefully before pressing Allow.

I would have tested with products like Cylance but I don't have access to them. However if you do mix Ai, I recommend you keep UAC (use it properly) and use a normal security solution which possess dynamic blocking capabilities as well (because for example the sample I wrote to test all of this, if that was real malware being deployed and you had an Auto-Pilot mode enabled on an Ai, then it would just go straight through and there'd be nothing to block operations in real-time!)

Sorry if this bit was a bit off-topic, since we were talking about configurations I just thought to mention mixing Ai could be beneficial in some ways.
 
D

Deleted member 65228

Guest
#15
Wtf hacker:notworthy:
Don't do this...You scared me:notworthy:so AI products are useless against advanced attacks?or what?
No, they aren't useless. I was using it as a demonstration to explain why they can be beneficial.

1. A lot of malware authors seem to think packing will make their malware full-proof (they clearly do not know of Ai/Cloud file reputation and BB/HIPS/Dynamic Heuristics).
2. Ai is more likely to flag a packed PE as suspicious because it would usually take into account the entropy calculation. Entropy is higher for a PE which is packed/obfuscated, and lower for a PE which is not.
3. Due to a lot of samples being packed these days, a lot of malicious PE (*.exe) files being applied to Ai training are probably packed which means the Ai can get better at flagging samples which are packed on its own (because the scanned PE would be more alike the malicious trained files... Low amount of imports, higher entropy, etc.).

If you want my advice if you use an Ai product, I would suggest to keep using it alongside a security solution which has dynamic blocking capabilities just in-case it does fail (it is all about layered protection these days!).
 
D

Deleted member 65228

Guest
#17
Regarding Microsoft Edge, it actually has a Windows module called apphelp.dll loaded. Therefore if you do inject into it, be cautious with how you retrieve addresses... Don't use an address to GetProcAddress to use that function, scan the Export Address Table instead. Reason being is apphelp.dll is linked to the Shim engine which was implemented in Windows 8, and this can cause spoofed address returns when you are trying to get an address from a module like ntdll.dll depending on the circumstances. I encountered this before after injecting code into a running program to detour a few APIs, but getting the address from the Export Address Table will resolve the problem.
kernel32.dll!GetProcAddress will internally call ntdll.dll!LdrGetProcedureAddress, LdrGetProcedureAddress will result in the address being obtained from the Export Address Table. If LdrGetProcedureAddress is hooked, the returned address can be spoofed to point you to a callback function from the author of the hook instead of the real function; the solution is to scan through the exported functions of the target module (*.DLL) yourself to retrieve the real address - also useful when performing code-cave injection where you won't be able to use the functions statically linked and present within the Import Address Table by default (you'd cause a crash).

Code:
FARPROC fpFindExportAddress(HMODULE hModule, LPCSTR lpFunctionName)
{
    void *fpFunctionAddress = 0; // used later to hold the address from EAT for the target function

    // get the DOS header from the module image
    IMAGE_DOS_HEADER *ImageDosHeader = (IMAGE_DOS_HEADER*)hModule;

    // check the image dos signature
    if (ImageDosHeader->e_magic == IMAGE_DOS_SIGNATURE)
    {
        // get the image nt headers
        IMAGE_NT_HEADERS *ImageNtHeaders = (IMAGE_NT_HEADERS*)((BYTE*)ImageDosHeader + ImageDosHeader->e_lfanew);

        // check the nt headers signature
        if (ImageNtHeaders->Signature == IMAGE_NT_SIGNATURE)
        {
            // get the image optional headers
            IMAGE_OPTIONAL_HEADER *ImageOptionalHeader = &ImageNtHeaders->OptionalHeader;

            // get the image data directory
            IMAGE_DATA_DIRECTORY *ImageDataDirectory = &ImageOptionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];

            // get the image export directory
            IMAGE_EXPORT_DIRECTORY *ImageExportDirectory = (IMAGE_EXPORT_DIRECTORY*)((BYTE*)hModule + ImageOptionalHeader->DataDirectory[0].VirtualAddress);

            // get a count of the exported function names by the target module (which tells us how many functions we need to compare with)
            ULONG *ulExportedNames = (ULONG*)((BYTE*)hModule + ImageExportDirectory->AddressOfNames);

            UINT uAddress = 0; // used later for calculations

            char *FunctionName = ""; // used later for holding the current function name being checked

            // loop through all the functions exported
            for (int i = 0; i < ImageExportDirectory->NumberOfNames; i++)
            {
                // assign the function name
                FunctionName = (char*)((BYTE*)hModule + ulExportedNames[i]);

                // compare to check if it is the function we want the EAT address of
                if (!strcmp(FunctionName, lpFunctionName))
                {
                    // grab the ordinal for the function so we can get the address after
                    USHORT uExportedOrdinal = ((USHORT*)((BYTE*)hModule + ImageExportDirectory->AddressOfNameOrdinals))[i];

                    // grab the function address using the ordinal we obtained
                    uAddress = ((UINT*)((BYTE*)hModule + ImageExportDirectory->AddressOfFunctions))[uExportedOrdinal];

                    // now the EAT address for our function = address of module + function address we obtained :)
                    fpFunctionAddress = (void*)((BYTE*)hModule + uAddress);
                }
            }
        }
    }
    return (FARPROC)fpFunctionAddress; // return back the address
}
You can adapt the above function I wrote to compare addresses between the Import Address Table (IAT) and the Export Address Table (EAT) to detect address manipulation within the IAT (IAT hooking), and then restore it by changing the address in the IAT back to the one from the EAT. You can also adapt it to actually hook the Export Address Table. The function supports both x86 and x64 processes due to me using the data types I used such as ULONG and IMAGE_NT_HEADERS (instead of IMAGE_NT_HEADERS32 or IMAGE_NT_HEADERS64 -> it changes depending on the compile configuration), so the right one will be used automatically. I stuck to using FARPROC for the address data-type return because GetProcAddress is also type FARPROC.

winnt.h:
Code:
#ifdef _WIN64
typedef IMAGE_NT_HEADERS64                  IMAGE_NT_HEADERS;
typedef PIMAGE_NT_HEADERS64                 PIMAGE_NT_HEADERS;
#else
typedef IMAGE_NT_HEADERS32                  IMAGE_NT_HEADERS;
typedef PIMAGE_NT_HEADERS32                 PIMAGE_NT_HEADERS;
#endif
If you'd like to learn more about PE files:
Peering Inside the PE: A Tour of the Win32 Portable Executable File Format
https://www.curlybrace.com/archive/PE File Structure.pdf
http://www.pelib.com/resources/luevel.txt
PE - OSDev Wiki
Anatomy of a .NET Assembly - PE Headers - Simple Talk (.NET PE analysis)
 
Last edited by a moderator:

Andy Ful

Level 30
Content Creator
Verified
Joined
Dec 23, 2014
Messages
1,957
OS
Windows 10
Antivirus
Microsoft
#18
...
You can adapt the above function I wrote to compare addresses between the Import Address Table (IAT) and the Export Address Table (EAT) to detect address manipulation within the IAT (IAT hooking), and then restore it by changing the address in the IAT back to the one from the EAT. You can also adapt it to actually hook the Export Address Table.
...
I would like to see the @wave (@kram7750) posts here. That would be an interesting conversation.
 
D

Deleted member 65228

Guest
#19
It is great to see browsers such as Microsoft Edge trying to tackle code injection attacks because really preventing code injection is a key part in mitigating banking malware. I can't say the same about Firefox though... Yet.
Google Chrome is better at blocking code injection compared to Internet Explorer, Firefox and Opera as well.



I did find a way to do it via normal methods (which it would have usually blocked such as remote thread/APC w/ LoadLibrary, image above). You inject before the process' main thread is resumed, so when the process starts actually executing code, your DLL is already loaded. It seems Google Chrome does not check the loaded DLLs when it starts-up, but only mitigates code injection after it has started running.

An idea for Google Chrome would be to check the loaded modules at start-up.

Microsoft Edge is one step ahead because what worked on Google Chrome failed for Microsoft Edge. Good on Microsoft! :)
 
Last edited by a moderator: