Threathunting Malware Analysis Series A5
Threathunting Malware Analysis Series A5
com
0. Quote
“Things go wrong. The odds catch up. Probability is like gravity: you cannot negotiate with gravity”.
(Det. James 'Sonny' Crockett | Miami Vice movie - 2006)
1. Introduction
Welcome to the fifth article of Malware Analysis Series (MAS). If readers haven’t read the first four
articles yet, all of them are available on the following links:
▪ MAS_1: https://exploitreversing.com/2021/12/03/malware-analysis-series-mas-article-1/
▪ MAS_2: https://exploitreversing.com/2022/02/03/malware-analysis-series-mas-article-2/
▪ MAS_3: https://exploitreversing.com/2022/05/05/malware-analysis-series-mas-article-3/
▪ MAS_4: https://exploitreversing.com/2022/05/12/malware-analysis-series-mas-article-4/
We have covered many different topics so far and, in a general way, the main goal continues being to
present fundamental concepts, and applied and practical approaches on malware analysis to help readers
to build up all necessary skills and move forward on their own analysis and learning path. As many readers
know, it isn’t my intention to propose hard samples because it’d be completely useless for an effective
learning and daily work, and in my opinion, would also be an unnecessary showing off behavior.
In the first four articles we presented, explained, applied, and learned techniques related to:
▪ malware profiling and main tools used to get fundamental information on malware samples.
▪ basic obfuscation (Control Flow Flattening) and anti-forensics concepts.
▪ main evidence’s items used to recognize a packed code and their associated challenges.
▪ usual and well-known unpacking tricks and how to fix IAT (Import Address Table) of extracted
binaries from memory.
▪ code injection review and managing DLL / API hashing resolution cases.
▪ unpacking methods and APIs to setup breakpoint over the unpacking procedure.
▪ how to write C2 data configuration extractors using Python, IDC, and IDA Python.
▪ how to write string de-deobfuscating scripts.
▪ relevant IDA Pro plugins commonly used over the analysis.
▪ .NET reflection concepts, internals unpacking and de-obfuscation.
▪ parsing and recognizing PE structures using IDA Pro.
▪ handling C++ structures through enumeration, local types, and structures.
1|Page
https://exploitreversing.com
Therefore, we’re ready to move forward and, in this article, we’ll be analyzing few features of Bumblebee
malware. As I’ve already mentioned previously, we don’t have any intention to dissecting it, but only some
interesting aspects of it.
2. Acknowledgments
I’d like to publicly thank Ilfak Guilfanov (@ilfak) and Hex-Rays (@HexRaysSA) for supporting this project
by providing me with a personal license of the IDA Pro.
My gratitude is endless because certainly I couldn’t keep writing this series without a personal license
(without depending on corporate licenses).
Honestly, I don’t have enough words to say how happy, thankful, and fortunate I feel myself in receiving
their help. Although it’s already much more than I would be able to dream in receiving, last June/2022 Ilfak
and Hex-Rays once again kindly agreed in helping me by providing new licenses of IDA Pro for other
platforms due to new series I’ve just started writing and planned to release as soon as possible. Personally,
all words from Ilfak expressing his trust and praise about this series of articles until now are the most
important for me.
Once again: thank you for everything, Ilfak.
3. Environment Setup
This article has a lab setup using the following environment:
▪ Windows 11 running in a virtual machine. You’re able to download a virtual machine for VMware,
Hyper-V, VirtualBox or Parallels from Microsoft on: https://developer.microsoft.com/en-
us/windows/downloads/virtual-machines/. If you already have a valid license for Windows 11, so
you can download the ISO file from: https://www.microsoft.com/software-download/windows11
▪ IDA Pro or IDA Home version (@HexRaysSA): https://hex-rays.com/ida-pro/ . I’ll be using IDA Pro
version 8.x and, mainly, the Hex-Rays Decompiler in this article.
▪ x64dbg(@x64dbg): https://x64dbg.com/
▪ PEBear (@hasherezade): https://github.com/hasherezade/pe-bear-releases
▪ DiE (from @horsicq): https://github.com/horsicq/DIE-engine/releases
▪ CFF Explorer: https://ntcore.com/?page_id=388
▪ HxD editor: https://mh-nexus.de/en/hxd/
▪ Resource Hacker: http://www.angusj.com/resourcehacker/
▪ Malwoverview: https://github.com/alexandreborges/malwoverview
▪ Floss: pip install -U flare-floss | https://github.com/mandiant/flare-floss/releases/tag/v2.0.0
▪ Capa: pip install -U flare-capa | https://github.com/mandiant/capa/releases
To get further information about lab configuration, I recommend readers to reserve some time to review
the first and second articles of this series. Both articles present concepts about the unpacking topic and
other details that, eventually, could be useful.
2|Page
https://exploitreversing.com
4. References
I could find several articles analyzing Bumblebee and, although I haven’t had the opportunity to read them
(my time is incredibly short), I recommend readers to do it because they were written by excellent security
researchers and companies, which covered and analyzed several aspects of the same family, and readers
can learn what’s more appropriate for their work. The list below doesn’t have any preferred order:
▪ https://blog.google/threat-analysis-group/exposing-initial-access-broker-ties-conti/
▪ https://blog.sekoia.io/bumblebee-a-new-trendy-loader-for-initial-access-brokers/
▪ https://blog.cyble.com/2022/06/07/bumblebee-loader-on-the-rise/
▪ https://symantec-enterprise-blogs.security.com/blogs/threat-intelligence/bumblebee-loader-
cybercrime
▪ https://research.nccgroup.com/2022/04/29/adventures-in-the-land-of-bumblebee-a-new-
malicious-loader/
▪ https://www.cynet.com/blog/orion-threat-alert-flight-of-the-bumblebee/
▪ https://www.proofpoint.com/us/blog/threat-insight/bumblebee-is-still-transforming
3|Page
https://exploitreversing.com
Certainly, there’re several other excellent blogs explaining concepts and applied techniques about
mentioned topics. I’ll include these references as soon as I learn about them in next articles.
6. Gathering Information
We’ll be examining the following sample:
57c4bdf0a644df4fd39f3d73d4570e6c88d8b7239ab4a395dba441ab15a5024f.
As usual, we should acquire initial information about our sample, which is available to download from
Malware Bazaar:
Readers can unzip (using 7z e <zip file>) the malware sample and there will be a file with .img extension.
Afterwards, it’s quite easy to use the 7z command to “unpack” this .img file and we’re going to find the
following files:
4|Page
https://exploitreversing.com
[Figure 3] Checking the inf.bat content and decoding the .lnk file
[Figure 4] Decoding the .lnk file – truncated output for saving space
In few words, we learned from figures above that:
a. the order of execution is: ScannedDocuments-0622.lnk → inf.bat → information.dll
b. the DLL is a 64-bit binary and one of its exported functions named hKOgtkmCis is executed.
Thus, it’s clear for us that we must analyze the DLL (this time is 64-bit) to understand what the threat does.
However, before proceeding, let’s collect further information that could, eventually, help us:
5|Page
https://exploitreversing.com
6|Page
https://exploitreversing.com
8|Page
https://exploitreversing.com
disabling system or binary’s memory randomization (also known as dynamic rebasing) to prevent the
executable of being allocated in different memory addresses each time it’s executed.
To accomplish this task readers could take three approaches:
▪ Disabling memory randomization globally on the system:
▪ Disabling binary’s memory randomization feature (dynamic rebasing) using command line:
▪ setdllcharacteristics.exe -d <binary>
Usually, people prefer acting on the binary instead of the system, but it a personal decision of each one.
Remember from previous articles of this series, the recommended approach to debug a DLL is to use the
rundll32.exe (C:\Windows\System32) and pass the DLL and the respective exported function (or ordinal
9|Page
https://exploitreversing.com
number) as argument. Thus, launch the x64dbg, open the rundll32.exe into debugger and go to File |
Change Command Line and alter its content as shown below:
▪ "C:\Windows\System32\rundll32.exe"
C:\Users\Administrator\Desktop\ARTICLES\MAS_5\mas_5_dll.bin,hKOgtkmCis
Restart/reload the x64dbg session (CTRL+F2) and the debugger will stop at System Breakpoint. Run once
(F9) and debugger will hit the Entry BreakPoint. Now you can configure breakpoints on the functions
mentioned previously. If you want, after hitting the first break point, you can setup a breakpoint at
beginning of exported function (hKOgtkmCis) by following the standard procedure: CTRL+G and enter
mas_5_dll.bin.hKOgtkmCis.
Readers will realize that they’re not able to get anything by following the dump of data pointed by
VirtualAlloc’s returned address (RAX). Nonetheless, you’re able to see a ERW region as shown below:
10 | P a g e
https://exploitreversing.com
11 | P a g e
https://exploitreversing.com
We have to right-click each one of the found addresses and then choosing Following in Dump:
12 | P a g e
https://exploitreversing.com
Open the extracted binary onto PEBear and readers will see a binary with IAT destroyed, but it isn’t any
problem. As the unpacked malware has been dumped from memory, so it’s in “mapped format” and its
respective addresses represent the memory addresses and not the raw (on disk) addresses:
14 | P a g e
https://exploitreversing.com
▪ __cdecl
o The caller is responsible for cleaning the stack.
o At same way of __stdcall, arguments are passed onto the stack.
o This calling convention is the default one for C and C++ programs.
o The __cdecl is used in Microsoft compilers.
o Variadic functions use this convention because the callee don’t know how many arguments
were passed.
▪ __fastcall
o This convention is only applied to x86 architecture.
o In Microsoft compilers, first two arguments are passed by register (ECX and EDX,
respectively). All remaining arguments are passed via stack. Other compilers use different
scheme.
o The callee is responsible for cleaning the stack.
▪ __thiscall
o It’s the convention used for C++ class member functions on x86 architecture.
o The callee is responsible for cleaning the stack.
o The “this” pointer is passed through ECX register.
o “this” pointers are available for non-static C++ member functions.
▪ __clrcall
o This determines that a function can only be called from a managed code.
o It must be used for virtual functions called from managed mode.
o This convention can’t be used for functions being called from native code.
▪ __vectorcall
o Arguments are passed through registers (when possible).
o This convention used more registers than __fastcall .
o Only supported in x86/x64 native processors with SSE2 support.
o Three types of arguments can be passed by register in vectorcall: integer, vector, and
homogenous vector aggregate types.
o The _vectorcall is used in Microsoft compilers.
In x64-bit architecture, things a bit different because there’s only one calling convention (x64 __fastcall)
and many other details:
▪ All parameters and values are 64-bit (QWORD).
▪ The returned value goes to RAX.
▪ The concept of shadow home (as well known as home space) comes up. In few words, the shadow
home (allocated by the caller) aims to reserve a space (0x20 bytes) for callees to save the first 4
parameters that are being passed even that there isn’t any parameter being passed!
▪ If the callee has any local variable, so it’ll allocate an additional space for them in addition to the
0x20 bytes (32 bytes).
15 | P a g e
https://exploitreversing.com
▪ If the shadow home is not used for storing function’s parameters (because there isn’t any), so the
compiler uses this space to save non-volatile registers.
▪ The first four parameters are passed to RCX, RDX, R8 and R9 registers, respectively. All remaining
parameters are passed on the stack.
▪ The caller (non-leaf function) usually saves volatile registers such as RAX, RCX, R8, R9, R10 and
R11 because they can be changed by callee. Other register like XMM4 and XMM5 are also saved.
▪ The callee saves R12, R13, R14, R15, RSI, RDI, RSP, RBX (non-volatile) and restores them later.
▪ The caller (non-leaf function) is responsible for allocating space for parameters being passed to
callee, which makes the usage of variadic functions easier.
▪ The stack must be aligned in 16 bytes. Usually, it’s not an issue, but functions such as malloc( ) and
alloca( ) might broken this alignment.
A possible picture to illustrate a non-leaf function being called follows (we aren’t considering alloca()
function neither a frame pointer being used):
Return Address
Stack Parameters
R9 home space
R8 home space
16 | P a g e
https://exploitreversing.com
▪ The x64 code supports instruction pointer-relative data addressing (RIP relative addressing). In
other words, it’s possible to access data at location that’s away by an offset from the current
pointer.
▪ In binaries containing PIC (Position Independent Code), the address is not stored “in the
instruction”, but only an offset from the current instruction pointer (RIP).
▪ Functions can’t allocate space at the middle of the code like may occur in x86, but only at their
beginning. Therefore, there aren’t pop and push instructions at the middle of the function.
Because of this rule, the stack size keeps constant over the function’s life.
▪ Most of the time, RSP register acts as stack pointer and frame points, but there’s an exception:
alloca( ).
▪ PE32+ binaries have an additional section named .pdata (as known as Exception Directory), which
holds information used for handling exceptions.
▪ The GS register is used to access the TEB (Thread Environment Block) from the user mode, but it is
also used to access the KPCR (Kernel Processor Control Region) when the code is executing from
kernel mode.
▪ If readers don’t know about KPCR, there’s one for each logical processor and it represents a
structure that contains general information about the processor and its status. Use the WinDbg (or,
in some case, SysInternals’ livekd) and try: a. dt nt!_KPCR ; b. dt!_KPRCB; c. !pcr commands.
If readers have spare time to examine the Windows system, there’re many examples about details
mentioned above. For example, in the function below, non-volatile registers are being saved before
function continuing:
[Figure 19] Disassembling a kernel function: passing arguments through register and stack
Finally, that’s a good example of non-volatile parameters being restored in a late point:
18 | P a g e
https://exploitreversing.com
Click on any called function and try to perform a similar and very quick analysis. Additionally, if reader want
to examine and learn a bit more about structures related to the function table entry for the given function
and even exceptions, it’s recommended to execute:
[Figure 22] Examining the function table structures and fields related to exception
I’m not sure whether readers already touched this stuff previous, but some few notes could be useful:
▪ BeginAddress: offset to the start point of the function. By adding this offset to the base of the
module, we get the address of the function.
▪ All three fields (BeginAddress, EndAddress and UnwindInfoAddress) make part of the
_RUNTIME_FUNCTION structure, which is located inside .pdata section.
▪ When an exception occurs, the first step is to use the RIP to search for an entry from this structure
that describes the current function.
▪ We should note that several registers are listed as non-volatile, so they must be saved before the
function overwriting them and, later, they will be recovered when it’s appropriated. Furthermore,
the remaining flags tell use a bit more about the context:
▪ UWOP_ALLOC_LARGE: Allocates a large area on the stack. If the operation info is equal to 1, so
part of the allocation (512K up to 4GB – 8 bytes) is recorded in the next two slots. If the
operation info is zero, then the size of the allocation (136 to 512K – 8 bytes) would be recorded
in the next slot.
19 | P a g e
https://exploitreversing.com
▪ UWOP_ALLOC_SMALL (not shown): it represents the size of the allocation (allocations between
8 and 128 bytes).
▪ UWOP_PUSH_NONVOL: this unwind operation code means a push of a non-volatile, which
decrements RSP by 8.
▪ UWOP_PUSH_NONVOL: this unwind operation code indicates that function saves a nonvolatile
integer register on the stack using a MOV instead of a PUSH.
Picking up a function from stack we have:
▪ 3c0: ALLOC_LARGE
▪ (4 * 8): five registers pushed.
▪ 8: return address’s size
▪ 8: UWOP_ALLOC_SMALL
There’re other quite relevant details that, sometimes, are very useful for the daily job in reverse
engineering and programming. For example, functions can be declared with naked attribute and, as a side
effect, these functions don’t have prolog and neither epilog. We can use naked function in a trampoline
function, which is responsible for restoring the overwritten instructions, while writing a hooking
program.
As probably readers already know, hooking is a legal mechanism used for monitoring, instrumentation,
and extension of a target function. In the malware world, it’s also a technique used by the malicious
binary to modify the system aiming to hide multiple artifacts and activities.
Hooking can be performed inline or at the IAT (Import Address Table), for example. At the end, the general
idea is having the following execution flow:
▪ Original function is called.
▪ The inline hooking at its beginning takes effect and redirects the execution to the malicious
function.
▪ At end of the malicious function, the execution flow is redirected to the trampoline function.
▪ The trampoline function restores the overwritten instructions.
▪ The trampoline function jumps to the original (hooked) function to executing the remaining
instructions.
Anyway, I don’t have intention to enter in detail about hooking in this article and, probably, I will return to
this topic in future texts.
▪ Load important libraries (remember: the sample is 64-bit) for analysis: View | Open Subviews |
Type Libraries (SHIFT+F11 hotkey) and insert libraries (INS hotkey) such as:
▪ ntapi64_win7
▪ ntddk64_win7 (it’s usually necessary while analyzing kernel drivers)
21 | P a g e
https://exploitreversing.com
[Figure 25] Main strings found in the unpacked sample (second part)
Based on the list from Figure 24 and 25, few considerations follow below:
▪ Clearly the malware threat detects a series of debuggers and tools like Ollydbg, IDA Pro (idaq.exe),
Immunity Debugger, WinDbg, x64/x32dbg, Process Hacker, Process Monitor, Bochs and so on.
The array containing all tool’s names was renamed to searched_tools[ ].
▪ The sub_18004D60( ) itself calls sub_180050380( ), which contains a typical sequence of calls such
as CreateToolhelp32Snapshot( ) + Process32FirstW( ) + Process32NextW( ). In few words, these
last two calls are used to parse the snapshot result.
▪ Indeed, WMI will be used several times to help on detecting virtual environments artifacts and, as
readers are going to notice, COM (Component Object Model) will be involved in this context. In
few pages we’ll return to this subject.
▪ Another anti-analysis approach used by the sample is detecting sandboxes and testing virtual
machines artifacts and, in special, common usernames used by test/sandboxes environments. The
subroutine sub_18004F860 (renamed to ab_CheckUserNames) is responsible for it and readers can
see that there’s a list of them shown in Figure 28 (next page).
24 | P a g e
https://exploitreversing.com
25 | P a g e
https://exploitreversing.com
▪
[Figure 30] Checking MAC Addresses
26 | P a g e
https://exploitreversing.com
▪ The string "Checking reg key HARDWARE\\Description\\System - %s is set to %s" refers directly to
the code of a well-known project to detect virtual machines, which readers can easily search for on
the Internet.
▪ There’re other similar subroutines such as sub_18004D3D0 (checks for VirtualBox Services),
sub_18004D520 (checks for VirtualBox processes), sub_18004D520 (checks for VirtualBox
executables, DLLs, and drivers), sub_18004D790 (checks for VirtualBox Guest Additions),
sub_18004D8C0 (checks for VirtualBox devices and pipes used for IPC and debugging) and
sub_18004D9E0( ) that checks for VirtualBox shared folders.
27 | P a g e
https://exploitreversing.com
▪ Additionally, there’re two calls for FindWindowW( ) to check for windows related to VirtualBox:
[Figure 33] Subroutine used to check for the VirtualBox hardware artifacts.
▪ There’re other many functions checking VirtualBox, VirtualPC, BOCHS, QEMU and other artifacts:
28 | P a g e
https://exploitreversing.com
29 | P a g e
https://exploitreversing.com
▪ CoSetProxyBlanket
The respective code is shown below:
31 | P a g e
https://exploitreversing.com
A COM object exposes its interfaces (well-defined interfaces) to make easy and possible any client to use
its services. In this case, it isn’t important whether the client is or not at the same machine/system. Indeed,
an interface is only a set of functions, and objects and clients can communicate to each other through
them.
However, let’s make it clear: a client interacts with pointers to interfaces and don’t have access to
anything else inside the interface. A COM object (an instance of a class) can implement multiple
interfaces, but it’s also important to highlight that the class must implement all functions defined by an
interface (even doing nothing).
Server and clients can communicate to each other through a pair of proxy and stub objects, which the
proxy works as a server representation of a remote server running in the client address space and the
stub is a sort of listener running on the server side and waiting for client requests. In other words, a
possible scheme to illustrate it is the following one:
▪ invoke method -> interface -> local proxy -> remote stub -> interface -> remote method
Once again, clients consume services through interfaces offered by COM object, but don’t have any idea of
services’ implementation, which is a well-known encapsulation example (as already mentioned earlier).
A COM object is an instance of a COM class (classes can be instantiated, but interfaces can’t). It
represents an object definition and implements the IUnknown interface, which is the base interface of all
interfaces in COM and that, at least, supports QueryInterface method (used for service discovering). In
other words, the QueryInterface( ) returns a pointer to the requested interface back to the client.
A COM component, which is a binary module represented by an executable (the server is implemented as
a standalone executable module ) or DLL (the server is implemented as a module to be loaded and
executed within the address space of the client), also supports concepts of inheritance and
polymorphism besides other security features, offered by interfaces, such as access control and
impersonation. The COM’s lifetime is managed by the usage of reference count through
IUnknown::AddRef and IUnknown::Release methods.
The general idea on how a COM binary is built up is shown below:
[Figure 36] Representation of part of a COM object (adapted, but based on COM Specification and
“Learning DCOM” book by L. Thai Thuan, which was released in 1999)
Few points of the image above:
▪ vtbl: virtual table (table of function pointers).
▪ vptr: virtual table pointer to the vtbl.
32 | P a g e
https://exploitreversing.com
▪ IUnknown interface: as explained, it’s the parent of all COM interfaces and it’s used, through the
IUnknown::QueryInterface method, to find dynamically other interfaces and perform lifetime
management through IUnknown::AddRef and IUnknown::Release methods.
Therefore, each instantiated object has an own associated vptr, although there’s only one vtbl per class.
To recap:
▪ An interface is composed by one or more functions.
▪ A class is the implementation of one or more interfaces.
▪ An object is the instance of a COM class.
▪ A binary is composed by one or more COM classes.
▪ Classes can be instantiated, but interfaces can’t be instantiated.
▪ Each instantiated object has an own vptr.
▪ There’s only one vtbl per class.
▪ A COM component can be represented by an executable (standalone process) or DLL (loaded into
the client’s process).
▪ Proxy runs in the client side and stub runs on the server side as a listener.
There’s a different kind of COM object named Factory that is responsible for creating and/or instantiating
other COM objects associated with a determined COM class. In this case, standard factories are
represented by IClassFactory interface, which is derived from IUnknown interface. Requests received by
IClassFactory to instantiate a COM object triggers IClassFactory::CreateInstance, which is responsible for
accomplishing the task. Thus, we have that: COM Factory -> COM Object.
Class factories and COM objects can be packed into either an executable or DLL file. If they are packed as
an executable, so they can run as a service or local/remote server in a different execution context from
COM client. Furthermore, executable COM classes can be registered using CoRegisterClassObject( ). If they
are packed as a DLL, so it’s an in-process server and are usually loaded by the client.
In the COM world, a function can be invoked using two different approaches:
▪ Static Invocation:
▪ Dynamic Invocation:
33 | P a g e
https://exploitreversing.com
There’re many COM functions and, among them, we have CoCreateInstance( ), which creates a COM
object, is one of most important, for sure:
▪ pUnkOuter: If NULL, indicates that the object is not being created as part of an aggregate. If non-
NULL, pointer to the aggregate object's IUnknown interface (the controlling IUnknown).
▪ dwClsContext: Context in which the code that manages the newly created object will run.
▪ riid: A reference to the identifier of the interface to be used to communicate with the object.
▪ ppv: Address of pointer variable that receives the interface pointer requested in riid.
We have strict interest on three these parameters: rclsid, riid and ppv. The clsid and riid are referenced by
a respective GUID (128-bit hexadecimal), which each one of them are unique (eliminates any chance of
name collision), and that classes and interfaces can be referred . One key aspect of interfaces is that they
are immutable then they are not versioned.
COM classes are registered into the operating system and identified by these GUIDs, which are used as a
representation of the class within Registry:
▪ HKLM\Software\Classes\CLSID
▪ HKCU\Software\Classes\CLSID
As readers likely already learned from other articles, during a COM hijacking attack, malware and
adversaries could establish persistence by replacing a legit COM entry in the Registry or even enumerating
CLSID subkeys such as LocalServer32 and InProcServer32 to discover abandoned binary references, which
is not so rare due to failed uninstallation processes.
In terms of nomenclature associated to Registry:
▪ Server: it’s a binary that’s referred by the CLSID inside of Registry.
▪ LocalServer32 key: it’s the path to an executable (.exe) implementation.
▪ InProcServer32 key: it’s the path to a dynamic linked library (.dll) implementation.
The fundamental COM APIs used to write COM clients are:
▪ CoInitialize( )/CoInitializeEx( )
34 | P a g e
https://exploitreversing.com
▪ CoCreateInstance( )
▪ CoUninitialize( )
The CoCreateInstance( ) calls internally the CoGetClassObject( ) to get a class factory that through
IClassFactory::CreateInstance to create the requested COM object. Moreover, CoGetClassObject( ) is
commonly used to create multiple objects for a given class object .
35 | P a g e
https://exploitreversing.com
To decode GUIDs (as known as UUID – Universally Unique Identifier) we can use a simple script written in
IDC as shown below:
Of course, we always have the option (and, in many times, it’s an easier way) to google it on the Internet
and soon we can find some important and related information:
▪ https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-wmi/46710c5c-d7ab-
4e4c-b4a5-ebff311fdcd1.
Personally, I like to search about COM information and definitions on the following Microsoft website:
▪ https://referencesource.microsoft.com
Querying by the interface ID (DC12A687-737F-11CF-884D-00AA004B2E24) we got:
[InterfaceTypeAttribute(0x0001)]
[TypeLibTypeAttribute(0x0200)]
[GuidAttribute("DC12A687-737F-11CF-884D-00AA004B2E24")]
[ComImport]
interface IWbemLocator
{
[PreserveSig] int ConnectServer_([In][MarshalAs(UnmanagedType.BStr)] string strNetworkResource,
[In][MarshalAs(UnmanagedType.BStr)] string strUser, [In]IntPtr strPassword,
[In][MarshalAs(UnmanagedType.BStr)] string strLocale, [In] Int32 lSecurityFlags,
[In][MarshalAs(UnmanagedType.BStr)] string strAuthority, [In][MarshalAs(UnmanagedType.Interface)]
IWbemContext pCtx, [Out][MarshalAs(UnmanagedType.Interface)] out IWbemServices ppNamespace);
}
As a summary, we have:
▪ Class: WbemLocator
▪ InProcServer32: C:\Windows\system32\wbem\wbemprox.dll
▪ Interface: IWbemLocator
▪ Explanation: IWbemLocator interface is used to get a namespace pointer to IWbemServices
interface for WMI and, getting this pointer, we can access Windows Management by calling
IWbemLocator::ConnectServer.
Next step is to use this information to make the reversed code by IDA Pro easier to understand. There’re
two paths to proceed and the composition them can help us a lot:
a. The first method consists in changing the type (“Y” hotkey) of ppv parameter. It’s used as the
return of CoCreateInstance function and contains the requested interface point, and as its type is
LPVOID, so it’s necessary to make a cast. Thus, change it from from “LPVOID” to “IWbemLocator*”
as well any variable receiving its value. No doubts, performing this task on pseudo code produced
by the IDA Decompiler is always recommended, but it isn’t enough. Why? Because new functions
will come up and their arguments will need to be adjusted too.
b. The second path that could help us to get a better result is done by executing the following steps:
a. If the structure isn’t already added, so insert it (SHIFT+F9 and then INS) using the following
nomenclature: <interface name>.vbtl
37 | P a g e
https://exploitreversing.com
b. use the “T” hotkey to apply the structure over the disassembled code (mainly for calls).
Before applying changes, it’s relevant to show the pseudo code from IDA Decompiler:
After applying a list of changes, which are explained in the next paragraphs, we get the following result:
▪ Changing the type (“Y” hotkey) of ppNamespace argument to IWbemServices ** according to the
expected name of the function IWbemLocator::ConnectServer described on MSDN.
▪ Editing the signature (“Y” hotkey) of the IWbemLocator::ConnectServer and changing type of the
ppNamespace argument to IWbemServices ** according to the expected name of the function
described on MSDN.
On the assembly side, I’ve used the T hotkey to apply the correct type for any call instruction using
indirection within this subroutine (sub_180050460) and according to the pseudo code.
If readers compare Figures 43 and 45, so you’ll notice a better result and, this time, it’s possible to
interpreter the code, although it would be possible to improve it.
There’re some necessary explanations about what’s happening so far:
▪ CoCreateInstance( ) creates one object that is an instance of a given class (WbemLocator , given by
the CLSID) on the local system .
▪ The class is the implementation of one or more COM interfaces and the interface is given by the
iid parameter (IWbemLocator).
▪ Remember that an interface is really a class containing functions defined as pure virtual, which
must be implemented by an implementation class.
▪ The clients access the virtual pointer (vptr) that is a reference to the virtual table containing
functions’ pointers to real functions.
▪ The final COM binary (DLL / exe) can be composed by one or more classes.
▪ Clients don’t communicate with the class directly, but through interfaces. At same way, clients
don’t need to know where a COM object is located nor its respective implementation because, as
stated, the interface is the main point of contact.
▪ All interfaces inherit from the IUnknow interface, which has three methods: AddRef( ), Release( )
and QueryInterface( ).
▪ The QueryInterface( ) aims to query an object for a given interface.
▪ The CoCreateInstance( ) returns, as output in the ppv parameter, an interface pointer to the
IWbemLocator.
▪ The IWbemLocator interface has only one method that’s ConnectServer( ), which is responsible for
creating a connection to a WMI namespace using its first parameter (strNetworkResource).
▪ The output from IWbemLocator::ConnectServer( ) is its seventh parameter (Figure 44), which
receives a pointer to an IWbemServices object. That’s the reason by which we changed its type.
▪ So far, the malware code can connect to WMI and, in specific, to ROOT\CIMv2 namespace and get
a pointer to IWbemServices interface and make next calls.
▪ The CoSetProxyBlanket( ) function is used to set the authentication information that will be used
to make calls through a proxy. Its fourth and fifth parameters with value 3 are saying that it’s
authenticating only at beginning of each RPC when the server receives the request, and the server
process can impersonate the client's security context while acting on behalf of the client.
As readers can realize, a simple piece of code can contain many subtle information and concepts that
might help us to have a better understanding of the what’s happening. Of course, analyzing each piece of
code in details might be time-consuming, but in several opportunities there isn’t other alternative.
40 | P a g e
https://exploitreversing.com
41 | P a g e
https://exploitreversing.com
Please, readers should observe I used a feature of IDA Pro named “Collapsing Local Declarations” before
taking a snapshot, so readers should do the same to match the same lines which I’m referring to.
Afterwards, few comments follow:
▪ On line 9, the type declaration of the first parameter of the call for sub_180050460 is
IWbemServices *ptr_IWbemServices.
▪ On line 9, the type declaration of the second parameter of the for sub_180050460 is
IWbemLocator *ptr_IWbemLocator.
▪ On line 12, string containing the WQL query is formed (“SELECT * FROM
Win32_ComputerSystem”). The Win32_ComputerSystem class holds a series of members
(properties) representing information from the local system.
▪ The IWbemServices::ExecQuery( ) on line 17 executes the given query above to retrieve possible
objects. The output of this method in stored into the fifth (and last) parameter (ppEnum), with has
the following type’s declaration: IEnumWbemClassObject *ppEnum. In general words, this last
parameter (ppEnum) holds an enumerator that will be used to access query results. Readers should
notice that this parameter is copied to a variable on line 30.
▪ Using the obtained enumerator through ExecQuery( ), the code parses each available property by
using IWbemClassObject::Next( ) on line 34.
▪ On line 38, it’s possible to get properties values using IWbemClassObject::Get method. In this case,
the code is checking “Model” property and looking for strings such as “VirtualBox”, “HVM domU”
and “VMware”.
▪ After having all checking’s, the code releases all objects by using the inherited Release( ) method
from IUnknown interface and closes the COM library on the current thread by calling
CoUninitialize( ).
Readers will find COM code over several different malware code, and I hope it can help you to get a better
understanding about how to analyze COM functions. If the reader to check other parts of the code that are
using CoCreateInstance( ), so you will be able apply a similar approach:
According to the experience, this kind of checking for virtual environments is usually performed at the
beginning of the malware execution and, most of times, right before something useful being done by the
malicious code. Therefore, by listing which subroutines are calling sub_18004CD50, we reach the
subroutine sub_18000A120 and, apparently, new findings are possible:
43 | P a g e
https://exploitreversing.com
Using Lumina is quite easy and simple. To apply Lumina definitions, go to Lumina menu and pick up pull all
metadata option (F12 hotkey) and Lumina’s metadata will enrich the idb database (most of them marked
in green). For example, at first lines of sub_18000A120 we have as result:
45 | P a g e
https://exploitreversing.com
If metadata fetched my Lumina is correct (never believe it blindly), our first finding is that the malware
have used the Catch unit testing framework for C++. If readers don’t know about Catch, there’re good
references to it:
▪ https://catch2.docsforge.com/
▪ https://github.com/catchorg/Catch2
Initially, examining all functions from database, it seems that there’s only one Catch routine. Of course,
Lumina brought much more metadata for many functions and all of them can help us during our analysis:
46 | P a g e
https://exploitreversing.com
As readers can realize, Lumina is able help us with C++ Template Libraries a lot and general C++ functions ,
for example.
Before proceeding, we have a very simple modifications in our database through the addition of new
library modules to Signatures tab (SHIFT+F5 hot key and INSERT key):
▪ vs64mfc
▪ msmfc64
The contribution of these modules is over 200 recognized functions. Fair enough.
The next important clue is that the sample has lots of strings associated to the Boost C++ library, as shown
below:
[Figure 53] Several strings indicating the presence of Boost C++ Library
There’s important information from the image above such as:
a. The malware is using boost C++ library: https://www.boost.org/
b. The author is using boost version 1.78 that’s available here:
https://boostorg.jfrog.io/artifactory/main/release/1.78.0/source/
c. There are clues that beast, which is a header-only library that serves as foundation to write
network interoperable libraries supporting HTTP, WebSocket and other networking skills, is also
present: https://github.com/boostorg/beast
Identify libraries used over the malware program is only the first step to try to tackle reversing problems
as, for example, unknown functions. As readers can realize, C++ functions and templates represents a
serious issue during the reversing tasks and managing this problem demands having further details in our
hands such as:
47 | P a g e
https://exploitreversing.com
▪ Identifying potential libraries and respective versions (strings command help in this task).
▪ Identifying the compiler and its version used to compile the malware code (DiE might helps us).
▪ Identifying the operating system and version used to compile the malware.
As I mentioned above, using DiE is useful for identifying the compiler and its version, as shown below:
[Figure 54] DiE helps us to identify the compiler and respective version.
As the figure above shows us, it seems the malware author used Microsoft Visual C++ 2015, what’s
awesome information. However, what version of Windows was used? It isn’t an easy task and, eventually,
we could try to find some evidence on the code using strings command.
I’ll be using Windows 8.1 and 11, and Visual Studio 2015 to simulating a possible environment and
compile the boost library. At the end, several individual libraries will be generated and, luckily, we could
help IDA Pro to recognize few functions. Of course, there’s a catch here: the malware author also used the
beast header-only library, so we should try something about it, but let’s move a step at time.
To compile the boost C++ library:
▪ Download and unpack the boost C++ version 1.78:
https://boostorg.jfrog.io/artifactory/main/release/1.78.0/source/
▪ Download the Visual Studio 2015 and install it and its respective SDK:
o (web installer) https://go.microsoft.com/fwlink/?LinkId=532606&clcid=0x409
o (iso image) https://go.microsoft.com/fwlink/?LinkId=615448&clcid=0x409
▪ Compile the boost library by going into its unpacked directory and executing the following:
o bootstrap.bat
o .\b2 --toolset=msvc-14.0
48 | P a g e
https://exploitreversing.com
49 | P a g e
https://exploitreversing.com
50 | P a g e
https://exploitreversing.com
For example, in this case, 175 functions were identified including only this library module:
51 | P a g e
https://exploitreversing.com
Of course, we should repeat the same process for each static library that we believed having been used in
the code. As readers can notice, it’s a time-consuming task.
There’re many other procedures that, eventually, might help you, but I suggest you try them using a
separated database. The success rate varies, but such procedures already helped in some cases.
Another scenario is that most of external libraries introduces many structure types that we don’t know
anything about them and, worse, we don’t have enough time to learn about them.
One of possible alternatives (actually, it’s a hack) to manage the lack of applied external structure data
types given by this sort of library follows below (there’re many ways to do the same steps here):
a. Write a program using the same library that you want to extract the structures’ definitions. Include
headers from this library to force them to be included in the final executable. It could seem hard,
but it isn’t because all these external libraries provide tutorial pages including many examples, so
you don’t need to learn everything about the library itself. Indeed, the program might be very
simple or even blank with a single main function since you include all headers you want to extract
definitions.
b. Configure the environment to use same compiler version that binary has been compiled. For
example, in this case, I’m using Visual Studio 2015 because we’ve identified it through the DiE
(Detect It Easy), and the same architecture (for example x64).
c. Don’t forget to include the headers to compile the program. In Visual Studio 2015: Properties |
VC++ Directories | Include Directories. For example:
C:\Users\Administrador\Desktop\MAS\MAS_5\boost_1_78_0\boost_1_78_0.
d. Don’t forget to include the compiled libraries to link the program. In Visual Studio 2015:
Properties | VC++ Directories | Library Directories. For example:
C:\Users\Administrador\Desktop\MAS\MAS_5\boost_1_78_0\boost_1_78_0\stage\lib
e. Disable any optimization. On Visual Studio 2015: Project Properties | C/C++ | Optimization |
Optimization: Disabled. At the same window, set “Whole Optimization” to No.
f. Don’t forget to setup to generate a Debug version (and not Release version). We need this setting
because the Visual Studio is going to generate a static library (or executable) and a respective pdb
file automatically.
g. This step is not valid for this VS 2015, but it’s recommended to Visual Studio 2022: Project
Properties | Configuration Properties | C/C++ | General and change “Scan Sources for Modules
Dependencies” to “Yes” and “Translate Includes to Imports” to “Yes (/translateInclude)”. This
setting will force the compiler to scan the code for dependencies to be included into header units.
h. Visual Studio allows you produce an executable (.exe), a DLL and a static library (.lib). My best
results were using static library because it forces everything to be included into a single file. You
can configure it going to Project Properties | Configuration Type | Static Library (.lib).
52 | P a g e
https://exploitreversing.com
j. Compile the program. On Visual Studio: Build | Build Solution (CTRL+SHIFT+B). In the Output
window it’s going to be shown the folder where the static library file and its respective .pdb file
are saved.
k. After the program having been compiled, open the resulting .lib file onto the IDA Pro.
l. Now we want to generate an IDC file containing all structure type-definitions. Go to File | Produce
file | Dump typeinfo to IDC File. The recommendation is to use an output name following the
syntax: <program name>.idc. Avoid using something like <program name>.exe.idc on Windows.
m. Now comes the final part. Using the IDA Pro, apply this script into the malware’s idb you’re
analyzing. Go to File | Script File and execute the generated IDC script.
n. Many structure’s types will be created, and you can check for them on the Structure Tab
(SHIFT+F9).
o. Go to the the Decompiler tab (pseudo code tab) and press F5. Likely several structure type
definitions, created from the external library (headers), will be applied automatically.
Note: it is impossible to claim this trick (a hack) will work for you, but it helped previously. Of course, this
procedure might mess up your idb file (again: do a copy first), but trying doesn’t cost anything for you and,
if you’re lucky, you can get a reasonable result. Additionally, it presents a good advantage: it’s appliable to
any library used by malware authors and, in case you already know which libraries were used, so it’s worth
to try it once. Let’s me to provide a simple and concrete example about what I described here:
1. Create a new C++ project (Console Application), give it a name, and choose directory to save all
files from solution. Usually, I create an apart folder to each project. I could have added more
headers, but this example is only a demonstration. Note: the code below is NOT mine and, usually,
you won’t have enough time to learn about the library, but you should remember: the code could
be very short, only including headers because we are concerned with headers and nothing more.
53 | P a g e
https://exploitreversing.com
4. Change “Include Directories” and “Reference Directories” settings as explained previous and as
shown below:
[Figure 63] VS 2015: VC++ Directories settings: Includes and Libraries Directories
5. Turn off any kind of optimization:
55 | P a g e
https://exploitreversing.com
a. <program name>.lib
b. <program name>.pdb
8. Open up the <programname>. lib on IDA Pro and choose <application name>.obj:
56 | P a g e
https://exploitreversing.com
9. Choose COFF (Windows AMD64) and, after this screen, choose loading the debugging symbols:
57 | P a g e
https://exploitreversing.com
[Figure 69] IDA Pro: pseudo code before executing the IDC file
58 | P a g e
https://exploitreversing.com
Now readers should check the same instructions from line 146 to line 195 after the script has being
executed (it’s pretty different, isn’t?!):
[Figure 70] IDA Pro: pseudo code AFTER executing the IDC file
If you need to know any new type definition, double click on it:
59 | P a g e
https://exploitreversing.com
▪ #include <boost/beast/core/stream_traits.hpp>
▪ #include <boost/beast/core/ostream.hpp>
▪ #include <boost/beast/core/buffers_to_string.hpp>
▪ #include <boost/beast/core/file_win32.hpp>
▪ #include <boost/beast/http/fields.hpp>
▪ #include <boost/beast/http/type_traits.hpp>
▪ #include <boost/beast/http/type_traits.hpp>
▪ #include <boost/beast/http/message.hpp>
▪ #include <boost/beast/http/status.hpp>
▪ #include <boost/beast/http/basic_parser.hpp>
▪ #include <boost/beast/http/impl/basic_parser.hpp>
▪ #include <boost/beast/http/impl/error.hpp>
▪ #include <boost/static_string.hpp>
There’re many other hacks that, eventually, could be helpful. One of these approaches to add external
type definitions to another idb database (in our case, the malware idb database) that I learned through a
comment from Igor Skochinsky (Hex-Rays’s developer -- @IgorSkochinsky ) on Stack Exchange | Reverse
Engineering website is the following one:
a. Using IDA Pro, open the ConsoleApplication1.idb, which is the same application that we’ve created
using Visual Studio and included all headers from Boost C++ Library.
b. Keeping the .idb file open, copy <program name>.til to another folder (I’ve copied it into the same
folder of IDA Pro because there I have the help’s support).
d. Copy the resulting .til file to /til/pc folder (C:\Program Files\IDA Pro 8.0\til\pc).
e. Open the second .idb (in this case, the idb database of the malware we’re executing) and add this
new type library in the list (SHIFT + F11 and then INS).
f. Although the new type definitions won’t be shown in the Local Types tab, fortunately they will be
available through Structures tab (SHIFT+F9) by inserting a new structure (INS) and choosing “Add
standard structure”. Additionally, these structures are available for decompiler too.
According to Igor, this procedure is not officially supported and, as any other hacks, it could cause many
issues and conflicts in the idb file, so the same recommendation is valid: make a backup of the .idb file
before testing this approach.
Once again: these last two approaches are experimental and they can work or not. However, when you’re
analyzing a malware that include libraries and new types, so you could spend a considerable amount of
time to understand the existing new types and, eventually (maybe...maybe....), it could be useful for your
analysis. According to my experience, I never (ever) disregard any approach, suggestion or trick while
reversing code because they always can be useful at some moment, so any new knowledge is always
valuable and welcome.
61 | P a g e
https://exploitreversing.com
62 | P a g e
https://exploitreversing.com
As the string “Vcffi2rj6t15”, which could be a possible key, is being assigned to v120 local variable and,
afterwards, this v120 variable is being assigned to v4, v5, v6 and v7 local variables. Therefore, I renamed:
▪ v120 to var_key
▪ v4 to var_key_1
▪ v5 to var_key_2
▪ v6 to var_key_3
▪ v7 to var_key_4
If readers closely pay attention, the same string (Vcffi2rj6t15) is being passed to sub_1800012D0 over the
four times it’s called:
64 | P a g e
https://exploitreversing.com
Of course, this “scheme” might change or having variants, but most of the time it happens this way.
Therefore, it seems that on Figure 75 the sub_180001360 has the following parameters’ meanings:
▪ a1 is the context
▪ a2 is the key
▪ a3 is the key length
Proceeding with the analysis, let’s examine the sub_180001670 -> sub1800014E0:
66 | P a g e
https://exploitreversing.com
[Figure 78] First part of script to extract and decrypt C2 List and botnet
67 | P a g e
https://exploitreversing.com
68 | P a g e
https://exploitreversing.com
If readers compare the output above against the list offered by Triage given by malwoverview tool or even
through the online report (https://tria.ge/220616-tng2tsfhfl), so you will notice that there’re a perfect
match for each IP address and botnet.
Actually, many quite interesting aspects are present in this binary and, undoubtedly, we could extend this
analysis over many pages. Anyway, there’s nothing special and, at end of the day, it’s only a work of
reading code, renaming variable and functions, re-typing, and interpreting APIs along the analysis.
Some functions that could be interesting:
▪ sub_180039CC0: read files
▪ sub_18004A32C: read files
▪ sub_180039B90: write file
▪ sub_18004AA18: write file
▪ sub_18003DF44: create process
▪ sub_180050380: enumerate processes
▪ sub_1800124B0: socket (receive data)
▪ sub_180012AA0: socket (send data)
▪ sub_180012030: socket configuration
▪ sub_180015360: socket configuration
▪ .data section: there’s two other PE executables embedded.
About the last statement, it’s quite easy to check it by running a simple strings.exe command:
[Figure 81] Confirming that there’re two other two executable within extracted payload
One of many way to extract PE file and other types of objects from a given binary is by using Binary
Refinery (https://github.com/binref/refinery), which is a sort of command line version of CyberChef.
To install Binary Refinery:
▪ pip install -U binary-refinery
or
▪ python -m venv test
▪ . ./test/bin/activate
▪ (test) $ pip install -U git+git://github.com/binref/refinery.git
To extract both embedded executable, we’re going to extract them into a file (payload_1) and, afterwards,
the second one to another file (payload_2), as shown below:
▪ emit mas_5_unpacked.bin | carve-pe -r | dump payload_1
▪ emit payload_1 | carve-pe | dump payload_2
Using the Binary Refinery, we’re able to visualize the necessary information about both files:
69 | P a g e
https://exploitreversing.com
As readers were able to notice, we extracted a 32-DLL and a 64-DLL payload. Don’t worry about payload_1
(32-bit DLL) having the second DLL inside it because, during an execution, only the first PE is regarded.
Do readers want to extract and decrypt the C2 List? It’s trivial because we already have the virtual
address, respective size and, most important, we also have the RC4 key (read page 66). Therefore, execute:
13. Conclusion
In this article, I presented the first 64-bit sample of the MAS Series as well as concepts related to x64
Assembly, COM (Component Object Model), managed difficulties in reversing code that use external
libraries, and extracted and decode the C2 List and its respective botnet.
Recently a professional (Twitter: @bushuo12) translated the three first articles of this series to Chinese
and, just in case you’re able to understand the language, Chinese versions follow below:
▪ (MAS): Article 1 -- https://www.yuque.com/docs/share/619f03dc-1bc9-42f7-828e-fc17d82786e7
▪ (MAS) : Article 2 -- https://www.yuque.com/docs/share/d16efbd6-e2e6-4325-9b9e-23c613bd2280
▪ (MAS) : Article 3 -- https://www.yuque.com/docs/share/7dca2583-8456-4ca5-8862-0524fc6faaf9
Just in case you want to keep in touch:
▪ Twitter: @ale_sp_brazil
▪ Blog: https://exploitreversing.com
Keep reversing and I see you at next time!
Alexandre Borges
71 | P a g e