Aww, another month or even more has apparently passed just in front of my eyes. As some of you might have realized, the school time have already ended (something like two weeks ago), thus allowing me to carry out some more research and remember about this blog. I expect some more posts to be written in the very next days, hope this will succeed.

In this particular post, I would like to describe some curiosities I found inside the kernel32.dll (and KernelBase.dll in case of Windows 7 RC) and ntdll.dll default Windows libraries. Not only want I to share the ideas that occured to me during this small research, but also I would like to hear some new techniques of making use of what I found. Feel free to add new facts/ideas regarding this post, as I could overlook some obvious assumption or things like this. Remember this is not and shouldn’t be considered a thorough report. To make everything clear, the entire post covers the situation on x86 versions of Microsoft Windows systems.

Actually, I want to write about a few things, all of which are listed below:

  • Modyfing the initial main thread CONTEXT structure via static DllMain code
  • LoadLibrary/FreeLibrary APIs handling the module reference counter
  • Undocumented FreeLibrary(AndExitThread) functionality and practical ways of making use of it
  • Fooling module managment APIs by crafting the TEB structure

Despite I would like to publish one post covering all the points listed above, I’ll try to create a short series of posts, each describing a separate mechanism. This should both help me manage the information and avoid making the reader feel overwhelmed by the contents.

DllMain function and the initial process context

One of the most interesting things I’ve recently came across was a short post on Skywing’s blog, saying about a neither documented nor well-known fact about what the lpReserved DllMain parameter contains in reality. It isn’t my intention to rewrite his post, so I’ll just try to summarise the main point (although I encourage to read it first, if you haven’t done it yet). The author uncovers some very curious fact that have been documented very poorly by Microsoft – a mysterious lpReserved parameter of a standard DllMain module entry function:

__in  HINSTANCE hinstDLL,
__in  DWORD fdwReason,
__in  LPVOID lpvReserved

Due to its type and name, one could assume the argument is not important to a programmer and is presumably set to NULL. However, the documentation gives a small sign that there’s something more about this value:

If fdwReason is DLL_PROCESS_ATTACH, lpvReserved is NULL for dynamic loads and non-NULL for static loads.

If fdwReason is DLL_PROCESS_DETACH, lpvReserved is NULL if FreeLibrary has been called or the DLL load failed and non-NULL if the process is terminating.

As the above quotation informs, our library is able to tell a static load from a dynamic one and execute proper piece of code in regard to the current situation. That is basically everything, when it comes to the information provided by Microsoft itself. As Skywing shows, yet there is more data passed to our routine during the process initialization – LPVOID lpReserved parameter can be successfully treated as PCONTEXT lpvContext as well! The CONTEXT structure being pointed to is the main thread processor context, set using the NtContinue system call after the PE loader finishes process initialization. The mechanism causing the parameter to be so have been already described, so I will only try to give some examples of how this little curiosity can be used in reality. On the other hand, one must remember that even though this ‘feature’ is confirmed to work on current Windows versions, it is not guaranteed to stay in newer systems forever.

Let’s take a look at some of the possible scenarios:

  • Unpackers altering the EntryPoint field

If an exe-packer developer took advantage of the fact that DllMain function has full control over the main thread’s initial context, it could be succesfully used to create a special unpacking library. After compressing the main Portable Executable file, one of its import records would be altered so as to contain a reference to unpack.dll. The “additional” module’s goal would be to dynamically allocate a small piece of memory, copy the loader’s code there and change lpvContext->Eip to make it point to the new code. After the load stub generated by unpack.dll finished its execution, every single memory page allocated by code related to the unpacker would be freed (including the unpack.dll image section). The purpose of such approach is to make the PE file remain as clean as possible (modifying the smallest number of header fields etc). The same goes to the in-memory application layout, that would be intended not to contain any additional memory blocks in comparison to the original process.

  • Choosing EntryPoint depending on the system version

Since the initial context contents clearly differ between various versions of Microsoft Windows, this fact could be used as an OS-detection technique. There is a number of possibilities of what particular fields could be taken into account, such as lpvContext->Eip (pointing somewhere inside kernel32.dll, additional execution layer before the EntryPoint) or the stack contents (pointed by lpvContext->Esp). If an exemplary application had problems with some specific Windows version, or some operations had to be handled in different ways, the developer could simply create a separate entry function for each system and assign context values characteristic for a given OS to appropriate EP routine.

  • Debug Registers modification

Playing with the Debug Register values could be used both as a software protection layer and an additional debugging functionality. It is possible to set up to 4 hardware breakpoints by making use of DRx. Having control over these fields could be used to take advantage of this debug functionality from the very beginning of application’s execution path. Despite the practical advantages, it can be also used in order to fool a reverser trying to analyze our code – I would probably have much trouble finding out how do the breakpoints magically appear at EntryPoint, unless I wouldn’t thorougly check the imports’ DllMain code first.

  • Creating an EntryPoint detour

Another security-related idea is to set-up a kind of “trampoline” code that would be executed just before OEP and return to the original address right after doing it’s job (accomplished by simply changing lpvContext->Eip). The additional code could do whatever the author wanted it to do – i.e. perform some additinal executable integrity checks (checksums and things) etc. I find this technique very effective because of a few reasons. Firstly, what I’m writing about here is poorly-documented and unknown to most of people. Secondly, the fact of any “external” code ran before EP seems to be almost invisible for those using debuggers like OllyDbg, which sets an initial breakpoint on the EntryPoint value taken from the PE header and waits for it to execute. If the context modification routine was decently obfuscated, one could have real difficulty dealing with such a trick.

  • Setting the TF bit in EFLags register with a SE handler

Yet one more anti-debugging trick, making use of the EFLags bit mask – particurarly Trap Flag, one of its members. This “trick” is old and well known by the community, though it can be quite a surprise when used in the context of initial flags value. Before executing the first program instruction, an exception of EXCEPTION_SINGLE_STEP type would be triggered and the execution would be passed to the already-installed handlers. The absence of an exception would indicate the presence of an active debugger consuming the exception.

  • Zeroing specific segment registers

Next technique I consider pretty universal in terms of possible applications. Having one of the segment register’s value set to NULL could be used for a great variety of purposes, such as monitoring the application’s memory references, code obfuscation through exceptions and more. The advantages of controlling the segment registers in protected mode environment is going to be covered in a separate post.

  • Passing information between two or more static modules

As Gynvael Coldwind suggested, the fact that all the libraries being statically loaded into a new process respectively operate on the same piece of memory, could be used in order to let the modules “communicate” with each other, however “passing information” sounds better for me. In order to do this, one could use the extisting main thread stack. I am curious about possible realistic scenarios making use of this idea, waiting for any interesting concepts : P

  • Running the application using ret-based programming

Last but not least, an idea that occured to me just a few minutes ago – return-oriented programming! This subject has already been thorougly documented by many undependant researchers (a few presentations on more or less popular conferences have been also held). If you want to get familiar with the technique basis, you are strongly advised to visit Generalizing Return-Oriented Programming to RISC and Return-Oriented Exploiting. In general, return oriented programming aims to create a fully executable code by linking together assembly code snippets ending with a “ret” instruction. The idea itself is just a way of doing things, thus it is not a generic solution for one, specific problem – rather a tool that can be used for various purposes. However, code of this type can mainly be seen in very advanced ret-to-libc exploits. In my case, I would like to use it as an obfuscation (anti-reversing) technique additinally supported by the DllMain trick.

The very first task is to generate the initial stack data, containing pre-generated return-oriented code components of separate “opcode” instructions – executable memory pointers together with their parameters. You can learn how to generate such code from the aforementioned papers. Having this code, the only remaining objective is to replace the lpvContext->Esp value with the dynamically-generated stack and make the lpvContext->Eip field point to a ret (0xc3) instruction. The last step could be as well simplified by setting Eip the top stack value (thus avoiding finding and executing the “ret” instruction). As soon as all the modifications are applied, the process should begin its execution right from the indirect code placed on the emulated stack. By using such approach, the coder can be sure that the original EntryPoint will never be reached, unless the return-oriented code decides to do so. This technique used together with the DllMain hack could also be used to develop something similar to a simple VM execution environment, with the bytecode (in the form of return-oriented code) placed on the stack. Even though performing dynamic analysis (debugging) on such code does not look like a hopeless task, I think that static analysis could be considered so.

In general, the above list presents every single thing I was able to think of so far.

What is more, I managed to prepare a small bonus – a very simple, exemplary KeygenMe application showing how some of the presented ideas work on a real computer. One note: the global protection scheme is much more important and should be more concentrated on than the application’s mechanics (CryptoAPI functions and so on). If you encounter any problems with the provided executable, please let me know (as for now, it has been confirmed to work on Windows XP SP3 and Windows Vista SP2). The package can be downloaded from HERE.

Furthermore, I would be very pleased to get informed about any other suggestions regarding this subject (if you find any kind of mistake, appropriate feedback will be also appreciated).

This is actually all for now, I will do my best to carry on writing posts about other interesting Windows internal stuff I often come across ; )

As a loyal standard Windows shell (explorer.exe) user I often encounter some problems with the number of opened Windows on one desktop. Since my current notebook hardly ever goes down, so does the user’s shell. After a few working evenings, I often have difficulty localizing the desired windows. Having something like 40-50 of them, it is usually a hard task to switch between internet browser, IDA, programming IDE, virtual machines, file manager and so on. The worst thing for me turned out to be looking for the TotalCommander window (being used the most frequently). A situation like this was obviously causing much of a time waste and consequently frustration.

I came up with a few available solutions, listed below:

  1. Having the taskbar items sorted at any time, thus making the current work state much clearer.
  2. Creating a set of system-wide hotkeys, each responsible for setting focus on the associated window or a group of windows.
  3. Start using some kind of Virtual Desktop software and reorganize the whole work environment.

All of them sound pretty good, in fact, and each is worth being described in detail. What is more, there is a great amount of free software designed just to help users with such problems. However, what everyone should already know is that the best solution is the made-by-myself one 😉

Although all of the ideas have their advantages and disadvantages as well as difficulty level, it’s not the subject of this post. This time, I would like to focus on one particular approach, listed as the 2nd, but in a little bit less complicated form. To be exact, I will show how to make ONE specific application perform some actions in response to a hotkey signal. As you could have really guessed, this application is Total Commander in my case. The hotkey we will use as the totalcmd-caller will be ALT+1.

What I eventually wanted to achieve using this slight hack was to:

  • Have an ALT+1 hotkey registered in the system.
  • Handle the incoming hotkey events in a loop and perform specific actions (set the focus on totalcmd’s window) in some cases.

Since I wanted to be able to use the key combination at any time, the message-loop would also have to be active all the time. To be honest, it’s not a good option for me to have one additional process running on my system, only to have one simple window event handled the right way. As we need only one thread to keep the event handling loop active, we can easily put it inside the affected process itself (totalcmd.exe). Thus, before beginning the real work, we have to ensure we’ve got an active thread running inside our target. This can be accomplished in a few ways (what a surprise!):

  • Spoof one of the program’s imported DLL files, redirect all of its exports to original functions and start a new thread in the DllMain routine.
  • Do the same thing at runtime: use CreateRemoteThread function to inject our own DLL module into the target’s address space and perform some actions inside DllMain.
  • Patch the application executable directly so as it creates the new event-loop thread and deals with the hotkeys. In this case no additional, external files are needed.

As having an external dll module code executed in the process context gives the biggest control over the execution track, I chose the first option. What makes it different from the second one is that we would need an additional ‘loader’ program to inject the dll into totalcmd, while dll-spoofing technique takes advantage of the fact that the attacker’s DLL gets loaded automatically by the Windows loader.

The next step is to choose the library to spoof. What should be noted is that it is possible to spoof the non-system DLLs, only. The list of modules being considered “system” can be found in registry:

HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\KnownDLLs

After comparing totalcmd imports with the list above, we can extract the library names we are able to spoof:

  • comctl32.dll
  • mpr.dll
  • winmm.dll
  • winspool.dll
  • version.dll (Vista only).

The “Vista-only” note means that the version.dll library has been one of system dlls until Windows XP, but has been removed from the KnownDLLs list in Windows Vista. As my intention was to make the hack work on Vista, and I didn’t know the difference from Windows XP, I chose the VERSION module as the spoofing target.

If we want to spoof a system dll with the one provided by us, the new library must have identical exports (their names as well as the oridinal numbers). As I aimed to use the MINGW package to compile this project, every dll-creation specifics are characteristic to dllwrap only. First of all, I created the version.def file containing the exports together with their forwarding targets. We are not interested in injecting our code in any of the exports, but just having the DllMain function called, thus all the exported functions are simple wrappers to the original ones. The following listing presents the final version.def file:

GetFileVersionInfoA= myVersion.GetFileVersionInfoA     @1
GetFileVersionInfoExW= myVersion.GetFileVersionInfoExW     @2
GetFileVersionInfoSizeA= myVersion.GetFileVersionInfoSizeA     @3
GetFileVersionInfoSizeExW= myVersion.GetFileVersionInfoSizeExW     @4
GetFileVersionInfoSizeW= myVersion.GetFileVersionInfoSizeW     @5
GetFileVersionInfoW= myVersion.GetFileVersionInfoW     @6
VerFindFileA= myVersion.VerFindFileA     @7
VerFindFileW= myVersion.VerFindFileW     @8
VerInstallFileA= myVersion.VerInstallFileA     @9
VerInstallFileW= myVersion.VerInstallFileW    @10
VerLanguageNameA= myVersion.VerLanguageNameA    @11
VerLanguageNameW= myVersion.VerLanguageNameW    @12
VerQueryValueA= myVersion.VerQueryValueA    @13
VerQueryValueW= myVersion.VerQueryValueW    @14

As you can see, there are only 14 exported addresses, all of them pointing to their equivalements inside the original library – myVersion.dll. You can read more about DLL export forwarding in [1]. The last thing we need to build our fake version.dll file is the hotkey-handling code itself. Let’s begin with the DllMain part:

  HANDLE hinstDLL,
  DWORD dwReason,
  LPVOID lpvReserved
 return TRUE;

Nothing really interesting, just creating a thread beginning in the MessageOnlyWindow function. Note that the CreateThread function is called only once, right after the application is launched (since the DLL_PROCESS_ATTACH parameter is passed to every module right before the
program’s EntryPoint is called). Let’s go a step further:

DWORD MessageOnlyWindow(LPVOID arg)
  MSG msg;
  WNDCLASS wndclass; = 0;
  wndclass.lpfnWndProc = MainWndProc;
  wndclass.cbClsExtra = 0;
  wndclass.cbWndExtra = 0;
  wndclass.hInstance = GetModuleHandle(0);
  wndclass.hIcon = NULL;
  wndclass.hCursor = 0;
  wndclass.hbrBackground = 0;
  wndclass.lpszMenuName = NULL;
  wndclass.lpszClassName = TEXT("TotalcmdBringToTop"); 

  if(RegisterClass(&wndclass) == 0)
    return FALSE; 

  if(CreateWindow(TEXT("TotalcmdBringToTop"), TEXT("TotalcmdBringToTop"), 0,CW_USEDEFAULT, CW_USEDEFAULT,
                  CW_USEDEFAULT, CW_USEDEFAULT, HWND_MESSAGE, NULL, GetModuleHandle(0), NULL) == NULL)
    return FALSE;

  while(GetMessage(&msg, NULL, 0, 0))
    DispatchMessage(&msg);  }
  return msg.wParam;

It actually looks like a standard function creating a window (do we want to create any window? ^_*) and executing an event-handling loop.
The only interesting thing here is the HWND_MESSAGE contant passed as the CreateWindow argument. It indicates that we want to create a Message-Only window.

Such windows are usually created to handle some events that are not related to a particular window itself (read more in [2]). In this case, I used it to deal

with the WM_HOTKEY events generated everytime a keyboard hotkey is used. Everything should become clear after seeing the last part of the library code:

  switch (uMsg)
    case WM_CREATE:

    case WM_HOTKEY:
        HWND hMainWnd = FindWindow("TTOTAL_CMD",NULL);

    case WM_CLOSE:

      return (DefWindowProc(hWnd, uMsg, wParam, lParam));
  return 0;

What can be seen here is a hothey being registered and given the 0x1337 identifier, during the window initialization (RegisterHotKey @ MSDN). The new hotkey is associated with the current window through the hWnd handle passed as the first argument. Since now, we’re guaranteed to receive a WM_HOTKEY signal when the defined (MOD_ALT+1) combination is pressed. When the callback function receives such an event, the TotalCmd window handle is obtained (TTOTAL_CMD is the application window’s class), and then used to maximize and set focus on the main window. What should be noted is that the

FindWindow method is quite unreliable, since it will find only one window handle, which is a problem when there are many Total Commander instances running on one machine. We’ve got a design problem now: which of the windows should be chosen to focus on? The code can be easily extended to perform more advanced actions, this one is just a concept of how an application behaviour can be customized to fit our needs. When something goes wrong and a WM_CLOSE event occurs (for example, when the TotalCmd itself decides to exit), we use the 0x1337 ID to unregister our hotkey from the system.

After putting everything into one .cpp file, we can eventually compile the hack:

19:46:11 Vexillium> g++ dllmain.cpp -o dllmain.o -c
19:46:35 Vexillium> dllwrap --def version.def -o version.dll dllmain.o --driver-name g++
19:46:40 Vexillium> strip version.dll

The last thing to do is to copy the fake version.dll file to \totalcmd directory and do the same with the original VERSION module from \Windows\System32 (renaming it to myVersion.dll in the meanwhile). When copied, we can launch the totalcmd.exe executable and use the ALT+1 hotkey everytime we want to get back to totalcmd window.

The original code package can be downloaded from here.
Have a nice evening!

PS. The previous post has been updated – as I promised, a Proof of Code package can be downloaded now (link).

References & Links

  1. Exported functions that are really forwarders
  3. Dll Spoofing in Windows
  4. DLL forwarding is not the same as delay-loading
  5. An In-Depth Look into the Win32 Portable Executable File Format, Part 2
  6. RegisterHotKey Function