Aww, another month or even more has apparently passed just in front of my eyes. As some of you might have realized, the school time have already ended (something like two weeks ago), thus allowing me to carry out some more research and remember about this blog. I expect some more posts to be written in the very next days, hope this will succeed.

In this particular post, I would like to describe some curiosities I found inside the kernel32.dll (and KernelBase.dll in case of Windows 7 RC) and ntdll.dll default Windows libraries. Not only want I to share the ideas that occured to me during this small research, but also I would like to hear some new techniques of making use of what I found. Feel free to add new facts/ideas regarding this post, as I could overlook some obvious assumption or things like this. Remember this is not and shouldn’t be considered a thorough report. To make everything clear, the entire post covers the situation on x86 versions of Microsoft Windows systems.

Actually, I want to write about a few things, all of which are listed below:

  • Modyfing the initial main thread CONTEXT structure via static DllMain code
  • LoadLibrary/FreeLibrary APIs handling the module reference counter
  • Undocumented FreeLibrary(AndExitThread) functionality and practical ways of making use of it
  • Fooling module managment APIs by crafting the TEB structure

Despite I would like to publish one post covering all the points listed above, I’ll try to create a short series of posts, each describing a separate mechanism. This should both help me manage the information and avoid making the reader feel overwhelmed by the contents.

DllMain function and the initial process context

One of the most interesting things I’ve recently came across was a short post on Skywing’s blog, saying about a neither documented nor well-known fact about what the lpReserved DllMain parameter contains in reality. It isn’t my intention to rewrite his post, so I’ll just try to summarise the main point (although I encourage to read it first, if you haven’t done it yet). The author uncovers some very curious fact that have been documented very poorly by Microsoft – a mysterious lpReserved parameter of a standard DllMain module entry function:

__in  HINSTANCE hinstDLL,
__in  DWORD fdwReason,
__in  LPVOID lpvReserved

Due to its type and name, one could assume the argument is not important to a programmer and is presumably set to NULL. However, the documentation gives a small sign that there’s something more about this value:

If fdwReason is DLL_PROCESS_ATTACH, lpvReserved is NULL for dynamic loads and non-NULL for static loads.

If fdwReason is DLL_PROCESS_DETACH, lpvReserved is NULL if FreeLibrary has been called or the DLL load failed and non-NULL if the process is terminating.

As the above quotation informs, our library is able to tell a static load from a dynamic one and execute proper piece of code in regard to the current situation. That is basically everything, when it comes to the information provided by Microsoft itself. As Skywing shows, yet there is more data passed to our routine during the process initialization – LPVOID lpReserved parameter can be successfully treated as PCONTEXT lpvContext as well! The CONTEXT structure being pointed to is the main thread processor context, set using the NtContinue system call after the PE loader finishes process initialization. The mechanism causing the parameter to be so have been already described, so I will only try to give some examples of how this little curiosity can be used in reality. On the other hand, one must remember that even though this ‘feature’ is confirmed to work on current Windows versions, it is not guaranteed to stay in newer systems forever.

Let’s take a look at some of the possible scenarios:

  • Unpackers altering the EntryPoint field

If an exe-packer developer took advantage of the fact that DllMain function has full control over the main thread’s initial context, it could be succesfully used to create a special unpacking library. After compressing the main Portable Executable file, one of its import records would be altered so as to contain a reference to unpack.dll. The “additional” module’s goal would be to dynamically allocate a small piece of memory, copy the loader’s code there and change lpvContext->Eip to make it point to the new code. After the load stub generated by unpack.dll finished its execution, every single memory page allocated by code related to the unpacker would be freed (including the unpack.dll image section). The purpose of such approach is to make the PE file remain as clean as possible (modifying the smallest number of header fields etc). The same goes to the in-memory application layout, that would be intended not to contain any additional memory blocks in comparison to the original process.

  • Choosing EntryPoint depending on the system version

Since the initial context contents clearly differ between various versions of Microsoft Windows, this fact could be used as an OS-detection technique. There is a number of possibilities of what particular fields could be taken into account, such as lpvContext->Eip (pointing somewhere inside kernel32.dll, additional execution layer before the EntryPoint) or the stack contents (pointed by lpvContext->Esp). If an exemplary application had problems with some specific Windows version, or some operations had to be handled in different ways, the developer could simply create a separate entry function for each system and assign context values characteristic for a given OS to appropriate EP routine.

  • Debug Registers modification

Playing with the Debug Register values could be used both as a software protection layer and an additional debugging functionality. It is possible to set up to 4 hardware breakpoints by making use of DRx. Having control over these fields could be used to take advantage of this debug functionality from the very beginning of application’s execution path. Despite the practical advantages, it can be also used in order to fool a reverser trying to analyze our code – I would probably have much trouble finding out how do the breakpoints magically appear at EntryPoint, unless I wouldn’t thorougly check the imports’ DllMain code first.

  • Creating an EntryPoint detour

Another security-related idea is to set-up a kind of “trampoline” code that would be executed just before OEP and return to the original address right after doing it’s job (accomplished by simply changing lpvContext->Eip). The additional code could do whatever the author wanted it to do – i.e. perform some additinal executable integrity checks (checksums and things) etc. I find this technique very effective because of a few reasons. Firstly, what I’m writing about here is poorly-documented and unknown to most of people. Secondly, the fact of any “external” code ran before EP seems to be almost invisible for those using debuggers like OllyDbg, which sets an initial breakpoint on the EntryPoint value taken from the PE header and waits for it to execute. If the context modification routine was decently obfuscated, one could have real difficulty dealing with such a trick.

  • Setting the TF bit in EFLags register with a SE handler

Yet one more anti-debugging trick, making use of the EFLags bit mask – particurarly Trap Flag, one of its members. This “trick” is old and well known by the community, though it can be quite a surprise when used in the context of initial flags value. Before executing the first program instruction, an exception of EXCEPTION_SINGLE_STEP type would be triggered and the execution would be passed to the already-installed handlers. The absence of an exception would indicate the presence of an active debugger consuming the exception.

  • Zeroing specific segment registers

Next technique I consider pretty universal in terms of possible applications. Having one of the segment register’s value set to NULL could be used for a great variety of purposes, such as monitoring the application’s memory references, code obfuscation through exceptions and more. The advantages of controlling the segment registers in protected mode environment is going to be covered in a separate post.

  • Passing information between two or more static modules

As Gynvael Coldwind suggested, the fact that all the libraries being statically loaded into a new process respectively operate on the same piece of memory, could be used in order to let the modules “communicate” with each other, however “passing information” sounds better for me. In order to do this, one could use the extisting main thread stack. I am curious about possible realistic scenarios making use of this idea, waiting for any interesting concepts : P

  • Running the application using ret-based programming

Last but not least, an idea that occured to me just a few minutes ago – return-oriented programming! This subject has already been thorougly documented by many undependant researchers (a few presentations on more or less popular conferences have been also held). If you want to get familiar with the technique basis, you are strongly advised to visit Generalizing Return-Oriented Programming to RISC and Return-Oriented Exploiting. In general, return oriented programming aims to create a fully executable code by linking together assembly code snippets ending with a “ret” instruction. The idea itself is just a way of doing things, thus it is not a generic solution for one, specific problem – rather a tool that can be used for various purposes. However, code of this type can mainly be seen in very advanced ret-to-libc exploits. In my case, I would like to use it as an obfuscation (anti-reversing) technique additinally supported by the DllMain trick.

The very first task is to generate the initial stack data, containing pre-generated return-oriented code components of separate “opcode” instructions – executable memory pointers together with their parameters. You can learn how to generate such code from the aforementioned papers. Having this code, the only remaining objective is to replace the lpvContext->Esp value with the dynamically-generated stack and make the lpvContext->Eip field point to a ret (0xc3) instruction. The last step could be as well simplified by setting Eip the top stack value (thus avoiding finding and executing the “ret” instruction). As soon as all the modifications are applied, the process should begin its execution right from the indirect code placed on the emulated stack. By using such approach, the coder can be sure that the original EntryPoint will never be reached, unless the return-oriented code decides to do so. This technique used together with the DllMain hack could also be used to develop something similar to a simple VM execution environment, with the bytecode (in the form of return-oriented code) placed on the stack. Even though performing dynamic analysis (debugging) on such code does not look like a hopeless task, I think that static analysis could be considered so.

In general, the above list presents every single thing I was able to think of so far.

What is more, I managed to prepare a small bonus – a very simple, exemplary KeygenMe application showing how some of the presented ideas work on a real computer. One note: the global protection scheme is much more important and should be more concentrated on than the application’s mechanics (CryptoAPI functions and so on). If you encounter any problems with the provided executable, please let me know (as for now, it has been confirmed to work on Windows XP SP3 and Windows Vista SP2). The package can be downloaded from HERE.

Furthermore, I would be very pleased to get informed about any other suggestions regarding this subject (if you find any kind of mistake, appropriate feedback will be also appreciated).

This is actually all for now, I will do my best to carry on writing posts about other interesting Windows internal stuff I often come across ; )