When trying to understand a binary, it’s key to be able to identify functions, and with them, their parameters and local variables. This will help the reverser figuring out APIs, data structures, etc. In short, gaining a deep understanding of the software. When dealing with functions, it’s essential to be able to identify the calling convention in use, as many times that will allow the reverser to perform educated guesses on the arguments and local variables used by the function. I’ll try to describe here a couple of points that may aid in identifying the calling convention of any given function and the number and ordering of its parameters.
Calling Conventions
A calling convention defines how functions are called in a program. They influence how data (arguments/variables) is laid on the stack when the function call takes place. A comprehensive definition of calling conventions is beyond the scope of this blog, nonetheless the most common ones are briefly described below.
cdecl
Description: Standard C/C++ calling convention. Allows functions to receive a dynamic number of parameters.
Cleans the stack: The caller is responsible for restoring the stack after making a function call.
Arguments passed: On the stack. Arguments are received in reverse order (i.e. from right to left). This is because the first argument is pushed onto the stack first, and the last is pushed last.
void _cdecl fun();
fastcall
Description: Slightly better performance calling convention.
Cleans the stack: The callee is responsible for restoring the stack before returning.
Arguments passed: First two arguments are passed in registers (ECX and EDX). The rest are passed through the stack.
void __fastcall fun();
stdcall
Description: Very common in Windows (used by most APIs).
Cleans the stack: The callee is responsible for cleaning up the stack before returning. Usually by means of a RETN #N instruction.
Arguments passed: On the stack. Arguments received from left to right (opposite to cdecl). First argument is pushed last.
void __stdcall fun();
thiscall
Description: Used when C++ method with a static number of parameters is called. Specially thought to improve performance of OO languages (saves EDX for the this pointer with VC++. GCC pushes the this pointer onto the stack last). When a dynamic number of parameters is required, compilers usually fall back to cdecl and pass the this pointer as the first parameter on the stack.
Cleans the stack: In GCC, caller cleans the stack. In Microsoft VC++ the callee is responsible for cleaning up.
Arguments passed: From right to left (as cdecl). First argument is pushed first, and last argument is pushed last.
void __thiscall fun();
Let the small table below serve as a quick reminder.
Figuring it out
To determine the calling convention for a given function we have to look at the function’s prologue and epilogue. They’ll provide information to narrow down the options and will help discovering the number of parameters and arguments of the function. The first thing is to find out who is building up and tearing down the stack.
If the caller is responsible of cleaning up the stack we’re more than likely looking at a cdecl function. Certainly, it could also be a GCC thiscall, in which case there would be one extra argument (the this pointer) pushed onto the stack. The latter is less common, and to tell apart we’ll need to spot that pointer. In other words, if the function takes one or more parameters (usually references ebp+X, with X>=8) and ends with a simple RET with no operands, the calling convention is most likely cdecl. See below example:
cdecl_fun: mov eax, dword ptr [ebp+8] mov ecx, dword ptr [ebp+c] [...] mov eax, 1 # return value in eax ret # no stack wind-up
If the callee is responsible for tearing down the stack, there are more options to start with. Our options at this stage would be VC++ thiscall, stdcall and fastcall. It gets complicated for functions with 0 or 1 parameter. However, a function with just 1 parameter may not require that we completely identify the calling convention, as there’s no doubt about the parameter ordering. The following tips will help you identify them on the rest of the cases.
If a valid pointer is loaded into ECX before calling a function, and the parameters are pushed onto the stack without using EDX, we’re looking at a VC++ thiscall. See example ASM below.
push ebp mov ebp, esp mov eax, [ebp+8] mov ecx, [ebp+c] […] pop ebp retn 8
If both ECX and EDX are used within the function without being initialized (meaning they are used as parameters and were loaded with valid data by the caller), we’re looking at a fastcall. See example ASM below
push ebp mov ebp, esp mov eax, dword ptr [ecx+c] mov ebx, dword ptr [edx] add eax, ebx mov ebx, dword ptr [ebp+8] [...] pop ebp retn 4
If all arguments are on the stack and the ending ret instruction has an argument whose value is at least four times the number of parameters for the function, we’re looking at a stdcall. In case the value is less than four, we might be talking about a fastcall with three or more arguments. See example ASM below.
push ebp mov ebp, esp mov eax, dword ptr [ebp+8] mov ecx, dword ptr [ebp+c] [...] mov eax, 1 # return value in eax ret 8 # no stack wind-up
Arguments
For those calling conventions where the callee is responsible for restoring the stack before returning, the argument passed to the ret instruction is very helpful to guess the number of arguments the function receives. Without any further observation, a simple instruction like the one below offers a lot of information.
retn 8
We can make an educated guess based on that retn. First, we know is not cdecl, since the function is unwinding the stack and not leaving that task to the caller. We also know that the number of arguments for the function is at least 2, since it unwinds 8 bytes from the stack, and can be up to 4 (if the calling convention were fastcall the first two would be in ECX and EDX). All this, of course, assuming 32 bits parameters and a 32 bits architecture.
Conclusion
In order to decipher undocumented APIs, it’s key to identify the calling convention in use. It’s obvious at this point that a different calling convention would change the signature of a function from fun(p1,p2,p3) to fun(p3,p2,p1), therefore the need to identify it clearly. I hope it’s more than evident that figuring out the calling convention, as well as the number of parameters a function takes, it’s the first step to try and understand it’s inner workings.
As always, if there’s anything to add, ask or correct, don’t hesitate to comment!
Take care!
Responder