@Psy and Danny:
Haha, it's more or less something that you should never ever do in production, but is still very interesting
1
void disp(int n){printf("%d ", n&(0xffff));}
This creates a function disp that when given an integer, issues a call to printf to print the first 16 bits (2bytes) of that integer.
Now a backslash (\) in a string denotes that the following sequence of characters are to be considered as a special escape sequence. '\x0' is equivalent to the ascii character whose character-code is 0, and '\x20' is equivalent to the ascii character whose code is 32 (x20 is in hexadecimal). But let's break from tradition here. Because is character is just a byte in memory, we can also consider a string to be a byte stream (array of byte) containing some arbitrary data.
Here's the interesting part:
let's assume that you have some arbitrary function
We can actually recast this function to do something else entirely:
1
(int(*)()) &random_func; // Casts random_func to a function that returns int
so that when you call int i = ((int(*)()) &random_func)(), the function will actually return an integer which is assigned to i.
This is what int(*)() means. It's a function signature that denotes the return type of some function.
While people don't usually cast strings into functions, it's possible.
1
(int(*)())"Hi Mom!" // <- this is a function. You can call it.
// Note: "Hi Mom!" has the type (const char*), which is a pointer to a part of the memory that is created OFF the heap (not important right now, I'll talk about this later) and is actually an integer type.
Of course, when you do actually call that function, you will get an error because the computer doesn't know what to do when it tries to compute "Hi Mom!".
So what can be used to call a function?
Well, since strings can be converted into functions, why can't we convert functions into strings?
1
(char*) &random_func; // This is what the string of a func looks like
I won't bore you with the details but upon further investigation, a function is made up of a bunch of illegible characters that form the machine code of the function (surprise).
So if we simply type machine code into a string and then cast it into a function, it'll compile and run that machine code right?
Well, not quite. Take the following instance for example:
1
2
((int(*)())"\xc3... valid machine code ...")() // This will run and return an arbitrary number
((int(*)())(char*)"\xc3... valid machine code ...")() // This will crash and cause a segmentation fault
If the first compiled, and then we casted that string to the same string and then into a function, why would that fail?
The simple answer is that only (const char*) can be executed.
More By default, the initialization of the char* will be create a variable in the BSS stack that links to a part of the memory that I will call the heap (or anything created from malloc/calloc). This is where all of the dynamic memory resides, so the creators of our operating systems have wisely decided that the IP (instruction pointer, which points to the address in memory where the current instruction is located at. Basically whatever code it points to gets executed) can never enter this area as it'll be too volatile. However, when you create a const char* or a string in "" (dbl quotes), you are declaring this during compile time so this data is hard coded into the instruction table itself. This in turn means that this data can be executed during runtime.
So onto the rest of the code
"\x66\x31\xc0\x8b\x5c\x24\x04\x66\x40\x50\xff\xd3\x58\x66\x3d\xe8\x03\x75\xf4\xc3"
This is machine code that can be roughly translated into the following assembly:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
int FORLOOP(int addr){ // addr is four bytes below the top of the stack (which is at the bottom) -> 4(%exp) or [%esp + 4]
// From now on, (;) starts a comment
xor %ax, %ax ; zeros ax
mov 4(%esp), %ebx ; saves esp+4 (addr of addr or the function) to ebx
body: ; this isn't actually in the code
inc %ax ; add one to ax (this will print from 1 to 1000 instead of 0-999)
push %eax ; pushes eax onto the stack so that it'll be passed into the function call next line
call *%ebx ; calls *addr, which should point to a function such as &disp
pop %eax ; pop eax off the stack assuming disp did not use eax
cmp $1000, %ax ; compare
jne body ; if ax is not 1000, goto body
ret
}
So by passing in the address of the disp function, this calls disp 1000 times, each time passing in the %ax registor, which is used as the counter of the for loop.