Páginas

sexta-feira, 6 de março de 2015

Assembly Adventure

 One day I decided to take a few hours of my day to research a bit about programming aware of Cache Memory. There is simply lots and lots of resources around, and I came to understand that this type of knowledge is a big extra when deciding which algorithms and data structure to use, and how to define their characteristics (the best choice is always context specific). There's an amazing resource, What every programmer should know about memory", which can be found here.

But I also found this article reposted in Gamasutra: “A Low Level Curriculum for C and C++", by Alex Darby (you can read it here). The article convinced me that Assembly Debugging is not that of an invincible monster – but it's not easy either. Come on, most programmers had to rely on it or some obscure output logging tricks and memory dumps, so why not give this a try?

And thus I've decided to follow the articles and play a bit around Visual Studio and generated Assembly code. This is purely for learning purposes, and understanding a bit more on how the compiler and it's optimization works, removing that feeling of “magical optimizer” that programmers might either distrust or trust too much. Note should be taken, my purpose here is not to optimize or develop in Assembly! During my tests and “discoveries”, I'll by posting then for the interested readers – I believe some might find this as interesting as I did. C++ code highlightning were made in hilite.me and Asm highlighting were made in tohtml.com.

Following the article, I changed my project settings (in Visual Studio) by setting “Basic Runtime Checks” to “default” in Debug builds, and started with the simplest code:


1
2
3
4
5
6
7
8
int main(int argc, char *argv[]) {
    int x = 1;
    int y = 2;
    int z = 0;

    z = x + y;
    return(0);
}

This a pretty dull code: simply adds two local variables to a third local variable. Here's the code generated by Visual Studio (be aware that these might change with compiler versions, configurations, and targets).

    int main(int argc, char *argv[]) {
00D81260  push        ebp  
00D81261  mov         ebp,esp  
00D81263  sub         esp,4Ch  
00D81266  push        ebx  
00D81267  push        esi  
00D81268  push        edi  
    int x = 1;
00D81269  mov         dword ptr [ebp-4],1  
    int y = 2;
00D81270  mov         dword ptr [ebp-8],2  
    int z = 0;
00D81277  mov         dword ptr [ebp-0Ch],0  

    z = x + y;
00D8127E  mov         eax,dword ptr [ebp-4]  
00D81281  add         eax,dword ptr [ebp-8]  
00D81284  mov         dword ptr [ebp-0Ch],eax  
    return(0);
00D81287  xor         eax,eax  
}
00D81289  pop         edi  
00D8128A  pop         esi  
00D8128B  pop         ebx  
00D8128C  mov         esp,ebp  
00D8128E  pop         ebp  
00D8128F  ret

The first six lines (00D81260 to 00D81268) are Stack operations. Remember, main is a function called from somewhere else. The stack is something too complicated for me to tackle right now, so the interested reader should research about it, or read the “A Low Level Curriculum for C and C++" series, in which Alex Darby did a very thorough explanation.

But it is important to understand a bit about the registers being used. EBP holds the base address of the “current” Stack, and ESP holds the top address of the “current” stack. Since in most machines and compilers, and unsurprisingly, mine too, the stack grows downwards, we can see that our local variables are accessed by negative positions offsets from EBP:

  • x address is EBP-4
  • y address is EBP-8
  • z address is EBP-0Ch

What? 0Ch? Well, 0 does not change the value, and the “h” it's to show the address is in hexadecimal. So, 0Ch is, converted to decimal, 12. I guess I can assume that ints are 4 bytes long on my machine!

Lines 00D8127E to 00D81284 executes our sum operation. The value at address EBP-4 (that is, x) is moved to the register EAX, and then the value at address EBP-8 is added to the same register. Address EBP-0Ch receives the value stored in EAX and our program is done! Integers and memory addresses are returned in the EAX register, so the register stores 0 by doing an exclusive or operation with it's own value. Neat! My guess is that Xor operations in register are the fastest way to zero their value. The following 6 lines are Stack operations, and again, I won't tackle them here.

Now, I'll change to release build, and compile optimized to speed. The resulting assembly just shows how meaningless this code is:

    int x = 1;
    int y = 2;
    int z = 0;

    z = x + y;
    return(0);
01381000  xor         eax,eax  
}
01381002  ret

In fact, the code seems to not do anything to the “outside of main”. Not even receive input or generate output. The compiler then just threw everything away and returned 0 right off the bat. So I just changed the code to return z directly:


1
2
3
4
5
6
7
8
int main(int argc, char *argv[]) {
    int x = 1;
    int y = 2;
    int z = 0;

    z = x + y;
    return(z);
}

And the resulting Assembly:

    int x = 1;
    int y = 2;
    int z = 0;

    z = x + y;
    return(z);
01161000  mov         eax,3  
}
01161005  ret

My guess is that the compiler noticed that z depends on two local variables that aren't changed anywhere else and just treated them as constants. To optimize for speed, it just set the main to return 3 directly, not needing to even “allocate” the variables in the compiled code. Even if we do more operations:

    int x = 1;
    int y = 2;
    int z = 0;

    x += y;
    z = 12;
    z = y + x;

    z += x + y;
    return(z);
00D11000  mov         eax,0Ah  
}
00D11005  ret

Awesome! You can see that on line 00D11000 the compiler did all the math and compiled the code to return a constant value. Indeed, this is way faster than doing the calculations during runtime. Some compilers, though, might not do this optimization automatically in some complicated circumstances. You can help it by setting variables that won't change as const:


1
2
3
4
5
6
7
8
9
int main(int argc, char *argv[]) {
    const int x = 1;
    const int y = 2;
    int z = 0;

    z = x + y;

    return(z);
}

Finally, as a curiosity, what if we declare a local variable as const, but actually do change its value? Changing a constant is not something that you'd usually do...


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
int main(int argc, char *argv[]) {
    const int x = 1;
    const int y = 2;
    int z = 0;

    int *cx = (int*)&x;
    *cx = 10;

    z = x + y;

    return(z);
}

Building as Debug, the Disassembly got quite long!

    int main(int argc, char *argv[]) {
00BF1260  push        ebp  
00BF1261  mov         ebp,esp  
00BF1263  sub         esp,54h  
00BF1266  mov         eax,dword ptr ds:[00BF600Ch]  
00BF126B  xor         eax,ebp  
00BF126D  mov         dword ptr [ebp-4],eax  
00BF1270  push        ebx  
00BF1271  push        esi  
00BF1272  push        edi  
    const int x = 1;
00BF1273  mov         dword ptr [ebp-8],1  
    const int y = 2;
00BF127A  mov         dword ptr [ebp-0Ch],2  
    int z = 0;
00BF1281  mov         dword ptr [ebp-10h],0  

    int *cx = (int*)&x;
00BF1288  lea         eax,[ebp-8]  
00BF128B  mov         dword ptr [ebp-14h],eax  
    *cx = 10;
00BF128E  mov         eax,dword ptr [ebp-14h]  
00BF1291  mov         dword ptr [eax],0Ah  

    z = x + y;
00BF1297  mov         dword ptr [ebp-10h],3  

    return(z);
00BF129E  mov         eax,dword ptr [ebp-10h]  
}
00BF12A1  pop         edi  
00BF12A2  pop         esi  
00BF12A3  pop         ebx  
00BF12A4  mov         ecx,dword ptr [ebp-4]  
00BF12A7  xor         ecx,ebp  
00BF12A9  call        00BF1019  
00BF12AE  mov         esp,ebp  
00BF12B0  pop         ebp  
00BF12B1  ret

First of all, I don't quite get why the first lines (Stack operations) are different from before. This is, to be honest, beyond my current knowledge (that isn't much, anyway). So now, in offsets from EBP, x is 8, y is 12, z is 16, and cx is 20. The lines here confused me a bit, so let's go one by one. Please remember, I'm new to this, so I might be wrong.

00BF1288: LEA is Load Effective Address. It will put the address of the second operand into the register specified by the first operand. So it is putting the address of memory location EBP-8, that is X, into the register EAX.

00BF128B: The value stored in EAX is copied to the memory at location EBP-14h, that is cx. If I'm right, dword ptr is something like Double Word Pointer. Double to mean 32 bits, in my case..

Thought pointers were confusing when first learning them? Its the same feeling when working on them on Assembly...


  • 00BF128E: Copies the value stored at memory address EBP-14h, that is, the address of EBP-8 (x) to the EAX register.

  • 00BF1291: Copies the number 0Ah (10 converted to hexadecimal) to the memory address that is stored in EAX. We effectively managed to change the value of the constant x from 1 to 10!

  • 00BF1297: Finally, we store the sum of x + y that is... Hey! The compiler optimized this, and stored 3 into z directly, even though we changed x from 1 to 10. The compiler missed our changing of the constant x. Makes sense, since we aren't supposed to change constants, but, as you can see, we got an different behavior than what we would think from reading the code.

Of course, it would be way easier to see that z has the unexpected value by using the debugger than by reading the Disassembly window, but the point here is to satisfy the curiosity about low level. Next time, I'll be playing with arrays allocation and initializations. See you there!