X64 Assembly
From UIC
x64 Assembly
Contents |
| Infos | |
|---|---|
| Author: | Daniel Pistelli |
| Email: | |
| Website: | http://ntcore.com |
| Date: | 01/01/2007 (dd/mm/yyyy) |
| Level: |
|
| Language: | English |
| Comments: | |
Links e References
Introdution
This article is extracted from "Moving to Windows x64It was published into New Years Pack 2: downloadable from 'Documents from UIC'" by Daniel Pistelli (Ntoskrnl)
Essay
Now I'll try to explain the basics of x64 assembly. I assume the reader is already familiar with x86 assembly, otherwise he won't be able to make heads or tails of this paragraph.
Moreover, since this is just a very (but very) brief guide, you'll have to look into the AMD64 documentation for more advanced stuff. Some stuff I won't even mention, you'll see by yourself that some instructions are no longer in use: for instance, that the lea instruction has completely taken place of the mov offset.
What you're going to notice at once is that there are some more registers in the x64 syntax:
- 8 new general-purpose registers (GPRs).
- 8 new 128-bit XMM registers.
Of course, all general-purpose registers are 64 bits wide. The old ones we already knew are easy to recognize in their 64-bit form: rax, rbx, rcx, rdx, rsi, rdi, rbp, rsp (and rip if we want to count the instruction pointer). These old registers can still be accessed in their smaller bit ranges, for instance: rax, eax, ax, ah, al.
The new registers go from r8 to r15, and can be accessed in their various bit ranges like this: r8 (qword), r8d (dword), r8w (word), r8b (low byte).
Here's a figure taken from the AMD docs:
Applications can still use segments registers as base for addressing, but the 64-bit mode only recognizes three of the old ones (and only two can be used for base address calculations). Here's another figure:
And now, the most important things. Calling convention and stack. x64 assembly uses FASTCALLs as calling convention, meaning it uses registers to pass the first 4 parameters (and then the stack).
Thus, the stack frame is made of: the stack parameters, the registers parameters, the return address (which I remind you is a qword) and the local variables.
The first parameter is the rcx register, the second one rdx, the third r8 and the fourth r9. Saying that the parameters registers are part of the stack frame, makes it also clear that any function that calls another child function has to initialize the stack providing space for these four registers, even if the parameters passed to the child function are less than four.
The initialization of the stack pointer is done only in the prologue of a function, it has to be large enough to hold all the arguments passed to child functions and it's always a duty of the caller to clean the stack. Now, the most important thing to understand how the space is provided in the stack frame is that the stack has to be 16-byte aligned.
In fact, the return address has to be aligned to 16 bytes. So, the stack space will always be something like 16n + 8, where n depends on the number of parameters. Here's a small figure of a stack frame:
Don't worry if you haven't completely figured out how it works: now we will see a few code samples, which, in my opinion, always make the theory a lot easier to understand. Let us take for instance a hello-world application like:
{
MessageBox(NULL, _T("Hello World!"), _T("My First x64 Application"), 0);
return 0;
}
This code disassembled would look like:
.text:0000000000401220
.text:0000000000401220 arg_0= qword ptr 8
.text:0000000000401220 arg_8= qword ptr 10h
.text:0000000000401220 arg_10= qword ptr 18h
.text:0000000000401220 arg_18= dword ptr 20h
.text:0000000000401220
.text:0000000000401220 mov [rsp+arg_18], r9d
.text:0000000000401225 mov [rsp+arg_10], r8
.text:000000000040122A mov [rsp+arg_8], rdx
.text:000000000040122F mov [rsp+arg_0], rcx
.text:0000000000401234 sub rsp, 28h
.text:0000000000401238 xor r9d, r9d ; uType
.text:000000000040123B lea r8, Caption ; "My First x64 Application"
.text:0000000000401242 lea rdx, Text ; "Hello World!"
.text:0000000000401249 xor ecx, ecx ; hWnd
.text:000000000040124B call cs:MessageBoxA
.text:0000000000401251 xor eax, eax
.text:0000000000401253 add rsp, 28h
.text:0000000000401257 retn
.text:0000000000401257 sub_401220 endp
The stack pointer initialization is all about the things I said earlier.
Since we are calling a child-function with parameters we need the space for all four parameter registers (0x20, this value is already aligned to 16 byte) and the return address (0x08). Thus, we'll have 0x28.
Remember that if the stack-value is too small or is not aligned, your code will crash at once. Also, don't wonder why there's no ExitProcess in this function: compiling the code above with Visual C++ adds always a stub (WinMainCRTStartup) which then calls our WinMain.
So, the ExitProcess is in the stub code. But what happens when the code before the MessageBox calls a function which take seven parameters instead of four?
.text:0000000000401180 ; sub_4011F0+11 p
.text:0000000000401180
.text:0000000000401180 var_28= qword ptr -28h
.text:0000000000401180 var_20= qword ptr -20h
.text:0000000000401180 var_18= qword ptr -18h
.text:0000000000401180
.text:0000000000401180 sub rsp, 48h
.text:0000000000401184 lea rax, unk_402040
.text:000000000040118B mov [rsp+48h+var_18], rax
.text:0000000000401190 lea rax, unk_402044
.text:0000000000401197 mov [rsp+48h+var_20], rax
.text:000000000040119C lea rax, unk_402048
.text:00000000004011A3 mov [rsp+48h+var_28], rax
.text:00000000004011A8 lea r9, qword_40204C ; __int64
.text:00000000004011AF lea r8, qword_40204C+4 ; __int64
.text:00000000004011B6 lea rdx, unk_402054 ; __int64
.text:00000000004011BD lea rcx, aAa ; "ptr"
.text:00000000004011C4 call TakeSevenParameters
.text:00000000004011C9 xor r9d, r9d ; uType
.text:00000000004011CC lea r8, Caption ; "My First x64 Application"
.text:00000000004011D3 lea rdx, Text ; "Hello World!"
.text:00000000004011DA xor ecx, ecx ; hWnd
.text:00000000004011DC call cs:MessageBoxA
.text:00000000004011E2 add rsp, 48h
.text:00000000004011E6 retn
.text:00000000004011E6 sub_401180 endp
As said, the child function takes 7 parameters, making it necessary to provide space for 3 extra parameters on the stack. So, 7 * 8 = 0x38, which aligned to 16byte is 0x40. Providing, then, space for the return address makes it 0x48, our value indeed.
I think you have understood the stack-frames logic by now, it's actually quite easy to understand it, but it needs a second to revert from the old x86/stdcall logic to this one. But now enough of this, now that we've seen how the x64 code works, we'll try compiling an assembly source by ourselves.
Before we start, I have to make something clear. There are some assemblers over the internet which make the job easier, mainly because the initialize the stack by themselves or they create code that is easy to converto from/to x86.
But I think that is not the point here in this article. In fact, I'm going to use the microsoft assembler (ml64.exe), which requires you to write everything down, just like in the disassembly. Another option could be compiling the with another assembler and then link it with ml64.
I think the reader should really make these decisions on his own. As far as I am concerned, I don't believe that much code should be written in assembly and avoided whenever it could be done. This new x64 technology is a good opportunity to re-think about these matters.
In the last years I always wrote 64-bit compatible code in C/C++ (I mean unmanaged, of course) and when I had to recompile a project of 70,000 lines of code for x64, I didn't had to change one single line of code (I'll talk about the C/C++ programming later). Despite of all the macros an assembler offers, I seriously doubt that people who wrote their whole code in assembly will be able to switch so easily to x64 (remember one day even the IA64 syntax could be adopted). I think in most cases the obvious choice will be not converting to the new technology and stick to x86, but this isn't always possible, it depends on the software category.
The microsoft assembler is contained in the SDK and in the DDK (WDK for Vista). Right now, I'm using Vista's WDK, which I freely downloaded from the msdn. The first sample of code I'm going to show you is a simple Hello-World messagebox application.
extrn ExitProcess : proc
.data
body db 'Hello World!', 0
capt db 'My First x64 Application', 0
.code
Main proc
sub rsp, 28h
xor r9d, r9d ; uType = 0
lea r8, capt ; lpCaption
lea rdx, body ; lpText
xor rcx, rcx ; hWnd = NULL
call MessageBoxA
xor ecx, ecx ; exit code = 0
call ExitProcess
Main endp
end
As you can see, I didn't bother unwinding the stack, since I call ExitProcess. The syntax is very similar to the old MASM one, although there are a few dissimalirites. The ml64 console output should be something like this:
The command line to compile is:
/defaultlib:C:\WinDDK\6000\lib\wnet\amd64\kernel32.lib
/defaultlib:C:\WinDDK\6000\lib\wnet\amd64\user32.lib
/entry:Main
If the libs are not in the same directory as ml64.exe, you'll have to provide the path like I did. The entry has to be provided, otherwise you would have to use WinMainCRTStartup as main entry.
The next sample of code I'm going to show you displays a window calling CreateWindowEx. What you're going to learn through this code is structure alignment and how integrating resources in your projects.
Like I said earlier, I don't want to encourage you to write your windows in assembly, but I believe that this sort of code is good for learning. Now the code, afterwards the explanation.
extrn MessageBoxA : proc
extrn RegisterClassExA : proc
extrn CreateWindowExA : proc
extrn DefWindowProcA : proc
extrn ShowWindow : proc
extrn GetMessageA : proc
extrn TranslateMessage : proc
extrn DispatchMessageA : proc
extrn PostQuitMessage : proc
extrn DestroyWindow : proc
extrn ExitProcess : proc
WNDCLASSEX struct
cbSize dd ?
style dd ?
lpfnWndProc dq ?
cbClsExtra dd ?
cbWndExtra dd ?
hInstance dq ?
hIcon dq ?
hCursor dq ?
hbrBackground dq ?
lpszMenuName dq ?
lpszClassName dq ?
hIconSm dq ?
WNDCLASSEX ends
POINT struct
x dd ?
y dd ?
POINT ends
MSG struct
hwnd dq ?
message dd ?
padding1 dd ? ; padding
wParam dq ?
lParam dq ?
time dd ?
pt POINT <>
padding2 dd ? ; padding
MSG ends
.const
NULL equ 0
CS_VREDRAW equ 1
CS_HREDRAW equ 2
COLOR_WINDOW equ 5
//; WS_OVERLAPPEDWINDOW = (WS_OVERLAPPED | WS_CAPTION | WS_SYSMENU | WS_THICKFRAME |
WS_MINIMIZEBOX | WS_MAXIMIZEBOX)
WS_OVERLAPPEDWINDOW equ 0CF0000h
CW_USEDEFAULT equ 80000000h
SW_SHOW equ 5
WM_DESTROY equ 2
WM_COMMAND equ 111h
IDC_MENU equ 109
IDM_ABOUT equ 104
IDM_EXIT equ 105
.data
szWindowClass db 'FirstApp', 0
szTitle db 'My First x64 Windows', 0
szHelpTitle db 'Help', 0
szHelpText db 'This will be a big help...', 0
.data?
hInstance qword ?
hWnd qword ?
wndclass WNDCLASSEX <>
wmsg MSG <>
.code
WndProc: //; proc hWnd : qword, uMsg : dword, wParam : qword, lParam : qword
mov [rsp+8], rcx // ; hWnd (save parameters as locals)
mov [rsp+10h], edx // ; Msg
mov [rsp+18h], r8 // ; wParam
mov [rsp+20h], r9 // ; lParam
sub rsp, 38h
cmp edx, WM_DESTROY
jnz @next1
xor ecx, ecx //; exit code
call PostQuitMessage
xor rax, rax
ret
@next1:
cmp edx, WM_COMMAND
jnz @default
mov rbx, rsp
add rbx, 38h
mov r10, [rbx+18h] // ; wParam
cmp r10w, IDM_ABOUT
jz @about
cmp r10w, IDM_EXIT
jz @exit
jmp @default
@about:
xor r9d, r9d
lea r8, szHelpTitle
lea rdx, szHelpText
xor ecx, ecx
call MessageBoxA
jmp @default
@exit:
mov rbx, rsp
add rbx, 38h
mov rcx, [rbx+8h] // ; hWnd
call DestroyWindow
@default:
mov rbx, rsp
add rbx, 38h
mov r9, [rbx+20h] // ; lParam
mov r8, [rbx+18h] // ; wParam
mov edx, [rbx+10h] // ; Msg
mov rcx, [rbx+8] // ; hWnd
call DefWindowProcA
add rsp, 38h
ret
MyRegisterClass: //; proc hInst : qword
sub rsp, 28h
mov wndclass.cbSize, sizeof WNDCLASSEX
mov eax, CS_VREDRAW
or eax, CS_HREDRAW
mov wndclass.style, eax
lea rax, WndProc
mov wndclass.lpfnWndProc, rax
mov wndclass.cbClsExtra, 0
mov wndclass.cbWndExtra, 0
mov wndclass.hInstance, rcx
mov wndclass.hIcon, NULL
mov wndclass.hCursor, NULL
mov wndclass.hbrBackground, COLOR_WINDOW
mov wndclass.lpszMenuName, IDC_MENU
lea rax, szWindowClass
mov wndclass.lpszClassName, rax
mov wndclass.hIconSm, NULL
lea rcx, wndclass
call RegisterClassExA
add rsp, 28h
ret
InitInstance: //; proc hInst : qword
sub rsp, 78h
mov rax, CW_USEDEFAULT
xor rbx, rbx
mov [rsp+58h], rbx // ; lpParam
mov [rsp+50h], rcx // ; hInstance
mov [rsp+48h], rbx // ; hMenu = NULL
mov [rsp+40h], rbx // ; hWndParent = NULL
mov [rsp+38h], rbx // ; Height
mov [rsp+30h], rax // ; Width
mov [rsp+28h], rbx //; Y
mov [rsp+20h], rax // ; X
mov r9d, WS_OVERLAPPEDWINDOW //; dwStyle
lea r8, szTitle //; lpWindowName
lea rdx, szWindowClass //; lpClassName
xor ecx, ecx // ; dwExStyle
call CreateWindowExA
mov hWnd, rax
mov edx, SW_SHOW
mov rcx, hWnd
call ShowWindow
mov rax, hWnd //; set return value
add rsp,78h
ret
Main proc
sub rsp, 28h
xor rcx, rcx
call GetModuleHandleA
mov hInstance, rax
mov rcx, rax
call MyRegisterClass
test rax, rax
jz @close //; if the RegisterClassEx fails, exit
mov rcx, hInstance
call InitInstance
test rax, rax
jz @close //; if the InitInstance fails, exit
@handlemsgs: //; message processing routine
xor r9d, r9d
xor r8d, r8d
xor edx, edx
lea rcx, wmsg
call GetMessageA
test eax, eax
jz @close
lea rcx, wmsg
call TranslateMessage
lea rcx, wmsg
call DispatchMessageA
jmp @handlemsgs
@close:
xor ecx, ecx
call ExitProcess
Main endp
end
As you can see, I tried to stay as low level as I could. The reason why I avoided for other functions other than the main the proc macro is that the ml64 puts a prologue end an epilogue, which I didn't want, by itself.
Avoiding the macro made it possible to define my own stack frame without any intermission by the compiler. The first thing to notice scrolling this code is the structure:
hwnd dq ?
message dd ?
padding1 dd ? //; padding
wParam dq ?
lParam dq ?
time dd ?
pt POINT <>
padding2 dd ? // ; padding
MSG ends
It requires two paddings which the x86 declaration of the same structure didn't. The reason, in a few words, is that qword members should be aligned to qword boundaries (this for the first padding).
The additional padding at the end of the structure follows the rule that: every structure should be aligned to its largest member. So, being its largest member a qword, the structure should be aligned to an 8-byte boundary.
To compile this sample, the command line is:
/defaultlib:C:\WinDDK\6000\lib\wnet\amd64\kernel32.lib
/defaultlib:C:\WinDDK\6000\lib\wnet\amd64\user32.lib /entry:Main
c:\myapp\test.res
test.res is a file I took from a VC++ wizard project, I was too lazy to make on by myself. Anyway, making a resource file is very easy with the VC++, but no one forbids you to use the notepad, it just takes more time.
To compile the resource file all you need to do is to use the command line: "rc test.rc".
I think the rest of the code is pretty easy to understand. I didn't cover everything with this paragraph, but now you should have quite a good insight into x64 assembly.
Daniel Pistelli
Disclaimer
I documenti qui pubblicati sono da considerarsi pubblici e liberamente distribuibili, a patto che se ne citi la fonte di provenienza. Tutti i documenti presenti su queste pagine sono stati scritti esclusivamente a scopo di ricerca, nessuna di queste analisi è stata fatta per fini commerciali, o dietro alcun tipo di compenso. I documenti pubblicati presentano delle analisi puramente teoriche della struttura di un programma, in nessun caso il software è stato realmente disassemblato o modificato; ogni corrispondenza presente tra i documenti pubblicati e le istruzioni del software oggetto dell'analisi, è da ritenersi puramente casuale. Tutti i documenti vengono inviati in forma anonima ed automaticamente pubblicati, i diritti di tali opere appartengono esclusivamente al firmatario del documento (se presente), in nessun caso il gestore di questo sito, o del server su cui risiede, può essere ritenuto responsabile dei contenuti qui presenti, oltretutto il gestore del sito non è in grado di risalire all'identità del mittente dei documenti. Tutti i documenti ed i file di questo sito non presentano alcun tipo di garanzia, pertanto ne è sconsigliata a tutti la lettura o l'esecuzione, lo staff non si assume alcuna responsabilità per quanto riguarda l'uso improprio di tali documenti e/o file, è doveroso aggiungere che ogni riferimento a fatti cose o persone è da considerarsi PURAMENTE casuale. Tutti coloro che potrebbero ritenersi moralmente offesi dai contenuti di queste pagine, sono tenuti ad uscire immediatamente da questo sito.
Vogliamo inoltre ricordare che il Reverse Engineering è uno strumento tecnologico di grande potenza ed importanza, senza di esso non sarebbe possibile creare antivirus, scoprire funzioni malevoli e non dichiarate all'interno di un programma di pubblico utilizzo. Non sarebbe possibile scoprire, in assenza di un sistema sicuro per il controllo dell'integrità, se il "tal" programma è realmente quello che l'utente ha scelto di installare ed eseguire, né sarebbe possibile continuare lo sviluppo di quei programmi (o l'utilizzo di quelle periferiche) ritenuti obsoleti e non più supportati dalle fonti ufficiali.


