X64 Assembly

Links And References

AMD64 documentation

This article is extracted from “Moving to Windows x64” by Daniel Pistelli (Ntoskrnl)


Now I’ll try to explain the basics of x64 assembly. I assume the reader is already familiar with x86 assembly, otherwise he won’t be able to make heads or tails of this paragraph.
Moreover, since this is just a very (but very) brief guide, you’ll have to look into the AMD64 documentation for more advanced stuff. Some stuff I won’t even mention, you’ll see by yourself that some instructions are no longer in use: for instance, that the lea instruction has completely taken place of the mov offset.

What you’re going to notice at once is that there are some more registers in the x64 syntax:

  • 8 new general-purpose registers (GPRs).
  • 8 new 128-bit XMM registers.

Of course, all general-purpose registers are 64 bits wide. The old ones we already knew are easy to recognize in their 64-bit form: rax, rbx, rcx, rdx, rsi, rdi, rbp, rsp (and rip if we want to count the instruction pointer). These old registers can still be accessed in their smaller bit ranges, for instance: rax, eax, ax, ah, al.
The new registers go from r8 to r15, and can be accessed in their various bit ranges like this: r8 (qword), r8d (dword), r8w (word), r8b (low byte).

Here’s a figure taken from the AMD docs:



Applications can still use segments registers as base for addressing, but the 64-bit mode only recognizes three of the old ones (and only two can be used for base address calculations). Here’s another figure:

And now, the most important things. Calling convention and stack. x64 assembly uses FASTCALLs as calling convention, meaning it uses registers to pass the first 4 parameters (and then the stack).
Thus, the stack frame is made of: the stack parameters, the registers parameters, the return address (which I remind you is a qword) and the local variables.
The first parameter is the rcx register, the second one rdx, the third r8 and the fourth r9. Saying that the parameters registers are part of the stack frame, makes it also clear that any function that calls another child function has to initialize the stack providing space for these four registers, even if the parameters passed to the child function are less than four.
The initialization of the stack pointer is done only in the prologue of a function, it has to be large enough to hold all the arguments passed to child functions and it’s always a duty of the caller to clean the stack. Now, the most important thing to understand how the space is provided in the stack frame is that the stack has to be 16-byte aligned.
In fact, the return address has to be aligned to 16 bytes. So, the stack space will always be something like 16n + 8, where n depends on the number of parameters. Here’s a small figure of a stack frame:


Don’t worry if you haven’t completely figured out how it works: now we will see a few code samples, which, in my opinion, always make the theory a lot easier to understand. Let us take for instance a hello-world application like:

int WINAPI _tWinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR szCmdLine, int iCmdShow)
    MessageBox(NULL, _T("Hello World!"), _T("My First x64 Application"), 0);
    return 0;

This code disassembled would look like:

.text:0000000000401220 sub_401220 proc near          ; CODE XREF: start+10E p
.text:0000000000401220 arg_0= qword ptr 8
.text:0000000000401220 arg_8= qword ptr 10h
.text:0000000000401220 arg_10= qword ptr 18h
.text:0000000000401220 arg_18= dword ptr 20h
.text:0000000000401220    mov [rsp+arg_18], r9d
.text:0000000000401225    mov [rsp+arg_10], r8
.text:000000000040122A    mov [rsp+arg_8], rdx
.text:000000000040122F    mov [rsp+arg_0], rcx
.text:0000000000401234    sub rsp, 28h
.text:0000000000401238    xor r9d, r9d               ; uType
.text:000000000040123B    lea r8, Caption            ; "My First x64 Application"
.text:0000000000401242    lea rdx, Text              ; "Hello World!"
.text:0000000000401249    xor ecx, ecx               ; hWnd
.text:000000000040124B    call cs:MessageBoxA
.text:0000000000401251    xor eax, eax
.text:0000000000401253    add rsp, 28h
.text:0000000000401257    retn
.text:0000000000401257 sub_401220 endp

As said, the child function takes 7 parameters, making it necessary to provide space for 3 extra parameters on the stack. So, 7 * 8 = 0x38, which aligned to 16byte is 0x40. Providing, then, space for the return address makes it 0x48, our value indeed.
I think you have understood the stack-frames logic by now, it’s actually quite easy to understand it, but it needs a second to revert from the old x86/stdcall logic to this one. But now enough of this, now that we’ve seen how the x64 code works, we’ll try compiling an assembly source by ourselves.

Before we start, I have to make something clear. There are some assemblers over the internet which make the job easier, mainly because the initialize the stack by themselves or they create code that is easy to converto from/to x86.
But I think that is not the point here in this article. In fact, I’m going to use the microsoft assembler (ml64.exe), which requires you to write everything down, just like in the disassembly. Another option could be compiling the with another assembler and then link it with ml64.
I think the reader should really make these decisions on his own. As far as I am concerned, I don’t believe that much code should be written in assembly and avoided whenever it could be done. This new x64 technology is a good opportunity to re-think about these matters.
In the last years I always wrote 64-bit compatible code in C/C++ (I mean unmanaged, of course) and when I had to recompile a project of 70,000 lines of code for x64, I didn’t had to change one single line of code (I’ll talk about the C/C++ programming later). Despite of all the macros an assembler offers, I seriously doubt that people who wrote their whole code in assembly will be able to switch so easily to x64 (remember one day even the IA64 syntax could be adopted). I think in most cases the obvious choice will be not converting to the new technology and stick to x86, but this isn’t always possible, it depends on the software category.

The microsoft assembler is contained in the SDK and in the DDK (WDK for Vista). Right now, I’m using Vista’s WDK, which I freely downloaded from the msdn. The first sample of code I’m going to show you is a simple Hello-World messagebox application.

extrn MessageBoxA : proc
extrn ExitProcess : proc

body db 'Hello World!', 0
capt db 'My First x64 Application', 0

Main proc
sub rsp, 28h
xor r9d, r9d        ; uType = 0
lea r8, capt        ; lpCaption
lea rdx, body       ; lpText
xor rcx, rcx        ; hWnd = NULL
call MessageBoxA
xor ecx, ecx        ; exit code = 0
call ExitProcess
Main endp


As you can see, I didn’t bother unwinding the stack, since I call ExitProcess. The syntax is very similar to the old MASM one, although there are a few dissimalirites. The ml64 console output should be something like this:

The command line to compile is:

ml64 C:\...\test.asm /link /subsystem:windows

If the libs are not in the same directory as ml64.exe, you’ll have to provide the path like I did. The entry has to be provided, otherwise you would have to use WinMainCRTStartup as main entry.

The next sample of code I’m going to show you displays a window calling CreateWindowEx. What you’re going to learn through this code is structure alignment and how integrating resources in your projects.
Like I said earlier, I don’t want to encourage you to write your windows in assembly, but I believe that this sort of code is good for learning. Now the code, afterwards the explanation.

extrn GetModuleHandleA : proc
extrn MessageBoxA : proc
extrn RegisterClassExA : proc
extrn CreateWindowExA : proc
extrn DefWindowProcA : proc
extrn ShowWindow : proc
extrn GetMessageA : proc
extrn TranslateMessage : proc
extrn DispatchMessageA : proc
extrn PostQuitMessage : proc
extrn DestroyWindow : proc
extrn ExitProcess : proc

  cbSize            dd      ?
  style             dd      ?
  lpfnWndProc       dq      ?
  cbClsExtra        dd      ?
  cbWndExtra        dd      ?
  hInstance         dq      ?
  hIcon             dq      ?
  hCursor           dq      ?
  hbrBackground     dq      ?
  lpszMenuName      dq      ?
  lpszClassName     dq      ?
  hIconSm           dq      ?

POINT struct
  x                 dd      ?
  y                 dd      ?
POINT ends

MSG struct    
  hwnd              dq      ?
  message           dd      ?
  padding1          dd      ?      ; padding
  wParam            dq      ?
  lParam            dq      ?
  time              dd      ?
  pt                POINT   <>
  padding2          dd      ?      ; padding
MSG ends

NULL equ 0
CW_USEDEFAULT equ 80000000h
SW_SHOW equ 5
WM_COMMAND equ 111h
IDC_MENU equ 109
IDM_ABOUT equ 104
IDM_EXIT equ 105

szWindowClass db 'FirstApp', 0
szTitle db 'My First x64 Windows', 0
szHelpTitle db 'Help', 0
szHelpText db 'This will be a big help...', 0

hInstance qword ?
hWnd qword ?
wndclass WNDCLASSEX <>
wmsg MSG <>


WndProc: //; proc hWnd : qword, uMsg : dword, wParam : qword, lParam : qword
  mov [rsp+8], rcx       // ; hWnd (save parameters as locals)
  mov [rsp+10h], edx     // ; Msg
  mov [rsp+18h], r8      // ; wParam
  mov [rsp+20h], r9      // ; lParam
  sub rsp, 38h
  cmp edx, WM_DESTROY
  jnz @next1

  xor ecx, ecx          //; exit code
  call PostQuitMessage
  xor rax, rax

  cmp edx, WM_COMMAND
  jnz @default

  mov rbx, rsp
  add rbx, 38h
  mov r10, [rbx+18h]    // ; wParam
  cmp r10w, IDM_ABOUT
  jz @about
  cmp r10w, IDM_EXIT
  jz @exit
  jmp @default

  xor r9d, r9d
  lea r8, szHelpTitle
  lea rdx, szHelpText
  xor ecx, ecx
  call MessageBoxA
  jmp @default

  mov rbx, rsp
  add rbx, 38h
  mov rcx, [rbx+8h]      // ; hWnd
  call DestroyWindow

  mov rbx, rsp
  add rbx, 38h
  mov r9, [rbx+20h]      // ; lParam
  mov r8, [rbx+18h]      // ; wParam
  mov edx, [rbx+10h]     // ; Msg
  mov rcx, [rbx+8]       // ; hWnd
  call DefWindowProcA
  add rsp, 38h

MyRegisterClass:  //; proc hInst : qword
  sub rsp, 28h
  mov wndclass.cbSize, sizeof WNDCLASSEX
  mov eax, CS_VREDRAW
  or eax, CS_HREDRAW
  mov wndclass.style, eax
  lea rax, WndProc
  mov wndclass.lpfnWndProc, rax
  mov wndclass.cbClsExtra, 0
  mov wndclass.cbWndExtra, 0
  mov wndclass.hInstance, rcx
  mov wndclass.hIcon, NULL
  mov wndclass.hCursor, NULL
  mov wndclass.hbrBackground, COLOR_WINDOW
  mov wndclass.lpszMenuName, IDC_MENU
  lea rax, szWindowClass
  mov wndclass.lpszClassName, rax
  mov wndclass.hIconSm, NULL
  lea rcx, wndclass
  call RegisterClassExA
  add rsp, 28h

InitInstance: //; proc hInst : qword
  sub rsp, 78h        
  mov rax, CW_USEDEFAULT
  xor rbx, rbx
  mov [rsp+58h], rbx           // ; lpParam
  mov [rsp+50h], rcx           // ; hInstance
  mov [rsp+48h], rbx           // ; hMenu = NULL
  mov [rsp+40h], rbx           // ; hWndParent = NULL
  mov [rsp+38h], rbx           // ; Height
  mov [rsp+30h], rax           // ; Width
  mov [rsp+28h], rbx            //; Y
  mov [rsp+20h], rax           // ; X
  mov r9d, WS_OVERLAPPEDWINDOW  //; dwStyle
  lea r8, szTitle               //; lpWindowName
  lea rdx, szWindowClass        //; lpClassName
  xor ecx, ecx                 // ; dwExStyle
  call CreateWindowExA
  mov hWnd, rax
  mov edx, SW_SHOW
  mov rcx, hWnd
  call ShowWindow
  mov rax, hWnd                 //; set return value
  add rsp,78h

Main proc
  sub rsp, 28h
  xor rcx, rcx    
  call GetModuleHandleA
  mov hInstance, rax
  mov rcx, rax
  call MyRegisterClass
  test rax, rax
  jz @close              //; if the RegisterClassEx fails, exit

  mov rcx, hInstance
  call InitInstance
  test rax, rax
  jz @close              //; if the InitInstance fails, exit

@handlemsgs:             //; message processing routine
  xor r9d, r9d          
  xor r8d, r8d
  xor edx, edx
  lea rcx, wmsg
  call GetMessageA
  test eax, eax
  jz @close
  lea rcx, wmsg
  call TranslateMessage
  lea rcx, wmsg
  call DispatchMessageA
  jmp @handlemsgs

  xor ecx, ecx  
  call ExitProcess
Main endp


As you can see, I tried to stay as low level as I could. The reason why I avoided for other functions other than the main the proc macro is that the ml64 puts a prologue end an epilogue, which I didn’t want, by itself.
Avoiding the macro made it possible to define my own stack frame without any intermission by the compiler. The first thing to notice scrolling this code is the structure:

MSG struct    
  hwnd              dq      ?
  message           dd      ?
  padding1          dd      ?      //; padding
  wParam            dq      ?
  lParam            dq      ?
  time              dd      ?
  pt                POINT   <>
  padding2          dd      ?     // ; padding
MSG ends

It requires two paddings which the x86 declaration of the same structure didn’t. The reason, in a few words, is that qword members should be aligned to qword boundaries (this for the first padding).
The additional padding at the end of the structure follows the rule that: every structure should be aligned to its largest member. So, being its largest member a qword, the structure should be aligned to an 8-byte boundary.

To compile this sample, the command line is:

ml64 c:\myapp\test.asm /link /subsystem:windows
/defaultlib:C:\WinDDK\6000\lib\wnet\amd64\user32.lib /entry:Main

test.res is a file I took from a VC++ wizard project, I was too lazy to make on by myself. Anyway, making a resource file is very easy with the VC++, but no one forbids you to use the notepad, it just takes more time.
To compile the resource file all you need to do is to use the command line: “rc test.rc”.

I think the rest of the code is pretty easy to understand. I didn’t cover everything with this paragraph, but now you should have quite a good insight into x64 assembly.

Daniel Pistelli