Encyclopedia > X86-assembly

Article Content

X86-assembly

Table of contents

1 x86 PC assembly tutorial

2 Basic information

3 Assembly in real mode

3.1 The mnemonics used in realmode x86-assembly
3.2 The realmode addressing model
3.3 the PC memory layout
3.4 Interrupts in realmode
3.5 Example code

4 Protected Mode

x86 PC assembly tutorial

This is a tutorial, not a complete scientific description of how the x86 processor works.

This text is intended for those who want to gain a insight into programming real assembly language. Because the x86 processors are so common, most of you should be able to assemble most of the code that you find in this tutorial at your own computer.

This tutorial uses standard Intel syntax, not AT&T syntax[?] in which most Linux assembly programs is written. All the code in this tutorial is intended for ordinary PC computers!

If you want to assemble the code you find in here you will have to download the free netwide assembler (NASM). Download it from this webpage: http://nasm.sourceforge.net/

Read about the hexadecimal numbers for a better understanding of this tutorial.

Basic information

There are mainly two different modes in wich a x86 processor can work: real mode and protected mode. Because of the need for backward compatibility, the processor always starts in real mode. Allmost all modern operating systems work in protected mode.

This tutorial will start with a brief introduction to realmode assembly and then go on with a larger protected mode section.

Assembly in real mode

There is 8 16-bit processor registers that is commonly used by the avarage application programmer. Each register is specialized for one thing, and operations that deal with that thing are often smaller if the right register is used (smaller code runs faster). Here are the most used registers in realmode:

 Data registers
 AX, the accumulator
 BX, the base register
 CX, the counter register
 DX, the data register

 address registers
 SI, the source register
 DI, the destination register
 SP, the stack pointer register
 BP, the stack base pointer register

All the data registers has 8-bit versions of them. There are two 8-bit registers "inside" each 16-bit register. ZH for the High 8 bit, and ZL for the Low 8 bit, where Z is the first letter in the 16-bit register. Like this: AH is an 8-bit register that contains the same bits as the high 8 bits in AX.
CL is a register that contains the low 8 bit of CX.

 AH, AL, BH, BL, CH, CL, DH and DL.

 segment register (not part of the 8 general registers)
 CS, the code segment register
 DS, the data segment register
 ES, an extra segment register
 FS, another extra segment register (not implemented before the 80286)
 GS, yet another extra segment register (not implemented before the 80286)
 SS, the stack segment register

 other registers (not part of the 8 general registers)
 IP, the instruction pointer register
 FLAGS, the flag register

The IP register points to where the processor currently executes the code (i.e. where in the program the processor "is".) The IP register cannot be accessed by the programmer directly.

The FLAGS register contains the current state of the processor. Each bit in this register is called a flag. Each flag can be either 1 or 0, set or not set. Some of the flags that the FLAGS register contains is carry[?], overflow, zero and single step. The flags are often used to control the execution flow of the program. "IF A = B THEN A = C" and the like requires the use of the FLAGS-register.

The mnemonics used in realmode x86-assembly

They are:

aaa, aad, aam, aas, adc, add, and, call, cbw, clc, cld, cli, cmc, cmp, cmpsb, smpsw, cwd, daa, das, dec, div, esc, hlt, idiv, imul, in, inc, int, into, iret, jajae, jb, jbe, jc, jcxz, je, jg, jge, jl, jle, jmp, jna, jnae, jnb, jnbe, jnc, jne, jng, jnge, jnlm, jnle, jno, jnp, jns, jnz, jo, jp, jpe, jpo, js, jz, lahf, lds, lea, les, lock, lodsb, lodsw, loop, loope, loopne, loopnz, loopz, mov, movsb, movsw, mul, neg, nop, not, or, out, pop, popf, push, push, puchf, rcl, rcr, rep, repe, repne, repnz, repz, ret, rol, ror, sahf, sal, sar, sbb, scasb, scasw, shl, shr, stc, std, sti, stosb, stosw, sub, test, wait, xchg, xlat, xor

(copied from IA-32)
You will never use most of these codes. Click on them to read more about them.

The realmode addressing model

This is quite simple, but still much hated by ordinary programmers. It uses two registers to point to one address: one segment register and one offset register. Any general application register (see above) could be used as an offset.

The segment register is shifted 4 bits left and then added to the offset register. The formula looks like this: segment*0x10+offset.

For example, if DS contains the hexadecimal number 0xDEAD and DX contains the number 0xCAFE they would together point to the memory address 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE One quick way to do this without a hexadecimal calculator would be to just add a zero to the hexadecimal number in the segment register and then add the content of the offset register to that number. The above would be 0xDEAD0+0xCAFE, which is quite easy to calculate in the head :-)

Usually, the two registers (the segment- and the offset-register) are written like this to denote that they are together pointing to some memory address: segment-register:offset-register. For example: DS:DX, CS:IP, SS:SP, DS:SI and ES:DI.

There are some special combinations of segment registers and general registers that point to interesting things:

 CS:IP points to the address where the processor is currently executing its code.
 SS:SP points to the location of the last item pushed onto the stack.
 DS:SI is often used to point to data that is about to be copied to ES:DI

the PC memory layout

   0-3FF        IVT (Interrupt Vector Table)
   400-FFF      BDA (BIOS Data Area)
   1000-9FFFF   odinary application RAM <-- the place where our programs will be, probably :)
   A0000-BFFFF  Video memory
   C0000-EFFFF  BIOS ROM
   F0000-FFFFF  ROM      (same as BIOS ROM?)

That means that we have 640kB of application RAM..

Everything above 0xFFFFF is called the "high memory area".

Interrupts in realmode

An interrupt is what it sounds like. There are two kinds of interrupts, software- and hardware-interrupts. A typical software interrupt is (in realmode) interrupt 0x21 (the ISR that handles this interrupt gets the function number and all the parameters from the program and then it executes the selected DOS-function) and int3 (breakpoint, often used to enter some sort of software-debugger). A typical hardware interrupt would be when some external circuit decides that it need attention from the CPU, like when the system clock ticks, it triggers interrupt 0x01.

At the very beginning of the memory lies the Interrupt Vector Table (IVT). The IVT contains pointers to all the Interrupt Service Routines (ISR's).

The pointers to the diffrent ISR's wired to the interrupts are saved in this format:

 [offset_0][segment_0][offset_1][segment_1][... ...][offset_255][segment_255]
 (each intiger (that is: the offset or segment-pointers) is 16 bits wide)

There are 256 diffrent interrupts, each with its own pointer.

Example code

 [ORG 0x100]
 [BITS 16]
    jmp installISR
 ;***********************
 ISRstart:
    cli               ; dissable interrupts
    push ax           ; save all the registers so that noone notice
    push ds           ; our presence.
    push di
    mov ax, 0xb800    ; point to text-video-memory
    mov ds, ax
    xor di, di
 loop1:
    mov al,[ds:di]    ; check one letter
    cmp al, "t"       ; was it "t"?
    jne search        ; if not, search for it...
    mov al,[ds:di+2]  ; check next letter
    cmp al, "h"       ; was it "h"?
    jne search        ; if not, search for it...
    mov al,[ds:di+4]  ; check the last letter
    cmp al, "e"       ; if its an "e"?
    jne search        ; if not, search for it...
                      ; if we are here, we have found a "the"

    ; replace "the" with "=O)"
    mov al, "="
    mov [ds:di], al   ; replace "t" with "="
    mov al, "O"
    mov [ds:di+2], al ; replace "h" with "o"
    mov al, ")"
    mov [ds:di+4], al ; and finaly replace "e" with ")"

   
    jmp loop1         ; is there any more "the"?
 end:
    pop di            ; return all registers
    pop ds            ; and noone will notice our presence! ;)
    pop ax
    sti               ; and reenable the interrupts
    iret              ; and return to whatever misc. activity the
                      ; computer was doing..
 search:
    cmp di, 0x1f40    ; did we search all the letters?
    jae end           ; yes: stop searching!
    add di, 2         ; no: select the next letter and return
    jmp loop1         ; to the find-more-"the"-stings-procedure
 ISRend:
 ISRlen EQU ISRend-ISRstart
 ;***********************
 installISR:
    xor ax, ax
    mov es, ax        ; es = 0
    mov di, 0x70      ; 0:0070 (INT 0x1C offset)
    mov ax, ISRstart  ; get our start-IP
    mov [es:di], ax   ; write it to the IVT interrupt pointer for 0x1C
    mov di, 0x72      ; 0:0072 (INT 0x1C segment)
    mov ax, cs        ; get our code segment
    mov [es:di], ax   ; write it to the IVT interrupt segment for 0x1C
    mov ax,0x3100     ; select DOS-function "TSR_Install", errorcode 0.
    mov dx, ds        ; when the program starts, DS = PSP
    int 0x21          ; make DOS reserve the piece of code and exit back to the shell.
 ; here, the computer will continue to work as if nothing happened.. 
 ; but trying to write "the" anywhere on the screen =O) ...it doesn't work.
 ;***********************

Assemble with "nasm filename -o filename.com". (see NASM)

Protected Mode

Is the mode in wich most modern operating systems run their code. When the PC-computer boots it first enters real mode, the operating system is responsible for switching into protected mode.

There are some new registers in protected mode:

All the general application registers (AX, BX, CX, DX, SI, DI, SP and BP) is extended to a total of 32 bits. To denote that you intend the 32-bit register instead of the low 16-bit part, add an E before the name of the register..

 EAX, EBX, ECX, EDX, ESI, EDI, ESP and EBP

The segment registers remain 16 bits wide and dosnt change their names.

 CS, DS, ES, FS, GS and SS

The segment registers does not work like they did in real mode. Instead, they are used to point out an selector in a table pointed to by the GDTR- or LDTR-register.

The FLAGS- and IP-register is also extended to 32 bits.

 EIP, EFLAGS

There are some totaly new registers also. They are useful only for system programmers.

 CR0, CR1, CR2, CR3, TR4, TR5, TR6, TR7
 GDTR, LDTR, IDTR, TSS..

They are used by the operating system and could not be accessed by odinary user programs.

There are also some other registers, like the debug registers, MMX, XMM and some more.

How to switch to protected mode:

 * load GDTR with the pointer to the GDT-table.
 * optional: load LDTR with the pointer to the LDT-table.
 * load IDTR with the 5-byte wide pointer to the IDT OR dissable interrupts.
 * set the PE-bit in the CR0-register.
 * make a far jump to the 32-bit code.

All Wikipedia text is available under the terms of the GNU Free Documentation License

Search Encyclopedia

Search over one million articles, find something about almost anything!