|
This is a tutorial, not a complete scientific description of how the x86 processor works.
This text is intended for those who want to gain a insight into programming real assembly language. Because the x86 processors are so common, most of you should be able to assemble most of the code that you find in this tutorial at your own computer.
This tutorial uses standard Intel syntax, not AT&T syntax[?] in which most Linux assembly programs is written. All the code in this tutorial is intended for ordinary PC computers!
If you want to assemble the code you find in here you will have to download the free netwide assembler (NASM). Download it from this webpage: http://nasm.sourceforge.net/
Read about the hexadecimal numbers for a better understanding of this tutorial.
There are mainly two different modes in wich a x86 processor can work: real mode and protected mode. Because of the need for backward compatibility, the processor always starts in real mode. Allmost all modern operating systems work in protected mode.
This tutorial will start with a brief introduction to realmode assembly and then go on with a larger protected mode section.
There is 8 16-bit processor registers that is commonly used by the avarage application programmer. Each register is specialized for one thing, and operations that deal with that thing are often smaller if the right register is used (smaller code runs faster). Here are the most used registers in realmode:
Data registers AX, the accumulator BX, the base register CX, the counter register DX, the data register
address registers SI, the source register DI, the destination register SP, the stack pointer register BP, the stack base pointer register
All the data registers has 8-bit versions of them. There are two 8-bit registers "inside" each 16-bit register. ZH for the High 8 bit, and ZL for the Low 8 bit, where Z is the first letter in the 16-bit register. Like this: AH is an 8-bit register that contains the same bits as the high 8 bits in AX.
CL is a register that contains the low 8 bit of CX.
AH, AL, BH, BL, CH, CL, DH and DL.
segment register (not part of the 8 general registers) CS, the code segment register DS, the data segment register ES, an extra segment register FS, another extra segment register (not implemented before the 80286) GS, yet another extra segment register (not implemented before the 80286) SS, the stack segment register
other registers (not part of the 8 general registers) IP, the instruction pointer register FLAGS, the flag register
The IP register points to where the processor currently executes the code (i.e. where in the program the processor "is".) The IP register cannot be accessed by the programmer directly.
The FLAGS register contains the current state of the processor. Each bit in this register is called a flag. Each flag can be either 1 or 0, set or not set. Some of the flags that the FLAGS register contains is carry[?], overflow, zero and single step. The flags are often used to control the execution flow of the program. "IF A = B THEN A = C" and the like requires the use of the FLAGS-register.
They are:
aaa, aad, aam, aas, adc, add, and, call, cbw, clc, cld, cli, cmc, cmp, cmpsb, smpsw, cwd, daa, das, dec, div, esc, hlt, idiv, imul, in, inc, int, into, iret, jajae, jb, jbe, jc, jcxz, je, jg, jge, jl, jle, jmp, jna, jnae, jnb, jnbe, jnc, jne, jng, jnge, jnlm, jnle, jno, jnp, jns, jnz, jo, jp, jpe, jpo, js, jz, lahf, lds, lea, les, lock, lodsb, lodsw, loop, loope, loopne, loopnz, loopz, mov, movsb, movsw, mul, neg, nop, not, or, out, pop, popf, push, push, puchf, rcl, rcr, rep, repe, repne, repnz, repz, ret, rol, ror, sahf, sal, sar, sbb, scasb, scasw, shl, shr, stc, std, sti, stosb, stosw, sub, test, wait, xchg, xlat, xor
(copied from IA-32)
You will never use most of these codes. Click on them to read more about them.
This is quite simple, but still much hated by ordinary programmers. It uses two registers to point to one address: one segment register and one offset register. Any general application register (see above) could be used as an offset.
The segment register is shifted 4 bits left and then added to the offset register. The formula looks like this: segment*0x10+offset.
For example, if DS contains the hexadecimal number 0xDEAD and DX contains the number 0xCAFE they would together point to the memory address 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE One quick way to do this without a hexadecimal calculator would be to just add a zero to the hexadecimal number in the segment register and then add the content of the offset register to that number. The above would be 0xDEAD0+0xCAFE, which is quite easy to calculate in the head :-)
Usually, the two registers (the segment- and the offset-register) are written like this to denote that they are together pointing to some memory address: segment-register:offset-register. For example: DS:DX, CS:IP, SS:SP, DS:SI and ES:DI.
There are some special combinations of segment registers and general registers that point to interesting things:
CS:IP points to the address where the processor is currently executing its code. SS:SP points to the location of the last item pushed onto the stack. DS:SI is often used to point to data that is about to be copied to ES:DI
0-3FF IVT (Interrupt Vector Table) 400-FFF BDA (BIOS Data Area) 1000-9FFFF odinary application RAM <-- the place where our programs will be, probably :) A0000-BFFFF Video memory C0000-EFFFF BIOS ROM F0000-FFFFF ROM (same as BIOS ROM?)
That means that we have 640kB of application RAM..
Everything above 0xFFFFF is called the "high memory area".
An interrupt is what it sounds like. There are two kinds of interrupts, software- and hardware-interrupts. A typical software interrupt is (in realmode) interrupt 0x21 (the ISR that handles this interrupt gets the function number and all the parameters from the program and then it executes the selected DOS-function) and int3 (breakpoint, often used to enter some sort of software-debugger). A typical hardware interrupt would be when some external circuit decides that it need attention from the CPU, like when the system clock ticks, it triggers interrupt 0x01.
At the very beginning of the memory lies the Interrupt Vector Table (IVT). The IVT contains pointers to all the Interrupt Service Routines (ISR's).
The pointers to the diffrent ISR's wired to the interrupts are saved in this format:
[offset_0][segment_0][offset_1][segment_1][... ...][offset_255][segment_255] (each intiger (that is: the offset or segment-pointers) is 16 bits wide)
There are 256 diffrent interrupts, each with its own pointer.
[ORG 0x100] [BITS 16] jmp installISR ;*********************** ISRstart: cli ; dissable interrupts push ax ; save all the registers so that noone notice push ds ; our presence. push di mov ax, 0xb800 ; point to text-video-memory mov ds, ax xor di, di loop1: mov al,[ds:di] ; check one letter cmp al, "t" ; was it "t"? jne search ; if not, search for it... mov al,[ds:di+2] ; check next letter cmp al, "h" ; was it "h"? jne search ; if not, search for it... mov al,[ds:di+4] ; check the last letter cmp al, "e" ; if its an "e"? jne search ; if not, search for it... ; if we are here, we have found a "the"
; replace "the" with "=O)" mov al, "=" mov [ds:di], al ; replace "t" with "=" mov al, "O" mov [ds:di+2], al ; replace "h" with "o" mov al, ")" mov [ds:di+4], al ; and finaly replace "e" with ")"
jmp loop1 ; is there any more "the"? end: pop di ; return all registers pop ds ; and noone will notice our presence! ;) pop ax sti ; and reenable the interrupts iret ; and return to whatever misc. activity the ; computer was doing.. search: cmp di, 0x1f40 ; did we search all the letters? jae end ; yes: stop searching! add di, 2 ; no: select the next letter and return jmp loop1 ; to the find-more-"the"-stings-procedure ISRend: ISRlen EQU ISRend-ISRstart ;*********************** installISR: xor ax, ax mov es, ax ; es = 0 mov di, 0x70 ; 0:0070 (INT 0x1C offset) mov ax, ISRstart ; get our start-IP mov [es:di], ax ; write it to the IVT interrupt pointer for 0x1C mov di, 0x72 ; 0:0072 (INT 0x1C segment) mov ax, cs ; get our code segment mov [es:di], ax ; write it to the IVT interrupt segment for 0x1C mov ax,0x3100 ; select DOS-function "TSR_Install", errorcode 0. mov dx, ds ; when the program starts, DS = PSP int 0x21 ; make DOS reserve the piece of code and exit back to the shell. ; here, the computer will continue to work as if nothing happened.. ; but trying to write "the" anywhere on the screen =O) ...it doesn't work. ;***********************
Assemble with "nasm filename -o filename.com". (see NASM)
Is the mode in wich most modern operating systems run their code. When the PC-computer boots it first enters real mode, the operating system is responsible for switching into protected mode.
There are some new registers in protected mode:
All the general application registers (AX, BX, CX, DX, SI, DI, SP and BP) is extended to a total of 32 bits. To denote that you intend the 32-bit register instead of the low 16-bit part, add an E before the name of the register..
EAX, EBX, ECX, EDX, ESI, EDI, ESP and EBP
The segment registers remain 16 bits wide and dosnt change their names.
CS, DS, ES, FS, GS and SS
The segment registers does not work like they did in real mode. Instead, they are used to point out an selector in a table pointed to by the GDTR- or LDTR-register.
The FLAGS- and IP-register is also extended to 32 bits.
EIP, EFLAGS
There are some totaly new registers also. They are useful only for system programmers.
CR0, CR1, CR2, CR3, TR4, TR5, TR6, TR7 GDTR, LDTR, IDTR, TSS..
They are used by the operating system and could not be accessed by odinary user programs.
There are also some other registers, like the debug registers, MMX, XMM and some more.
How to switch to protected mode:
* load GDTR with the pointer to the GDT-table. * optional: load LDTR with the pointer to the LDT-table. * load IDTR with the 5-byte wide pointer to the IDT OR dissable interrupts. * set the PE-bit in the CR0-register. * make a far jump to the 32-bit code.
Search Encyclopedia
|
Featured Article
|