Table of Content
Experiment and Programming Environment
To have a uniform experience, we shall elect the following as our experiment environment for this course,
- An x86 Debian Linux system running on an Oracle VM VirtualBox virtual machine
On the x86 Debian Linux system, install required application packages by command,
apt-get install -y nasm qemu qemu-system hexedit
The above command must run as root on the Linux system. In the Debian system, you first switch to root by command
su
If you have installed and set up the sudo package. You can also use
sudo
to run the above apt-get
command to install the required
packages, i.e.,
sudo apt-get install -y nasm qemu qemu-system hexedit
CPU Emulator
QEMU is a CPU emulator. To view the specifications of the x86 system it emulates, we use view one of its manual page:
man qemu-system-i386
BIOS
The Basic Input/Output System (BIOS) contains the Bootstrap code that is the first code the CPU executes when we power on the system. The Bootstrap code is responsible for loading the OS. This begins to load the code in the boot sector typically located at the very first sector of a booting device, such as, a hard disk drive. A system may have multiple boot sectors. We term the first boot sector BIOS loads the Master Boot Sector (Master Boot Record or MBR) that in turns loads and executes code in other boot sectors or the operating system kernel. When you install and set up an operating system, the system installer writes the MBR code to the Master Boot Sector.
We can examine the code in the MBR using the following commands
cat /proc/diskstats
dd if=/dev/sda of=mbr.bin bs=512 count=1
hexedit mbr.bin
ndisasm -b16 -o7c00h mbr.bin > mbr.asm
vi mbr.asm
For more see https://thestarman.pcministry.com/asm/mbr/GRUB.htm.
Experimenting Boot Sector Code
We can write our own boot sector code. Although you can write the code to the disk’s boot sector, we run it in the emulated x86 system using QEMU. The following examples are from Nick Bundell.
Compiling and Running Boot Sector Code
For the programs given here, the procedure to compile and run the code are as follows, provided that the program to run is in file example.asm:
- Compile example.asm. Open a terminal window, run
nasm example.asm -f bin -o example.bin
- Run example.asm on the x86 system emulated by the QEMU emulator. In the
terminal window, run
qemu-system-i386 -drive format=raw,file=example.bin \ -curses \ -monitor telnet:127.0.0.1:54321,server,nowait
- Open the QEMU monitor. Open another terminal window run
telnet 127.0.0.1 54321
To quit the QEMU emulator, issue
quit
command as in(qemu) quit
Example 0 Infinite Loop
Following the steps in the above, we do the following:
- Create boot0.asm. On the Linux system, create the boot0.asm file using either nano, vi, or
other editors if you have installed.
; boot0.asm ; a do-nothing infinite loop loop: jmp loop times 510-($-$$) db 0 dw 0xaa55
- Compile boot0.asm. Open a terminal window, run
nasm boot0.asm -f bin -o boot0.bin
- Run boot0.asm on the x86 system emulated by the QEMU emulator. In the
terminal window, run
qemu-system-i386 -drive format=raw,file=boot0.bin \ -curses \ -monitor telnet:127.0.0.1:54321,server,nowait
- Open the QEMU monitor. Open another terminal window run
telnet 127.0.0.1 54321
To quit the QEMU emulator, issue
quit
command as in(qemu) quit
Example 2 Printing a Message
Similarly, you can run and observe the following code.
; boot1.asm
; print Hello on console
mov ah, 0x0e
mov al, 'H'
int 0x10
mov al, 'e'
int 0x10
mov al, 'l'
int 0x10
mov al, 'l'
int 0x10
mov al, 'o'
int 0x10
jmp $
times 510-($-$$) db 0
dw 0xaa55
Example 3 Printing a Message in a Loop
Now print “Hello” a few times.
; boot2.asm
; print Hello 3 times
mov bx, 3
mov ah, 0x0e
print_hello:
mov al, 'H'
int 0x10
mov al, 'e'
int 0x10
mov al, 'l'
int 0x10
mov al, 'l'
int 0x10
mov al, 'o'
int 0x10
mov al, ' '
int 0x10
dec bx
cmp bx, 0
jne print_hello
jmp $
times 510-($-$$) db 0
dw 0xaa55
Example 4 Printing Message in Memory
mov ah, 0x0e
mov bx, HELLO_MSG
add bx, 0x7c00
LOOP:
mov al, [bx]
cmp al, 0
je DONE_PRINT
int 0x10
inc bx
jmp LOOP
DONE_PRINT:
jmp $
HELLO_MSG :
db 'Hello, World!', 0 ;
times 510-($-$$) db 0
dw 0xaa55
BIOS of x86 systems loads the boot sector at address 0x7c00. However,
the compiler computes the address value of the label HELLO_MSG
as
an offset to the beginning of this code. Using hexedit to view
the machine code file boot3.bin, we can see the offset is
0x16. Since the data labeled is part of the code, the actual
offset should be 0x7c00 + 0x16. Following this idea, we compute the address
of the message data in the code
mov bx, HELLO_MSG
add bx, 0x7c00
In next example we let the compiler to do this computation for us.
Example 5 Printing Message in Memory Revisited
[org 0x7c00]
mov ah, 0x0e
mov bx, HELLO_MSG
LOOP:
mov al, [bx]
cmp al, 0
je DONE_PRINT
int 0x10
inc bx
jmp LOOP
DONE_PRINT:
jmp $
HELLO_MSG :
db 'Hello, World!', 0 ;
times 510-($-$$) db 0
dw 0xaa55
In this example, the statement [org 0x7c00]
informs the compiler that
it should add 0x7c00 to an offset.
Example 6 Printing Message using a Function
We now write a print_msg
function. The interface (or prototype) of this
function is actually,
void print_msg(char *msg);
When we call this function, we pass the argument message via register bx
.
In fact, when you write code in high-level programming languages and the
function you write has only a few parameters, the compiler often passes
the arguments via registers. Likewise, the compiler also often realize
local variables using registers.
[org 0x7c00]
mov ax, HELLO_MSG ; 1st variable
mov cx, WORLD_MSG ; 2nd variable
mov bx, ax ; pass argument via bx
call print_msg ; call the function
mov bx, cx ; pass argument via bx
call print_msg ; call the function
mov bx, ax
call print_msg
mov bx, cx
call print_msg
jmp $
; we implement a function with interface
; void print_msg(char* msg)
print_msg:
pusha ; push all registers to stack
mov ah, 0x0e
LOOP:
mov al, [bx]
cmp al, 0
je DONE_PRINT_MSG
int 0x10
inc bx
jmp LOOP
DONE_PRINT_MSG:
popa ; pop all registers to stack
ret
HELLO_MSG :
db 'Hello!', 0 ;
WORLD_MSG :
db 'World!', 0 ;
times 510-($-$$) db 0
dw 0xaa55
The equivalent C code of the above is like,
char *hello_msg = "Hello!";
char *world_msg = "World!";
print_msg(hello_msg);
print_msg(world_msg);
print_msg(hello_msg);
print_msg(world_msg);
It is important we save the registers in the stack when entering the function,
and restore the registers before returning to the calling code. In the assembly
code, the pusha
statement is to push all registers to the stack, while the
popa
pop them from the stack.
What would happen if we remove these two statements? Perhaps, you should run an experiment and see what happens if you remove or comment out these two statements.