Understanding System Bootstrap Process (1 of 3)

Table of Content

Experiment and Programming Environment
CPU Emulator
BIOS
Experimenting Boot Sector Code

Experiment and Programming Environment

To have a uniform experience, we shall elect the following as our experiment environment for this course,

An x86 Debian Linux system running on an Oracle VM VirtualBox virtual machine

On the x86 Debian Linux system, install required application packages by command,

apt-get install -y nasm qemu qemu-system hexedit

The above command must run as root on the Linux system. In the Debian system, you first switch to root by command

su

If you have installed and set up the sudo package. You can also use sudo to run the above apt-get command to install the required packages, i.e.,

sudo apt-get install -y nasm qemu qemu-system hexedit

CPU Emulator

QEMU is a CPU emulator. To view the specifications of the x86 system it emulates, we use view one of its manual page:

man qemu-system-i386

BIOS

The Basic Input/Output System (BIOS) contains the Bootstrap code that is the first code the CPU executes when we power on the system. The Bootstrap code is responsible for loading the OS. This begins to load the code in the boot sector typically located at the very first sector of a booting device, such as, a hard disk drive. A system may have multiple boot sectors. We term the first boot sector BIOS loads the Master Boot Sector (Master Boot Record or MBR) that in turns loads and executes code in other boot sectors or the operating system kernel. When you install and set up an operating system, the system installer writes the MBR code to the Master Boot Sector.

We can examine the code in the MBR using the following commands

cat /proc/diskstats
dd if=/dev/sda of=mbr.bin bs=512 count=1 
hexedit mbr.bin
ndisasm -b16 -o7c00h mbr.bin > mbr.asm
vi mbr.asm

For more see https://thestarman.pcministry.com/asm/mbr/GRUB.htm.

Experimenting Boot Sector Code

We can write our own boot sector code. Although you can write the code to the disk’s boot sector, we run it in the emulated x86 system using QEMU. The following examples are from Nick Bundell.

Compiling and Running Boot Sector Code

For the programs given here, the procedure to compile and run the code are as follows, provided that the program to run is in file example.asm:

Compile example.asm. Open a terminal window, run
```
nasm example.asm -f bin -o example.bin
```

Run example.asm on the x86 system emulated by the QEMU emulator. In the terminal window, run

qemu-system-i386 -drive format=raw,file=example.bin \
 -curses \
 -monitor telnet:127.0.0.1:54321,server,nowait

Open the QEMU monitor. Open another terminal window run
```
telnet 127.0.0.1 54321 
```
To quit the QEMU emulator, issue quit command as in
```
(qemu) quit
```

Example 0 Infinite Loop

Following the steps in the above, we do the following:

Create boot0.asm. On the Linux system, create the boot0.asm file using either nano, vi, or other editors if you have installed.
```
; boot0.asm
; a do-nothing infinite loop
loop:
 jmp  loop
times 510-($-$$) db 0
dw 0xaa55
```
Compile boot0.asm. Open a terminal window, run
```
nasm boot0.asm -f bin -o boot0.bin
```

Run boot0.asm on the x86 system emulated by the QEMU emulator. In the terminal window, run

qemu-system-i386 -drive format=raw,file=boot0.bin \
 -curses \
 -monitor telnet:127.0.0.1:54321,server,nowait

Open the QEMU monitor. Open another terminal window run
```
telnet 127.0.0.1 54321 
```
To quit the QEMU emulator, issue quit command as in
```
(qemu) quit
```

Example 2 Printing a Message

Similarly, you can run and observe the following code.

; boot1.asm
; print Hello on console
mov ah, 0x0e
mov al, 'H'
int 0x10
mov al, 'e'
int 0x10
mov al, 'l'
int 0x10
mov al, 'l'
int 0x10
mov al, 'o'
int 0x10
jmp $

times 510-($-$$) db 0
dw 0xaa55

Example 3 Printing a Message in a Loop

Now print “Hello” a few times.

; boot2.asm
; print Hello 3 times
mov bx, 3
mov ah, 0x0e

print_hello:
    mov al, 'H'
    int 0x10
    mov al, 'e'
    int 0x10
    mov al, 'l'
    int 0x10
    mov al, 'l'
    int 0x10
    mov al, 'o'
    int 0x10
    mov al, ' '
    int 0x10
    dec bx
    cmp bx, 0
    jne print_hello

jmp $

times 510-($-$$) db 0
dw 0xaa55

Example 4 Printing Message in Memory

mov ah, 0x0e
mov bx, HELLO_MSG
add bx, 0x7c00

LOOP:
    mov al, [bx]
    cmp al, 0
    je DONE_PRINT
    int 0x10
    inc bx
    jmp LOOP

DONE_PRINT:
    jmp $

HELLO_MSG :
    db 'Hello, World!', 0 ;

times 510-($-$$) db 0
dw 0xaa55

BIOS of x86 systems loads the boot sector at address 0x7c00. However, the compiler computes the address value of the label HELLO_MSG as an offset to the beginning of this code. Using hexedit to view the machine code file boot3.bin, we can see the offset is 0x16. Since the data labeled is part of the code, the actual offset should be 0x7c00 + 0x16. Following this idea, we compute the address of the message data in the code

mov bx, HELLO_MSG
add bx, 0x7c00

In next example we let the compiler to do this computation for us.

Example 5 Printing Message in Memory Revisited

[org 0x7c00]

mov ah, 0x0e
mov bx, HELLO_MSG

LOOP:
    mov al, [bx]
    cmp al, 0
    je DONE_PRINT
    int 0x10
    inc bx
    jmp LOOP

DONE_PRINT:
    jmp $

HELLO_MSG :
    db 'Hello, World!', 0 ;

times 510-($-$$) db 0
dw 0xaa55

In this example, the statement [org 0x7c00] informs the compiler that it should add 0x7c00 to an offset.

Example 6 Printing Message using a Function

We now write a print_msg function. The interface (or prototype) of this function is actually,

void print_msg(char *msg);

When we call this function, we pass the argument message via register bx. In fact, when you write code in high-level programming languages and the function you write has only a few parameters, the compiler often passes the arguments via registers. Likewise, the compiler also often realize local variables using registers.

[org 0x7c00]
mov ax, HELLO_MSG ; 1st variable
mov cx, WORLD_MSG ; 2nd variable

mov bx, ax        ; pass argument via bx
call print_msg    ; call the function
mov bx, cx        ; pass argument via bx
call print_msg    ; call the function

mov bx, ax
call print_msg
mov bx, cx
call print_msg

jmp $

; we implement a function with interface
;     void print_msg(char* msg)
print_msg:
    pusha           ; push all registers to stack
    mov ah, 0x0e
LOOP:
    mov al, [bx]
    cmp al, 0
    je DONE_PRINT_MSG
    int 0x10
    inc bx
    jmp LOOP
DONE_PRINT_MSG:
    popa            ; pop all registers to stack
    ret

HELLO_MSG :
    db 'Hello!', 0 ;

WORLD_MSG :
    db 'World!', 0 ;

times 510-($-$$) db 0
dw 0xaa55

The equivalent C code of the above is like,

char *hello_msg = "Hello!";
char *world_msg = "World!";

print_msg(hello_msg);
print_msg(world_msg);

print_msg(hello_msg);
print_msg(world_msg);

It is important we save the registers in the stack when entering the function, and restore the registers before returning to the calling code. In the assembly code, the pusha statement is to push all registers to the stack, while the popa pop them from the stack.

What would happen if we remove these two statements? Perhaps, you should run an experiment and see what happens if you remove or comment out these two statements.