Skip to content

Guest Page Table

The next step is to map the guest memory space to run some code in the guest.

Guest-physical vs. Host-physical addresses

With hardware-assisted virtualization enabled, RISC-V CPU (and other modern CPUs) has 4 address spaces:

  • Guest-virtual address (new)
  • Guest-physical address (new)
  • Host-virtual address
  • Host-physical address (so-called physical memory space)

In the host mode, the memory addresses will be translated from host-virtual to host-physical addresses using the page table in the host.

However, when the CPU is in the guest mode, the address translation will be done in 2 stages:

  • Stage 1: Guest-virtual → guest-physical (page table in satp)
  • Stage 2: Guest-physical → host-physical (guest page table in hgatp)

This isolation is what allows multiple VMs to run simultaneously - each guest thinks it has exclusive access to physical memory, but the hypervisor secretly maps them to different host memory regions.

Page allocator

Page allocator is a memory allocator that allocates memory in pages, where each page is a fixed-size memory region. In most cases, it's 4KiB (4096 bytes).

TIP

RISC-V and other modern CPUs support bigger page sizes.

In this book, we'll use the global allocator for simplicity. Construct a memory request to the allocator layout with the required length and alignment:

src/allocator.rs
rust
pub fn alloc_pages(len: usize) -> *mut u8 {
    debug_assert!(len % 4096 == 0, "len must be a multiple of 4096");
    let layout = Layout::from_size_align(len, 4096).unwrap();
    unsafe { GLOBAL_ALLOCATOR.alloc_zeroed(layout) as *mut u8 }
}

Building a guest page table

Let's build a guest page table. While it's a different data structure, it's mostly identical to the page table in the user mode we learned in OS in 1,000 Lines:

src/guest_page_table.rs
rust
use core::mem::size_of;

pub const PTE_R: u64 = 1 << 1; /* Readable */
pub const PTE_W: u64 = 1 << 2; /* Writable */
pub const PTE_X: u64 = 1 << 3; /* Executable */
const PTE_V: u64 = 1 << 0; /* Valid */
const PTE_U: u64 = 1 << 4; /* User */
const PPN_SHIFT: usize = 12;
const PTE_PPN_SHIFT: usize = 10;

#[derive(Debug, Clone, Copy, PartialEq, Eq)]
#[repr(transparent)]
struct Entry(u64);

impl Entry {
    pub fn new(paddr: u64, flags: u64) -> Self {
        let ppn = (paddr as u64) >> PPN_SHIFT;
        Self(ppn << PTE_PPN_SHIFT | flags)
    }

    pub fn is_valid(&self) -> bool {
        self.0 & PTE_V != 0
    }

    pub fn paddr(&self) -> u64 {
        (self.0 >> PTE_PPN_SHIFT) << PPN_SHIFT
    }
}

#[repr(transparent)]
struct Table([Entry; 512]);

impl Table {
    pub fn alloc() -> *mut Table {
        crate::allocator::alloc_pages(size_of::<Table>()) as *mut Table
    }

    pub fn entry_by_addr(&mut self, guest_paddr: u64, level: usize) -> &mut Entry {
        let index = (guest_paddr >> (12 + 9 * level)) & 0x1ff; // extract 9-bits index
        &mut self.0[index as usize]
    }
}

pub struct GuestPageTable {
    table: *mut Table,
}

impl GuestPageTable {
    pub fn new() -> Self {
        Self {
            table: Table::alloc(),
        }
    }

    pub fn hgatp(&self) -> u64 {
        (9u64 << 60/* Sv48x4 */) | (self.table as u64 >> PPN_SHIFT)
    }

    pub fn map(&mut self, guest_paddr: u64, host_paddr: u64, flags: u64) {
        let mut table = unsafe { &mut *self.table };
        for level in (1..=3).rev() { // level = 3, 2, 1
            let entry = table.entry_by_addr(guest_paddr, level);
            if !entry.is_valid() {
                let new_table_ptr = Table::alloc();
                *entry = Entry::new(new_table_ptr as u64, PTE_V);
            }

            table = unsafe { &mut *(entry.paddr() as *mut Table) };
        }

        let entry = table.entry_by_addr(guest_paddr, 0);
        println!("map: {:08x} -> {:08x}", guest_paddr, host_paddr);
        assert!(!entry.is_valid(), "already mapped");
        *entry = Entry::new(host_paddr, flags | PTE_V | PTE_U);
    }
}
  • Entry represents a page table entry. It contains the next-level table or the physical page number (PPN) in host memory.
  • Table represents a page table in each level. It has 512 entries of Entry (8 bytes each), that is 512 * 8 = 4096 bytes. It fits into a 4KiB page nicely!
  • GuestPageTable is a guest page table. It contains the pointer to the top-level page table.
  • GuestPageTable::map maps a guest-physical address to a host-physical address. It traverses the page table levels from the top to the bottom, and allocates a new intermediate page table if it's not present, and finally updates the leaf entry.

Loading a guest kernel

We're ready to fill the guest memory space! But what should we put in the guest memory? Let's start with a minimal boot code which just infinitely loops:

guest.S
asm
.section .text
.global guest_boot
guest_boot:
    j guest_boot

Since using Rust is a bit tricky, we'll use Clang and LLVM:

run.sh
sh
#!/bin/sh
set -ev

# macOS users need to install "llvm" in Homebrew:
export PATH="$(brew --prefix llvm)/bin:$PATH"

clang \
    -Wall -Wextra --target=riscv32-unknown-elf -ffreestanding -nostdlib \
    -Wl,-eguest_boot -Wl,-Ttext=0x100000 -Wl,-Map=guest.map \
    guest.S -o guest.elf

llvm-objcopy -O binary guest.elf guest.bin

RUSTFLAGS="-C link-arg=-Thypervisor.ld -C linker=rust-lld" \
  cargo build --bin hypervisor --target riscv64gc-unknown-none-elf

llvm-objcopy coverts the ELF file to a raw binary file (guest.bin) that can be loaded into the guest memory as-is. Load it using include_bytes! and map it into the guest page table:

src/main.rs
rust
use crate::{
    allocator::alloc_pages,
    guest_page_table::{GuestPageTable, PTE_R, PTE_W, PTE_X},
};

fn main() -> ! {
    /* ... */

    let kernel_image = include_bytes!("../guest.bin");
    let guest_entry = 0x100000;

    // Copy guest kernel to a guest memory buffer.
    let kernel_memory = alloc_pages(kernel_image.len());
    unsafe {
        let dst = kernel_memory as *mut u8;
        let src = kernel_image.as_ptr();
        core::ptr::copy_nonoverlapping(src, dst, kernel_image.len());
    }

    // Map the guest memory into the guest page table.
    let mut table = GuestPageTable::new();
    table.map(guest_entry, kernel_memory as u64, PTE_R | PTE_W | PTE_X);

    let mut hstatus = 0;
    hstatus |= 2u64 << 32; // VSXL: XLEN for VS-mode (64-bit)
    hstatus |= 1u64 << 7; // SPV: Supervisor Previous Virtualization mode (HS-mode)

    unsafe {
        asm!(
            "csrw hstatus, {hstatus}",
            "csrw hgatp, {hgatp}",
            "csrw sepc, {sepc}",
            "sret",
            hstatus = in(reg) hstatus,
            hgatp = in(reg) table.hgatp(),
            sepc = in(reg) guest_entry,
        );
    }

Let's try it:

$ ./run.sh
Booting hypervisor...
map: 00100000 -> 80305000

It doesn't cause a guest page fault anymore! However, it's unclear if it's working because it doesn't print anything.

The quick way to check if it's working is to use QEMU's monitor command:

QEMU 10.0.0 monitor - type 'help' for more information
(qemu) info registers

CPU#0
 V      =   1
 pc       0000000000100000

As you can see, the program counter is stuck at 0x100000 in the guest mode (V=1). Yay!