In this blog I share my observations, thoughts and experience about computers, linguistics, philosophy and many other things that interest me.

Thursday, April 09, 2026

QRV on Real Hardware: First Boot on SiFive Unmatched

Two evening sessions, about four to five hours each. At the end of the second one, QRV was booting on a real RISC-V board — a SiFive HiFive Unmatched with the U740 SoC, 16 GiB of RAM — getting through the full startup sequence, loading taskman, resolving 5,717 symbols, and running deep into process manager initialization before hitting a crash. 


The Gap Between QEMU and Real Hardware

QEMU is a comfortable environment. Hardware addresses are fixed and documented. The memory layout is predictable. The MMU behaves helpfully — for example, it manages the Accessed and Dirty bits in page table entries automatically, so you never notice if you forgot to set them. Real hardware does not do any of those things.

The SiFive U740 has its own ideas. It took a cluster of fixes to get QRV past the point where QEMU masks hardware requirements.

PTE Accessed/Dirty/Global bits

The U74 core uses trap-based A/D bit management. If a leaf PTE doesn't have the Accessed (PTE_A) and Dirty (PTE_D) bits set, the CPU raises a page fault on first access and expects the OS to set them in the fault handler. QEMU sets them automatically in hardware, so this was invisible until the Unmatched.

The fix: pre-set PTE_A, PTE_D, and PTE_G (global) on all kernel leaf PTEs at construction time. Also fix split_megapage(): when a 2MB megapage is split into 512 4KB pages, the original flags mask only preserved V/R/W/X/U — stripping A/D/G from every resulting leaf PTE, causing page faults on any access through the split region after MMU enable.

FreeBSD source code helped greatly here. 

Boot stack stomped by heap allocator

On QEMU, the initrd (the CPIO module package) is loaded immediately after the kernel binary. This means alloc_ram for the module package naturally claims the region right after _end, which happens to include the startup stack. The stack survives because it's claimed before anything else touches it.

On the Unmatched, U-Boot loads the initrd far away in high memory. The region after _end is left "free", and calloc_ram for the boot heap happily allocated it — zeroing 16 KB of live stack in the process, producing a clean hang with no output.

Fix: explicitly include the boot stack region (set up in head.S at _end + 0x4000) in the kernel memory claim before the heap allocator runs.

FDT-based device discovery

The old code had QEMU's virt machine addresses hardcoded: UART0_BASE, CLINT, PLIC, KERNBASE. None of these match the Unmatched's memory map. The fix was already partially done for drivers in v0.20; this sprint extended it to the startup code itself.

board.c now does a second FDT pass to discover PLIC, CLINT, UART, RTC, and RAM base from the device tree, using a two-phase accumulation to handle arbitrary FDT property ordering. A compat_match() helper scans null-separated compatible string lists — necessary for the Unmatched, where "sifive,plic-1.0.0" is the second string in the PLIC's compatible list and a naive prefix check would miss it.

The include/platform/qemu_virt.h header — and everything that included it — is gone. Addresses are now runtime, not compile-time.

One additional MMU fix: sfence.vma before csrw satp (matching FreeBSD's sequence), and identity-mapping of the FDT region if U-Boot placed it outside normal RAM (the Unmatched puts it just below 4 GiB).


Where It Gets To

Here is what the serial console shows with the current build on the Unmatched:

+--------------------------------------------+
| QRV Operating System Kernel version 0.20.3 |
+--------------------------------------------+

Startup runs: clocks initialized, 16 GiB of RAM discovered (0x80000000..0x480000000), Sv39 page tables built, MMU enabled, PLIC configured for 127 sources, four CPUs (the U740's four U74 application cores — hart 0, the S7 monitor core without MMU support, correctly skipped as added in v0.20).

Then the kernel loads taskman:

[1] kerlink: boot/taskman.qkx: resolved 5717/5717 external symbols
[1] kerlink: boot/taskman.qkx: link complete, entry taskman_main=ffffffc0803c44e8
[1] taskman: message_init...
[1] taskman: sysmgr_init...
[1] taskman: pathmgr_init...

And then, during pathmgr_init's walk of the address space info, while processing the sysram entry that covers memory above 4 GiB:

*** trap from S-mode on cpu 1 (act=ffffffc080270000 tid0 pid1) ***
[1]   scause: 12 (instruction page fault)
[1]   sepc:   0xffffffc0803d75c8
[1]   stval:  0x40803d75c8

Reading the Trap Dump

The values here are telling. sepc is 0xffffffc0803d75c8 — a valid kernel address, the instruction that faulted. For an instruction page fault, stval contains the target address that couldn't be fetched — 0x40803d75c8. That is not a valid Sv39 address: bit 38 is set, but the upper bits are zero rather than all-ones as a kernel address requires. A canonical kernel address would be 0xffffffc0803d75c8; what we got is that address with the upper bits stripped.

Something truncated a kernel function pointer to a smaller integer type and then used it as a call target.

The last successful log line before the crash is walk_asinfo iterating over the sysram entry start=0xfe713a00 end=0x47fffffff — the portion of physical RAM that extends above 4 GiB. On QEMU's virt machine with 256 MB of RAM, no address ever comes close to 4 GiB, and this code path is never exercised. With 16 GiB on the Unmatched, end=0x47fffffff is completely normal.

The original QNX physical allocator code used (unsigned) casts throughout: quantum counts, block sizes, pointer arithmetic. unsigned is 32 bits. Addresses and sizes above 4 GiB truncate silently to 32 bits — and the result, zero-extended back to 64 bits, produces exactly the kind of corrupted pointer we see in the trap dump.


The Hypothesis: LP64 Truncation in the Physical Allocator

A first LP64 audit pass has been applied: LEN_TO_NQUANTUM, NQUANTUM_TO_LEN, OFF_TO_QUANTUM widened to size_t; block_head fields num_to_clean, num_quanta, max_free_run widened; the (unsigned) cast on the quantum index in PQ_TO_PADDR removed; quantum count parameters and locals in pa_alloc_given, pa_free, _pa_free widened. A stale FUTURE comment in pa.c that noted the >4G block issue was removed alongside these changes.

The crash persists. The physical allocator is the right area — the evidence still points there — but the specific truncation that produces the corrupted call target has not been identified yet. Work continues.


Meanwhile on QEMU: A Lot Happened

The bare-metal work ran in parallel with substantial feature development on the QEMU side. Since v0.20.1:

A filesystem. fs-qrv — a user-space filesystem resource manager backed by devb-virtio — now works. Three bugs were fixed to get there: fs_read() was writing file data into the 1.5 KB resmgr dispatch buffer instead of building IOV entries pointing into the block cache (large reads corrupted the dispatch context); fs_lseek() wasn't checking _IO_COMBINE_FLAG, so combined lseek+read messages from the ELF loader received the seek offset instead of file data; and qrvfs_init() had a race where devb-virtio's IST thread wasn't ready on the first block read.

Pipe resource manager. pipe — a POSIX pipe/FIFO resource manager ported from QNX 6.4 — runs in user space. QNX's procmgr_daemon() replaced with procmgr_detach(), the host-filesystem FIFO passthrough removed, the thread pool adapted for QRV's model.

cpiofs readdir fixed. ls /rd/bin previously returned top-level entries regardless of which directory was opened — the OCB wasn't storing the directory path, so readdir had no way to filter. Fixed.

ls -l and ls -a. Long listing with mode, link count, uid, gid, size. Dotfiles. Small but visible progress.

mount utility. Ported from QNX 6.4. With no arguments, walks /proc/mount and reports mounted servers. Searches for both mount_<type> and fs-<type> helpers.

Syscall conformance test suite. userland/test/syscall_testing — a framework for testing kernel calls against their specified behavior. First subsystem: 25 tests for TimerCreate/TimerDestroy covering success paths, error returns, the _r variants, resource exhaustion, and ID reuse. All 25 pass.

Kernel call reference documentation. LaTeX-formatted reference pages for individual syscalls: InterruptAttachThread, ConnectAttach, TimerCreate, TimerDestroy. Growing slowly but present.


What Comes Next

Finding the crash. The LP64 physical allocator work was the right direction but hasn't closed it yet — there is still a truncated pointer somewhere in the pathmgr_init / walk_asinfo path when processing memory above 4 GiB. The target is a shell prompt on the Unmatched's physical serial console today.

No comments: