It boots. On a real RISC-V board, pressing a real reset button, with a real
serial cable connected to a real terminal. The shell prompt appears. pidin
runs and shows 15 GB free out of 16. shutdown works. This is v0.21 on a
SiFive HiFive Unmatched.
The full console capture:
U-Boot 2025.10-0ubuntu0.24.04.1 (Nov 19 2025 - 17:26:40 +0000)
CPU: sifive,bullet0
Model: SiFive HiFive Unmatched A00
DRAM: 16 GiB
...
=> run bootqrv
385008 bytes read in 0 ms
3778560 bytes read in 4 ms (900.9 MiB/s)
Starting kernel ...
===============================================================================
Boot command line: -Dsbi
+------------------------------------------+
| QRV Operating System Kernel version 0.21 |
+------------------------------------------+
init_clocks
init_raminfo
init_mmu
ram_top = 480000000
identity_map RAM 80000000..480000000
identity_map CLINT 2000000
identity_map PLIC c000000
high_va_map RAM
high_va_map CLINT
high_va_map PLIC
page tables ready, root at 80a62000
enable_mmu...
enable_mmu: Sv39 paging active
MMU enabled
init_intrinfo: timer(base=5,n=1) plic(base=32,n=128)
init_qtime: cycles_per_sec=1000000 timer_rate=1000 intr=5
init_cpuinfo: 4 CPUs, rv64imac, 1 MHz
init_system_private
Startup complete, entering kernel...
[2] modpkg: bin/esh: executable (recorded)
[2] modpkg: boot/taskman.qkx: kernel extension (recorded)
[2] modpkg: lib/ld-qrv.so: shared library (recorded)
[2] modpkg: lib/libc.qrl: shared library (recorded)
[2] modpkg: loaded bin/esh (sv39) at VA 10000, entry=120a4
[2] modpkg: ld-qrv.so: applied 160/160 RELATIVE relocations
[2] modpkg: loaded ld-qrv.so at VA 100000, entry=101610
[2] modpkg: bin/esh is dynamic, entering via ld-qrv.so at 101610, sp=3fffffdf60
[2] modpkg: 5 module(s) found
[2] plic_init: 127 sources configured
*** QRV kernel: preparing to start taskman ***
[2] kerlink: lib/libc.qrl: ET_DYN, 8 program headers
[2] kerlink: lib/libc.qrl: exported 703 symbols for dynamic linking
[2] kerlink: boot/taskman.qkx: ET_REL, 22 sections
[2] kerlink: boot/taskman.qkx: resolved 5716/5716 external symbols (0 unresolved)
[2] kerlink: boot/taskman.qkx: applied 9954/9954 relocations (0 skipped)
[2] kerlink: boot/taskman.qkx: link complete, entry taskman_main=ffffffc0803c44c8
[2] taskman: message_init...
[2] taskman: rsrcdbmgr_init...
[2] taskman: sysmgr_init...
[2] taskman: pathmgr_init...
[2] taskman: bootimage_init...
[2] taskman: namedsem_init...
[2] taskman: message_start...
[2] cpu_start_others: hart 1 started
[2] cpu_start_others: hart 3 started
[2] cpu_start_others: hart 4 started
[2] bootimage_start: opening /dev/console
[2] bootimage: spawned /rd/init (pid 2)
***********************************
* Welcome to QRV Operating System *
***********************************
pid tid name prio STATE Blocked
1 1 taskman 255r RUNNING
1 2 taskman 255r RUNNING
1 3 taskman 255r RUNNING
1 4 taskman 255r RUNNING
1 5 taskman 255r RECEIVE 1
1 6 taskman 10r RUNNING
1 7 taskman 255r RECEIVE 1
1 8 taskman 255r RECEIVE 1
1 9 taskman 10r RECEIVE 1
1 10 taskman 10r RECEIVE 1
1 11 taskman 10r RECEIVE 1
1 12 taskman 10r RECEIVE 1
1 13 taskman 10r RECEIVE 1
2 1 esh 10r REPLY 1
2 2 esh 10r RECEIVE 1
3 1 pidin 10r REPLY 1
3 2 pidin 10r RECEIVE 1
Starting devc-ser8250...
[2] devc-ser8250: no UART found in hwinfo and no -p/-i given
system console set to /dev/ser1
Unable to reopen /dev/ser1
Starting devb-virtio...
devb-virtio: no virtio device in hwinfo and no -p/-i given
Starting fs-qrv...
fs-qrv: cannot open /dev/vblk0: No such file or directory
Use shutdown command to shut down the machine
# pidin info
CPU:RISC-V Release:0.21 FreeMem:15Gb/16Gb BootTime:0
Processes: 4, Threads: 18
Processor1: 1 rv64imac 1MHz FPU
Processor2: 2 rv64imac 1MHz FPU
Processor3: 3 rv64imac 1MHz FPU
Processor4: 4 rv64imac 1MHz FPU
# shutdown
Shutting down...
What the Log Shows
The Unmatched has 16 GiB of RAM across physical addresses
0x80000000–0x47fffffff. All of it is discovered, mapped, and visible to
the allocator. pidin info reports FreeMem:15Gb/16Gb — the kernel and
taskman together occupy about 1 GB of a 16 GB machine, which is reasonable
for a debug build with no memory trimming.
Four CPUs start: hart 1, 3, and 4 (the U74 application cores — hart 0, the
S7 monitor core without MMU support, is correctly skipped). Thirteen taskman
threads running across them. The init script runs, spawns pidin, the
process table is displayed, the shell accepts input, shutdown works.
The virtio and UART driver failures are expected: devb-virtio and
devc-ser8250 currently discover their hardware through the syspage hwinfo
section, which is populated from FDT nodes for virtio,mmio and ns16550a.
The Unmatched's UART is a different compatible string (sifive,uart0), and
there is no virtio device on real hardware. Both fail gracefully with a
diagnostic. The console falls back to the in-kernel devtext driver, which is
why the shell prompt is reachable at all.
These are the next targets.
What Fixed It
Three bugs stood between the first crash and a clean boot. Two were hardware-specific — invisible on QEMU, fatal on real silicon.
CPU_SYSTEM_PADDR_END = 0xFFFFFFFF. The physical allocator's ceiling
was 4 GB. 14 of the Unmatched's 16 GiB were simply outside the allocatable
universe. Any address in that region passed through the allocator with
undefined results — function pointers truncated to 32 bits, corrupted call
targets, instruction page faults. The fix is one line, committed on March
14th and dormant until today:
- #define CPU_SYSTEM_PADDR_END 0xFFFFFFFFUL
+ #define CPU_SYSTEM_PADDR_END ((1UL << 56) - 1)
Along with it: the paddr32/paddr64 split mechanism — an x86 PAE relic
that made no sense on LP64 RISC-V — was removed end-to-end. On a 64-bit
architecture where every pointer is 64 bits, the per-process opt-in to
high physical memory is pointless. restrict_user now covers the full
0 … last_paddr range.
PTE Accessed/Dirty bits on user mappings. The U74 uses trap-based A/D
bit management: if a leaf PTE doesn't have PTE_A set (and PTE_D for
writable pages), the CPU raises a page fault on first access and expects the
OS to set them. QEMU manages these bits in hardware and sets them
automatically, masking the bug entirely. prot_to_pte_bits() now pre-sets
PTE_A on all mappings and PTE_D on writable ones.
cpiofs_lseek return value in combined messages. When the ELF loader
sends a combined lseek+read message (rcvid == 0), cpiofs_lseek was
returning _RESMGR_PTR instead of EOK. _RESMGR_PTR breaks the
combined-message dispatch loop in _resmgr_io_handler — the read handler
never runs, proc_read gets a short reply, and retries with a shifted
buffer pointer. On QEMU this produced intermittent load failures; on the
Unmatched it was consistent enough to block the boot entirely.
The March 14 Comment
The CPU_SYSTEM_PADDR_END constant was set on March 14th — the same week
as first user-mode ecall dispatch, the same week the runtime linker first
self-relocated. The comment at the time read:
QEMU virt tops out at ~4GB for now.
A temporary constraint, noted honestly, and then left in place while the system grew around it. On QEMU with 256 MB of RAM, no address ever came within twelve times the horizon. Six weeks later, on a machine with 64 times more RAM than QEMU was given, the constraint became visible.
That is the nature of this kind of porting work. The bugs that matter are not the ones that crash immediately — those are easy to find. The ones that matter are the ones that were correct for the world they were written in.
What Comes Next
UART driver support for the Unmatched's sifive,uart0 hardware, so the
shell runs on the real serial driver rather than devtext. After that: block
storage on real hardware, which means either NVMe through the PCIe slot
the Unmatched provides, or SD card. The PCI server from QNX 6.6 is on the
horizon.
No comments:
Post a Comment