Fourth in a series on building a firmware-free VideoBIOS for RISC-V. In Part 3 the card's own x86 firmware ran to a clean far return on my RISC-V machine, sign-on banner and all. This post is about what happened next: by every register I could read the card was up, and the monitor still showed nothing.
The story so far. I taught a RISC-V CPU to emulate x86 well enough to run the GK208's video BIOS from the first byte to its sign-on banner to a clean far return. Counted in instructions, POST finished. The screen stayed black.
Did the display code even run?
First I had to rule out the boring explanation: that the firmware's display code never actually executed and I'd fooled myself with a nice-looking trace.
The device-init handlers in this firmware are dispatch records with a code pointer a few bytes in. Once I found that structure I put a log on the pointer and watched the display handlers get called and run. So they ran. The problem was past "did it run."
Walking back from the glass
When the monitor is black, you start at the connector and walk backwards, asking at each stage whether the signal is there.
The DAC I could power on. Cost me an evening to learn the live output on this board is DAC-1, not DAC-0. The 8×16 font uploaded into the right plane. The mode-3 raster geometry programmed fine. Everything near the monitor looked plausible.
The signal died at the clock. The pixel clock on this head comes from a PLL the hardware calls VPLL0, and VGA text mode wants it locked at 28.322 MHz. No locked clock, no timing; no timing, no sync; no sync, black — no matter what's downstream. My VPLL0 wasn't locking.
One bit
To see what a locked VPLL looks like, I did what I'd end up doing over and
over: booted the card-equipped machine into Linux with nouveau, let
it bring up a login screen, and read the display registers off a card that was
demonstrably working. That became my golden reference.
The difference was one bit. The VPLL control register at 0x614140
read 0x02030002 on the nouveau card and 0x02010002 on
mine — bit 17, the lock bit, set on the one showing a login screen and clear
on the one showing black. Everything else in my modeset matched the golden values
closely enough not to matter. So it wasn't a wrong color or a font in the wrong
plane. The display engine just never crossed from "configured" to "on," and it
said so with one unlit bit.
The supervisor stuck at phase 1
That bit stays clear because of the display supervisor — a small state machine in Kepler's display controller that runs a modeset as a sequence of phases and enables the output at the end. Mine stalled at phase 1 and never advanced. No advance, no output enable, no PLL lock, black screen.
I spent days pushing on it. Instrumented the output teardown to find where it wedged. Wrote code to fake the supervisor-advance signals so the firmware would go program its own VPLL — and it did, I got the VBIOS programming the pixel clock itself. Still no lock.
So I took apart the display script the firmware decompresses at runtime (the self-inflating hole from Part 3), decoded its opcode table, and found the instruction that wouldn't let go: a timed condition-wait, one opcode among many, waiting for a hardware condition that on my machine never came true. That was the gate. Nothing was broken and my emulation wasn't wrong — the firmware was correctly waiting for something my cold card never gave it.
Two firmwares in one ROM
What it was waiting for got clearer when I looked harder at the ROM. It isn't one program. The image I'd been running for months is the legacy x86 video BIOS, the option-ROM that answers INT 10h. Sitting next to it in the same ROM is a second, separate image: a 64-bit x86 UEFI GOP driver — the module a modern PC's UEFI loads to drive this exact card.
That re-framed the first image. A GK208 in a real PC isn't brought up by the INT 10h path I'd been so carefully emulating; it's brought up by UEFI running the GOP driver, inside a pile of services and prior init the legacy path has quietly come to depend on. The old text-mode firmware is still there for compatibility, but it no longer carries the whole recipe for taking a stone-cold card to a lit display, because on the machines it actually runs on it never has to.
The warm-card model
This is what I started calling the warm-card model. The firmware doesn't expect the card I was handing it. It expects one that's already partway warm — touched by a real PC's UEFI, by the GOP driver, by the handshake a motherboard does around the video BIOS — and its display path is written for that. On a genuinely cold card with nothing under it but a RISC-V CPU running x86, the supervisor hits its timed gate, waits for a condition only a warm machine would have set up, and never advances.
The thing I was proud of in Part 3 — running the real firmware faithfully, with nothing faked — is exactly why it couldn't finish. A real firmware run faithfully will faithfully wait for the world it was written for.
The floor
This is the wall, and it's worth being clear about which kind it is: not the kind more of the same effort climbs. I could make the x86 emulation flawless, serve every byte of the PROM, match the golden registers digit for digit — and the card would still sit at that gate. The missing piece was never fidelity. It was a whole warm-machine context, a UEFI and a GOP and a PC, that my board isn't and that I had no intention of building. I'd set out to give my computer a small voice at power-on and found that this road ended at rebuilding most of a PC just to say hello.
The last commit on this road
At the very tip of that project's history there's a quiet commit, and I like it in hindsight, because it's where the whole approach turns. After all the fighting to make someone else's firmware finish, the last thing I did was write code to just watch: log the full sequence of register writes a real, working init performs, and save it. I didn't quite see it that night, but that little capture tool was me giving up on running the x86 firmware and starting to ask a better question — if I can see exactly what a working card wants written to it, why do I need the x86 firmware at all?
The rest of the series is the answer. It starts with throwing away everything above.
Next: Burning the Boats — Abandoning x86 Entirely.
No comments:
Post a Comment