Fifth in a series on building a firmware-free VideoBIOS for RISC-V. In Part 4 the card's own x86 firmware ran to a clean far return and the monitor still showed black, because that firmware was written for a warm PC my board is not. This post is about giving up on it — and the better approach that giving up made obvious.
The story so far.I got a RISC-V CPU to run the GK208's x86 video BIOS: full init, sign-on banner, clean far return — onto a card that stayed dark. The firmware was waiting for a machine mine isn't, and more fidelity would not change that.
The last thing I did on the old road
The last commit on the x86 project is not a fix. It is a small tool that only watches: while a known-good driver brings the card up, it logs every register write — address and value — to a text file. I wrote it as one more debugging instrument, but it was the end of the x86 road. With that log, the question answers itself: if I can see every write a working card receives, why run the x86 firmware at all?
What the capture meant
A GPU init is a sequence of register writes. The x86 firmware is one elaborate way to produce that sequence — self-decompressing, ROM-reading, assuming a PC around it — but the card does not care which CPU issues the writes; the same values in the same order bring it up the same way. Everything I had built on the x86 road — the emulator, the disassembler fixes, the decompression hole, the warm-card handshake I never satisfied — existed only to generate that sequence, and I already had it captured.
Reading a driver instead of running a firmware
The known-good driver is nouveau, the open Linux driver for these
cards. It mattered as a reference to read, not as code to run — I am not going to
ship a Linux kernel to light the screen at power-on. Unlike the legacy VBIOS, nouveau
is a cold-start driver: it assumes nothing warm underneath and brings the card up from
nothing, in readable C. Where the firmware was a black box, nouveau was a
specification. None of it ships; what ships is my own sequence, in my own C, checked
against a driver that does the same job in the open.
Burning the boats
The hard part was throwing away work I liked. The emulator I had compiled for RISC-V, the disassembler I had repaired, the patches that went upstream — all of it served running the firmware, and I was deciding that running the firmware was the wrong goal. Not a failed attempt at a good plan, but a good attempt at a plan that could not work, because the firmware wants a PC and my machine is not one. Cortés burned his ships so no one could turn back; a new, empty repository with no x86 in it was the same move. Mostly it was a relief.
A cold init is a sequence of writes
The new program is short to describe: replay the captured sequence from C, on the RISC-V CPU, into the register window. It is not a flat list, because a cold init has to wait for the hardware in places, so the recipe becomes a small op-table of three step kinds — write a value, poll a register until it reads what I expect, wait a fixed time. Each poll is bounded by the GPU's own timer, so a step that never completes fails in milliseconds instead of hanging the machine. Clocks, memory controller, privilege ring: a few hundred lines, no interpreter, no x86 ROM.
Replay is not parroting
A blind replay does not work. Some steps are only values in order, but others depend on the state the card is in when you reach them, and the privilege ring is the clear case: it has to be brought up and re-enumerated before whole regions of registers will answer, and writing to them early returns a fault pattern instead of data. You have to know what each step is for, so that when a captured value does not produce the captured result you can tell whether you are early, wrong, or reading a region the ring has not connected yet.
The card wakes, firmware-free
I powered a cold GK208 that had never been POSTed, ran the op-table, and it came up:
PMC_BOOT_0 read a live chip ID, the memory controller started, the
privilege ring stood, the register map answered. No x86 ran anywhere, and no emulator
was even in the binary.
And the monitor was still black. Waking the silicon and lighting the display are different jobs, and the x86 road had died at the second one — the clock that would not lock, the supervisor stuck at phase one. Burning the boats did not remove that problem; it only meant I would solve it in my own modeset, with no firmware underneath me. That is the next post.
Next: First Light — the Modeset That Finally Locked the Clock.
No comments:
Post a Comment