Yesterday I had a working macro preprocessor. Today I have a working assembler that generates real GMS/359 machine code.
The Debug Hunt
The day started with a classic "it compiles but doesn't work" situation. The preprocessor built fine but %define was being interpreted as %elif. Classic symptom: wrong value being returned somewhere.
The culprit? A mismatch between the pptok.h enum values and the hash lookup function. NASM's original design used a perfect hash (computed by Perl scripts) where the array index equals the enum value. My simplified version returned array indices, but the enum values were carefully designed for conditional processing:
/* The enum assigns specific values for conditional grouping */
PP_ELIF = 0, /* Base for %elif variants */
PP_IF = 32, /* Base for %if variants */
PP_DEFINE = 67, /* Way up here! */
When pp_token_hash("%define") returned 0 (the array index), the switch statement thought it was PP_ELIF. Chaos ensued.
The Fix
Rather than regenerating perfect hashes (which would require porting NASM's Perl infrastructure), I created a proper lookup table:
static const struct {
const char *name;
enum preproc_token token;
} pp_directive_table[] = {
{ "%elif", PP_ELIF },
{ "%define", PP_DEFINE },
/* ... all 80+ directives ... */
{ NULL, PP_INVALID }
};
Linear search? Yes. Fast enough for ~80 directives? Absolutely. The preprocessor is I/O bound anyway.
With that fixed, plus a couple of null pointer issues in macro expansion, the preprocessor started working. %define, %ifdef, %macro — all correct.
Building the Instruction Encoder
With preprocessing working, the next step was instruction encoding. GMS/359 uses the classic S/360 instruction formats, but with little-endian byte order:
RR-format: [opcode] [R1<<4|R2] (2 bytes)
RX-format: [opcode] [R1<<4|X2] [D2-lo] [B2<<4|D2-hi] (4 bytes)
SI-format: [opcode] [I2] [D1-lo] [B1<<4|D1-hi] (4 bytes)
S-format: [opcode] [00] [D2-lo] [B2<<4|D2-hi] (4 bytes)
I implemented a complete instruction table with all the core S/360 instructions plus the GMS/359 I/O instructions:
| Format | Instructions |
|---|---|
| RR | LR, AR, SR, MR, DR, CR, NR, OR, XR, LTR, LCR, LPR, LNR, BCR, BALR, BCTR |
| RX | L, ST, A, S, M, D, C, N, O, X, LA, LH, STH, AH, SH, MH, CH, BC, BAL, BCT |
| SI | MVI, NI, OI, XI, CLI, TM |
| S | LPSW, SIO, TIO, HIO, TCH, HALT |
Plus all the branch aliases: B, BH, BL, BE, BNE, BO, BP, BM, BZ, BR, NOP, NOPR...
The Test: IPL Bootstrap
To verify everything works, I assembled a real IPL bootstrap — the code that would run when the GMS/359 powers on:
; GMS/359 IPL: Display "HELLO" on console
%define CONSOLE 10h
%define CAW 48h
ORG 200h
START:
LA R2,[CCW]
ST [CAW],R2
POLL: TIO CONSOLE
BC 1,POLL
SIO CONSOLE
BC 2,ERROR
WAIT: TIO CONSOLE
BC 1,WAIT
LPSW [DONEPSW]
The assembler generated:
MVI [048h], 00h → 92 00 48 00 ✓
SIO 010h → 9c 00 10 00 ✓
TIO 010h → 9d 00 10 00 ✓
BC 2, poll_loop → 47 20 10 00 ✓
Every byte correct. The instruction encoding matches my Python reference implementation exactly.
RDOFF Output
The final piece: generating linkable object files. I chose RDOFF (Relocatable Dynamic Object File Format) — NASM's simple, well-documented format. It's perfect for a homebrew system:
- Clean header with module name and architecture tag
- Separate code and data segments
- Simple relocation records
- No unnecessary complexity
The assembler now outputs .rdf files that can be linked with ldrdf or converted to raw binary with rdf2bin.
What's Working
- ✅ Complete NASM-style macro preprocessor
- ✅ All %define, %macro, %if, %rep directives
- ✅ GMS/359 instruction encoding (RR, RX, RS, SI, S formats)
- ✅ Labels and forward references
- ✅ RDOFF v2 object file output
- ✅ Two-pass assembly with proper error reporting
What's Next
The immediate next step is linking multiple object files and generating the final binary for the FPGA. Then: channel programs, device drivers, and eventually a small operating system.
Reflection
This project started as "let's build a simple assembler" and turned into a deep dive through 60 years of computer architecture. The S/360 designers knew what they were doing — channel I/O, the PSW, the clean instruction formats — it all still makes sense today.
And now I have the tools to write software for my own implementation of those ideas.
No comments:
Post a Comment