In this blog I share my observations, thoughts and experience about computers, linguistics, philosophy and many other things that interest me.

Thursday, January 15, 2026

asm359: From Preprocessor to Working Assembler

Yesterday I had a working macro preprocessor. Today I have a working assembler that generates real GMS/359 machine code.

The Debug Hunt

The day started with a classic "it compiles but doesn't work" situation. The preprocessor built fine but %define was being interpreted as %elif. Classic symptom: wrong value being returned somewhere.

The culprit? A mismatch between the pptok.h enum values and the hash lookup function. NASM's original design used a perfect hash (computed by Perl scripts) where the array index equals the enum value. My simplified version returned array indices, but the enum values were carefully designed for conditional processing:

/* The enum assigns specific values for conditional grouping */
PP_ELIF = 0,      /* Base for %elif variants */
PP_IF = 32,       /* Base for %if variants */
PP_DEFINE = 67,   /* Way up here! */

When pp_token_hash("%define") returned 0 (the array index), the switch statement thought it was PP_ELIF. Chaos ensued.

The Fix

Rather than regenerating perfect hashes (which would require porting NASM's Perl infrastructure), I created a proper lookup table:

static const struct {
    const char *name;
    enum preproc_token token;
} pp_directive_table[] = {
    { "%elif",   PP_ELIF },
    { "%define", PP_DEFINE },
    /* ... all 80+ directives ... */
    { NULL, PP_INVALID }
};

Linear search? Yes. Fast enough for ~80 directives? Absolutely. The preprocessor is I/O bound anyway.

With that fixed, plus a couple of null pointer issues in macro expansion, the preprocessor started working. %define, %ifdef, %macro — all correct.

Building the Instruction Encoder

With preprocessing working, the next step was instruction encoding. GMS/359 uses the classic S/360 instruction formats, but with little-endian byte order:

RR-format:  [opcode] [R1<<4|R2]                    (2 bytes)
RX-format:  [opcode] [R1<<4|X2] [D2-lo] [B2<<4|D2-hi]  (4 bytes)
SI-format:  [opcode] [I2] [D1-lo] [B1<<4|D1-hi]    (4 bytes)
S-format:   [opcode] [00] [D2-lo] [B2<<4|D2-hi]    (4 bytes)

I implemented a complete instruction table with all the core S/360 instructions plus the GMS/359 I/O instructions:

Format Instructions
RR LR, AR, SR, MR, DR, CR, NR, OR, XR, LTR, LCR, LPR, LNR, BCR, BALR, BCTR
RX L, ST, A, S, M, D, C, N, O, X, LA, LH, STH, AH, SH, MH, CH, BC, BAL, BCT
SI MVI, NI, OI, XI, CLI, TM
S LPSW, SIO, TIO, HIO, TCH, HALT

Plus all the branch aliases: B, BH, BL, BE, BNE, BO, BP, BM, BZ, BR, NOP, NOPR...

The Test: IPL Bootstrap

To verify everything works, I assembled a real IPL bootstrap — the code that would run when the GMS/359 powers on:

; GMS/359 IPL: Display "HELLO" on console
%define CONSOLE 10h
%define CAW     48h

        ORG     200h
START:
        LA      R2,[CCW]
        ST      [CAW],R2
POLL:   TIO     CONSOLE
        BC      1,POLL
        SIO     CONSOLE
        BC      2,ERROR
WAIT:   TIO     CONSOLE
        BC      1,WAIT
        LPSW    [DONEPSW]

The assembler generated:

MVI [048h], 00h  →  92 00 48 00  ✓
SIO 010h         →  9c 00 10 00  ✓
TIO 010h         →  9d 00 10 00  ✓
BC  2, poll_loop →  47 20 10 00  ✓

Every byte correct. The instruction encoding matches my Python reference implementation exactly.

RDOFF Output

The final piece: generating linkable object files. I chose RDOFF (Relocatable Dynamic Object File Format) — NASM's simple, well-documented format. It's perfect for a homebrew system:

  • Clean header with module name and architecture tag
  • Separate code and data segments
  • Simple relocation records
  • No unnecessary complexity

The assembler now outputs .rdf files that can be linked with ldrdf or converted to raw binary with rdf2bin.

What's Working

  • ✅ Complete NASM-style macro preprocessor
  • ✅ All %define, %macro, %if, %rep directives
  • ✅ GMS/359 instruction encoding (RR, RX, RS, SI, S formats)
  • ✅ Labels and forward references
  • ✅ RDOFF v2 object file output
  • ✅ Two-pass assembly with proper error reporting

What's Next

The immediate next step is linking multiple object files and generating the final binary for the FPGA. Then: channel programs, device drivers, and eventually a small operating system.

Reflection

This project started as "let's build a simple assembler" and turned into a deep dive through 60 years of computer architecture. The S/360 designers knew what they were doing — channel I/O, the PSW, the clean instruction formats — it all still makes sense today.

And now I have the tools to write software for my own implementation of those ideas.

No comments: