In this blog I share my observations, thoughts and experience about computers, linguistics, philosophy and many other things that interest me.

Wednesday, January 21, 2026

GMS 2950: A Cryptographic Coprocessor Joins the Mainframe

Today marks a significant milestone for the GMS/359 project: the successful integration and verification of the GMS 2950 Cryptographic Processor — a modular exponentiation accelerator connected to our Multiplexor Channel.

The Achievement

GMS 2950 Crypto Test
====================
Test: 3^5 mod 7 = ?
Computing...
SUCCESS! Z = 0x05

That humble 0x05 represents something remarkable: our little FPGA mainframe just computed its first cryptographic operation. The number 243 (which is 3⁵) modulo 7 equals 5 — verified in hardware, orchestrated through authentic System/360-style Channel I/O.

Why Modular Exponentiation?

The operation z = y^x mod m is the mathematical foundation of modern public-key cryptography:

  • RSA encryption and decryption
  • Diffie-Hellman key exchange
  • Digital signatures

IBM didn't add dedicated crypto acceleration to their mainframes until the z990 in 2003. Our 1965-era architecture recreation now has capabilities that Big Iron took four decades to acquire!

Architecture

The GMS 2950 connects to the Multiplexor Channel as device 0x2A:

┌─────────────────────────────────────────────────────┐
│            2870 MULTIPLEXOR CHANNEL                 │
├─────────────────────────────────────────────────────┤
│  10h Video Controller     ──── VGA Output           │
│  11h Keyboard Controller  ──── PS/2 Input           │
│  12h UART                 ──── Serial Console       │
│  2Ah Crypto Processor     ──── NEW!                 │
│  2Bh SYSINFO              ──── System Information   │
│  2Eh Console              ──── Console Area         │
└─────────────────────────────────────────────────────┘

The programming model follows our Channel I/O conventions:

WRITE 25 bytes to load operands and start computation:

Bytes 0-7:   X (exponent)   - 64-bit little-endian
Bytes 8-15:  Y (base)       - 64-bit little-endian  
Bytes 16-23: M (modulus)    - 64-bit little-endian
Byte 24:     CONTROL        - bit 0 = START

READ 8 bytes to retrieve the result:

Bytes 0-7:   Z (result)     - 64-bit little-endian

The beauty of Channel I/O shines here: the CPU issues SIO 02Ah, and the channel handles all the byte-by-byte transfers autonomously. The CPU can poll with TIO or (eventually) receive an interrupt when computation completes.

The Test Program

; Load operands and start crypto
        LFI     R1, crypto_write_ccw
        ST      [tMSVA.CAW], R1
        SIO     02Ah                    ; Start WRITE to crypto

poll_write:
        TIO     02Ah
        BB      poll_write              ; Wait for completion

; ... computation happens in hardware ...

; Read result
        LFI     R1, crypto_read_ccw
        ST      [tMSVA.CAW], R1
        SIO     02Ah                    ; Start READ from crypto

poll_read:
        TIO     02Ah
        BB      poll_read

; Check result
        LFI     R2, result_buffer
        LB      R3, [R2]                ; Load first byte of result
        LFI     R4, 5                   ; Expected value
        CMP     R3, R4
        BNE     test_failed          ; Branch if not equal

        ; SUCCESS!

The CCW (Channel Command Word) setup is straightforward:

crypto_write_ccw:
        DB      01h                     ; WRITE command
        DB      00h                     ; No flags
        DW      25                      ; 25 bytes
        DD      crypto_operands         ; Source address

crypto_operands:
        DB      05h, 00h, 00h, 00h, 00h, 00h, 00h, 00h  ; X = 5
        DB      03h, 00h, 00h, 00h, 00h, 00h, 00h, 00h  ; Y = 3
        DB      07h, 00h, 00h, 00h, 00h, 00h, 00h, 00h  ; M = 7
        DB      01h                                      ; START

Implementation Details

The crypto core implements the classic square-and-multiply algorithm for modular exponentiation, built from three layers:

  1. modm_adder — computes (x + y) mod m in 3 pipeline stages
  2. modm_multiplier — computes (x × y) mod m using repeated addition
  3. modm_exponentiation — computes y^x mod m using square-and-multiply

For 64-bit operands, a full exponentiation takes approximately 30,000 clock cycles — about 2.4ms at our 12.5 MHz system clock. Imperceptible to humans, but the CPU is free to do other work (or service other channel programs) during computation.

The design is parameterized: changing two generic constants scales it to 128-bit or 256-bit operands. The tradeoff is FPGA resources and computation time, both of which scale quadratically with bit width.

Modular Design

The channel controller now supports compile-time configuration:

entity gms_2870_multiplexor_channel is
    generic (
        ENABLE_CRYPTO : boolean := false
    );

When ENABLE_CRYPTO is false, all crypto-related logic is optimized away by the synthesizer. Our Makefile selects the appropriate top-level:

ifeq ($(WITH_CRYPTO),1)
    TOP_FILE = rtl/gms359_top_crypto.vhd
else
    TOP_FILE = rtl/gms359_top.vhd
endif

Building with or without crypto is now a simple make WITH_CRYPTO=1.

CPU Enhancements

Testing the crypto processor also drove expansion of the GMS 2050 instruction set. New instructions added today:

Opcode Mnemonic Description
0x19 CR Compare Register
0x1B SR Subtract Register
0x14 NR AND Register
0x16 OR OR Register
0x17 XR XOR Register
0x88 SHR Shift Right Logical
0x89 SHL Shift Left Logical
0x8A SAR Shift Right Arithmetic

The assembler now supports "smart" mnemonics that automatically select RR (register-register) or RX (register-memory) format based on operands. LD R4, R3 assembles to LR, while LD R4, [R5+100] assembles to L.

Resource Utilization

Adding the crypto processor and new CPU instructions had a noticeable impact:

Metric Before After
Wires ~100K 251K
Fmax ~33 MHz 21.75 MHz
Build time   1×

The maximum clock frequency dropped significantly, primarily due to the shift instructions (barrel shifter logic) and the 64-bit crypto datapath. However, we still have 73% timing margin over our 12.56 MHz target — plenty of headroom.

What's Next

The GMS 2950 opens up interesting possibilities:

  • Larger key sizes — scale to 256-bit for real-world crypto
  • RSA implementation — full encrypt/decrypt in software using the accelerator
  • Performance measurement — compare against pure software implementation
  • Integration with the Pico 2 — the planned GMS 2350 RISC-V accelerator could work alongside the crypto unit

Closing Thoughts

There's something deeply satisfying about watching a recreation of 1960s mainframe architecture perform modern cryptographic operations. The GateMate A1 FPGA, drawing a few hundred milliwatts from a USB port, now does what would have been unimaginable to the engineers who designed the original System/360.

The Channel I/O model proves its elegance once again: adding a complex coprocessor required no changes to the CPU. The channel handles all communication; the CPU just says "start I/O to device 2A" and waits for completion. This is the architecture that ran the world's banking systems, airline reservations, and scientific computing for decades — and it's still teaching us lessons about clean system design.

3⁵ mod 7 = 5

A small calculation. A big step for GMS/359.


The GMS/359 project recreates IBM System/360 architecture on modern FPGA hardware. Source code will be available at gitlab.com/gatemate/s359.

No comments: