Hole 16: Fitting Through the Door — Gradient Descent Country Club

Round 1, Hole 16 Bogey March 19, 2026

1.3286

compression score

What this score means

Quick read before we head down the fairway.

Bits per byte is the challenge score: how many bits the model needs, on average, to predict each byte of unseen text. Lower is better.

vs baseline: +0.1042 vs last hole: +0.0231

Tee Box R1 · H16

Artifact 12.62 MB

Headroom 3.38 MB Room left under 16 MB

Tempo 236 ms 2,541 steps

Looper The Caddie

Calculated Risk

Four holes in a row we've been over the 16MB limit. The value embeddings earn their keep but the artifact won't cooperate. The fix isn't fewer parameters — it's fewer bits per parameter. INT6 quantization: 63 levels instead of 255. Every weight gets rounded to a coarser grid. That sounds bad, but the magic is in the compression: zlib LOVES low-entropy data, and 63 unique values compress way better than 255. The leaderboard leaders are all using this trick. Time we did too.

Technical Read

INT6 quantization (63 levels instead of 255) should compress much better with zlib, finally getting the artifact under 16MB at the cost of some quality.

Trent Fairway On the Tee

(Whispering) And finally — finally — the competitor addresses the elephant that has been standing patiently on the fairway for four consecutive holes. The artifact size. INT6 quantization. Sixty-three levels where there were two hundred and fifty-five. The bag gets lighter. The question is how much skill goes with it.

Looper’s Pick

Four holes in a row we’ve been over the 16MB limit. The value embeddings earn their keep but the artifact won’t cooperate. The fix isn’t fewer parameters — it’s fewer bits per parameter. INT6 quantization: 63 levels instead of 255. Every weight gets rounded to a coarser grid. That sounds bad, but the magic is in the compression: zlib LOVES low-entropy data, and 63 unique values compress way better than 255. The leaderboard leaders are all using this trick. Time we did too.

The Shot — INT6 Quantization

Why does reducing precision from 8 bits to 6 bits help so much with compression?

Imagine you’re packing a suitcase. With 255 different items (INT8), every pocket is unique — the zipper can’t find patterns to exploit. But with only 63 items (INT6), there’s far more repetition: the same few values appear over and over. A good compressor like zlib exploits exactly this kind of repetition.

Standard INT8 quantization maps each weight to one of 255 levels (-127 to +127). After zlib compression, this gives roughly 4-5x compression from the raw tensor bytes. INT6 maps to only 63 levels (-31 to +31). The weights are stored as regular int8 bytes (there’s no native 6-bit type), but since only 63 of the 256 possible byte values are used, zlib’s dictionary coding can represent each value in fewer bits.

The result in practice: our artifact went from 16.71 MB (INT8, over the limit) to 12.68 MB (INT6, 3.3MB under the limit). That’s a 24% reduction in compressed size.

The cost: each weight has less precision. Instead of 255 possible values per scale factor, we have 63. This introduces more rounding error during the quantization step. Our BPB went from 1.3055 (INT8 + sliding window) to 1.3286 — a 0.023 BPB degradation.

But here’s the key strategic insight: we now have 3.3MB of headroom. That’s enough for ~3 million additional parameters at INT6 compression rates. The leaderboard leaders use INT6 specifically to unlock bigger models — like 3x MLP width — that more than compensate for the per-weight precision loss. We took a small step back in quality to take a large step forward in capacity.

On the Tee

Results

Metric	Value
val_bpb	1.3286
val_loss	2.2433
params	~18,380,000
artifact	12.68 MB (3.3MB under 16MB!)
wall time	600s
steps completed	~2,541

INT8 vs INT6

Quant	val_bpb	Artifact	Under 16MB?	Headroom
INT8 (Hole 15)	1.3055	16.71 MB	No	-710 KB
INT6 (Hole 16)	1.3286	12.68 MB	Yes	+3.32 MB

Lost 0.023 BPB but gained 3.3MB of headroom. This is the enabling technique for everything that follows.

The Booth Reacts

Trent: (Visible relief) Twelve-point-six-eight megabytes. Ladies and gentlemen, after four holes of anguished arithmetic, the artifact is finally — finally — beneath the sixteen-megabyte ceiling. And not by a whisker, mind you. By three-point-three megabytes. (Adjusts tie) Yes, the BPB has risen by twenty-three thousandths versus INT8. But one now has room. Room for wider layers, deeper architectures, additional parameters. This is not a retreat. This is building the runway for the final approach.

Slice: TWELVE POINT SIX EIGHT! We went from 16.7 — OVER the line, DQ’d, go home, thanks for playing — to 12.7 with room to SPARE! That’s not a compression trick, that’s a MAGIC trick! And yeah, we gave back 0.023 BPB. You know what 0.023 BPB buys you? NOTHING compared to what 3.3 megabytes of headroom buys you. We can put three MILLION more parameters in this thing now. The leaderboard leaders? They run 3x MLP width. You know why? Because INT6 gives them the ROOM. We’re finally playing the same game they’re playing. (Slams table) Now. Let’s USE that room.

The Booth Reacts

Trent Fairway

(Visible relief) Twelve-point-six-eight megabytes. Ladies and gentlemen, after four holes of anguished arithmetic, the artifact is finally — finally — beneath the sixteen-megabyte ceiling. And not by a whisker, mind you. By three-point-three megabytes. (Adjusts tie) Yes, the BPB has risen by twenty-three thousandths versus INT8. But one now has room. Room for wider layers, deeper architectures, additional parameters. This is not a retreat. This is building the runway for the final approach.

Slice Shanksalot

TWELVE POINT SIX EIGHT! We went from 16.7 — OVER the line, DQ'd, go home, thanks for playing — to 12.7 with room to SPARE! That's not a compression trick, that's a MAGIC trick! And yeah, we gave back 0.023 BPB. You know what 0.023 BPB buys you? NOTHING compared to what 3.3 megabytes of headroom buys you. We can put three MILLION more parameters in this thing now. The leaderboard leaders? They run 3x MLP width. You know why? Because INT6 gives them the ROOM. We're finally playing the same game they're playing. (Slams table) Now. Let's USE that room.

The Card

Scorecard

Result Free Lunch

Dropped a shot versus the last hole

This hole lost 0.0231 on the compression score versus the previous stop. Lower is better here: it means the model predicts unseen text more efficiently while leaving 3,375,392 bytes of artifact headroom.

1.3286 i +0.1042 val bpb 2.2433 i val loss 18,380,000 i params 12.62 MB i artifact 600s i wall time 2,541 i steps 236ms i step avg

0 MB 12.62 MB < 16MB limit 16 MB

Training Curve

This hole Baseline (R2)

Post-Round Lesson

The artifact problem is solved. 12.68MB with 3.3MB of headroom. INT6 cost 0.023 BPB but unlocked legality AND room for a bigger model. This is the enabling technique for everything that follows.

vs. the Field

+0.1089 vs SOTA (1.2197)

+0.1042 vs Baseline (1.2244)

+0.1042 vs Our Best (1.2244)

SOTA
1.2197

Baseline
1.2244

Our Best
1.2244

This Hole
1.3286

← better

Signature Voices

Post-round notebook notes from the tower, the caddie book, and the cheap seats.

Looper Caddie Notebook

Four holes in a row we've been over the 16MB limit. The value embeddings earn their keep but the artifact won't cooperate. The fix isn't fewer parameters — it's fewer bit...

Trent Fairway On The Tee

(Whispering) And finally — finally — the competitor addresses the elephant that has been standing patiently on the fairway for four consecutive holes. The artifact size....

Slice Shanksalot Booth Reaction

TWELVE POINT SIX EIGHT! We went from 16.7 — OVER the line, DQ'd, go home, thanks for playing — to 12.7 with room to SPARE! That's not a compression trick, that's a MAGIC...

Model Card

How this hole was run

Run ID round_016_int6

Status ok

Training Script train_gpt_valemb_sw_int6.py

Backend cuda

Key Overrides

TRAIN_BATCH_TOKENS=131072VAL_SW_STRIDE=256QUANT_BITS=6MAX_WALLCLOCK_SECONDS=600

Back Up The Fairway Round 1, Hole 15 Reading Every Green Head To The Next Tee Round 1, Hole 17 The Big Iron