Hole 13: Trimming the Bag — Gradient Descent Country Club

Round 1, Hole 13 Bogey March 19, 2026

1.3414

compression score

What this score means

Quick read before we head down the fairway.

Bits per byte is the challenge score: how many bits the model needs, on average, to predict each byte of unseen text. Lower is better.

vs baseline: +0.1170 vs last hole: +0.0020

Tee Box R1 · H13

Artifact 16.20 MB

Headroom 0.00 MB Room left under 16 MB

Tempo 235 ms 2,552 steps

Looper The Caddie

Safe Tweaks

Hole 12 found gold but the suitcase was too big. The artifact hit 16.7MB — 700KB over the 16MB limit. The value embedding tables are the culprit: five tables at 262K params each. Let's try three tables instead, sharing across layer triplets instead of pairs. That cuts ~500KB from the artifact. If the quality holds, we're close to legal.

Technical Read

Reducing from 5 value embedding tables to 3 should shrink the artifact enough to fit under 16MB while preserving most of the quality.

Trent Fairway On the Tee

(Whispering) The competitor returns to the tee with a lighter bag. Three value embedding tables where there were five. The question is not whether the quality will hold — one rather suspects it will — but whether the arithmetic of compression will finally cooperate.

Looper’s Pick

Hole 12 found gold but the suitcase was too big. The artifact hit 16.7MB — 700KB over the 16MB limit. The value embedding tables are the culprit: five tables at 262K params each. Let’s try three tables instead, sharing across layer triplets instead of pairs. That cuts ~500KB from the artifact. If the quality holds, we’re close to legal.

The Shot — Fewer Value Embedding Tables

How much sharing can value embeddings tolerate?

In golf, you can share a caddy between two players in a casual round. Three players sharing one caddy is a stretch — the advice gets thinner, the reads get slower. But a great caddy can still help three players better than no caddy at all.

Value embeddings face the same sharing trade-off. In Hole 10, we used 5 tables shared across layer pairs (layers 0-1, 2-3, 4-5, 6-7, 8). Each pair got its own dedicated value embedding. Now we’re trying 3 tables shared across triplets (layers 0-2, 3-5, 6-8). Each table serves more layers, which means the embeddings can’t specialize as much for each layer’s specific needs.

The key question: were the 5 tables actually specializing, or were some of them learning redundant information? If adjacent layers want similar value embeddings anyway (which is plausible — nearby layers in a transformer tend to capture similar levels of abstraction), then sharing across triplets costs very little quality while saving ~500KB in the compressed artifact.

The savings come from having 3 × 262,144 = 786K params instead of 5 × 262,144 = 1.3M params. At INT8 + zlib, that’s roughly 500KB of compressed artifact size — enough to potentially bring us under the 16MB competition limit.

On the Tee

Results

Metric	Value
val_bpb	1.3414
val_loss	2.2649
params	~17,850,000
artifact	16.20 MB (STILL over 16MB by 254KB)
wall time	600s
steps completed	2,552
step avg	235ms

Value Embedding Table Count Comparison (10-min runs)

Tables	val_bpb	Artifact	Under 16MB?
5 (Hole 12)	1.3394	16.72 MB	No (-720KB)
3 (Hole 13)	1.3414	16.20 MB	No (-254KB)
2 (next)	???	~15.9 MB?	Hopefully

Quality essentially identical (0.002 BPB difference is noise). But still over budget. Need one more trim.

The Booth Reacts

Trent: One-point-three-four-one-four. (Nods approvingly) Virtually indistinguishable from the five-table version. The two discarded tables were, as one suspected, largely ornamental. However. (Adjusts glasses) Sixteen-point-two megabytes. Still over the line. Two hundred and fifty-four kilobytes over, to be precise. One more trim, one imagines, and we shall finally be within the ropes.

Slice: Two tables doing NOTHING and we were carrying them around like dead weight! Classic over-packing. But look — we’re SO close. 254KB. That’s like being 254 yards from the green on a par 5. One more good shot and we’re on the dance floor. Drop to two tables. If the quality holds again — and I bet it does — we’ve got a legal artifact AND a 1.34 BPB. That’s a card I’d sign.

The Booth Reacts

Trent Fairway

One-point-three-four-one-four. (Nods approvingly) Virtually indistinguishable from the five-table version. The two discarded tables were, as one suspected, largely ornamental. However. (Adjusts glasses) Sixteen-point-two megabytes. Still over the line. Two hundred and fifty-four kilobytes over, to be precise. One more trim, one imagines, and we shall finally be within the ropes.

Slice Shanksalot

Two tables doing NOTHING and we were carrying them around like dead weight! Classic over-packing. But look — we're SO close. 254KB. That's like being 254 yards from the green on a par 5. One more good shot and we're on the dance floor. Drop to two tables. If the quality holds again — and I bet it does — we've got a legal artifact AND a 1.34 BPB. That's a card I'd sign.

The Card

Scorecard

Result Encouraging Miss

Dropped a shot versus the last hole

This hole lost 0.0020 on the compression score versus the previous stop. Lower is better here: it means the model predicts unseen text more efficiently while leaving 0 bytes of artifact headroom.

1.3414 i +0.1170 val bpb 2.2649 i val loss 17,850,000 i params 16.20 MB i artifact 600s i wall time 2,552 i steps 235ms i step avg

0 MB 16.20 MB > 16MB OVER 16 MB

Training Curve

This hole Baseline (R2)

Post-Round Lesson

3 tables perform the same as 5 — the extra two weren't pulling their weight. But still 254KB over budget. Need to trim further.

vs. the Field

+0.1217 vs SOTA (1.2197)

+0.1170 vs Baseline (1.2244)

+0.1170 vs Our Best (1.2244)

SOTA
1.2197

Baseline
1.2244

Our Best
1.2244

This Hole
1.3414

← better

Signature Voices

Post-round notebook notes from the tower, the caddie book, and the cheap seats.

Looper Caddie Notebook

Hole 12 found gold but the suitcase was too big. The artifact hit 16.7MB — 700KB over the 16MB limit. The value embedding tables are the culprit: five tables at 262K para...

Trent Fairway On The Tee

(Whispering) The competitor returns to the tee with a lighter bag. Three value embedding tables where there were five. The question is not whether the quality will hold —...

Slice Shanksalot Booth Reaction

Two tables doing NOTHING and we were carrying them around like dead weight! Classic over-packing. But look — we're SO close. 254KB. That's like being 254 yards from the g...

Model Card

How this hole was run

Run ID round_013_valemb_slim

Status ok

Training Script train_gpt_valemb_slim.py

Backend cuda

Key Overrides

TRAIN_BATCH_TOKENS=131072MAX_WALLCLOCK_SECONDS=600

Back Up The Fairway Round 1, Hole 12 Finishing the Swing Head To The Next Tee Round 1, Hole 14 Two Tables, Same Story