Est. 2026 • Members Welcome

Gradient Descent Country Club

Members-only scorecards from the Parameter Golf circuit

← All Holes
Round 1, Hole 1 Par March 18, 2026

1.2244

compression score

What this score means

Quick read before we head down the fairway.

Bits per byte is the challenge score: how many bits the model needs, on average, to predict each byte of unseen text. Lower is better.

vs baseline: 0.0000 vs last hole: —
Tee Box R1 · H1
Artifact 15.82 MB
Headroom 0.18 MB Room left under 16 MB
Tempo 44 ms 13,780 steps
Looper
Looper The Caddie
Safe Tweaks

No pick yet, boss. This is the practice hole. We're just here to see what the course plays like — stock clubs, stock swing, stock everything. You don't change your grip on the first tee.

Technical Read

Start with the untouched official baseline so every future gain has a trustworthy reference point.

Trent Fairway
Trent Fairway On the Tee

(Whispering) And here we are at the first tee of what promises to be a remarkable tournament. The conditions are standard — eight H100s, ten minutes on the clock, sixteen megabytes in the bag. The model approaches... steps up... and we're away.

Looper’s Pick

No pick yet, boss. This is the practice hole. We’re just here to see what the course plays like — stock clubs, stock swing, stock everything. You don’t change your grip on the first tee.

The Shot — The Practice Round

What is a baseline run, and why do we start here?

In golf, you walk the course before the tournament starts. You note where the bunkers are, how the greens break, what the wind does at the turn. You don’t try anything clever. You just play your natural game and see where you land.

That’s what a baseline run is in machine learning. OpenAI provides a reference training script — a 9-layer transformer with 17 million parameters, trained for 10 minutes on 8 NVIDIA H100 GPUs. It uses a small vocabulary (1,024 tokens), the Muon optimizer for weight matrices, and a technique called grouped query attention that shares some of the attention computation across heads to save parameters.

The model gets compressed after training: every floating-point weight gets rounded to an 8-bit integer (a process called INT8 quantization), then the whole thing gets zlib-compressed. The final artifact — code plus compressed model — must fit in 16 megabytes. This baseline squeezes in at 15.8MB.

The score is measured in bits per byte (BPB) — how many bits the model needs, on average, to predict each byte of unseen text. Lower is better. The baseline lands at 1.2244 BPB. That’s our par for the course. Everything we do from here is measured against this number.

The key insight from this practice hole: the model trains for 13,780 steps before hitting the 10-minute wall clock limit, averaging 43.5ms per step. That step budget — and how efficiently we use it — is the fundamental constraint of this challenge.

On the Tee

(Whispering) And here we are at the first tee of what promises to be a remarkable tournament. The conditions are standard — eight H100s, ten minutes on the clock, sixteen megabytes in the bag. The model approaches… steps up… and we’re away.

Results

MetricValue
val_bpb1.2244
val_loss2.0727
params17,059,912
artifact15.82 MB (yes < 16MB)
wall time600s

Training Curve (tail)

StepLossAvg Step
136002.023443.54ms
136502.031643.54ms
137002.032343.55ms
137501.991043.54ms
13780— (stopped)43.54ms

The Booth Reacts

Trent: And there it is. One-point-two-two-four-four. A perfectly respectable opening hole. Nothing flashy, nothing reckless — just solid, workmanlike golf from the baseline configuration. Fifteen-point-eight megabytes. Comfortably under the limit with a touch of room to spare. One senses there is more to be found here.

Slice: BORING! I mean, is it a fine hole of golf? Sure. Is it the kind of golf that gets you on SportsCenter? Absolutely not. Thirteen thousand steps and this is what we’ve got? When I was qualifying in ‘04, I could tell you by step five thousand whether a training run had the juice or not. This one? (makes so-so hand gesture) It’s a rental car. Gets you there. Doesn’t make you feel anything. Let’s get Looper out here with a real club.

The Booth Reacts

Trent Fairway TF
Trent Fairway
And there it is. One-point-two-two-four-four. A perfectly respectable opening hole. Nothing flashy, nothing reckless — just solid, workmanlike golf from the baseline configuration. Fifteen-point-eight megabytes. Comfortably under the limit with a touch of room to spare. One senses there is more to be found here.
Slice Shanksalot
BORING! I mean, is it a fine hole of golf? Sure. Is it the kind of golf that gets you on SportsCenter? Absolutely not. Thirteen thousand steps and this is what we've got? When I was qualifying in '04, I could tell you by step five thousand whether a training run had the juice or not. This one? (makes so-so hand gesture) It's a rental car. Gets you there. Doesn't make you feel anything. Let's get Looper out here with a real club.
Slice Shanksalot SS

The Card

Scorecard
Result Baseline

Under par versus baseline

This score sits 0.0000 versus the official baseline. Lower is better because the model is spending fewer bits to model the same text, with 184,153 bytes left in the bag.

1.2244 i 0.0000 val bpb Bits per byte — the headline score How many bits the model needs, on average, to predict each byte of unseen text. This is the challenge's primary metric. It's tokenizer-agnostic, so models with different vocabularies can be compared fairly. Lower is better. The baseline scores 1.2244. 2.0727 i val loss Validation cross-entropy loss The model's prediction error on held-out text, measured in nats (natural log units). This is the raw loss before converting to bits-per-byte. Related to BPB by: BPB = (val_loss / ln(2)) × (tokens / bytes). Lower is better. 17,059,912 i params Total trainable parameters The number of individual weight values in the model. More parameters generally means more capacity to learn, but also a larger artifact. The 16MB limit constrains how many parameters you can afford — at INT8, roughly 16 million; at ternary (1.58 bits), roughly 80 million. 15.82 MB i artifact Compressed model + code size The total submission size: your training script's code bytes plus the model weights compressed via INT8 quantization and zlib. Must be under 16,000,000 bytes (decimal 16MB). The model is decompressed and dequantized before evaluation. 600s i wall time Training wall clock time Real-world elapsed time for the training run. The challenge caps training at 10 minutes on 8×H100 GPUs. Our L40S iteration runs use shorter time limits since we're just getting directional signal, not final scores. 13,780 i steps Training steps completed Each step processes one batch of tokens, computes the loss, and updates the model weights. More steps generally means a better-trained model. The number of steps you get depends on your batch size, GPU speed, and wall clock limit. 44ms i step avg Average time per training step How long each gradient update takes in milliseconds. Faster steps mean more training in the same wall clock budget. Affected by batch size, model size, and GPU capability. The 8×H100 baseline averages 43.5ms; our L40S averages ~230-1000ms depending on batch size.
0 MB 15.82 MB < 16MB limit 16 MB

Training Curve

2.0 3.0 4.0 5.0 6.0 7.0 5000 10000 2.0200 train_loss step
This hole Baseline (R2)
Post-Round Lesson

The real bottleneck is step budget under a hard wallclock. Any idea that buys more useful steps is immediately interesting.

vs. the Field

+0.0047 vs SOTA (1.2197)
0.0000 vs Baseline (1.2244)
SOTA
1.2197
Baseline
1.2244
This Hole
1.2244
← better

Signature Voices

Post-round notebook notes from the tower, the caddie book, and the cheap seats.

Looper
Looper Caddie Notebook
No pick yet, boss. This is the practice hole. We're just here to see what the course plays like — stock clubs, stock swing, stock everything. You don't change your grip o...
Trent Fairway
Trent Fairway On The Tee
(Whispering) And here we are at the first tee of what promises to be a remarkable tournament. The conditions are standard — eight H100s, ten minutes on the clock, sixteen...
Slice Shanksalot
Slice Shanksalot Booth Reaction
BORING! I mean, is it a fine hole of golf? Sure. Is it the kind of golf that gets you on SportsCenter? Absolutely not. Thirteen thousand steps and this is what we've got?...

Model Card

How this hole was run

Run ID baseline_8xH100
Status ok
Backend cuda
Head To The Next Tee Round 1, Hole 2 First Contact