No pick yet, boss. This is the practice hole. We're just here to see what the course plays like — stock clubs, stock swing, stock everything. You don't change your grip o...
1.2244
compression score
What this score means
Quick read before we head down the fairway.
Bits per byte is the challenge score: how many bits the model needs, on average, to predict each byte of unseen text. Lower is better.
No pick yet, boss. This is the practice hole. We're just here to see what the course plays like — stock clubs, stock swing, stock everything. You don't change your grip on the first tee.
Start with the untouched official baseline so every future gain has a trustworthy reference point.
(Whispering) And here we are at the first tee of what promises to be a remarkable tournament. The conditions are standard — eight H100s, ten minutes on the clock, sixteen megabytes in the bag. The model approaches... steps up... and we're away.
Looper’s Pick
No pick yet, boss. This is the practice hole. We’re just here to see what the course plays like — stock clubs, stock swing, stock everything. You don’t change your grip on the first tee.
The Shot — The Practice Round
What is a baseline run, and why do we start here?
In golf, you walk the course before the tournament starts. You note where the bunkers are, how the greens break, what the wind does at the turn. You don’t try anything clever. You just play your natural game and see where you land.
That’s what a baseline run is in machine learning. OpenAI provides a reference training script — a 9-layer transformer with 17 million parameters, trained for 10 minutes on 8 NVIDIA H100 GPUs. It uses a small vocabulary (1,024 tokens), the Muon optimizer for weight matrices, and a technique called grouped query attention that shares some of the attention computation across heads to save parameters.
The model gets compressed after training: every floating-point weight gets rounded to an 8-bit integer (a process called INT8 quantization), then the whole thing gets zlib-compressed. The final artifact — code plus compressed model — must fit in 16 megabytes. This baseline squeezes in at 15.8MB.
The score is measured in bits per byte (BPB) — how many bits the model needs, on average, to predict each byte of unseen text. Lower is better. The baseline lands at 1.2244 BPB. That’s our par for the course. Everything we do from here is measured against this number.
The key insight from this practice hole: the model trains for 13,780 steps before hitting the 10-minute wall clock limit, averaging 43.5ms per step. That step budget — and how efficiently we use it — is the fundamental constraint of this challenge.
On the Tee
(Whispering) And here we are at the first tee of what promises to be a remarkable tournament. The conditions are standard — eight H100s, ten minutes on the clock, sixteen megabytes in the bag. The model approaches… steps up… and we’re away.
Results
| Metric | Value |
|---|---|
| val_bpb | 1.2244 |
| val_loss | 2.0727 |
| params | 17,059,912 |
| artifact | 15.82 MB (yes < 16MB) |
| wall time | 600s |
Training Curve (tail)
| Step | Loss | Avg Step |
|---|---|---|
| 13600 | 2.0234 | 43.54ms |
| 13650 | 2.0316 | 43.54ms |
| 13700 | 2.0323 | 43.55ms |
| 13750 | 1.9910 | 43.54ms |
| 13780 | — (stopped) | 43.54ms |
The Booth Reacts
Trent: And there it is. One-point-two-two-four-four. A perfectly respectable opening hole. Nothing flashy, nothing reckless — just solid, workmanlike golf from the baseline configuration. Fifteen-point-eight megabytes. Comfortably under the limit with a touch of room to spare. One senses there is more to be found here.
Slice: BORING! I mean, is it a fine hole of golf? Sure. Is it the kind of golf that gets you on SportsCenter? Absolutely not. Thirteen thousand steps and this is what we’ve got? When I was qualifying in ‘04, I could tell you by step five thousand whether a training run had the juice or not. This one? (makes so-so hand gesture) It’s a rental car. Gets you there. Doesn’t make you feel anything. Let’s get Looper out here with a real club.
The Booth Reacts
The Card
Under par versus baseline
This score sits 0.0000 versus the official baseline. Lower is better because the model is spending fewer bits to model the same text, with 184,153 bytes left in the bag.
Training Curve
The real bottleneck is step budget under a hard wallclock. Any idea that buys more useful steps is immediately interesting.
vs. the Field
1.2197
1.2244
1.2244
Signature Voices
Post-round notebook notes from the tower, the caddie book, and the cheap seats.
(Whispering) And here we are at the first tee of what promises to be a remarkable tournament. The conditions are standard — eight H100s, ten minutes on the clock, sixteen...
BORING! I mean, is it a fine hole of golf? Sure. Is it the kind of golf that gets you on SportsCenter? Absolutely not. Thirteen thousand steps and this is what we've got?...
Model Card
How this hole was run
baseline_8xH100 ok cuda