Est. 2026 • Members Welcome

Gradient Descent Country Club

Members-only scorecards from the Parameter Golf circuit

← All Holes
Round 1, Hole 2 Triple Bogey March 18, 2026

1.4361

compression score

What this score means

Quick read before we head down the fairway.

Bits per byte is the challenge score: how many bits the model needs, on average, to predict each byte of unseen text. Lower is better.

vs baseline: +0.2117 vs last hole: —
Tee Box R1 · H2
Artifact 10.63 MB
Headroom 5.37 MB Room left under 16 MB
Tempo
Looper
Looper The Caddie
Safe Tweaks

Same as Hole 1, boss. Stock clubs, stock swing — but this time on our hardware. We need to know what the L40S plays like before we start swapping irons. This is us walking the course.

Technical Read

Before changing the model, we need to know whether our local hardware preserves the baseline learning dynamics at all.

Trent Fairway
Trent Fairway On the Tee

(Whispering) And now the moment of truth. Our competitor steps up to the tee for the very first time on unfamiliar ground. A single L40S. Ten minutes on the clock. The same clubs as the baseline, but rather... different conditions. One imagines this will be a learning experience.

Looper’s Pick

Same as Hole 1, boss. Stock clubs, stock swing — but this time on our hardware. We need to know what the L40S plays like before we start swapping irons. This is us walking the course.

The Shot — First Contact

Why run the same baseline on a different GPU?

In golf, the same course plays completely differently depending on the conditions. A hole that’s a comfortable par 4 in calm weather becomes a nightmare when the wind picks up. The strategy has to adapt to the conditions.

The same is true for our training runs. The official baseline was tuned for 8 NVIDIA H100 GPUs — the Ferraris of the AI compute world, running at about 43 milliseconds per training step. We’re running on a single L40S, a solid workstation GPU that clocks in at about 1,000 milliseconds per step. That’s roughly 23 times slower.

Why does this matter? Because the challenge has a 10-minute wall clock limit. On 8xH100, you get about 13,800 training steps in that window. On our L40S, we get about 600. The model sees 23 times less data, takes 23 times fewer gradient updates, and has 23 times less opportunity to converge.

This means some strategies that work on H100 — like aggressive learning rates that need thousands of steps to settle down — might not work for us during iteration. We need to understand this gap before we start making changes, so we’re not fooled into thinking a good idea is bad just because we tested it on the wrong hardware.

The BPB number from this run (1.4361) is meaningless as a competition score. What matters is that we now have a reference point on our hardware to compare future experiments against.

On the Tee

(Whispering) And now the moment of truth. Our competitor steps up to the tee for the very first time on unfamiliar ground. A single L40S. Ten minutes on the clock. The same clubs as the baseline, but rather… different conditions. One imagines this will be a learning experience.

Results

MetricValue
val_bpb1.4361
val_loss2.4249
params17,059,912
artifact10.63 MB (yes < 16MB)
wall time600s
steps completed599 / 20,000
step avg1,002ms

Training Curve (tail)

StepLossAvg Step
2002.74271002.5ms
4002.35471002.3ms
599— (val only)1002.1ms

The Booth Reacts

Trent: And there we have it. One-point-four-three-six-one on the L40S. Now, the astute viewer will note that this is rather distant from the baseline’s one-point-two-two on H100. But one must remember — this is not a reflection of the model’s character, merely its… circumstances. Six hundred steps versus nearly fourteen thousand. It’s rather like judging a golfer’s career from the first two holes of the opening round.

Slice: Look boss, 1.43 BPB is not a number you put on the fridge. My KID could get 1.43 and he’s SEVEN. But here’s the thing — and I hate to be the reasonable one — we’re running on a go-kart when the real race is Formula One. The baseline hit 2.74 train loss at step 200. We hit 2.74 at step 200. SAME TRAJECTORY. The physics is right, we just need more runway. Or a faster car.

Trent: Quite. The trajectory is indeed what matters here. Shall we proceed to something rather more… adventurous?

The Booth Reacts

Trent Fairway TF
Trent Fairway
And there we have it. One-point-four-three-six-one on the L40S. Now, the astute viewer will note that this is rather distant from the baseline's one-point-two-two on H100. But one must remember — this is not a reflection of the model's character, merely its... circumstances. Six hundred steps versus nearly fourteen thousand. It's rather like judging a golfer's career from the first two holes of the opening round.
Slice Shanksalot
Look boss, 1.43 BPB is not a number you put on the fridge. My KID could get 1.43 and he's SEVEN. But here's the thing — and I hate to be the reasonable one — we're running on a go-kart when the real race is Formula One. The baseline hit 2.74 train loss at step 200. We hit 2.74 at step 200. SAME TRAJECTORY. The physics is right, we just need more runway. Or a faster car.
Slice Shanksalot SS
Trent Fairway TF
Trent Fairway
Quite. The trajectory is indeed what matters here. Shall we proceed to something rather more... adventurous?

The Card

Scorecard
Result Baseline

Baseline still has the honor

This score sits +0.2117 versus the official baseline. Lower is better because the model is spending fewer bits to model the same text, with 5,365,169 bytes left in the bag.

1.4361 i +0.2117 val bpb Bits per byte — the headline score How many bits the model needs, on average, to predict each byte of unseen text. This is the challenge's primary metric. It's tokenizer-agnostic, so models with different vocabularies can be compared fairly. Lower is better. The baseline scores 1.2244. 2.4249 i val loss Validation cross-entropy loss The model's prediction error on held-out text, measured in nats (natural log units). This is the raw loss before converting to bits-per-byte. Related to BPB by: BPB = (val_loss / ln(2)) × (tokens / bytes). Lower is better. 17,059,912 i params Total trainable parameters The number of individual weight values in the model. More parameters generally means more capacity to learn, but also a larger artifact. The 16MB limit constrains how many parameters you can afford — at INT8, roughly 16 million; at ternary (1.58 bits), roughly 80 million. 10.63 MB i artifact Compressed model + code size The total submission size: your training script's code bytes plus the model weights compressed via INT8 quantization and zlib. Must be under 16,000,000 bytes (decimal 16MB). The model is decompressed and dequantized before evaluation. 600s i wall time Training wall clock time Real-world elapsed time for the training run. The challenge caps training at 10 minutes on 8×H100 GPUs. Our L40S iteration runs use shorter time limits since we're just getting directional signal, not final scores.
0 MB 10.63 MB < 16MB limit 16 MB
Post-Round Lesson

The L40S is directionally faithful but brutally step-starved. From here on out, throughput is the story.

vs. the Field

+0.2164 vs SOTA (1.2197)
+0.2117 vs Baseline (1.2244)
+0.2117 vs Our Best (1.2244)
SOTA
1.2197
Baseline
1.2244
Our Best
1.2244
This Hole
1.4361
← better

Signature Voices

Post-round notebook notes from the tower, the caddie book, and the cheap seats.

Looper
Looper Caddie Notebook
Same as Hole 1, boss. Stock clubs, stock swing — but this time on our hardware. We need to know what the L40S plays like before we start swapping irons. This is us walkin...
Trent Fairway
Trent Fairway On The Tee
(Whispering) And now the moment of truth. Our competitor steps up to the tee for the very first time on unfamiliar ground. A single L40S. Ten minutes on the clock. The sa...
Slice Shanksalot
Slice Shanksalot Booth Reaction
Look boss, 1.43 BPB is not a number you put on the fridge. My KID could get 1.43 and he's SEVEN. But here's the thing — and I hate to be the reasonable one — we're runnin...

Model Card

How this hole was run

Run ID l40s_baseline_10min
Status ok
Backend cuda
Back Up The Fairway Round 1, Hole 1 The Practice Round Head To The Next Tee Round 1, Hole 3 Grip It and Rip It