> Modify one thing at a time
> Change only one variable per ablation while keeping everything else constant. If you change multiple things and performance improves, you won’t know what caused it. Test modifications individually, then combine successful ones and reassess.
This is an unintentional microcosm of what is flawed with the document.
One of the reasons people build one though is to learn. Most smart folks are quite aware that the reality of pre-training a real LLM is going to involve some head banging against the wall (ie, things don't go smoothly like "building an llm from scratch" book), and they want to go through the process.
Tumbler speak has a bunch of whacky things, notably "chimkin nuggers."
And even then. If you’re an IC and your boss is saying, “incrementalism at the level of planning experiments,” and the goal is research, quit, because you will fail.
Or, more modern Bayesian methods if you're more interested in getting the best results for a given hyperparameter sweep.
However, that is not to detract from the excellent effort made here and the great science being investigated. Write ups like this offer so much gold to the community.