Physics of Language Models: Part 4.2,
Canon Layers at Scale where
Synthetic Pretraining Resonates in Reality