Physics of Language Models: Part 4.1,
Architecture Design
and the Magic of Canon Layers