Physics of Language Models: Part 3.3,
Knowledge Capacity Scaling Laws