Physics of Language Models: Part 1
Learning Hierarchical Language Structures