Physics of Language Models: Part 2.1,
Grade-School Math and the Hidden Reasoning Process
arXiv paper: https://arxiv.org/abs/2407.20311 (last update: July 2024)
Authors: Tian Ye, Zicheng Xu, Yuanzhi Li and Zeyuan Allen-Zhu
Code Release: We believe in the importance of sharing our codebase for the iGSM data generation pipeline. However, we kindly ask for your patience as we take the necessary time for code refactoring and legal review. As a small team with multiple priorities, impacted by layoff, we need to manage our time carefully. In the meantime, our paper includes the complete pseudocode for the data generation process. Thank you for your understanding.
Zicheng Xu is on job market: Due to an unexpected layoff, Zicheng Xu is now on the job market. He has my strongest endorsement and I picked him from a pool of ~30 internal candidates (based on his coding skills, dedication, enthusiasm, etc.). One of our biggest ideas from Part 2.2 also comes directly from Zicheng. His performance is meet or above expectation so this layoff is very sudden and unexpected (he's not the only one impacted). If interested in this project or hiring him, contact him at zichengBxuB42@gmail.com (remove the capital 'B').
A 20-min video was presented in ICML 2024 tutorial.
A longer video is under plan (targeting early September)
Twitter link for discussions: https://x.com/ZeyuanAllenZhu/status/1818493152711569551
Slide show (best viewed on a computer)
@article{YXLA2024-gsm1,
author = {Ye, Tian and Xu, Zicheng and Li, Yuanzhi and {Allen-Zhu}, Zeyuan},
title = {{Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process}},
journal = {ArXiv e-prints},
year = 2024,
month = jul,
volume = {abs/2407.20311},
note = {Full version available at \url{http://arxiv.org/abs/2407.20311}}
}