Sunny Sanyal Research Blog
Posts
Archives
Search
Site Archives
2026
2
February
1
Curriculum Pretraining Enables 10-Digit Addition for a 296-Parameter GPT with 99% Accuracy
February 27, 2026
GPT
Transformer
pre-training
curriculum learning
arithmetic
weight averaging
deep learning
1774 words
9 min
January
1
Looped-GPT: Looping During Pre-training improves Generalization
January 12, 2026
language models
Transformer
pre-training
looped transformers
compute efficiency
1414 words
7 min