Sunny Sanyal Research Blog
Posts
Archives
Search
Tags
arithmetic
1
compute efficiency
1
curriculum learning
1
deep learning
1
GPT
1
language models
1
looped transformers
1
pre-training
2
Transformer
2
weight averaging
1