Sunny Sanyal



Looped-GPT — a language model trained with depth recurrence that enables iterative activation refinement via a reverse residual connection. During pre-training, Looped-GPT outperforms a standard GPT under comparable settings. In this Blog In this post, I introduce Looped-GPT, a simple modification to the standard GPT architecture that enables depth recurrence. The key idea is a reverse residual connection that feeds the representation from the final transformer block back into the input, allowing the model to iteratively refine its activations over multiple passes....

A 296-parameter GPT learns to add 10-digit numbers not by changing the architecture, but by changing the training recipe. Abstract Can a transformer with fewer than 300 parameters reliably solve 10-digit addition? The answer turns out to be yes if you train it right. This post describes AdditionGPT, a minimal causal transformer that treats digit-wise addition as a sequence classification task. The key insight is that the architecture need not change at all: a two-stage training recipe, curriculum-style pre-training on variable-length addition, followed by fine-tuning is sufficient for a sub-300-parameter GPT to achieve 99% test accuracy on 10-digit addition....