Looped-GPT logo

Looped-GPT: Looping During Pre-training improves Generalization  [Pinned]

Looped-GPT — a language model trained with depth recurrence that enables iterative activation refinement via a reverse residual connection. During pre-training, Looped-GPT outperforms a standard GPT under comparable settings. In this Blog In this post, I introduce Looped-GPT, a simple modification to the standard GPT architecture that enables depth recurrence. The key idea is a reverse residual connection that feeds the representation from the final transformer block back into the input, allowing the model to iteratively refine its activations over multiple passes....

Teacher teaching addition to a child and tiny robot

Curriculum Pretraining Enables 10-Digit Addition for a 296-Parameter GPT with 99% Accuracy

A 296-parameter GPT learns to add 10-digit numbers not by changing the architecture, but by changing the training recipe. Abstract Can a transformer with fewer than 300 parameters reliably solve 10-digit addition? The answer turns out to be yes if you train it right. This post describes AdditionGPT, a minimal causal transformer that treats digit-wise addition as a sequence classification task. The key insight is that the architecture need not change at all: a two-stage training recipe, curriculum-style pre-training on variable-length addition, followed by fine-tuning is sufficient for a sub-300-parameter GPT to achieve 99% test accuracy on 10-digit addition....