On less being more & tiny recursive models

Just yesterday, I was thinking about the Transformer paper - ["Attention is All You Need"](https://doi.org/10.48550/arXiv.1706.03762)and the massive impact it had. That single 2017 paper rewrote the rules of AI and enabled everything from GPT to Claude to Gemini. It got me wondering: maybe someone's publishing a paper right now that will fundamentally alter the tech landscape 5-10 years from now. Which makes analyzing new companies and investing in AI harder than we think. We might be watching that paper emerge. Samsung SAIL just published something that could shake things up. A while back, I wrote about [sparsity in AI](https://karanmjpinto.medium.com/how-less-is-becoming-the-new-more-in-ai-sparsity-driving-the-evolution-of-llms-compute-a92531e8081a) - this idea that less is becoming more, that strategically using fewer [[Parameters]] could unlock more capability than just throwing compute at the problem. Felt like the industry was slowly waking up to the idea that bigger isn't always better. (see: [[Sparsity x LLMs]]) Now SAIL's Alexia Jolicoeur-Martineau has published ["Less is More: Recursive Reasoning with Tiny Networks,"](https://doi.org/10.48550/arXiv.2510.04871) it's taking that same intuition further than I expected. The sparsity insight was about which parameters you use. > Tiny Recursive Models (TRM) ask something different: what if it's about how many times you use them? Instead of a billion-parameter model that thinks once and hopes it got the answer right, you get a 7-million parameter network that thinks recursively, correcting itself iteratively like you would working through a tough problem. The thesis is pretty disruptive: for difficult logic problems, how an AI thinks matters more than its size. Large models generate answers token by token in one pass. Make an error early, everything falls apart. > We've only begun to tap into the potential of [[Test Time Compute]]. Tasks like planning, scheduling, and reasoning demand far greater compute at inference. We're now entering a shift - from static Deep Learning to dynamic Reinforcement Learning - where TTC becomes essential. TRM keeps a running guess and an internal scratchpad, updating both in loops. Turns out smaller networks actually work better here because they're forced to learn the underlying logic instead of memorizing patterns. To push the boundaries of intelligence, we must enable pattern recognition across domains with shifting rules - where objectives are non-stationary and constantly evolving (see: [[Non-Stationarity of Objectives]]). The results are hard to ignore: a 7-million parameter TRM beats Gemini 2.5 Pro on ARC-AGI reasoning benchmarks. That's 0.01% of the parameters. Sudoku-Extreme accuracy jumped from 55% to 87%. This gets back to something I was exploring in that sparsity piece, we've been treating model capability and model size as the same thing when they might not be. The architectural insight, how you structure the reasoning, these could matter more than raw parameter count. Which is what makes analyzing the new tech companies emerging right now so difficult. If reasoning architecture matters more than scale, if iteration beats size - what do we make of all those billions going into massive data centers? (Ref: [[Puzzle of low data center utilisation]]) The moat everyone thought was compute and scale could disappear if the right architectural insight gets you better results with a fraction of the resources. I've found that the rules don't change gradually. They change in jolts. The [[Transformers]] was one. Sparsity might be another. TRM could be next. Someone somewhere is probably already working on what comes after. Which raises an uncomfortable question: when do you pull the plug on autonomous agents? (After watching the latest Mission Impossible, this feels closer than ever). Maybe it's when they start speaking to each other in a language you can't understand. Or when they begin [[recursive self-improvement]] - learning things you can’t trace, heading toward outcomes you can’t control. Maybe it’s when they gain direct access to weapons, or when exfiltration and reproducibility become permissionless. That's the terrifying and fascinating part. ![[Screenshot 2025-10-12 at 14.50.22.png]]