Lottery Ticket Hypothesis

# Lottery Ticket Hypothesis Parent: [[Neural Scaling Laws & the Compression-Quality Tradeoff]] Frankle and Carbin's 2018 finding: inside a randomly-initialised dense network, there exists a small sparse subnetwork that — if trained in isolation from the same initialisation — matches the accuracy of the full dense network. They called these subnetworks "winning tickets." The experimental procedure that demonstrates it is almost suspicious in its simplicity. Train the full network. Prune the weights with the smallest final magnitudes. Reset the surviving weights to their original random initialisation values. Train again. The pruned-and-reset network matches the original network's performance. Randomly re-initialising the surviving weights breaks the effect, which is how we know the specific initial values matter, not just the structure. The implications are philosophical as well as practical. Overparameterised neural networks are not doing "distributed computation across all their weights." They are, at some level, an architectural search over a distribution of possible sparse networks, and training finds the one that happened to get lucky in initialisation. The extra weights are scaffolding for the search process, not participants in the final function. Practically, this is a theoretical justification for pruning and compression: if winning tickets exist, compression that finds them is not destroying information, it is revealing the actual computation that was always there. It also hints at why training small models from scratch is often harder than compressing large ones — the small model has fewer lottery tickets to try. ## Related - [[Effective Rank]] - [[Overparameterisation and Implicit Regularisation]] - [[Pruning]] --- Tags: #ai #theory #compression #kp