# Effective Rank
Parent: [[Low-Rank Decomposition & Matrix Factorisation]]
Nominal rank is the maximum rank a matrix could have — for a d×d matrix, it is d. Effective rank is the rank it actually has for practical purposes — the number of singular values that carry meaningful energy before the rest decay into noise.
The distinction matters because compression is only possible when effective rank is much smaller than nominal rank. A matrix with 1000 rows and 1000 columns has nominal rank 1000, but if 950 of its singular values are close to zero, its effective rank is 50 — and you can compress it by 10x without losing anything meaningful.
Several operational definitions exist. The most common is the ratio of the sum of singular values to the maximum singular value, or the number of singular values needed to capture 99% of the total energy. Which definition you use depends on the downstream task: for a dense retrieval embedding you care about preserving angles, for a classifier you care about preserving decision boundaries.
In practice, effective rank in transformer weight matrices is dramatically lower than nominal rank — often 10-30% of the maximum. This is the empirical fact that makes LoRA, SVD compression, and most low-rank methods work. It is also the fact that the field is still trying to explain theoretically; it seems related to how gradient descent on overparameterised models is implicitly biased toward low-rank solutions.
## Related
- [[Singular Value Decomposition (SVD)]]
- [[Low-Rank Decomposition]]
- [[Overparameterisation and Implicit Regularisation]]
---
Tags: #ai #linalg #theory #kp