# Low-Rank Decomposition & Matrix Factorisation
Parent: [[Model Compression & Edge AI MOC]]
The central observation: most large weight matrices in neural networks are not actually high-rank. You can approximate a d×d matrix as the product of two d×r matrices where r is much smaller than d, and lose almost nothing meaningful. This is why LoRA works, why SVD-based compression works, and why ratios of 50x+ can cost only 1–2 percentage points of accuracy.
If the rank were really full, compression would be impossible. The question is always: how much rank does this layer actually need, and where does effective rank differ from nominal rank?
## Key Concepts
- [[Singular Value Decomposition (SVD)]] — the mathematical backbone
- [[LoRA (Low-Rank Adaptation)]] — freezing the base model and training low-rank deltas
- [[Tucker Decomposition]] and [[CP Decomposition]] — tensor generalisations
- [[Effective Rank]] vs. nominal rank
- [[Rank-Constrained Training]] — building low-rank structure in during training, not post-hoc
- [[Weight-Sharing Schemes]] — a close cousin of decomposition
## Key Questions
- Where in the network does rank actually collapse? (Attention projections? FFN layers?)
- What is the rank-vs-accuracy curve for this architecture?
- Is the decomposition applied at init, during training, or post-hoc?
- How does low-rank structure interact with quantisation? (They often compose well.)
- Can the same decomposition be reused across tasks, or is it task-specific?
- Is the decomposition realised on-chip as fewer FLOPs, or only as fewer parameters stored?
## Reading
- Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models" (2021)
- Sainath et al., "Low-Rank Matrix Factorisation for Deep Neural Network Training" (2013) — early DNN application
- FWSVD / ASVD family of papers for modern SVD-based pruning
- Dettmers et al., "QLoRA" (2023) — composition with quantisation
---
Tags: #ai #compression #linalg #kp