Low-Rank Decomposition & Matrix Factorisation

# Low-Rank Decomposition & Matrix Factorisation Parent: [[Model Compression & Edge AI MOC]] The central observation: most large weight matrices in neural networks are not actually high-rank. You can approximate a d×d matrix as the product of two d×r matrices where r is much smaller than d, and lose almost nothing meaningful. This is why LoRA works, why SVD-based compression works, and why ratios of 50x+ can cost only 1–2 percentage points of accuracy. If the rank were really full, compression would be impossible. The question is always: how much rank does this layer actually need, and where does effective rank differ from nominal rank? ## Key Concepts - [[Singular Value Decomposition (SVD)]] — the mathematical backbone - [[LoRA (Low-Rank Adaptation)]] — freezing the base model and training low-rank deltas - [[Tucker Decomposition]] and [[CP Decomposition]] — tensor generalisations - [[Effective Rank]] vs. nominal rank - [[Rank-Constrained Training]] — building low-rank structure in during training, not post-hoc - [[Weight-Sharing Schemes]] — a close cousin of decomposition ## Key Questions - Where in the network does rank actually collapse? (Attention projections? FFN layers?) - What is the rank-vs-accuracy curve for this architecture? - Is the decomposition applied at init, during training, or post-hoc? - How does low-rank structure interact with quantisation? (They often compose well.) - Can the same decomposition be reused across tasks, or is it task-specific? - Is the decomposition realised on-chip as fewer FLOPs, or only as fewer parameters stored? ## Reading - Hu et al., "LoRA: Low-Rank Adaptation of Large Language Models" (2021) - Sainath et al., "Low-Rank Matrix Factorisation for Deep Neural Network Training" (2013) — early DNN application - FWSVD / ASVD family of papers for modern SVD-based pruning - Dettmers et al., "QLoRA" (2023) — composition with quantisation --- Tags: #ai #compression #linalg #kp