Singular Value Decomposition (SVD)

# Singular Value Decomposition (SVD) Parent: [[Low-Rank Decomposition & Matrix Factorisation]] The canonical way to factor any matrix W into three pieces: W = UΣVᵀ, where U and V are orthogonal and Σ is diagonal with non-negative entries called singular values. The singular values are sorted in descending order and they tell you how much "energy" each direction carries. The magic of SVD for compression: if you keep only the top r singular values (and the corresponding columns of U and rows of Vᵀ), you get the best possible rank-r approximation of W in the least-squares sense. This is the Eckart–Young theorem, and it is the theoretical floor for any low-rank compression scheme. Every other decomposition — Tucker, CP, LoRA — is either a special case, a tensor generalisation, or a learned approximation of what SVD gives you exactly. In neural network compression, SVD shows up in two places. First, as a diagnostic: compute the singular values of each weight matrix, plot them, and see where they decay. If the top 10% carry 95% of the energy, the layer is a good compression candidate. Second, as a direct method: replace W with its rank-r approximation and fine-tune briefly to recover any lost accuracy. SVD is O(d³) and becomes expensive on very large matrices, which is why approximate and randomised SVD variants exist. For inference-time compression, you only pay the cost once. ## Related - [[Low-Rank Decomposition]] - [[Effective Rank]] - [[LoRA (Low-Rank Adaptation)]] --- Tags: #ai #linalg #compression #kp