Mixture of Experts & Adapter Architectures

# Mixture of Experts & Adapter Architectures Parent: [[Model Compression & Edge AI MOC]] The shift from "one giant generalist model" to "one base model plus hundreds of small specialists" is the defining architectural move of the last few years. It solves multi-tenant serving, it enables parameter-efficient fine-tuning, and it changes the unit economics of deployment. A single GPU can now serve many tenants with bounded memory overhead — if the architecture is designed for it. The interesting questions are always about routing (who decides which expert?), switching cost (how fast can you swap adapters?), and composition (can multiple specialists stack without interfering?). ## Key Concepts - [[Mixture of Experts (MoE)]] — sparse routing through specialised sub-networks - [[Top-k Gating]] and [[Expert Load Balancing]] - [[LoRA Adapters]] as the dominant specialist format - [[Adapter Switching and Warm-Swap]] — what it actually costs to change specialists at inference - [[Parameter-Efficient Fine-Tuning (PEFT)]] — the umbrella family - [[Prefix Tuning]], [[Prompt Tuning]], [[IA3]] — alternatives to LoRA - [[Multi-Tenant Model Serving]] — the deployment pattern this architecture enables ## Key Questions - How many experts or adapters are active per token, and what is the routing cost? - What is the switching latency when a new specialist is requested? (Claimed vs. measured.) - Are adapters hot-swapped into the same weights, or stored as separate tensors? - Does adapter composition work additively, or do gains interfere? - For multi-tenant serving: what is the memory overhead per tenant at steady state? - What happens to quality when many adapters are stacked vs. one at a time? ## Reading - Shazeer et al., "Outrageously Large Neural Networks" (2017) — the seminal MoE paper - Hu et al., "LoRA" (2021) - Houlsby et al., "Parameter-Efficient Transfer Learning for NLP" (2019) — adapter origins - Liu et al., "Few-Shot Parameter-Efficient Fine-Tuning" (IA3, 2022) - Any recent Mixtral / DeepSeek-MoE technical report for current practice --- Tags: #ai #moe #peft #kp