Scheduling - Mind Palace

**Scheduling** in data center compute refers to the system that decides *which* workloads run on *which* resources at *what* time. The scheduler is the brain of a [[Clustering|cluster]] — it allocates CPUs, GPUs, memory, and network bandwidth to jobs while maximising utilisation, meeting SLAs, and handling failures. --- ### First Principle: Resources are finite; demand is not. The scheduler is the arbiter. In any shared compute environment, there are more workloads wanting resources than resources available. The scheduler's job is to make allocation decisions that balance throughput, fairness, priority, and efficiency — all in real time. --- ### Key Considerations - **Job Types**: Schedulers must handle diverse workload shapes — long-running training jobs (hours/days), bursty inference requests (milliseconds), batch processing, and interactive sessions. Each has different latency and resource requirements. - **Resource Granularity**: Modern schedulers allocate at multiple levels: whole nodes, individual GPUs, [[MIGs|MIG partitions]], CPU cores, and memory slices. Finer granularity improves utilisation but increases scheduling complexity. - **Preemption and Priority**: High-priority jobs (production inference) may preempt lower-priority jobs (experimental training). The scheduler must checkpoint and resume preempted work gracefully. - **Locality Awareness**: For GPU training jobs, the scheduler should place communicating processes on nodes with fast interconnects (same rack, same switch), minimising network latency. See [[Clustering]]. - **Common Schedulers**: **Kubernetes** (containers), **SLURM** (HPC/AI training), **Nomad** (general-purpose), and cloud-native schedulers (AWS Batch, GCP Vertex). --- ### Actionable Insights For [[Modular Data Center Design Principles|modular data center]] deployments, the choice of scheduler depends on the primary workload. SLURM is the standard for AI training and HPC where jobs need multi-node GPU allocation with topology awareness. Kubernetes is standard for inference serving and microservices. Many production environments run both — SLURM for training clusters and Kubernetes for inference — with a meta-layer that manages capacity between them. The scheduler is also the enforcement point for [[multi-tenancy|multi-tenancy]]: it ensures that one tenant's workloads cannot starve another's. --- ### Scheduling Decision Factors | Factor | Description | Impact | |--------|-------------|--------| | Resource fit | Does the job's request match available resources? | Basic feasibility | | Topology | Are requested GPUs on the same switch/rack? | Training performance | | Priority | What is the job's priority class? | Preemption decisions | | Fairness | Has this tenant used their fair share? | Multi-tenant balance | | Affinity | Does the job prefer specific hardware? | Placement quality | | Queue depth | How many jobs are waiting? | Throughput optimisation | [[Clustering]] | [[MIGs]] | [[Docker Containers]] | [[VMs]] | [[multi-tenancy]] | [[Noisy Neighbour Problem]]