# Reinforcement Learning for Process Control
RL trains agents to make sequential decisions by trial and error in a simulated environment. In industrial settings, the agent learns optimal operating parameters by running thousands of simulated scenarios without risking actual equipment.
Two distinct problem types:
- **Continuous optimization** (process manufacturing): finding optimal temperature, pressure, flow rate setpoints. Derivative-free methods and RL both work here. The problem is smooth and well-behaved.
- **Combinatorial optimization** (scheduling, assignment): allocating gates to flights, scheduling maintenance windows. NP-hard problems where the solution space is discrete and explodes exponentially. Requires fundamentally different algorithms.
A company claiming RL-based optimization for both continuous process parameters and discrete scheduling problems is actually running two different systems. Neither is wrong, but calling both "RL optimization" obscures the reality.
The gap between academic RL and production RL is enormous. Lab demos work on clean simulations with known reward functions. Production systems deal with noisy sensors, partial observability, hard safety constraints, and operators who override recommendations. [[Human-in-the-Loop Systems]] create a paradox: operator overrides prevent the system from proving its own value, which limits pricing power.
Related: [[Simulation-Based Optimization]], [[Surrogate Models]], [[Human-in-the-Loop Systems]], [[Industrial AI MOC]]
---
Tags: #deeptech #firstprinciple