Safe Explicit Model Predictive Control via Constrained Neural Network Training
Fast MPC policies learned offline with safety constraints using a primal-dual (augmented Lagrangian) training loop.
Model Predictive Control (MPC) is great when you can afford to solve an optimization problem online. When you can’t, a common idea is to approximate the MPC policy with a neural network for fast inference. The catch is that approximation error is not “just” suboptimality. It can break feasibility and safety.
This project asks a more pointed question: can we learn an explicit MPC policy that is fast at runtime while enforcing closed-loop safety during training, rather than hoping it generalizes.
Approach
I trained a neural network policy $\pi_\theta(x)$ using two kinds of samples:
- Supervised points: state–action pairs $(x^{(i)}, u^{(i)})$ generated by running implicit MPC offline along closed-loop trajectories.
- Safety points: additional states $\tilde{x}^{(j)}$ sampled in the safe set (especially near the boundary), used only to enforce safety constraints.
The key constraint is a one-step closed-loop safety condition:
\[V(f(\tilde{x}, \pi_\theta(\tilde{x}))) \le 0,\]which can be evaluated without solving the MPC optimization problem. That means we can scale the number of safety checks cheaply and use them to actively shape the learned policy.
To solve the resulting constrained learning problem, I implemented a first-order primal–dual training loop based on the augmented Lagrangian, designed to work cleanly with modern deep learning tooling (PyTorch + Adam).
In a companion paper, I analyzed this “primal-dual Adam” approach and proved local convergence under explicit step-size conditions, which helped clarify how the hyperparameters interact with constraint geometry.
Results
In simulation across three systems (including nonlinear dynamics and nonconvex safe sets), the constrained training approach achieved 100% safety on both domain and boundary test sets, while a nominal imitation-learned policy violated constraints. Runtime is where the method really pays off: evaluating the learned policy took about 0.035 ms worst case, versus up to 479 ms for solving MPC online in the nonlinear example, which is a 3–4 order-of-magnitude reduction with essentially constant latency.
References
2024
- Safe Explicit MPC by Training Neural Networks Through Constrained OptimizationIn UKACC 14th International Conference on Control (CONTROL), 2024
- Analysis and Local Convergence Proof of a Constrained Optimization Algorithm for Training Neural NetworksIn 2024 IEEE Conference on Control Technology and Applications (CCTA), 2024