Safe Explicit Model Predictive Control via Constrained Neural Network Training

Model Predictive Control (MPC) is great when you can afford to solve an optimization problem online. When you can’t, a common idea is to approximate the MPC policy with a neural network for fast inference. The catch is that approximation error is not “just” suboptimality. It can break feasibility and safety.

This project asks a more pointed question: can we learn an explicit MPC policy that is fast at runtime while enforcing closed-loop safety during training, rather than hoping it generalizes.

Approach

I trained a neural network policy $\pi_\theta(x)$ using two kinds of samples:

Supervised points: state–action pairs $(x^{(i)}, u^{(i)})$ generated by running implicit MPC offline along closed-loop trajectories.
Safety points: additional states $\tilde{x}^{(j)}$ sampled in the safe set (especially near the boundary), used only to enforce safety constraints.

The key constraint is a one-step closed-loop safety condition:

\[V(f(\tilde{x}, \pi_\theta(\tilde{x}))) \le 0,\]

which can be evaluated without solving the MPC optimization problem. That means we can scale the number of safety checks cheaply and use them to actively shape the learned policy.

To solve the resulting constrained learning problem, I implemented a first-order primal–dual training loop based on the augmented Lagrangian, designed to work cleanly with modern deep learning tooling (PyTorch + Adam).
In a companion paper, I analyzed this “primal-dual Adam” approach and proved local convergence under explicit step-size conditions, which helped clarify how the hyperparameters interact with constraint geometry.

Results

In simulation across three systems (including nonlinear dynamics and nonconvex safe sets), the constrained training approach achieved 100% safety on both domain and boundary test sets, while a nominal imitation-learned policy violated constraints. Runtime is where the method really pays off: evaluating the learned policy took about 0.035 ms worst case, versus up to 479 ms for solving MPC online in the nonlinear example, which is a 3–4 order-of-magnitude reduction with essentially constant latency.

Safe Explicit Model Predictive Control via Constrained Neural Network Training

Approach

Results

References

2024