Optimal control and reinforcement learning are often viewed as competing paradigms for sequential decision-making. While model-based control relies on adjoint sensitivities, reinforcement learning derives policies through value-function approximation. However, the Hamiltonian in optimal control and the state–action value function in reinforcement learning plays analogous roles in guiding policy improvement.
This work leverages these structural correspondences to develop mechanisms of reciprocal reinforcement between modelling and control within the Reinforcement Twinning framework. In this cyber–physical architecture, a digital twin and a learning agent evolve within a shared feedback loop: the twin provides the predictive structure and sensitivity information necessary to accelerate policy optimization, while the agent generates informative trajectories that improve the model.
Illustrative test cases on nonlinear thermo-fluid systems demonstrate how policies optimized on imperfect twins can extract robust operational knowledge under stochastic disturbances. By explicitly exploiting and refining model uncertainty rather than assuming absolute fidelity, this framework demonstrates that the synergy between predictive modelling and autonomous learning outperforms their independent implementation, providing a more resilient path for the control of complex, nonlinear systems.
1 avenue Clement Ader
86360 CHASSENEUIL DU POITOU
Prochains évènements
Retour à l'agendaLa Background-Oriented Schlieren – Principes, Applications et Perspectives
Intervenant : Olivier Léon & David Donjat (ONERA),
