Miguel Alfonso Mendez, de du von Karman Institute (Belgique)

Optimal control and reinforcement learning are often viewed as competing paradigms for sequential decision-making. While model-based control relies on adjoint sensitivities, reinforcement learning derives policies through value-function approximation. However, the Hamiltonian in optimal control and the state–action value function in reinforcement learning plays analogous roles in guiding policy improvement.
This work leverages these structural correspondences to develop mechanisms of reciprocal reinforcement between modelling and control within the Reinforcement Twinning framework. In this cyber–physical architecture, a digital twin and a learning agent evolve within a shared feedback loop: the twin provides the predictive structure and sensitivity information necessary to accelerate policy optimization, while the agent generates informative trajectories that improve the model.
Illustrative test cases on nonlinear thermo-fluid systems demonstrate how policies optimized on imperfect twins can extract robust operational knowledge under stochastic disturbances. By explicitly exploiting and refining model uncertainty rather than assuming absolute fidelity, this framework demonstrates that the synergy between predictive modelling and autonomous learning outperforms their independent implementation, providing a more resilient path for the control of complex, nonlinear systems.

25 June 2026, 14h0015h00

salle B266/B267 (Numa Manson) à l'ISAE-ENSMA
1 avenue Clement Ader
86360 CHASSENEUIL DU POITOU

Prochains évènements

Retour à l'agenda

30 April 2026

Tracking the nonlinear formation of an interfacial wave cascade: from one, to few, to many

Silvia Schiattarella

23 April 2026

La Background-Oriented Schlieren – Principes, Applications et Perspectives

Intervenant : Olivier Léon & David Donjat (ONERA),