Combining multiple models for off-policy evaluation of reinforcement learning in healthcare
The recent successes of reinforcement learning (RL) across multiple fields such as robotics, games and self-driving cars has sparked interest in applying these methods to more challenging and arguably more impactful fields such as healthcare. However, high stakes real world applications break the main paradigm which makes RL so powerful - the ability to experiment by interacting with the environment in order to learn an optimal strategy. Thus, in healthcare and other real-world applications we must often use observational data to learn and evaluate policies. I will give an overview of some of the main challenges encountered when trying to evaluate RL methods from observational data as well as the main methods used to tackle these problems. I will then describe our approaches to combining these different methods to leverage the individual strengths of each one.