Abstract: The gold-standard approach to establishing causality is through a randomized controlled trial (RCT) where participants are randomly assigned to either a treatment or a control group. Unfortunately, in many cases only observational data (e.g., social media) is available, but these data include potential sources of bias. Several methods have been proposed to mitigate the effect of such biases, for example, the Self Controlled Case Series method. In our work we compare three approaches for causal discovery and apply them to social media data (specifically, Reddit) to identify side effects of medical interventions. Our results show that all examined methods replicate known findings from the literature, except for cases where sample sizes are insufficient. Thus, our work demonstrates the usefulness of social media data, analyzed by appropriate algorithmic approaches, to facilitate the discovery of adverse reactions of medical interventions through real-world data.
Assessing causal effects of medical interventions from social media: The contribution of different algorithmic approaches