Causal inference for medical AI: what can we learn from observational data of COVID-19 patients?

5 min readMar 2, 2021

By Giovanni Cinà (Pacmed Labs)

This blog post is for you if: you work in healthcare, you are interested in AI, you are binge-reading content on COVID-19 while splayed on the couch.


At the time of writing, it has been more than a year since the onset of the COVID-19 pandemic. Among all the ways in which our lives have been changed, we have all been presented with a very pressing question: how do we find a cure?

The problem of finding effective medications and vaccines, an issue that used to concern only a restricted group of experts, suddenly took a considerable amount of airtime on mass media, social media and our face-to-face conversations (remember those?). Begrudgingly, we had to came to grips with the idea that proving treatment effectiveness is a tricky process.

The method of choice for establishing the effect of a treatment is a Randomized Controlled Trial (RCT): a population of suitable patients is randomly divided into two groups, one receiving the treatment (the treated group) and one receiving no treatment or a placebo (the control group). The groups are then monitored for a certain time to check whether the desired outcome is different in the two groups.

We will not go into the advantages and disadvantages of RCTs; it suffices to say, as you might have noticed waiting for a vaccine, that it can be a long procedure.

But what about the people that are sick now? What can we do while we wait for the results of the RCTs? The most obvious answer is to start employing treatments that worked for diseases similar to COVID-19. But how do we know if they work well? Read on.

Collecting data of COVID-19 patients

To be able to say anything at all, we must first collect the data of COVID-19 patients. In the Netherlands, an unprecedented collaboration between Intensive Care Units (ICU) across the country, led by the Amsterdam University Medical Center, resulted in the collection of a vast database containing the information of COVID-19 patients from the ICUs of dozens of hospitals. You can find a description of the dataset in this letter, published on the Journal of Intensive Care Medicine.

This mass-scale data gathering allowed us to realize that there has been a certain amount of variation in the way treatments have been administered to COVID-19 patients. Doctors have tried different dosages of medications, different settings on the ventilators, and so on. This is not surprising, given that there were no specific guidelines on how to treat COVID-19 patients. These different behaviors might look like a sign of disorganization, but they do however offer the possibility to study the effectiveness of treatments.

Estimating treatment effect

The problem of assessing the efficacy of a treatment is always about answering the “what if” question, aka the counterfactual question. Let’s look at an example.

The headache-aspirin example

(Disclaimer: this example is completely made up for explanatory purposes, it does not contain any medical advice.)

Suppose you want to test the effectiveness of aspirin in curing headaches. Perhaps the first sensible thing we can think of is: giving aspirin to someone with headaches and checking if they feel better than before. But that’s not enough.

What we really want to know is: does the person feel better after taking aspirin compared to not taking it? Or in other words, is treatment better than not-treatment? It could be, for example, that headache disappears after a while and aspirin is not needed. The problem is that for each person we can only observe the outcome of one action, we cannot observe the counterfactual scenario (the one that did not occur). This is sometimes called ‘the fundamental problem’ of causal inference.

What we ideally want to do is to have two (very similar) people with headache, give treatment to one and no treatment to the other, observe the results and compare. You see that we are moving towards a RCT-like setup. In an RCT, using large groups makes it easier to overcome individual differences and have two relatively similar populations, while the random treatment assignment makes sure that there is no other factor influencing treatment assignment.

The subject of this example, headache. Incidentally, also a potential side-effect of this article.

Back to the main story

In summary, to be able to say something reasonable from observational data we need to

  • identify two groups of patients, one that received treatment and one that did not;
  • check that the two groups are very similar, meaning that the distributions of parameters are similar;
  • make sure that all variables influencing treatment have been measured;
  • check that each patient could have received either the treatment or the non-treatment.

The last two are finer technical point, we will not go into the details here; it suffices to say that the last point connects back to the intuitions about treatment variability and the “what if” question. If there is a strict guideline and doctors always assign treatment (say) to some kind of patients, then we have no information about the counterfactual scenario and cannot answer the “what if” part of the problem.

What can we do then about COVID-19

Now that the background is out of the way, we can see that there are two key questions remaining:

  1. Is there a treatment for COVID-19 for which the above conditions are fulfilled on the Dutch data and we can estimate its efficacy?
  2. What are the best methods to carry out this estimation?

Thanks to the generous funding of SIDN, and under the auspices of the consortium of Dutch ICUs led by the Amsterdam University Medical Center, we were able to investigate the ICU dataset and find a medical procedure for which it seems possible to answer both questions.

The medical procedure in question is called proning, and it is routinely used to treat Acute Respiratory Distress Syndrome and improve patients’ breathing. This maneuver was used often on COVID-19 patients but not always in the same fashion, thus there is enough treatment variability in the data.

In conclusion

In this post we went through the basics of treatment effect estimation, and described what are the challenges of carrying out such estimation from observational data. If you would like to know more about the fascinating field of causal inference, there are several resources available online (here for example you can find the online version of a popular textbook).

We are preparing two manuscripts to summarize our findings on the effect of proning on COVID-19 patients and answer the two questions mentioned above. They will be submitted to peer-review to ensure the methodology is sound and subsequently released open source, along with the code.

Stay tuned!




Pacmed builds decision support tools for doctors based on machine learning that makes sure patients only receive care that has proven to work for them!