Causality refers to the study of causal Bayesian networks (causal models.) Causal models are Bayesian networks in which all directed edges between variables denote a causal - not just correlational - relationship between the variables. Causal models are useful in situations where actions (interventions) that affect model variables need to be taken into account.

A **Bayesian network** is a graph in which each node represents a random variable and directed edges (arrows) connect nodes together. When viewing such a network it is tempting to think of the arrows between nodes as expressing the causal influence of one variable on another variable. For example, the Bayesian network

:$C \rightarrow T\,\!$

can represent the relationship between cavities $(C)\,\!$ and toothaches $(T)\,\!$. The network seems to imply that cavities cause toothaches, which is consistent with intuition. However, Bayesian networks do not *necessarily* show the causal influences between variables. For example, the Bayesian network

:$C \leftarrow T\,\!$

is just as valid as the previous network. But in this case the network would seem to imply that toothaches cause cavities, which is counter to intuition (and reality).

Instead of encoding causal relationships, Bayesian networks encode conditional independence relationships between variables. The first network asserts that $T\,\!$ is conditionally independent of all other variables given $C\,\!$. The second network asserts that $C\,\!$ is conditionally independent of all other variables given $T\,\!$. Both Bayesian networks are valid because both conditional independence assertions are valid.

**Causal models** are Bayesian networks where the directed arrows between nodes *do* represent causal connections between the variables. In a causal model, the parents of a node are the direct causes of the node.

The networks $C \rightarrow T\,\!$ and $C \leftarrow T\,\!$ are both valid Bayesian networks and both can be used to answer traditional probabilistic, or **observational queries** such as $P(C|T)\,\!$, or “what is the probability of a cavity given that we observe the presence of a toothache?” But in some situations it is useful to be able to answer **interventional queries** such as $P(C|do(t))\,\!$, or “what is the probability of a person having a cavity given that we punch the person in the teeth, causing the person to have a toothache?” The notation $do(t)\,\!$ means that the variable $T\,\!$ is manipulated such that it takes on the value $t\,\!$. Causal models allow us to answer such questions whereas Bayesian networks do not.

One reason this is true can be seen by trying to answer the query $P(C|do(t))\,\!$ using the above networks. Let's first assume that cavities cause toothaches and therefore the first network represents the causal relationship between $C\,\!$ and $T\,\!$. In this case intuition asserts correctly that $P(C|do(t))=P(C)\,\!$; if we punch a person in the teeth they will have a toothache whether they have a cavity or not and therefore the toothache gives us no information about the presence of a cavity. Now let's assume that toothaches cause cavities and therefore the second network represents the causal relationship between $C\,\!$ and $T\,\!$. In this case $P(C|do(t))=P(C|t)\,\!$; because toothaches cause cavities, giving someone a toothache will influence the chance that they develop a cavity. From this we see that the answer to an interventional query is dependent on the network. If the network is a causal model we get the correct answer (*e.g.*, $P(C|do(t))=P(C)\,\!$ in this case) and if the model is an arbitrary Bayesian network we (likely) get an incorrect answer (*e.g.*, $P(C|do(t))=P(C|t)\,\!$).

In some situations it is possible to reduce an interventional query to an observational query. We have done that in an ad hoc manner above when we stated that $P(C|do(t))=P(C)\,\!$ in the first network. Koller gives three simplification rules (pgs. 1018-19) that define when these simplifications can be performed. There are situations, however, when an interventional query *cannot* be simplified.

It is important to use a causal model when answering interventional queries. Unfortunately, it is usually very difficult to fully specify a causal model. One difficulty is distinguishing between the causal influence (if any) of one variable on another and the influence of a latent (unobserved) variable on both. For instance, consider the possibility that the tooth fairy's evil twin ($F\,\!$) causes both cavities and toothaches. This is modeled as follows:

$C \leftarrow F \rightarrow T\,\!$

where $C\,\!$ and $T\,\!$ are observed and $F\,\!$ is unobserved. In this case there is no causal connection between cavities and toothaches. If the tooth fairy's twin is out to confuse us he could cause cavities and toothaches in such a way as to make it impossible to distinguish (from observational data) this model from the original model:

$C \rightarrow T \,\!$.

Again, in order to answer interventional queries it is important that we have the right model. This is demonstrated in the following table.

Query/Model | $C \rightarrow T \,\!$ | $C \leftarrow F \rightarrow T\,\!$ |
---|---|---|

do(t))\,\!$ | $P(C)\,\!$ | $P(C)\,\!$ | ||

do©)\,\!$ | $P(T|c)\,\!$ | $P(T)\,\!$ |

In general, anything that makes it hard to determine the causal structure between variables is called a **confounding factor**. A latent variable such as $F\,\!$ is one example of a confounding factor.

Evil tooth fairies are not the only possible confounding latent variable that we could add to the model. Other factors such as genetics, certain diseases, and whether a person is a bully punching other people's teeth or the person being punched are all latent variables that could influence cavities and toothaches. Instead of trying to model each of these variables we can use what are called response variables to model them all.

We will add a response variable to the cavity/toothache problem. First, we assume that the variables are binary-valued and that cavities really do cause toothaches, leading to the use of this model:

:$C \rightarrow T\,\!$.

We also look at all the possible ways in which latent variables could affect this model.

There are four different effects that latent variables could have on the model. These can be illustrated using four different example patients. The first patient has the nerves in his mouth deadened (through painkillers, for instance) and won't have a toothache no matter what. The second patient is normal. The third patient angered the tooth fairy's twin and has a spell cast on him which causes him to have a toothache when he doesn't have a cavity and to have no toothache when he does have a toothache. The third patient has a disease which causes him to have a toothache no matter what. This is summarized in the table:

Nerves | | Normal | | Trick | | Disease | |||||
---|---|---|---|---|---|---|---|---|---|---|---|

$C\,\!$ | $T\,\!$ | $C\,\!$ | $T\,\!$ | $C\,\!$ | $T\,\!$ | $C\,\!$ | $T\,\!$ | ||||

Outcomes | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | |||

1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | ||||

We now add the response variable $U^T\,\!$ to the model to get:

:$C \rightarrow T \leftarrow U^T\,\!$.

The response variable $U^T\,\!$ can take on one of four values: Nerves, Normal, Trick, and Disease. $U^T\,\!$ is never observed because it is an abstraction of latent variables (all of the latent variables that could effect $C\,\!$ and $T\,\!$, in fact.)

However, if $U^T\,\!$ were observed, then observing $C\,\!$ would allow you to determine the exact value of $T\,\!$ by simply consulting the table above. The value of $T\,\!$ can be written as a deterministic function of $C\,\!$ and $U^T\,\!$. For this reason, causal models that include response variables are called **functional causal models**.

In this model, adding a response variable to $C\,\!$ (which would be name $U^C\,\!$) doesn't add anything because all the effects of latent variables on $C\,\!$ are summarized in the prior value $P(C)\,\!$.

We don't ever observe the value of $U^T\,\!$ and therefore we cannot gather statistics to determine $P(U^T)\,\!$. However, we *can* relate $P(U^T)\,\!$ to the observed probability distribution using the following equations:

:$ \begin{align} P(T=0|C=0) & = P(U^T=Nerves) + P(U^T=Normal) \\ P(T=1|C=0) & = P(U^T=Spell) + P(U^T=Disease) \\ P(T=0|C=1) & = P(U^T=Spell) + P(U^T=Nerves) \\ P(T=1|C=1) & = P(U^T=Normal) + P(U^T=Disease) \end{align} $

Let's look at the first equation. Given $C=0\,\!$, there are two possible ways in which $T=0\,\!$: if the patient is normal ($U^T=Normal\,\!$) or if the patient has deadened nerves ($U^T=Nerves\,\!$). Therefore, given $C=0\,\!$, the conditional probability of $T=0\,\!$ is $P(U^T=Nerves) + P(U^T=Normal)\,\!$. Note that $U^T=Spell\,\!$ and $U^T=Disease\,\!$ are not consistent with left hand side of the equation. If $U^T=Spell\,\!$ then the person would have a toothache ($T=1\,\!$) given that $C=0\,\!$. Likewise, if $U^T=Disease\,\!$ then the person would have a toothache ($T=1\,\!$) no matter what. Similar reasoning leads to the other three equations.

This set of equations sets constraints on the probability distribution $P(U^T)\,\!$. So even though we never observe $U^T\,\!$ and therefore cannot compute $P(U^T)\,\!$ directly, we *can* set constraints on what values the distribution can take on. We can also set priors over $P(U^T)\,\!$ and update them (*i.e.*, calculate the posteriors) after observing data.

What does this buy us? Well, in this example it doesn't really buy us anything because any interventional query in this example can be reduced to an observational query. However, it is useful in situations that are (even slightly) more complex in which we cannot reduce interventional queries to observational queries. In these situations we cannot calculate an exact value for the interventional query, but through the use of response variables we *can* calculate bounds on the value (that are sometimes quite tight). Koller has an example of this on pages 1032-33.

**Counterfactual queries** are queries such as “what is the probability of Germany invading England during World War II had Winston Churchill not been born?” Counterfactual queries ask about the probability of an event happening in an alternate universe that is exactly the same as our own universe *except for a given number of details*, details such as “Winston Churchill was never born”. The framework of causality and causal models gives us tools to define these queries very precisely and to answer them, or at least to put bounds on the answers to them.

An important tool in answering counterfactual queries are **counterfactual twinned networks**. These networks model both the real universe *and* the alternate universe that is involved in the counterfactual query. An example is the following:

:$C \rightarrow T \leftarrow U^T \rightarrow T^\prime \leftarrow C^\prime\,\!$

where $C^\prime\,\!$ and $T^\prime\,\!$ refer to the variables in the alternate universe. Answering a counterfactual query is essentially the same thing as answering an interventional query in this new network. We just run an inference algorithm (like Metropolis) to determine the probability of one node given 1) the values of whatever other nodes are observed and 2) the fact that we perform an intervention on one or more other nodes.

Note that the real universe and the alternate universe variables are connected through response variables; this implies that the underlying, unobserved causal mechanisms in the world do not change from the real to the alternate universe. The only thing that changes is the fact that we perform an intervention in the alternate-universe portion of the network.

Here is an example counterfactual query in words: “what is the probability that Jim would have had a toothache had we caused him to have a cavity, given that he didn't have a cavity and doesn't have a toothache?” Formally this would be written as:

:$P(T^\prime|do(c^\prime),C=0, T=0)\,\!$.

What is the difference between a Bayesian network and a causal model? What is the difference between an interventional query and an observational query?

- D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques. London: The MIT Press, 2009, Chapter 21.
- J. Pearl, Causality: Models, Reasoning, and Inference, 2nd Edition. Cambridge University Press, 2009.