In the variable elimination algorithm, the basic computational step is the manipulation of factors. In clique trees, a factor $\phi_i$ is considered as a computational data structure, which receives messages generated by other factors and sends a message used by another factor.
A cluster graph for a set of factors F is an undirected graph with the following properties:
Example: From Variable Elimination Execution to cluster tree
Fig.1 Original graph
Fig.2 Cluster tree after the variable elimination execution
Execution of variable elimination defines a cluster-graph
A clique tree is a cluster tree that satisfies the running intersection property
So the tree shown in Fig.2 is also a clique tree, each box represents a clique C and on each edge between them the sepset $S_{ij} $ is labeled.
Given a clique tree as a starting point, this data structure can be used to perform variable elimination. Furthermore, the clique tree provides a data structure allowing multiple executions of variable elimination to be performed much more efficiently than simply performing each one separately.
Here is an example of using a clique tree to do the variable elimination.
Fig.3 Given a possible clique tree at the beginning
Notice that this clique tree is different from the one in Fig.2 which is obtained from the variable elimination execution. And we could verify that this clique tree satisfy the definition.
Let's define a term before we show the clique tree algorithm
The initial potential $\pi^0_i(C_i)$ is computed by multiplying the initial factors assigned to the clique $C_i$ . For example, $\pi^0_4(G,J,S,L) = P(L|G)P(J|S,L)$.
Fig.4 initial potentials
Now, assume that we want to compute the probability P(J). We do the varibale elimination process so that J is not eliminated. We select a clique containing J as our root clique, for example,$ C_4$.
Each clique $C_i$ , except for the root, performs a message passing computation and sends a message to its upstream neighbor. The clique $C_i$ multiplies all incoming messages from its other neighbors with its initial potential, resulting in a factor whose scope is the clique. It then sums out all variables except those in the sepset between $C_i$ and $C_j$ , and sends the resulting factor as a message to $C_j$ .
Fig.5 Message passing
In Fig.5, for example, the clique $C_2(G,I,D)$ generates a factor ( message ) by multiplying the message$\delta_{1 \rightarrow 2}(D)$ from $C_1$ with its initial potential $\pi^0_2(G,I,D)$ and then summing out the variable D. The resulting factor ( message ) which has a scope G, I, is then sent to clique $C_3(G,S,I)$ .
We say that a clique is ready when it has received all of its incoming messages. Thus, in this example, we could see that $C_4$ is ready at the very start of the algorithm, and the computation associated with it can be performed at any point in the execution.
If we want to compute the posterior distribution over every random variable in the network with least computation, there is a “Shafer-Shenoy” algorithm to solve this.
Fig.6 Messages passing in two passes
A key point is that the resulting marginal distribution does not depend on the clique we selected. That is, if variable X appears in two cliques, they must agree on its marginal. So here comes the definition of calibrated clique tree.
An alternative approach to computing the message is to multiply in all of the messages and then divide the resulting factor by returning message.
$\delta_{i \rightarrow j} = \dfrac{\sum_{C_i - S_{i,j}} \beta_i(C_i)}{ \delta_{j \rightarrow i}} $
In principle, they both use the same basic operations of multiplying factors and summing out variables. The cliques in clique tree are basically the factors in variable elimination. In practice, clique trees have:
Given a Baysian Network, construct your own clique tree and compute the posterior marginal distribution of each node in the network. (Using the calibrated clique tree algorithm)