##### Differences

This shows you the differences between two versions of the page.

 cs-677sp2010:variational-bayes [2014/12/09 09:53]ryancha created cs-677sp2010:variational-bayes [2014/12/11 16:51] (current)ryancha 2014/12/11 16:51 ryancha 2014/12/09 09:53 ryancha created 2014/12/11 16:51 ryancha 2014/12/09 09:53 ryancha created Line 4: Line 4: The following equation provides the distance or divergence measure needed to compare distributions:​ The following equation provides the distance or divergence measure needed to compare distributions:​ - [[File:​KL_distance_equation.png|200px]] + [[media:​cs-677sp10:​KL_distance_equation.png|200px]] - This equation may also be seen in the form: [[File:​KL_distance_equation2.png|100px]] + This equation may also be seen in the form: [[media:​cs-677sp10:​KL_distance_equation2.png|100px]] The Kullback-Leibler divergence distance measure does not meet all the requirements to be called a distance metric – namely symmetry and triangle inequality. However, it does have the positivity property as well as other useful features. Some of these additional features are the ability to include conditionals and to apply the chain rule. For example: The Kullback-Leibler divergence distance measure does not meet all the requirements to be called a distance metric – namely symmetry and triangle inequality. However, it does have the positivity property as well as other useful features. Some of these additional features are the ability to include conditionals and to apply the chain rule. For example: - [[File:​KL_distance_equation_conditional.png]] + [[media:​cs-677sp10:​KL_distance_equation_conditional.png]] - [[File:​KL_distance_equation_chain_rule.png]] + [[media:​cs-677sp10:​KL_distance_equation_chain_rule.png]] ==Formulating the Problem== ==Formulating the Problem== Starting with the joint distribution and the chain rule we have: Starting with the joint distribution and the chain rule we have: - [[File:​VB_problem_formulation.png]] + [[media:​cs-677sp10:​VB_problem_formulation.png]] Because we’re in log land, we can add the same term to the denominator of the two terms in a subtraction and not change the result Because we’re in log land, we can add the same term to the denominator of the two terms in a subtraction and not change the result - [[File:​VB_problem_formulation2.png]] + [[media:​cs-677sp10:​VB_problem_formulation2.png]] - [[File:​VB_problem_formulation3.png]] contains the free distribution of our choosing and because it is a pdf, it will integrate to 1 on the left hand side. The other side can be re-written in terms of the KL-divergence function and lower bound function. First, the second half of the right hand side is negative, which negative sign can be thought of as coming from flipping the inside of the log upside down and extracting the negative out to the front + [[media:​cs-677sp10:​VB_problem_formulation3.png]] contains the free distribution of our choosing and because it is a pdf, it will integrate to 1 on the left hand side. The other side can be re-written in terms of the KL-divergence function and lower bound function. First, the second half of the right hand side is negative, which negative sign can be thought of as coming from flipping the inside of the log upside down and extracting the negative out to the front - [[File:​KL_distance_rewrite.png]] + [[media:​cs-677sp10:​KL_distance_rewrite.png]] - This becomes the second half of the right hand side the equation and because it is the distance between our distribution and the true distribution,​ it is the part we will try to minimize. Conversely the first half of the right hand side must be our lower bound for our posterior approximation which will be shorthanded to be [[File:​Lower_bound_rewrite.png]]. The final equation looks like: + This becomes the second half of the right hand side the equation and because it is the distance between our distribution and the true distribution,​ it is the part we will try to minimize. Conversely the first half of the right hand side must be our lower bound for our posterior approximation which will be shorthanded to be [[media:​cs-677sp10:​Lower_bound_rewrite.png]]. The final equation looks like: - [[File:​Problem_Formulated.png]] + [[media:​cs-677sp10:​Problem_Formulated.png]] - The two parts work together to reproduce the left hand side. However, we’d like to get our lower bound as close to the true distribution as possible, so maximizing [[File:​Lower_Bound.png]] also equates to minimizing [[File:​KL_distance_equation_half.png]],​ and can be visualized as follows: + The two parts work together to reproduce the left hand side. However, we’d like to get our lower bound as close to the true distribution as possible, so maximizing [[media:​cs-677sp10:​Lower_Bound.png]] also equates to minimizing [[media:​cs-677sp10:​KL_distance_equation_half.png]],​ and can be visualized as follows: - [[File:​Problem_Formulation_Visual.png]] + [[media:​cs-677sp10:​Problem_Formulation_Visual.png]] ==Choosing the Right Distribution== ==Choosing the Right Distribution== Line 39: Line 39: 1) 1) - [[File:​Simple_structure.PNG|200px]] ​ + [[media:​cs-677sp10:​Simple_structure.PNG|200px]] ​ 2) 2) - [[File:​Simple_structure2.PNG|200px]] + [[media:​cs-677sp10:​Simple_structure2.PNG|200px]] ==Iterate Till Convergence== ==Iterate Till Convergence==