##### Differences

This shows you the differences between two versions of the page.

 — cs-677:continuous-evsi [2015/01/06 21:22] (current)ryancha created 2015/01/06 21:22 ryancha created 2015/01/06 21:22 ryancha created Line 1: Line 1: + ==Simple Decisions in the Continuous Case== + Suppose we have a normally distributed node: + + $\theta \sim N(\theta_0, \sigma_0^2)$ whose PDF will be referred to as $g(\theta)\,​$. + + We will make a decision among two choices. ​ Our utility depends upon the value of $\theta$ and the choice we make.  We assume that for each choice, the utility function is linear: + + $U(1, \theta) = m_1\theta + b_1\,$ + + $U(2, \theta) = m_2\theta + b_2\,$ + + ===Expected Prior Utility=== + + The utility associated with the "​best"​ decision in this case is given by: + + $\max_i E_\theta[U(i,​ \theta)]\,$ + + $= \max_i E_\theta[m_i\theta + b_i]\,$ + + $= \max_i (m_i E_\theta[\theta] + b_i)\,$ (using the linearity property of Expected Values) + + $= \max_i (m_i \theta_0 + b_i)\,$ + + where the subscript in $E_\theta[]$ means to integrate out $\theta$ when taking the Expected Value. + + ===Breakeven Point=== + + Note that there is a point $\theta_b$ such that $U(1, \theta_b) = U(2, \theta_b)$. ​ When $\theta$ holds this value, we are indifferent to decision 1 vs. decision 2.  That is: + + $U(1, \theta) = U(2, \theta)$ ​ + + $m_1\theta_b + b_1 = m_2\theta_b + b_2$ + + $(m_1-m_2)\theta_b = b_2 - b_1$ + + so the breakeven point $\theta_b$ is given by: + + $\theta_b = \frac{b_1 - b_2}{m_2 - m_1}$ + + Or + + $\theta_b = \frac{b_2 - b_1}{m_1 - m_2}$ + + ==Optional Sample Information== + + There is another node $y$ which may or may not be observed: + + $y|\theta \sim N(\theta, \sigma_y^2)$ + + Some cost is associated with observing $y$, so it may or may not be worthwhile to make the observation. ​ We will find the Expected Value of Sample Information (EVSI) to help us make this decision. ​ If EVSI is greater than the cost of observing $y$, then we should choose to observe $y$.  If it is less than the cost, we should not make the observation. + + ===Posterior Mean=== + + We define $\tilde{\theta}(y)$ to be the posterior mean of $\theta|y$. ​ Recall from earlier class discussions that the posterior is distributed:​ + + $\theta|y \sim N(\frac{\sigma_y^2 \theta_0 + \sigma_0^2 y}{\sigma_0^2 + \sigma_y^2},​ \frac{\sigma_0^2 \sigma_y^2}{\sigma_0^2 + \sigma_y^2})$ + + Thus the posterior mean of $\theta|y$ is a linear function of $y$ which we will call$\tilde{\theta}(y)$:​ + + $E_\theta[\theta|y]=\tilde{\theta}(y) = \frac{\sigma_y^2 \theta_0 + \sigma_0^2 y}{\sigma_0^2 + \sigma_y^2} = \frac{\sigma_0^2}{\sigma_0^2 + \sigma_y^2} y + \frac{\sigma_y^2}{\sigma_0^2 +\sigma_y^2} \theta_0$ + + This function can be inverted: + + $\tilde{\theta}(y) - \frac{\sigma_y^2}{\sigma_0^2 +\sigma_y^2} \theta_0 = \frac{\sigma_0^2}{\sigma_0^2 + \sigma_y^2} y$ + + $\frac{\tilde{\theta}(y) - \frac{\sigma_y^2}{\sigma_0^2 +\sigma_y^2} \theta_0} {\frac{\sigma_0^2}{\sigma_0^2 + \sigma_y^2}}= ​ y$ + + So + + $y= \tilde{\theta}^{-1}(\theta) = \frac{(\sigma_0^2 + \sigma_y^2) \theta - \sigma_y^2 \theta_0}{\sigma_0^2}$ + + ===Breakeven Observation=== + + The breakeven observation $y_b$ is the observation which causes the posterior mean $\theta|y_b$ to move to the breakeven point: + + $y_b = \tilde{\theta}^{-1}(\theta_b)$ + + If we observe the value $y_b$, we will be indifferent to decision 1 vs. decision 2.  Since the utility functions are linear, we will always prefer one of the decisions if we observe a value less than $y_b$, and we will always prefer the other decision if we observe a value greater than $y_b$. ​ We can calculate $y_b$: + + $y_b = \frac{(\sigma_0^2 + \sigma_y^2) \theta_b - \sigma_y^2 \theta_0}{\sigma_0^2} = \theta_b + \frac{\sigma_y^2}{\sigma_0^2} (\theta_b - \theta_0)$ + + ==Expected Value of Sample Information== + + Recall the definition of the Expected Value of Sample Information and the discussion about the Expected Prior Utility: + + $EVSI = E_y\left[\max_i E_\theta[U(i,​ \theta)|y]\right] - max_j E_\theta[U(j,​ \theta)]\,$ + + $= E_y\left[\max_i (m_i E_\theta[\theta|y] + b_i)\right] - max_j (m_j E_\theta[\theta] + b_j)\,$ + + $= E_y\left[\max_i (m_i \tilde{\theta}(y) + b_i)\right] - max_j (m_j \theta_0 + b_j)\,$ + + ===Rewrite as One Expectation=== + + Without loss of generality, assume that decision 1 is better than decision 2 in the case of the prior. Re-label choice 1 and 2 if this is not the case. That is re-label such that: + + $E[U(1, \theta)] > E[U(2, \theta)]\,$ + + This lets us write: + + $EVSI = E_y\left[\max_i (m_i \tilde{\theta}(y) + b_i)\right] - (m_1 \theta_0 + b_1)\,$ + + The next step is tricky. ​ First, we show that $E_y\left[\tilde{\theta}(y)\right] = \theta_0$: + + $E_y[\tilde{\theta}(y)] = E_y[\frac{\sigma_0^2}{\sigma_0^2 + \sigma_y^2} y + \frac{\sigma_y^2}{\sigma_0^2 + \sigma_y^2} \theta_0]\,​$ + + $= \frac{\sigma_0^2}{\sigma_0^2 + \sigma_y^2} E_y[y] + \frac{\sigma_y^2}{\sigma_0^2 + \sigma_y^2} \theta_0\,$ + + $= \frac{\sigma_0^2}{\sigma_0^2 + \sigma_y^2} \theta_0 + \frac{\sigma_y^2}{\sigma_0^2 + \sigma_y^2} \theta_0 = \theta_0\,$ + + where the last step is possible because: + + $Y \sim N(\theta_0, \sigma_0^2 + \sigma_y^2)\,​$ + + Now we can use this fact (in reverse) to make a clever substitution for $\theta_0$ and rewrite the EVSI equation as follows: + + $EVSI = E_y\left[\max_i (m_i \tilde{\theta}(y) + b_i)\right] - (m_1 E_y[\tilde{\theta}(y)] + b_1)\,$ + + $= E_y\left[\max_i (m_i \tilde{\theta}(y) + b_i)\right] - E_y[m_1 \tilde{\theta}(y) + b_1]\,$ + + $= E_y\left[\max_i (m_i \tilde{\theta}(y) + b_i) - (m_1 \tilde{\theta}(y) + b_1)\right]\,​$ + + ===Rewrite as an Integral=== + + Using the definition of the expected value, this becomes: + + $EVSI = \int_{-\infty}^{\infty}\left[\max_i (m_i \tilde{\theta}(y) + b_i) - (m_1 \tilde{\theta}(y) + b_1)\right] \bar{f}(y) dy$ + + $= \int_{-\infty}^{y_b}\left[\max_i (m_i \tilde{\theta}(y) + b_i) - (m_1 \tilde{\theta}(y) + b_1)\right] \bar{f}(y) dy$ + $+ \int_{y_b}^{\infty}\left[\max_i (m_i \tilde{\theta}(y) + b_i) - (m_1 \tilde{\theta}(y) + b_1)\right] \bar{f}(y) dy +$ + + where $\bar{f}(y)$ is the marginal pdf of Y and $Y \sim N(\theta_0, \sigma_0^2 + \sigma_y^2)\,​$ as noted above. + + ===Case 1: $\theta_0 > \theta_b$=== + + To get rid of the max in the EVSI equation, we will look at two separate cases. ​ In the first case, we suppose that $\theta_0 > \theta_b$ and note that $\tilde{\theta}(\cdot)$ is monotone increasing. ​ When the observation $y > y_b$, $\tilde{\theta}(y) > \theta_b$. ​ We don't change our mind, and decision 1 is still the best choice. ​ However, when $y < y_b$, decision 2 is the best choice, given the observation. + + $EVSI = \int_{-\infty}^{y_b}[(m_2 \tilde{\theta}(y) + b_2) - (m_1 \tilde{\theta}(y) + b_1)] \bar{f}(y) dy$ + $+ \int_{y_b}^{\infty}[(m_1 \tilde{\theta}(y) + b_1) - (m_1 \tilde{\theta}(y) + b_1)] \bar{f}(y) dy +$ + + $= \int_{-\infty}^{y_b}[(m_2 - m_1) \tilde{\theta}(y) + (b_2 - b_1)] \bar{f}(y) dy + 0$ + + $= (m_2 - m_1) \int_{-\infty}^{y_b} \left[\tilde{\theta}(y) + \frac{b_2 - b_1}{m_2 - m_1}\right] \bar{f}(y) dy$ + + $= (m_2 - m_1) \int_{-\infty}^{y_b} \left[\tilde{\theta}(y) - \theta_b\right] \bar{f}(y) dy$ + + Note that $(m_2 - m_1)$ must be negative if choice 1 is preferred in the prior. Let's flip things around so that these quantities are positive and then convert to use the absolute value. The use of the absolute value allows us to drop the requirement that choice one is preferred in the prior. + + $= |m_1 - m_2| \int_{-\infty}^{y_b} \left[\theta_b - \tilde{\theta}(y)\right] \bar{f}(y) dy$ + + ===Case 2: $\theta_0 < \theta_b$=== + + In the second case, we suppose that $\theta_0 < \theta_b$. ​ Since $\tilde{\theta}(\cdot)$ is monotone increasing, $\tilde{\theta}(y) < \theta_b$ when $y < y_b$ and $\tilde{\theta}(y) > \theta_b$ when $y > y_b$.  Thus, decision 2 remains the best choice when $y < y_b$, but decision 1 is the best choice when $y > y_b$. + + $EVSI = \int_{-\infty}^{y_b}[(m_1 \tilde{\theta}(y) + b_1) - (m_1 \tilde{\theta}(y) + b_1)] \bar{f}(y) dy$ + $+ \int_{y_b}^{\infty}[(m_2 \tilde{\theta}(y) + b_2) - (m_1 \tilde{\theta}(y) + b_1)] \bar{f}(y) dy +$ + + $= 0 + \int_{y_b}^{\infty}[(m_2 - m_1) \tilde{\theta}(y) + (b_2 - b_1)] \bar{f}(y) dy$ + + $= (m_2 - m_1) \int_{y_b}^{\infty} \left[\tilde{\theta}(y) + \frac{b_2 - b_1}{m_2 - m_1}\right] \bar{f}(y) dy$ + + $= |m_1 - m_2| \int_{y_b}^{\infty} \left[\tilde{\theta}(y) - \theta_b\right] \bar{f}(y) dy$ + + ===Pause and Reflect=== + + We have found a formula for EVSI when $\theta_0 > \theta_b$, and another formula for when $\theta_0 < \theta_b$. ​ To get to this point, we assumed that $m_1 > m_2$.  However, at this point, we can drop the assumption because those variables only appear as a scaling factor in front of the integral. + + When $\theta_0 > \theta_b$, + + $EVSI = |m_1 - m_2| \int_{-\infty}^{y_b} \left[\theta_b - \tilde{\theta}(y)\right] \bar{f}(y) dy$ + + and when $\theta_0 < \theta_b$, + + $EVSI = |m_1 - m_2| \int_{y_b}^{\infty} \left[\theta_b - \tilde{\theta}(y)\right] \bar{f}(y) dy$ + + These equations are the same except in the endpoints of the integral. ​ The first integral is called the "​left-hand linear loss integral,"​ and the second is the "​right-hand linear loss integral." ​ They look messy, but we can simplify them.  For now, let's say: + + $EVSI_{\theta_0 > \theta_b} = |m_1 - m_2| L_l(y)$ + + $EVSI_{\theta_0 < \theta_b} = |m_1 - m_2| L_r(y)$ + + ===Standardized Normal Transformation=== + + Before we get started on the simplification of the linear loss functions, recall that the standard normal (usually named "​z"​) has mean 0 and standard deviation 1.  The standard normal PDF is $\phi(\cdot)$,​ and the standard normal CDF is $\Phi(\cdot)$. ​ Standardized values let us plug values into $\phi$ and $\Phi$. ​ To standardize a value from another normal distribution,​ subtract the mean and divide by the standard deviation: + + $z = \frac{y-\mu}{\sigma}$ + + Recall that Y is distributed as: + + $y \sim N(\theta_0, \sigma_0^2 + \sigma_y^2)$ + + Therefore, the standardized value is: + + $z = \frac{y - \theta_0}{\sqrt{\sigma_0^2 + \sigma_y^2}}$ + + Inverting this, we find that: + + $y = z \sqrt{\sigma_0^2 + \sigma_y^2} + \theta_0$ + + We will simplify calculations if we define a helper variable: + + $t = \sqrt{\sigma_0^2 + \sigma_y^2}$ + + Using $t$, we can simplify the following equations: + + $y = t z + \theta_0\,$ + + $\tilde{\theta}(y) = \frac{\sigma_0^2}{\sigma_0^2 + \sigma_y^2} y + \frac{\sigma_y^2}{\sigma_0^2 +\sigma_y^2} \theta_0 = \frac{\sigma_0^2}{t^2} y + \frac{\sigma_y^2}{t^2} \theta_0$ + + Then we substitute in for $y$ in terms of $z$: + + $\tilde{\theta}(tz + \theta_0) = \frac{\sigma_0^2}{t^2} (tz + \theta_0) + \frac{\sigma_y^2}{t^2} \theta_0 = \frac{\sigma_0^2}{t} z + \frac{\sigma_0^2 + \sigma_y^2}{t^2} \theta_0 = \frac{\sigma_0^2}{t} z + \theta_0$ + + ===Left Hand Linear Loss Integral=== + + We begin simplifying the left hand linear loss integral: + + $L_l(y) = \int_{-\infty}^{y_b} \left[\theta_b - \tilde{\theta}(y)\right] \bar{f}(y) dy$ + + $= \theta_b \int_{-\infty}^{y_b} \bar{f}(y) dy - \int_{-\infty}^{y_b} \tilde{\theta}(y) \bar{f}(y) dy$ + + We will perform the following change of variables of $z$ for $y$: + + $dy = t dz\,$ + + $\bar{f}(y) dy = \phi(z) dz$ (the t cancels out with the standard deviation in the normal pdf) + + Substituting this in, we get: + + $L_l(y_b) = \theta_b \int_{-\infty}^{z_b} \phi(z) dz - \int_{-\infty}^{z_b} \tilde{\theta}(tz + \theta_0) \phi(z) dz$ + + $= \theta_b \Phi(z_b) - \int_{-\infty}^{z_b} (\frac{\sigma_0^2}{t} z + \theta_0) \phi(z) dz$ + + $= \theta_b \Phi(z_b) - \theta_0 \int_{-\infty}^{z_b} \phi(z) dz + - \frac{\sigma_0^2}{t} \int_{-\infty}^{z_b} z \phi(z) dz$ + + $= (\theta_b - \theta_0) \Phi(z_b) - \frac{\sigma_0^2}{t} \int_{-\infty}^{z_b} z \phi(z) dz$ + + The last messy bit left is the integral, which turns out to be very simple: + + $\int_{-\infty}^{z_b} z \phi(z) dz$ + + $= \int_{-\infty}^{z_b} z \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} z^2} dz$ + + $= \frac{1}{\sqrt{2\pi}} \left[ -e^{-\frac{1}{2} z^2} \right]_{-\infty}^{z_b}$ + + $= - \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{2} z_b^2}$ + + $= - \phi(z_b)\,​$ + + Now we can substitute this back in: + + $L_l(y_b) = (\theta_b - \theta_0) \Phi(z_b) + \frac{\sigma_0^2}{t} \phi(z_b)$ + + ==Formula for the Expected Value of Sample Information== + + Breakeven point of $\theta$: + + $\theta_b = -\frac{b_2 - b_1}{m_2 - m_1}$ + + Breakeven observation:​ + + $y_b = \theta_b + \frac{\sigma_y^2}{\sigma_0^2} (\theta_b - \theta_0)$ + + The normalized breakeven point is: + + $z_b = \frac{y_b - \theta_0}{\sqrt{\sigma_0^2 + \sigma_y^2}}$ + + Formula for EVSI: + + $EVSI_{\theta_0 = \theta_b} = 0$ + + $EVSI_{\theta_0 > \theta_b} = |m_1 - m_2| \frac{\sigma_0^2}{\sqrt{\sigma_0^2 + \sigma_y^2}} L_N(-z_b)$ + + $EVSI_{\theta_0 < \theta_b} = |m_1 - m_2| \frac{\sigma_0^2}{\sqrt{\sigma_0^2 + \sigma_y^2}} L_N(z_b)$ + + $L_N$ in these equations is the linear loss integral for the normal distribution:​ + + $L_N(x) = \phi(x) - x (1 - \Phi(x))\,$ + + where $\phi$ is the normal PDF and $\Phi$ is the normal CDF.