If I am slightly risk averse which do I prefer:
[0.5,$900;0.5,$800] or [0.1,$8750;0.9,$0]
[0.6,$100;0.4,$90] or [0.6001,$100;0.3999,$90]
[0.5,$110;0.5,$90] or [0.5001,$90;0.4999,$150]
This section follows the flow of the oil example given in class.
Let $P(x) = 0.2$ for a Boolean Random Variable X.
Assume that you have to make a decision D=1 or 2, leading to different utilities, both functions of X:
$U(D=1,x)= 400\,$ and $U(D=1,\neg x)= 2\,$
$U(D=2,x)= 20\,$ and $U(D=2,\neg x)= 100\,$
Compute the Expected Utilities and state what choice you would make.
Suppose that X (Boolean true, false) influences another Random Variable, Y (3-valued 1,2 and 3), in the following way:
$P(Y=1|x) = 0.2\,$
$P(Y=2|x) = 0.4\,$
$P(Y=1|\neg x) = 0.6\,$
$P(Y=2|\neg x) = 0.3\,$
Compute the posterior probabilities:
Note that we will be computing many probabilities in this and subsequent sections. You may compute the joint of x and y and then sum the needed Values out of your joint or use Bayes' law directly.
Use these probabilities to compute the posterior expected utilities:
What choice would you make in each of the following cases. What utility would you expect in each case given your choice? Use these three conditional expected utilities in the next section.
Compute the following probabilities:
What is the Expected Posterior Utility, that is, multiply (pair-wise) the probabilities you just computed by the conditional expected utilities you computed at the end of the previous section, then add them up.
What is the Expected Value of Sample Information? (The expected Posterior Utility you just computed, minus the maximum expected utility computed at the beginning of this section)
This sequence of questions led you through the computations needed for EVSI.
Consider the following simple MDP:
<table border=1> <tr> <td>A</td> <td>B</td> <td>C</td>
<tr> <td>D</td> <td>E</td> <td>F</td>
<table>
Assume that:
Do two full (all states) iterations of each of the following:
You might also want to satisfy yourself that you could do policy iteration too, but don't turn it in. policy iteration would require that you have access to a linear equations solver.
<!–
Using the modified version of the example used in class, shown below
<table border=1> <tr> <td>A</td> <td>B</td> <td>C</td>
<table>
where:
Compute the updates for the following trace:
<table border=1> <tr> <th>Step</th><th>State</th><th>Action</th><th>Result</th><th>Reward</th>
<tr> <th>1</th> <td>A</td> <td>U</td> <td>A</td> <td>0.0</td>
<tr> <th>2</th> <td>A</td> <td>R</td> <td>B</td> <td>0.0</td>
<tr> <th>3</th> <td>B</td> <td>U</td> <td>A</td> <td>0.0</td>
<tr> <th>4</th> <td>A</td> <td>R</td> <td>B</td> <td>0.0</td>
<tr> <th>5</th> <td>B</td> <td>R</td> <td>C → B</td> <td>1.0</td>
<tr> <th>6</th> <td>B</td> <td>R</td> <td>B</td> <td>0.0</td>
<table>–>