When working with very small probabilities, it is easy to encounter problems with underflow in the floating point representation. To avoid this problem, it is customary to work in the Logarithmic Domain (or “log space”). Logarithms scale the quantities in question into a workable range where machine precision issues are not as detrimental. The base of the logarithm, once fixed, can be ignored.

Working in the log domain requires that we recall a couple of identities from algebra:

- $log(a \cdot b) = log(a) + log(b)$
- $log(a^b) = b \cdot log(a)$

Consequently,

- $log(1/a) = log(a^{-1}) = - log(a)$

Hence,

- $log(a / b) = log(a) - log(b)$

Multiplicative identity:

- $log 1 = 0$

Additive identity:

- $log 0 = -Infinity$

In other words, we can replace multiplication, exponentiation, and division by simple operations in the log domain.

A more subtle technique is required to replace addition (and subtraction). In other words, if we wish to replace $log(a + b)$ by some operation involving quantities $log(a)$ and $log(b)$ already in log space, then we require the $logAdd()$ method.

Conceptually, $logAdd(log(a),log(b))$ takes two arguments in log space, adds them in regular space, and returns a result in log space. Naively this could be implemented as log (e^{log(a)} + e^{log(b)}). However, when we are dealing with numbers $a$ and $b$ that are so small that they can't be represented in regular space using floating point representation (e^{log(a)} or e^{log(b)} would underflow to 0), so the implementation must play some math tricks to avoid actually doing the exponentiation.

The $logAdd()$ method below employs those tricks. First off, logAdd() takes two arguments of type double in log space, logX and logY. It returns a double value which closely approximates log(X + Y). Consequently, the inputs and output are in log space, and we need not convert in and out of log space to complete this operation.

Steps 1-4 are commented in the code. We will prove why step 4 is correct after the code.

double logAdd(double logX, double logY) // 1. make X the max if (logY > logX) double temp = logX logX = logY logY = temp // 2. now X is bigger if (logX == Double.NEGATIVE_INFINITY) return logX // 3. how far "down" (think decibels) is logY from logX? // if it's really small (20 orders of magnitude smaller), then ignore double negDiff = logY - logX if (negDiff < -20) return logX // 4. otherwise use some nice algebra to stay in the log domain // (except for negDiff) return logX + log(1.0 + exp(negDiff))

$log X + log(1.0 + e^{logY-logX})$

$= log(X \cdot (1.0 + e^{logY-logX}))$

$= log(X + X \cdot e^{logY/X})$

$= log(X + X \cdot Y / X)$

$= log(X + Y)$

Couldn't be nicer.

To log-sum several logarithmic quantities, simply represent those quantities in a list L. Call logAdd() to log-accumulate each of the quantities in the list, one at a time. Thus, you could simply implement logSum() using the following pseudo-code:

double logSum(a list of logarithmic quantities L) logResult = Double.NEGATIVE_INFINITY foreach logX in L: logResult = logAdd(logResult, logX) return logResult