Survival Times and Probabilities

My acquaintance gwern had a question: given the current total lifetime of a certain web service, how to infer the probable future lifetime?

Quick Estimate

A quick and dirty intuitive method of calculating this uses Laplace’s rule of succession: treat survival or failure as a Bernoulli trial per unit time, so that given that the subject has survived for N days without a closure on any of those days, we could say that the probability of surviving the next day is as given:

\[ p(N) = \frac{1}{N+2} \]

In that case the probability that we survive at least the next \(n\) days could be \(p^n\), or

\[ P(n | N) = \left(\frac{1}{N+2}\right)^n \]

But of course that’s not really right, because after the first of those \(n\) days we have one more day of information, so we’re a little bit more likely to survive the next. Therefore it can’t just be \(p^n\). We really want:

\[ P(n | N) = \prod_{d=0}^{n-1} \left(\frac{1}{N+d+2}\right) \]

Problem?

The above solutions have two big problems. The first is somewhat obvious: the second, more correct solution requires \(n\) to be an integral multiple of your unit of time. You can’t ask about \(n=1.618\) days, because you can’t take the product of discrete steps from \(d=0\) to \(1.618\).

The second problem is more subtle: neither of these solutions are invariant with respect to the base unit of time. This is easiest to see with the first solution – changing from measuring in days to months turns \(\left(\frac{1}{N+2}\right)^n\) into \(\left(\frac{1}{N/30+2}\right)^{(n/30)}\) which is not identically equal. This is a problem because our units of time are an arbitrary convention and should make no difference to the result.

A True Solution

The resolution to both those problems lies in observing that the rule of succession isn’t really aware of time as a continuous (as opposed to discrete) notion. We need a proper model. As it turns out, the usual model for ‘lifetimes’ of things is a negative exponential:

\[ P(T=t | a) = a e^{-at} \]

\[ P(T>t | a) = e^{-at} \]

This could perhaps be called the continuous generalization of a Bernoulli trial. The event has a constant probability per unit time given that it hasn’t happened yet, described by \(a\), which is called the “hazard rate”. \(a\) has units of inverse time, or “events per unit time”, as you’d expect.

In our case, \(a\) is an unknown variable. The only input we’ve been given is the survival time so far, \(Y\), from which we must calculate the distribution for the future survival time, \(X\) (switching to variables \(X\) and \(Y\) here rather than \(n,N\) since we’re no longer thinking discretely). Our goal is to calculate the posterior distribution:

\[ P(X=x | Y=y) \]

or

\[ P(T=(x+y) | T>y) \]

We can start by calculating the posterior distribution on \(a\):

\[ \begin{aligned} P(a | T>y) &= \frac{P(T>y|a) P(a)}{P(T>y)} \\ &= \frac{P(T>y|a) P(a)}{\int da' ~ P(T>y|a') P(a')} \end{aligned} \]

A reasonable choice of prior \(P(a)\) is also a negative exponential \(P(a) = w e^{-wa}\). We can take the limit \(w \to 0\) for the flat (improper) prior of maximum ignorance. Then,

\[ \begin{aligned} P(T>y | a) P(a) &= e^{-ay} w e^{-wa} \\ &= w e^{-a(y+w)} \\ P(T>y) &= w \int_0^\infty e^{-a(y+w)} ~ da \\ &= \frac{w}{y+w} \end{aligned} \]

So we have our posterior distribution on \(a\):

\[ \begin{aligned} P(a | T>y) &= (y + w) e^{-a(y+w)} \end{aligned} \]

With this we can calculate the posterior pdf on \(X\), by marginalizing over \(a\):

\[ \begin{aligned} P(T=(x+y) | T>y) &= \int_0^∞ a e^{-ax} P(a|T>y) ~ da \\ &= \int_0^∞ a e^{-ax} (y + w) e^{-a(y+w)} ~ da \\ &= (y + w) \int_0^∞ a e^{-a(w + y + x)} ~ da \\ &= \frac{w + y}{(w + y + x)^2} \end{aligned} \]

And the posterior reverse cumulative distribution:

\[ \begin{aligned} P(T>(x+y) | T>y) &= \int_0^∞ { e^{-ax} ~ (y + w) e^{-a(y+w)} } ~ da \\ &= \int_0^∞ { (y + w) e^{-a(w+y+x)} } ~ da \\ &= \frac{w + y}{w + y + x} \end{aligned} \]

Notice that \(w\) (the prior) seems to affect the result in the same way as would \(w\) units of previously observed survival time. Taking the ignorance prior \(w=0\), the results are:

\[ \begin{aligned} P(a | Y=y) &= y e^{-ay} \\ P(X=x | Y=y) &= \frac{y}{(y + x)^2} \\ P(X>x | Y=y) &= \frac{y}{y + x} \end{aligned} \]

The expected value of \(a\) is \(1/y\) which is quite reasonable, but notice that since \(x P(X=x | Y=y)\) goes as \(1/x\) asymptotically, the expectation \(E[x|y]\) doesn’t converge.

Also notice that these results are properly invariant with respect to the unit of time. \(P(X>x | Y=y)\) is a unitless probability, and therefore is completely invariant as it should be, while \(P(X=x | Y=y)\) and \(P(a | Y=y)\) are probability densities with respect to time and inverse time, and therefore have units of inverse time / time as they should.