Parameter inference using Markov chain Monte Carlo algorithms and Haskell

Jun 28, 2022 · 8 minute read · Coding

We analyze the number of worldwide airline fatal accidents:

Year	1976	1977	1978	1979	1980	1981	1982	1983	1984	1985
Fatalities	24	25	31	31	22	21	26	20	16	22

This table is an excerpt of Table 2.2 in Gelman, Andrew and Carlin, John B and Stern, Hal S and Rubin, Donald B (2014).

We assume that the number of fatal accidents \(X\) is Poisson distributed with fatal accident rate \(\lambda\)

\begin{align} Pr(X=k|\lambda) = \frac{\lambda^k e^{-\lambda}}{k!}. \end{align}

The maximum likelihood estimate of \(\lambda\) is the mean of the fatalities, which is \(23.8\). However, we infer the probability distribution of \(\lambda\) given the observed fatalities. Actually, we infer a function that is proportional to the probability distribution of \(\lambda\) but is not a distribution because it does not integrate to \(1.0\). We call this function posterior function.

We know that the true mode of the posterior function should be at \(\lambda = 23.8\), and so the mode of our estimate of the posterior function should be close. We use a Markov chain Monte Carlo (MCMC) sampler and Haskell.

Algorithm

Markov chains are sequences of events with a peculiar property: the probability of each possible next event only depends on the state of the previous event. In particular, the probability does not depend on any events before the previous event. Markov chains are widely used in statistics, for example, to predict the weather. An interesting application of Markov chains is MCMC, a class of algorithms used to effectively sample from probability distributions.

This sounds theoretical and over-complicated. Why do we need a dedicated algorithm to sample from a probability distribution? Isn’t it easy to just pick more probable values more often than less probably values, and do so with the correct ratio?

The answer is: Yes, and this is exactly what MCMC samplers do. For example, the most famous Metropolis-Hastings-Green algorithm (Geyer, Charles J, 2011, an excellent introduction to MCMC by the way) samples new values from given, well-behaved proposal distributions, only to accept or reject these new values according to the actual probability distribution of interest.

Note that standard probability distributions are heavily studied and there exist much faster, dedicated sampling methods. For example, see the computational methods to sample from the Poisson distribution or the normal distribution. Actually, the Metropolis-Hastings-Green algorithm makes use of these dedicated methods.

Implementation

In the following, we will implement the sampler using the mcmc library which I am developing. Here, I only briefly present the essential steps to run an MCMC sampler; please have a look at the documentation on Hackage, if you want to get a deeper understanding of the internals. Further, we use the random library, as well as the following imports:

-- We need 'void'.
import Control.Monad
-- I am developing the 'mcmc' library.
import Mcmc
-- We need to sample random numbers; requires the 'random' library.
import System.Random

The state space \(I\) of an MCMC sampler defines the possible values the Markov chain can attain. In our case \(I\) is the fatal accident rate \(\lambda\), a floating point number:

type I = Double

For a given fatal accident rate \(\lambda\), the likelihood function is a product of Poisson probabilities with the observed fatalities:

lh :: LikelihoodFunction I
lh x = product [poisson x y | y <- fatalities]
  where
    fatalities = [24, 25, 31, 31, 22, 21, 26, 20, 16, 22]

The LikehoodFunction I is a type synonym for I -> Log Double. Internally, we use probabilities in the log domain to avoid numerical underflow; see log-domain.

We need to tell the sampler how to propose new values. We do this using proposals. Here, we use two types:

a sliding proposal which adds random numbers to the current value of \(\lambda\), and
a scaling proposal which multiplies random values with the current value of \(\lambda\).

ppSl :: Proposal I
ppSl = slideSymmetric 0.1 (PName "lambda") (pWeight 1) Tune

ppSc :: Proposal I
ppSc = scaleUnbiased 0.1 (PName "lambda") (pWeight 1) Tune

There is some necessary boiler plate code about how large the proposal sizes are, how we name the proposals, or what weight we assign to the proposals. Tune tells the MCMC sampler to tune the proposal size according to established optimization criteria. In particular, the acceptance rates of proposals have optimal values that depend on the proposal dimension. The proposal dimension roughly corresponds to the number of independent parameters manipulated by the proposal. Tuning only happens during the first phase of the sampler which is called burn in (more about that below).

We collect the proposals in a cycle:

cc :: Cycle I
cc = cycleFromList [ppSl, ppSc]

This modular definition of proposals, that is, of how to traverse the state space is one of the big strengths of the mcmc library. For complicated state spaces, we can use liftProposal and lenses to inform proposals about what part of the state they should change.

Now, we define some monitors, so that we can observe the values of \(\lambda\) attained by the Markov chain:

-- 'monitorDouble' is a simple monitor printing the value of a 'Double'.
monLambda :: MonitorParameter I
monLambda = monitorDouble "lambda"

-- We print the value of lambda to the standard output every 100 iterations.
monStdOut :: MonitorStdOut I
monStdOut = monitorStdOut [monLambda] 100

-- We log the value of lambda to a file more often.
monFile :: MonitorFile I
monFile = monitorFile "lambda" [monLambda] 3

mon :: Monitor I
mon = Monitor monStdOut [monFile] []

We do not use batch monitors, so the last list of mon is empty.

Before running the chain, we need to provide some required settings:

ss :: Settings
ss =
  Settings
    -- Provide an analysis name.
    (AnalysisName "poisson")
    -- Burn in for 2000 generations. During burn in, the proposals are tuned
    -- automatically. This is called "auto tuning". Here, auto tuning is
    -- performed every 200 iterations.
    (BurnInWithAutoTuning 2000 200)
    -- Number of actual iterations after burn in.
    (Iterations 30000)
    -- The trace of the Markov chain contains the attained values. In our case,
    -- it is a vector of fatal accident rates. Here, we tell the sampler to use
    -- the shortest trace possible. In our case, this will be a single value.
    -- However, when using batch monitors, or when auto tuning the masses of
    -- proposals based on Hamiltonian dynamics, the required length of the trace
    -- is larger than 1. masses. The trace length can also be set manually.
    TraceAuto
    -- Overwrite files created by a possible previous analysis.
    Overwrite
    -- Do not run chains in parallel. For the standard Metropolis-Hastings-Green
    -- algorithm, this has no effect. However, there are algorithms such as the
    -- MC3 algorithm with multiple chains that can run in parallel.
    Sequential
    -- Save the chain so that it can be continued (see 'mcmcContinue').
    Save
    -- Log to standard output and save the log to a file.
    LogStdOutAndFile
    -- Verbosity.
    Info

Finally, we instantiate a chain using the Metropolis-Hastings-Green algorithm (MHG, mhg function) and run the MCMC sampler with the mcmc function:

main :: IO ()
main = do
  let g = mkStdGen 0
  -- Set up the Markov chain. For computational efficiency (mutable vectors),
  -- this requires IO.
  al <- mhg ss noPrior lh cc mon 1.0 g
  -- We ignore the actual return value which is the complete Markov chain object
  -- using 'void'.
  void $ mcmc ss al

The complete code is available in the mcmc Git repository, see also the accompanying Cabal file.

Results

Using the above mentioned Git repository, you can run the code with

cabal run poisson

There is a lot of informative output. Further, log files poisson.mcmc.* and the monitor file poisson.lambda.monitor are created. Here, I will have a look at the posterior function of \(\lambda\). To this end, I use Tracer to inspect the monitor file poisson.lambda.monitor:

We see in the summary statistics, that the estimated median is \(23.88\) which is close to the theoretical optimum of \(23.8\). We also observe that the posterior function is somewhat normal distributed.

Summary and outlook

We inferred the posterior function of the fatal accident rate of airlines in the 70s and 80s using a simple Poisson distribution, and an MCMC sampler in Haskell. There is a lot more we could do here. For example, we could improve our model using Poisson regression, we could look at advanced proposals such as proposals using Hamiltonian dynamics, or we could look at algorithms using parallel chains such as the Metropolic coupled MCMC (MC3) algorithm.

If this post spurred your interest, and you want to have a look at a real-life project: We use the mcmc library to perform phylogenetic dating. With McmcDate we infer the ages of speciations using molecular sequence data (DNA), molecular clocks, the age of fossils, gene transfers and much more! For example, we apply McmcDate to data from land plants (Harris, Brogan J. and Clark, James W. and Schrempf, Dominik and Szöllősi, Gergely J. and Donoghue, Philip C.J. and Hetherington, Alistair M. and Williams, Tom A., 2021).

The mcmc library is under development, and I am happy about your suggestions or comments; drop them on the repository on GitHub!

References

Gelman, Andrew and Carlin, John B and Stern, Hal S and Rubin, Donald B (2014). Bayesian data analysis, CRC Press.

Geyer, Charles J (2011). Introduction to Markov Chain Monte Carlo, CRC press.