# Parameter inference using Markov chain Monte Carlo algorithms and Haskell

Jun 28, 2022 · 8 minute read · CodingWe analyze the number of worldwide airline fatal accidents:

Year | 1976 | 1977 | 1978 | 1979 | 1980 | 1981 | 1982 | 1983 | 1984 | 1985 |
---|---|---|---|---|---|---|---|---|---|---|

Fatalities | 24 | 25 | 31 | 31 | 22 | 21 | 26 | 20 | 16 | 22 |

This table is an excerpt of Table 2.2 in Gelman, Andrew and Carlin, John B and Stern, Hal S and Rubin, Donald B (2014).

We assume that the number of fatal accidents \(X\) is Poisson distributed with fatal accident rate \(\lambda\)

\begin{align} Pr(X=k|\lambda) = \frac{\lambda^k e^{-\lambda}}{k!}. \end{align}

The maximum likelihood estimate of \(\lambda\) is the mean of the fatalities,
which is \(23.8\). However, we infer the probability distribution of \(\lambda\)
given the observed fatalities. Actually, we infer a function that is
proportional to the *probability distribution* of \(\lambda\) but is not a
distribution because it does not integrate to \(1.0\). We call this function
*posterior function*.

We know that the true mode of the posterior function should be at \(\lambda = 23.8\), and so the mode of our estimate of the posterior function should be close. We use a Markov chain Monte Carlo (MCMC) sampler and Haskell.

## Algorithm

Markov chains are sequences of events with a peculiar property: the probability of each possible next event only depends on the state of the previous event. In particular, the probability does not depend on any events before the previous event. Markov chains are widely used in statistics, for example, to predict the weather. An interesting application of Markov chains is MCMC, a class of algorithms used to effectively sample from probability distributions.

This sounds theoretical and over-complicated. Why do we need a dedicated algorithm to sample from a probability distribution? Isn’t it easy to just pick more probable values more often than less probably values, and do so with the correct ratio?

The answer is: Yes, and this is exactly what MCMC samplers do. For example, the most famous Metropolis-Hastings-Green algorithm (Geyer, Charles J, 2011, an excellent introduction to MCMC by the way) samples new values from given, well-behaved proposal distributions, only to accept or reject these new values according to the actual probability distribution of interest.

Note that standard probability distributions are heavily studied and there exist much faster, dedicated sampling methods. For example, see the computational methods to sample from the Poisson distribution or the normal distribution. Actually, the Metropolis-Hastings-Green algorithm makes use of these dedicated methods.

## Implementation

In the following, we will implement the sampler using the mcmc library which I am developing. Here, I only briefly present the essential steps to run an MCMC sampler; please have a look at the documentation on Hackage, if you want to get a deeper understanding of the internals. Further, we use the random library, as well as the following imports:

```
-- We need 'void'.
import Control.Monad
-- I am developing the 'mcmc' library.
import Mcmc
-- We need to sample random numbers; requires the 'random' library.
import System.Random
```

The state space \(I\) of an MCMC sampler defines the possible values the Markov chain can attain. In our case \(I\) is the fatal accident rate \(\lambda\), a floating point number:

```
type I = Double
```

For a given fatal accident rate \(\lambda\), the likelihood function is a product of Poisson probabilities with the observed fatalities:

```
lh :: LikelihoodFunction I
lh x = product [poisson x y | y <- fatalities]
where
fatalities = [24, 25, 31, 31, 22, 21, 26, 20, 16, 22]
```

The `LikehoodFunction I`

is a type synonym for `I -> Log Double`

. Internally, we
use probabilities in the log domain to avoid numerical underflow; see
log-domain.

We need to tell the sampler how to propose new values. We do this using proposals. Here, we use two types:

- a sliding proposal which adds random numbers to the current value of \(\lambda\), and
- a scaling proposal which multiplies random values with the current value of \(\lambda\).

```
ppSl :: Proposal I
ppSl = slideSymmetric 0.1 (PName "lambda") (pWeight 1) Tune
ppSc :: Proposal I
ppSc = scaleUnbiased 0.1 (PName "lambda") (pWeight 1) Tune
```

There is some necessary boiler plate code about how large the proposal sizes
are, how we name the proposals, or what weight we assign to the proposals.
`Tune`

tells the MCMC sampler to tune the proposal size according to established
optimization criteria. In particular, the acceptance rates of proposals have
optimal values that depend on the proposal dimension. The proposal dimension
roughly corresponds to the number of independent parameters manipulated by the
proposal. Tuning only happens during the first phase of the sampler which is
called burn in (more about that below).

We collect the proposals in a cycle:

```
cc :: Cycle I
cc = cycleFromList [ppSl, ppSc]
```

This modular definition of proposals, that is, of *how to traverse the state
space* is one of the big strengths of the `mcmc`

library. For complicated state
spaces, we can use `liftProposal`

and lenses to inform proposals about what part
of the state they should change.

Now, we define some monitors, so that we can observe the values of \(\lambda\) attained by the Markov chain:

```
-- 'monitorDouble' is a simple monitor printing the value of a 'Double'.
monLambda :: MonitorParameter I
monLambda = monitorDouble "lambda"
-- We print the value of lambda to the standard output every 100 iterations.
monStdOut :: MonitorStdOut I
monStdOut = monitorStdOut [monLambda] 100
-- We log the value of lambda to a file more often.
monFile :: MonitorFile I
monFile = monitorFile "lambda" [monLambda] 3
mon :: Monitor I
mon = Monitor monStdOut [monFile] []
```

We do not use batch monitors, so the last list of `mon`

is empty.

Before running the chain, we need to provide some required settings:

```
ss :: Settings
ss =
Settings
-- Provide an analysis name.
(AnalysisName "poisson")
-- Burn in for 2000 generations. During burn in, the proposals are tuned
-- automatically. This is called "auto tuning". Here, auto tuning is
-- performed every 200 iterations.
(BurnInWithAutoTuning 2000 200)
-- Number of actual iterations after burn in.
(Iterations 30000)
-- The trace of the Markov chain contains the attained values. In our case,
-- it is a vector of fatal accident rates. Here, we tell the sampler to use
-- the shortest trace possible. In our case, this will be a single value.
-- However, when using batch monitors, or when auto tuning the masses of
-- proposals based on Hamiltonian dynamics, the required length of the trace
-- is larger than 1. masses. The trace length can also be set manually.
TraceAuto
-- Overwrite files created by a possible previous analysis.
Overwrite
-- Do not run chains in parallel. For the standard Metropolis-Hastings-Green
-- algorithm, this has no effect. However, there are algorithms such as the
-- MC3 algorithm with multiple chains that can run in parallel.
Sequential
-- Save the chain so that it can be continued (see 'mcmcContinue').
Save
-- Log to standard output and save the log to a file.
LogStdOutAndFile
-- Verbosity.
Info
```

Finally, we instantiate a chain using the Metropolis-Hastings-Green algorithm
(MHG, `mhg`

function) and run the MCMC sampler with the `mcmc`

function:

```
main :: IO ()
main = do
let g = mkStdGen 0
-- Set up the Markov chain. For computational efficiency (mutable vectors),
-- this requires IO.
al <- mhg ss noPrior lh cc mon 1.0 g
-- We ignore the actual return value which is the complete Markov chain object
-- using 'void'.
void $ mcmc ss al
```

The complete code is available in the mcmc Git repository, see also the accompanying Cabal file.

## Results

Using the above mentioned Git repository, you can run the code with

```
cabal run poisson
```

There is a lot of informative output. Further, log files `poisson.mcmc.*`

and
the monitor file `poisson.lambda.monitor`

are created. Here, I will have a look
at the posterior function of \(\lambda\). To this end, I use Tracer to inspect
the monitor file `poisson.lambda.monitor`

:

We see in the summary statistics, that the estimated median is \(23.88\) which is close to the theoretical optimum of \(23.8\). We also observe that the posterior function is somewhat normal distributed.

## Summary and outlook

We inferred the posterior function of the fatal accident rate of airlines in the 70s and 80s using a simple Poisson distribution, and an MCMC sampler in Haskell. There is a lot more we could do here. For example, we could improve our model using Poisson regression, we could look at advanced proposals such as proposals using Hamiltonian dynamics, or we could look at algorithms using parallel chains such as the Metropolic coupled MCMC (MC3) algorithm.

If this post spurred your interest, and you want to have a look at a real-life
project: We use the `mcmc`

library to perform *phylogenetic dating*. With
McmcDate we infer the ages of speciations using molecular sequence data (DNA),
molecular clocks, the age of fossils, gene transfers and much more! For example,
we apply `McmcDate`

to data from land plants (Harris, Brogan J. and Clark, James W. and Schrempf, Dominik and Szöllősi, Gergely J. and Donoghue, Philip C.J. and Hetherington, Alistair M. and Williams, Tom A., 2021).

The `mcmc`

library is under development, and I am happy about your suggestions
or comments; drop them on the repository on GitHub!

## References

Gelman, Andrew and Carlin, John B and Stern, Hal S and Rubin, Donald B (2014). *Bayesian data analysis*, CRC Press.

Geyer, Charles J (2011). *{Introduction to Markov Chain Monte Carlo}*, CRC press.

Harris, Brogan J. and Clark, James W. and Schrempf, Dominik and Szöllősi, Gergely J. and Donoghue, Philip C.J. and Hetherington, Alistair M. and Williams, Tom A. (2021). *Divergent evolutionary trajectories of bryophytes and tracheophytes from a complex common ancestor of land plants*.