The chance element. 28.06.20

The chance element. 28.06.20

Welcome to The Plague Pit.

This issue – number 35, ‘The Element of Chance in Epidemiological Modelling’ – comes from another talented student contributor – Ed Lucas. Ed is a sixth-form pupil of Dr John Cullerne.

Dr Cullerne is Undermaster at Winchester College, where he teaches Physics and Maths. He’s a longstanding friend of The Plague Pit, whose inspired reading suggestions appear regularly on our Military Intelligence page. Another pupil of his, Adrian Tsui, has written several pieces for us and Dr Cullerne himself was one of our first expert contributors back in April. Ed refers to some of these articles in his excellent work, below

When I attended a lecture by Dr Cullerne on epidemiology, I saw he had physically run simulations of the Reed-Frost beads model, a mathematical model of epidemics, using a wooden ‘gutter’ and marbles. This is what is called a Stochastic Model because it models the element of chance in infection transmission.

Dr Cullerne’s demonstration was of course designed to teach rather than to actually model. It was very time-consuming and thus used small sample sizes that gave poor results. I wrote a program that virtually runs the model on a computer with far larger sample sizes and logged the data. I coded a model in C#, partly because of my proficiency in the language and partly because it is better suited to mathematical modelling.

This was the beginning of a series of increasingly sophisticated and complex models that sought to more accurately simulate the spread of real-world diseases. The next models I programmed were all based on the SIR compartmental models (Susceptibles, Infectives and Removed), which is essentially a set of differential  equations that model the spread of infection (Dr Cullerne’s and Adrian Tsui’s Plague Pit articles introduce this idea).

https://plaguepit.com/modelling/

https://plaguepit.com/calculating-the-curves/

The Reed-Frost model (the bead model described in Dr Cullerne’s article), like most others, splits the population into groups: infectives, susceptibles and removed, with a select number of blockers. A blocker is a physical barrier to infection like  social distancing or creating “households” or “bubbles”. Each member of a “compartment” is assigned a different colour of bead/marble and then they are randomly ordered in a line. Any susceptibles not separated by a blocker from an infective is infected, then all the old infectives acquire immunity (an assumption for this particular model). The beads are reordered and the process repeats.

Next I programmed a deterministic (no randomness) model based on the SIR equations. This is more like Dr Cullerne’s water model, with continuous flows of individuals from one group to another. The SIR equations give more parameters to change the diseases’ properties, with one relating to how infectious the disease is (called Alpha) and another related to the recovery time (called Beta).

However, the real world isn’t deterministic, it is stochastic (has randomness). I programmed a stochastic model based on the SIR equations.

However, this model has some flaws. The populations can only change by 1 in each time step which limits rapid changes over time, even when the timestep is small. It also determines whether an infection occurs rather arbitrarily, checking if a random number is over a certain threshold. This makes the model poorly resemble the deterministic model. The stochastic model’s infective peak is far flatter than that of the deterministic model, despite each having the same properties. This is a problem as they are both based on the same SIR equations, suggesting something is wrong in the stochastic simulation.

My next step was to skew the odds of the random number being generated using a cumulative Poisson distribution. A Poisson distribution makes certain numbers more likely to be generated than others, as seen below. Lambda, a variable that controls the properties of the distribution and determines where the peak of the distribution is. Lambda changes as the simulation progresses and allows the disease to spread and die with greater dynamism and the result is a lot closer to the SIR curves. To ensure the result’s validity given the inherent randomness at the heart of the algorithm, many simulations are run, with the mean of each population and the standard deviation calculated at each time step. As you can see the results far better match those shown by the deterministic model.

However, while these results look similar to the results given by the pure SIR curves (no random numbers, they are just mathematical curves) there is a clear difference between them.

Even when 1000s of simulations are run with a tiny delta time, the stochastic model is always slightly ‘overeager’ despite the parameters being the same.

To fix this problem I tried an additional method of averaging results which generates multiple different changes in populations at each time step and averages them. However, this only made the ‘over-eagerness’ problem worse, and the standard deviation greater.

Next I thought the random numbers generator may be to blame for this difference. Over time the generated numbers might follow a trend or get more volatile. I tested this hypothesis with another program that tracked the mean and standard deviation of groups of randomly generated numbers.

However even when 10,000 groups of 10,000 random numbers were generated both the mean and the standard deviation remain roughly constant. This disproves my theory of generated numbers following a trend over time, but bad generation may still be the problem.

Random numbers are generated using a ‘seed’, a long sequence of numbers that an algorithm turns into random numbers. In my case I used a predictable seed, my system clock time. This made my random numbers statistically random, as seen in the test above, but not truly random as they were predictable. If you knew the time I ran my generator program you could know the value of every number I generated.

There is another way to generate random numbers that are truly, not just statistically, random. Instead of using a predictable metric like the time etc. the seed is formed by gathering my computer system’s ‘entropy.’ i.e., the time keys are pressed, atmospheric pressure, fan noise and speed and more. This makes my seed impossible to find and therefore the random numbers impossible to predict.

I edited my code to use this method to generate the random numbers. The random number generator outputted almost identical results, but this is not surprising as the method used previously did output statistically random numbers.

But when compared to the deterministic model’s results you can see the problem persists, the stochastic model is always ‘overeager.’ This shows the random number generation isn’t the cause of the problem, but something else.

This journey exploring modelling epidemics was as fascinating and informative as it was inconclusive. The SIR model, as established and simple as it is, is still not fully understood which helps show how difficult it is to control epidemics.

Ed Lucas

Comments are closed.