Archive

Posts Tagged ‘Markov transition matrix’

On Patchixes and Patches - or Pasqualian Matrixes - (RWLA,MCT,GT,AM Part II)

October 10th, 2010 3 comments

For the last few months I have been thinking about several very curious properties of patchixes and patches (mentioned here); in particular, having studied patch behavior in a "continuous Markov chain" context, and, at having been drinking a bowl of cereal and  observing the interesting movement of the remaining milk, it hit me: a patch could certainly describe milk movement at particular time steps.  It is my hope to try to elucidate this concept a little better here today.  In particular, I think I have discovered a new way to describe waves and oscillations, or rather, "cumulative movement where the amount of liquid is constant" in general, but, in my honest belief, I think this new way and the old way converge in limit (this based on my studies, here and here, or discrete Markov chains at the limit of tiny time steps, so that time is continuous), although it is a little bit unclear to me how at the moment.  It is my hope that this new way not only paves the way for a new and rich field of research, but I foresee it clarifying studies in, for example, turbulence, and, maybe one day, Navier-Stokes related concepts.  This last part may sound a little lofty and ambitious, but an approach in which, for example, vector fields of force or velocity need to be described for every particle and position of space, with overcomplicated second and third order partial derivatives, is in itself somewhat ambitious and lofty, and often prohibitive for finding exact solutions;  perhaps studying particle accumulations through a method of approximation, rather than individual particles, is the answer.

I want to attempt to describe the roadmap that led me to the concept of a patchix (pasqualian matrix) in the first place; it was in the context of discrete Markov chains.  Specifically, I thought that, as we study linear algebra, for a function or transformation  T(\textbf{v}) , with  \textbf{v} is an n-vector with  n entries (finite), we have  T can be described succinctly by an  n \times n matrix.  Such a matrix then, converts  \textbf{v} into another n-dimensional vector, say  \textbf{w} .  This field is very well studied of course: in particular, invertible transformations are very useful, and many matrixes can be used to describe symmetries, so that they underlie Group Theory:

 \textbf{v} \underrightarrow{T} \textbf{w}

Another useful transformation concept resides in  l_2 , the space of sequences whose lengths squared (dot product with itself) converge, that was used, for example by Heisenberg, in quantum mechanics, as I understand it.  For example, the sequence  x_1 + x_2 + \ldots can be transported to another  y_1 + y_2 + \ldots via  T , as by  T(x_1 + x_2 + \ldots) = y_1 + y_2 + \ldots .  Key here then was the fact that  x_1^2 + x_2^2 + \ldots converged, so that  \sqrt{x_1^2 + x_2^2 + \ldots} , the norm, is defined.  Also the dot product  x_1 y_1 + x_2 y_2 + \ldots converges (why?).  Importantly, however, this information points in the direction that a transformation matrix could be created for  T to facilitate computation, with an infinite number of entries, so that indeed a sequence is taken into another in this space in a manner that is easy and convenient.  I think this concept was used by Kolmogorov in extending Markov matrices as well, but I freely admit I am not very versed in mathematical history.  Help in this regard is muchly appreciated.

In function space such as  C^{\infty}[0,1] , the inner product of, say, f(x) with g(x) is also defined, as  \langle f(x), g(x) \rangle = \int_0^{1} f(x) \cdot g(x) dx , point-wise continuous multiplications of the functions summed absolutely convergently (which results from the integral).  Then the norm of  f(x) is  \sqrt{\langle f(x), f(x) \rangle} = \sqrt{\int_0^{1} f(x)^2 dx} .  The problem is of course no convenient "continuous matrix" that results in the transform  T(f(x)) = g(x) , although transforms of a kind can be achieved through a discrete matrix, if its coefficients represent, say, the coefficients of a (finite) polynomial.  Thus, we can transform polynomials into other polynomials, but this is limiting in scope in many ways.

The idea is that we transform a function to another by point-wise reassignment: continuously.  Thus the concept of a patchix (pasqualian matrix) emerges, we need only mimic the mechanical motions we go through when conveniently calculating any other matrix product.  Take a function  f(x) defined continuously on  [0,1] , send  x \rightsquigarrow 1-y so that  f(1-y) is now aligned with the y-axis. From the another viewpoint, consider  f(1-y) as  f(1-y,t) so that, at any value of  t , the cross-section looks like  f .  Define a patchix  p(x,y) on  [0,1] \times [0,1] .  Now "multiply" the function (actually a patchix itself from the different viewpoint) with the patchix as  \int_{0}^{1} f(1-y) \cdot p(x,y) dy = g(x) to obtain  g(x) .  The patchix has transformed  f(x) \rightsquigarrow g(x) as we wanted.  I think there are profound implications from this simple observation; one may now consider, for example, inverse patchixes (or how to get  g(x) back to  f(x) , identity patchixes, and along with these one must consider what it may mean, as crazy as it sounds, to solve an infinite (dense) system of equations; powers of patchixes and what they represent; eigenpatchixvalues and eigenfunctionvectors; group theoretical concepts such as symmetry groups the patchixes may give rise to, etc.

As much as that is extremely interesting to me, and I plan on continuing with my own investigations, my previous post and informal paper considered the implications of multiplying functions by functions, functions by patchixes, and patchixes by patchixes.  Actually I considered special kinds of patchixes  p(x,y) , those having the property that for any specific value  y_c \in [0,1] , then  \int_0^1 p(x,y_c) dx = 1 .  Such special patchixes I dubbed patches (pasqualian special matrixes), and I went on to attempt an extension of a Markov matrix and its concept into a Continuous Markov Patch, along with the logical extension of the Chapman-Kolmogorov equation by first defining patch (discrete) powers (this basically means "patchix multiplying" a patch with itself).  The post can be found here.

So today what I want to do is continue the characterization of patches that I started.  First of all, emulating some properties of the Markov treatment, I want to show how we can multiply a probability distribution (function) "vector" by a patch to obtain another probability distribution function vector. Now this probability distribution is special, in the sense that it doesn't live in all of  \mathbb{R} but in  [0,1] .  A beta distribution, such as  B(2,2) = 6(x)(1-x) , is the type that I'm specifically thinking about. So suppose we have a function  b(x) , which we must convert first to  b(1-y) in preparation to multiply by the patch.  Suppose then the patch is  p(x,y) with the property that, for any specific  y_c , then  \int_0^1 p(x,y_c) dx = 1 .  Now, the "patchix multiplication" is done by

 \int_0^1 b(1-y) \cdot p(x,y) dy

and is a function of  x .  We can show that this is indeed a probability distribution function vector by taking the integral for every infinitesimal change in  x , and see if it adds up to one, like this:

 \int_0^1 \int_0^1 b(1-y) \cdot p(x,y) dy dx

If there is no issue with absolute convergence of the integrals, there is no issue with the order of integration by the Fubini theorem, so we have:

 \int_0^1 \int_0^1 b(1-y) \cdot p(x,y) dx dy = \int_0^1 b(1-y) \int_0^1 p(x,y) dx dy

Now for the inner integral,  p(x,y) adds up to 1 for any choice of  y , so the whole artifact it is in effect a uniform distribution in  [0,1] with value 1 (i.e., for any choice of  y \in [0,1] , the value of the integral is 1).  Thus we have, in effect,

 \int_0^1 b(1-y) \int_0^1 p(x,y) dx dy = \int_0^1 b(1-y) \cdot u(y) dy = \int_0^1 b(1-y) (1) dy

for any choice of  y in  [0,1] , and that last part we know is 1 by hypothesis.

Here's a specific example:  Let's declare  b(x) = 6(x)(1-x) and  p(x,y) = x + \frac{1}{2} .  Of course, as required,  \int_0^1 p(x,y) dx = \int_0^1 x + \frac{1}{2} dx = (\frac{x^2}{2} + \frac{x}{2}) \vert^1_0 = 1 .  So then  b(1-y) = 6(1-y)(y) , and by "patchix multiplication"

 \int_0^1 b(1-y) \cdot p(x,y) dy = \int_0^1 6(1-y)(y) \cdot \left(x + \frac{1}{2} \right) dy = x + \frac{1}{2}

Thus, via this particular patch, the function of  b(x) = 6(x)(1-x) \rightsquigarrow c(x) = x + \frac{1}{2} , point by point.  Which brings me to my next point.

If  p(x,y) is really solely a function of  x , then it follows that  b(x) \rightsquigarrow p(x) any initial probability distribution becomes the patch function distribution (from the viewpoint of a single dimension, than two).  Here's why:

 \int_0^1 b(1-y) \cdot p(x,y) dy = \int_0^1 b(1-y) \cdot p(x) dy = p(x) \int_0^1 b(1-y) dy = p(x)

I think, of course, a lot more interesting are patches that are in fact functions of both  x and of  y .  There arises a problem in constructing them.  For example, let's assume that we can split  p(x,y) = f(x) + g(y) .  Forcing our requirement that  \int_0^1 p(x,y) dx = 1 for any  y \in [0,1] means:

 \int_0^1 p(x,y) dx = \int_0^1 f(x) dx + g(y) \int_0^1 dx = \int_0^1 f(x) dx + g(y) = 1

which implies certainly that   g(y) = 1 - \int_0^1 f(x) dx is a constant since the integral is a constant.  Thus it follows that  p(x,y) = p(x) is a function of  x alone.  Then we may try  p(x,y) = f(x) \cdot g(y) .  Forcing our requirement again,

 \int_0^1 p(x,y) dx = \int_0^1 f(x) \cdot g(y) dx = g(y) \int_0^1 f(x) dx = 1

means that  g(y) = \frac{1}{\int_0^1 f(x) dx} , again, a constant, and  p(x,y) = p(x) once more.  Clearly the function interactions should be more complex, let's say something like:  p(x,y) = f_1(x) \cdot g_1(y) + f_2(x) \cdot g_2(y) .

 \int_0^1 p(x,y) dx = g_1(y) \int_0^1 f_1(x) dx + g_2(y) \int_0^1 f_2(x) dx = 1

so that, determining three of the functions determines the last one, say

 g_2(y) = \frac{1-g_1(y) \int_0^1 f_1(x) dx}{\int_0^1 f_2(x) dx} is in fact, a function of  y .

Let's construct a patch in this manner and see its effect on a  B(2,2) .  Let  f_1(x) = x^2 , and  g_1(y) = y^3 , and  f_2(x) = x , so that

 g_2(y) = \frac{1 - g_1(y) \int_0^1 f_1(x) dx}{\int_0^1 f_2(x) dx} = \frac{1 - y^3 \int_0^1 x^2 dx}{\int_0^1 x dx} = \frac{1 - \frac{y^3}{3}}{\frac{1}{2}} = 2 - \frac{2y^3}{3}

and  p(x,y) = x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) .

So now the "patchix product" is

 \int_0^1 6(1-y)(y) \cdot \left(x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) \right) dy = \frac{x^2}{5} + \frac{28x}{15} which is a probability distribution on the interval  [0,1] and, as a matter of check, we can integrate with respect to  x to obtain 1.  Thus the probability distribution function  6(x)(1-x) is carried, point by point, as  6(x)(1-x) \rightsquigarrow \frac{x^2}{5} + \frac{28x}{15} which, quite frankly, is very amusing to me!

From an analytical point of view, it may be interesting or useful to see what happens to the uniform distribution on  [0,1] when it's "patchix multiplied" by the patch above.  We would have:

 \int_0^1 u(y) \cdot \left(x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) \right) dy = \int_0^1 (1) \cdot \left(x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) \right) dy = \frac{x^2}{4} + \frac{11x}{12}

so that  u(x) \rightsquigarrow \frac{x^2}{4} + \frac{11x}{12} .

In my next post, I want to talk about more in detail about "patchix multiplication" of, not a probability distribution on [0,1] vectors by a patch, but of a patch by a patch, which is the basis of (self) patch powers: with this I want to begin a discussion on how we can map oscillations and movement in a different way, so that perhaps we can trace my cereal milk movement in time.

On a Battle Markov Chain Model

October 1st, 2010 4 comments

About a month ago I befriended Dan, from Giant Battling Robots, online, as he was doing investigations into Lanchester's differential equations to model the outcome of battles, particularly those of videogames.  He sent me this very interesting link that describes a Markov chain model for battles, since he saw that I was trying figure out a way to coalesce Markov reasoning with Lanchesterian DEs.  The pdf describes quite a different approach that I've been taking or investigating, but I thought it would be interesting nevertheless to attempt and illustrate it by example.  The subject of this post is therefore expository, based on Dan's link, but the example and explanation is entirely mine, and so any error in concept is my own.

Suppose BLUE and GREEN are feuding - in fact, battling to the death, each possessing the same or differing technologies for their engagements.  The particular technologies translate to a probability of kill of the other's units.  For example, GREEN may have bazooka technology that assures that, when launched, the probability of kill/disabling of its enemy unit (perhaps a tank) is 0.45, where BLUE's tanks may kill bazooka-launchers with probability 0.55.  These probabilities represent the fact that, if an encounter is to occur at a time step, in 55 percent of encounters the tanker (BLUE) is successful in killing or disabling GREEN's unit;  on the other hand, it is unsuccessful in 45 percent of encounters.

For a battlefield in a given area, depending on hiding places, like trees, or forests, or caves (terrain), the probability of a tank or a bazooka-launcher to encounter at that time step is an exponentially distributed random variable.  What this means is basically this:  a single BLUE and a single GREEN will find each other with a given (expected probability) at a given time-step, and the higher probabilities of encounter are less likely than the lesser ones for any time-step (I may be having a conceptual problem here, since, an exponentially distributed random variable has a domain from 0 to infinity, where of course the probability space is 0 to 1). So in other words, the probability of encounter of a GREEN with a BLUE, let's say, is 0.87 on expectation.  It's a fuzzy 0.87, because in the next time step it may be less or more, but it's 0.87 on average.

Now, since the probability of finding each other and the probability of winning an encounter (we assume 1-on-1) is independent, then the probability of killing the opposing unit is simply the probability of encounter times the probability of succeeding in the kill.  If BLUE has 4 units and GREEN 3, our initial state, the probability of transitioning to state (3,3) is 0.39: 0.87, the (expected) encounter probability at the time step, times 0.45, the success probability of GREEN.  Notice we allow for only ONE encounter (or NONE) at any time step: this makes sense if the time-steps are sufficiently small and resolution of the encounter, if there is one, is instantaneous.  The probability of transitioning to state (4,2) is 0.48: the expected probability of encounter, 0.87, times the probability of success by BLUE, 0.55.  Finally, the probability of staying in state (4,3) is 1-.39-.48 = 0.13.

Neglecting the natural variations in our expected encounter probability, we can now craft the following chart that describes the transition probabilities to each state succinctly.  Yellow cells are possible states.  Directly to the left of a state is the probability of BLUE being killed one unit, and to the bottom the probability of GREEN being killed one unit at the next time-step.  The diagonal probability is the probability of remaining in that state at the next time-step:

The battle ends whenever one of the coordinates of our state is 0: for example, (0,3), or (0,1), or (2,0)...  Clearly, state (0,0) is unreachable: there must be at least one unit that remains in the end.  Of course, in this model, who wins in the end does not only depend on the technology constants (probability of kill or success), but on numerical superiority: namely, initial conditions or the initial state coordinate.

It's difficult to order the states in any meaningful way if they are Cartesian coordinates like in this example.  I used a diagonal method to craft the transition probabilities in matrix form.  Orange states are absorbing:

As in the usual Markov chain treatment, the nth power of this matrix indicates the transition probabilities at the nth time step.  After many time steps, if the transition probability matrix possesses certain properties, like regularity, then this basically means that the difference between a matrix at the nth step and the (n+1)th is negligible: all entries are "stable."  At the same time, the initial vector state (in our case 0 everywhere except 1 at state (4,3)) morphs into (converges to) a stable steady-state vector (while going through several intermediate vectors).  This stable vector indicates the probability of each outcome after a very long time: it indicates the proportion of time we will see GREEN or BLUE win after considering every possible battle, even the very long ones; so it is indeed a true probability of win.  For example, the following chart

shows that, if we simulate battles with identical initial states and transition probabilities and up to 20 time-steps, after 20 time-steps, BLUE is more likely to have won the majority of simulations (by obliterating the opposition, and remaining with 1, 2, 3 or 4 units up in the end).  Notice the fix proportion of wins for BLUE and for GREEN.  Else think about it this way.  Let U represent the battle is undecided at that time step, and B that BLUE has won at that time-step.  B will continue to be the state of the battle after it has become B for the first time (the battle is now over, so B remains the state of the system).  G indicates likewise that GREEN has won at that time step, and an infinite string of Gs after the first G indicates that G will continue to be the state of the battle after GREEN has won:

Simulation 1:  UUBBBBBBBBBBB....

Simulation 2: UUUUUUUGGGGG.....

etc.

We will have strings of Us followed by either B or G but not both, infinitely.  If we count the proportion of Us at the first time step (1st position of the string) across all simulations, this corresponds to the chart's 1 proportions.  If we count the proportion of Gs and Bs at the 20th step across all simulations, this corresponds to the chart's 20th step proportions.  The higher the time-step will indicate the true probability of win after most battles are (long) over, and this probability is stable/convergent.  In my particular example, most battles are over by the time they reach the 20th time step, and BLUE has won most battles, no matter how short or long they took.

I have crafted an .xls file that might make things clearer.  The items that you may want to modify are in grey, and the chart is there too, in a different sheet.  For the (grey) state vector, put a 1 in the initial state and 0 everywhere else; otherwise make sure you enter proportions and that they add up to 1 in the end. I only did the chart with states up to (4,3), but you can probably extrapolate the idea for states that have more than 4 BLUE units and 3 GREEN ones.  You can probably use this chart for smaller dimension games without much effort, too.

As always, comments or questions are welcome!

Enjoy!

MarkovBattle

On Lanchester’s Differential Equations and WWII: Modeling the Iwo Jima Battle

August 18th, 2010 1 comment

Almost at the end of World War II, a battle to the death between US and Japanese forces took place off Japan in the island of Iwo Jima.  We can definitely apply a Lanchesterian analysis to this encounter; in fact, it has already been done by Martin Braun in Differential Equations and their Applications, 2nd ed. (New York: Springer and Verlag, 1975).  If  x represents the number of US soldiers and  y represents the number of Japanese troops, Braun estimates that  a = 0.05 and  b = 0.01 approximately.  Assuming then that the initial strength of the US forces was hovering around 54,000, and that of the Japanese was 21,500, we'd be interested in knowing the trajectory of the battle and see who wins.  I've used the method described in my previous post to obtain estimates for the proportion of  soldiers alive in  X = US and in  Y = Japan , namely the Markovization or discretization (or Eulerization) method, and this is what we come up with.  We assume the battle is fought to the death and there are no reinforcements from either side:

 L = \left[ \begin{array}{ccc} 1 - 0.05 \cdot \frac{p_{alive,y}}{p_{alive,x}} & 0 & 0.05 \cdot \frac{p_{alive,y}}{p_{alive,x}} \\ 0 & 1 - 0.01 \cdot \frac{p_{alive,x}}{p_{alive,y}} & 0.01 \cdot \frac{p_{alive,x}}{p_{alive,y}} \\ 0 & 0 & 1 \end{array} \right]

The attrition constants are within the range so that all entries are less than one as required by a Markov transition matrix provided the excess ratio of one army population to another is not too large (also, each row sums to 1).  The matrix is time-dependent, however, because each time the proportion of alives in any army changes according to the strength of the other army (proportional to both numerical superiority and technology, as can be seen in the matrix above, thus, from the Markov chain vantage point, the probability of death at each time step changes), as modeled by Lanchester's continuous differential equations.  Here are the proportion trajectories I came up with, computing at each time step the above matrix and pulling the original vector through:

Iwo Jima battle

It's pretty clear that, despite having the technology advantage, the Japanese lose because of numerical inferiority after about 59 time-steps.  The approximate proportion of US soldiers at the end of the battle is 0.35 (of the total soldier population), or about 26,276 soldiers, a HUGE decline from the initial 54,000 (roughly 0.72 of the total soldier population).  The Japanese army essentially obliterated half the soldier population of the US, a testament to their ferocity in battle.

My Calculus book (Deborah Hughes-Hallet, Andrew M. Gleason, et al., New York, John Wiley & Sons, Inc., 1994, p.551) suggests that the US in fact would have had 19,000 soldiers that could have stepped in as reinforcements: this does not matter really if the reinforcements come at the end of the battle, the battle has already been won with 54,000 soldiers, but it does alter the outcome significantly if they are added to the original soldiers at the beginning of battle (namely the US has 73,000 troops, or 0.78 of the total soldier population): the Japanese army would lose in roughly 34 time-steps (to the death), and the number of US soldiers left standing would be about 55,420, or 0.59 of the total soldier population: the Japanese obliterate only about 25% of the US troops.

In my next post, I want to discuss Dan's (from Giant Battling Robots) pdf document that he provided, essentially looking at a different stochastic model related to warfare and battle.  It's pretty neat, although it counts each potential outcome as a Markovian state (and each state is linked "downward" by a transition probability so that a given probability governs the death of one soldier in either army).  Thanks Dan!

On Lanchester's Differential Equations and their Transform into a Markov Transition Matrix (or, the Markovization of the Lanchester Equations)

August 10th, 2010 7 comments

I have been very curious as to whether the Markovization of differential equations is applicable to all systems, linear or not.  In the case of the SIR differential equations, Markovization was possible even when the system is both non-linear and coupled.  I think that perhaps there is an algorithm to frame the differential equations thusly... I haven't fleshed it out completely (other than by using quite a bit of intuition, as in the SIR case).  Markovization, I think, is distinct from putting differential equations in state-space form; the vector pulled through the state-space matrix represents the trajectories of a function and its derivatives, not the trajectories of a set of functions related in that their sum is one: probabilities or proportions.  Who knows, though, a happy mix may be possible.

I was examining a Calculus book, specifically by Deborah Hughes-Hallet, Andrew M. Gleason, et al. (New York, John Wiley & Sons, Inc., 1994), and came across this very interesting problem that models how armies or soldiers fight it out... and tries to predict who wins.  I hope you don't judge: even after having read the most complicated books in mathematics, I honestly believe that revisiting more basic books can bring about insights that one didn't think of before.  Maybe even ground the complicated knowledge one acquired, or refine it.  Such has been my own experience... so I think it's a useful exercise, for all of those who have resisted "going back to the basics." If you are one, perhaps you are missing the opportunity to bridge the gaps you may (or not) still have!  :-\

The problem on page 550 says that, if  x represents the number of soldiers from army  X and  y represents the number of soldiers from army  Y , the rate at which soldiers in one army are injured or die can be thought of as proportional to the number of "firepower" the other army can concentrate on them.  The model was first thought by F.W. Lanchester:

 \frac{dx}{dt} = -ay

 \frac{dy}{dt} = -bx with  a, b > 0 .

In order to Markovize Lanchester's differential equations, we need to flesh out what the states are: clearly, you can be alive and belong to army  X , or army  Y , or be dead (from a different vantage point, we could have "Dead from  X " and "Dead from  Y " if we wanted to count each individually).  The Markov transition matrix should look something like this, then:

 L = \left[ \begin{array}{ccc} something & 0 & something \\ 0 & something & something \\ 0 & 0 & 1 \end{array} \right]

where  L_{1,1} represents the probability of being alive in army  X (if you are from army  X ) after a time step (or the proportion of alive soldiers in  X after a time step),  L_{1,2} represents the probability of "defecting" to army  Y if you are in army  X (I assume such doesn't happen),  L_{1,3} represents the probability of dying if you are in army  X .  Then  L_{2,1} represents the probability of defecting to army  X if you are in army  Y (I also assume this doesn't happen),  L_{2,2} represents the probability of being alive in army  Y , if you are from army  Y , and  L_{2,3} is the probability of dying if you are in army  Y .  The third row of  L simply represents the fact that one cannot come back from the dead once dead:  there are no zombies (such may be in fact possible in videogames, for example, hence why I make a point of saying this).

The next order of business is to note that the total population does not change (alives and deads).  Thus we would have that:

 p_{alive,x} = x/N

 p_{alive,y} = y/N , and

 p_{alive,x} + p_{alive,y} + p_{dead} = 1 .  Rewriting Lanchester's equations in terms of proportions, we would have that:

 \frac{d p_{alive,x}}{dt} = -a p_{alive,y}

 \frac{d p_{alive,y}}{dt} = -b p_{alive,x}

To be complete, we must model what the rate of change of the dead is, which we can figure out by differentiating in time the equation  p_{alive,x} + p_{alive,y} + p_{dead} = 1 .  In effect, what I'm saying is that, Lanchester's equations are incomplete - they are missing the key information that:

 \frac{d p_{dead}}{dt} = a p_{alive,y} + b p_{alive,x}

This equation basically states that the dead increases as the rate at which soldiers in army  X are put down and the rate at which soldiers in army  Y are put down.

Euler's method suggests that to write the approximation to these differential equations, we need think of the derivative not at the limit, but taking tiny time steps.  So in the following equation

 lim_{\Delta t \rightarrow \infty} \frac{\Delta p_{alive,x}}{\Delta t} = -a p_{alive,y}

let's forget about the limit for a second, and solve thus:

 \Delta p_{alive,x} \approx -a p_{alive,y} \Delta t

and

 p_{alive,x} (t+\Delta t) \approx p_{alive,x} (t) - a p_{alive,y} \Delta t .  Writing all differential equations thus:

 p_{alive,y} (t+\Delta t) \approx p_{alive,y} (t) -b p_{alive,x} \Delta t and

 p_{dead} (t+\Delta t) \approx p_{dead} (t) + a p_{alive,y} \Delta t + b p_{alive,x} \Delta t

Euler's method gives us just what we need for our transition matrix construct.  If the approximations are in fact equalities, from the point-of-view of Markov chain theory, the left-hand side is the next-step vector, and the right-hand side explains how this vector would be obtained by a transformation on the at-step vector.  In terms of Markov dynamics:

 \textbf{p(t +} \Delta t \textbf{)} = \textbf{p(t)} \cdot L

Playing a bit with the matrix  L , it should look like:

 L = \left[ \begin{array}{ccc} 1 - a \cdot \frac{p_{alive,y}}{p_{alive,x}} \Delta t & 0 & a \cdot \frac{p_{alive,y}}{p_{alive,x}} \Delta t \\ 0 & 1 - b \cdot \frac{p_{alive,x}}{p_{alive,y}} \Delta t & b \cdot \frac{p_{alive,x}}{p_{alive,y}} \Delta t \\ 0 & 0 & 1 \end{array} \right]

Letting  \Delta t be unity,

 \textbf{p(t +1)} = \textbf{p(t)} \cdot L

with

 L = \left[ \begin{array}{ccc} 1 - a \cdot \frac{p_{alive,y}}{p_{alive,x}} & 0 & a \cdot \frac{p_{alive,y}}{p_{alive,x}} \\ 0 & 1 - b \cdot \frac{p_{alive,x}}{p_{alive,y}} & b \cdot \frac{p_{alive,x}}{p_{alive,y}} \\ 0 & 0 & 1 \end{array} \right]

and, I think, we have successfully Markovized the differential equations, in effect discretizing them at unit time steps.  We should be cautious though, because now both  a, b have additional restrictions in order that the entries of  L remain below or equal to 1 (and specifically, the sum of each row must be equal to one).

Euler's method in effect is the link here between the continuous differential equations and the Markov transition matrix.  In the SIR discretization of my previous post, I went backwards, showing how we can write continuous differential equations from the Markov transition matrix.

Several interesting things to note about the matrix  L : the probability of death, if one is in army  X is proportional to the attrition  a and by how many soldiers army  Y exceeds  X at the time.  A similar argument works for the death probability of soldiers in  Y , with attrition  b and excedent of  X to  Y .

As with the SIR equations, since  L is really changing every time step, it is  L(t) .  So, from initial conditions, the amount of soldiers alive in army  X , army  Y , and dead at time step  n can be calculated using the discretized matrix by pulling the original vector through, as:

 \textbf{p(t +} n \Delta t \textbf{)} = \textbf{p(t)} \cdot L(t) \cdot L(t + \Delta t) \cdot \ldots \cdot L(t + (n-1) \Delta t)

In my next post, I want to model, as in the Calculus book I mentioned, the Iwo Jima encounter during World War II, the fight between Japan and the US, and solving the trajectory numerically using the discretization above.

An additional note, from examining our assumptions of matrix  L , we realize now simple ways in which we could extend Lanchester's differential equations, by filling in "probabilities that a soldier will defect to the other army."  We have indeed now enriched the model in the sense that we can now take into account other complicated dynamics (the least of which is the "creation of zombies").  I think this is very useful in any model, because it allows us to contemplate transitions into states we wouldn't have considered otherwise, and, more powerfully, allows us to transform our mindset to the realm of differential equations, where we can have DE-related insights, and back again, where we can have Markovian insights.

On the Swine Flu (a Year-and-a-Half Later)

August 8th, 2010 1 comment

This week I have been revisiting the swine flu scare of April 2009.  The Mexican government then was particularly anxious: it suspended regular classes, and many kids were just thrilled that they found themselves suddenly on vacation.  Some schools even advised kids that if they felt ill, they should excuse themselves from attending classes for at least a week.  Many took the slightest symptom as an excuse to take a short vacation... to the beach, no less.

A year ago in April I attempted to simulate, a grosso modo, not the time progress of the disease, but yes some of its generalities.  I estimated, for example, that if the swine flu were concentrated in Mexico City, with its sizable population of 20 million, at the outbreak rate of infection and, by clever deduction of the SIR differential equation constants of proportionality  a (and a wild guess of  b ), the total of infected individuals at one time would come to about 500,000, or 2.5% of the city's population.

One thing that struck me from Mexico's Secretaria de Salud infomercials at the time was that they graphed an exponential progression of the disease (which, after government intervention, supposedly decreased exponentially again). It was unrealistic and, whether the authorities minded or not, misinforming.

With the SIR model I recently calculated what could have been the time progress of the disease: I used the method that I developed in my previous post to facilitate the calculation, after I realized that my (Markovian) matrix formulation was equivalent to Euler's method: expanding out the matrix one obtains, indeed, Euler's estimation formulas (which we thought of, from the point-of-view of a Markov chain, as the one-step forward transformation of a vector pulled through or propagated by the Markov matrix).  Interestingly, then, one can say that a Markov transition matrix really what it does is use Euler's method to estimate the next step.  This is kind of an intriguing insight to me, because I never understood a transition matrix in this way.

The matrix formulation therefore also acquires an error as Euler's method does: picking a point away form the initial condition, if the number of steps used to get there is  n , the error (the difference between the approximate value and the true value) is approximately proportional to  1/n .

The matrix formulation, then, required that the differential equations of the SIR model be recast not in absolute terms but in relative proportions, as Duke University's website proposes from the beginning.  I didn't do this originally, and so my calculation of the constant  a must be modified.  I said that  dS/dt = -400 .  In terms of  s(t) , which is  S(t) / N where  N is the total population,  ds/dt = -400/20,000,000 .  The constant of proportionality  a , then, can be calculated as  a = \frac{-ds/dt}{s \cdot i} \approx 0.25 if, from initial conditions,  s = \frac{19,998,400}{20,000,000} and  i = \frac{1,600}{20,000,000} .  My calculation of the constant of proportionality  b remains unmodified, it was a guesstimate based in part on CDC records of disease recovery, if I remember correctly.  Recall that this  b , from the POV of the Markov chain matrix, is the transition probability of going from infected state to recovered state.  The initial conditions  S_0 = 19,998,400 ,  I_0 = 1,600 and  R_0 = 0 need to be recast in terms of proportions, too, as  s_0, i_0 above and  r_0 = 0 .  Strictly speaking, we had data at the time that deaths were about 149 people (which I attempted to use to estimate the constant  b ).  The initial condition vector would have to be modified to reflect this forgotten piece of information, but the truth is that the difference in trajectory is negligible, so, to remain consistent with the analysis of a year ago, I will leave  s and  r as they are.

Here's the timed graph I came up with.  It states that the time at which the number of infecteds is a maximum (about 500,000 as per the post of April of 2009; new numerical estimate using the matrix transform: 440,000) is approximately 145 days after the outbreak (April 27ish): maybe around September 19, 2009.  It also states that, at the end of the disease, roughly 8,000,000  people recover (or die).  This I know was not the case (not even thousands of people died, e.g.), largely in part because of medication and excellent medical care and, admittedly, competent government response (but it puts things in perspective, doesn't it? No wonder it was a "scare.").

Swine Flu SIR April 27, 2009