## On a Battle Markov Chain Model

About a month ago I befriended Dan, from Giant Battling Robots, online, as he was doing investigations into Lanchester's differential equations to model the outcome of battles, particularly those of videogames. He sent me this very interesting link that describes a Markov chain model for battles, since he saw that I was trying figure out a way to coalesce Markov reasoning with Lanchesterian DEs. The pdf describes quite a different approach that I've been taking or investigating, but I thought it would be interesting nevertheless to attempt and illustrate it by example. The subject of this post is therefore expository, based on Dan's link, but the example and explanation is entirely mine, and so any error in concept is my own.

Suppose BLUE and GREEN are feuding - in fact, battling to the death, each possessing the same or differing technologies for their engagements. The particular technologies translate to a probability of kill of the other's units. For example, GREEN may have bazooka technology that assures that, when launched, the probability of kill/disabling of its enemy unit (perhaps a tank) is 0.45, where BLUE's tanks may kill bazooka-launchers with probability 0.55. These probabilities represent the fact that, if an encounter is to occur at a time step, in 55 percent of encounters the tanker (BLUE) is successful in killing or disabling GREEN's unit; on the other hand, it is unsuccessful in 45 percent of encounters.

For a battlefield in a given area, depending on hiding places, like trees, or forests, or caves (terrain), the probability of a tank or a bazooka-launcher to encounter at that time step is an exponentially distributed random variable. What this means is basically this: a single BLUE and a single GREEN will find each other with a given (expected probability) at a given time-step, and the higher probabilities of encounter are less likely than the lesser ones for any time-step (I may be having a conceptual problem here, since, an exponentially distributed random variable has a domain from 0 to infinity, where of course the probability space is 0 to 1). So in other words, the probability of encounter of a GREEN with a BLUE, let's say, is 0.87 on expectation. It's a fuzzy 0.87, because in the next time step it may be less or more, but it's 0.87 on average.

Now, since the probability of finding each other and the probability of winning an encounter (we assume 1-on-1) is independent, then the probability of killing the opposing unit is simply the probability of encounter times the probability of succeeding in the kill. If BLUE has 4 units and GREEN 3, our initial state, the probability of transitioning to state (3,3) is 0.39: 0.87, the (expected) encounter probability at the time step, times 0.45, the success probability of GREEN. Notice we allow for only ONE encounter (or NONE) at any time step: this makes sense if the time-steps are sufficiently small and resolution of the encounter, if there is one, is instantaneous. The probability of transitioning to state (4,2) is 0.48: the expected probability of encounter, 0.87, times the probability of success by BLUE, 0.55. Finally, the probability of staying in state (4,3) is 1-.39-.48 = 0.13.

Neglecting the natural variations in our expected encounter probability, we can now craft the following chart that describes the transition probabilities to each state succinctly. Yellow cells are possible states. Directly to the left of a state is the probability of BLUE being killed one unit, and to the bottom the probability of GREEN being killed one unit at the next time-step. The diagonal probability is the probability of remaining in that state at the next time-step:

The battle ends whenever one of the coordinates of our state is 0: for example, (0,3), or (0,1), or (2,0)... Clearly, state (0,0) is unreachable: there must be at least one unit that remains in the end. Of course, in this model, who wins in the end does not only depend on the technology constants (probability of kill or success), but on numerical superiority: namely, initial conditions or the initial state coordinate.

It's difficult to order the states in any meaningful way if they are Cartesian coordinates like in this example. I used a diagonal method to craft the transition probabilities in matrix form. Orange states are absorbing:

As in the usual Markov chain treatment, the nth power of this matrix indicates the transition probabilities at the nth time step. After many time steps, if the transition probability matrix possesses certain properties, like regularity, then this basically means that the difference between a matrix at the nth step and the (n+1)th is negligible: all entries are "stable." At the same time, the initial vector state (in our case 0 everywhere except 1 at state (4,3)) morphs into (converges to) a stable steady-state vector (while going through several intermediate vectors). This stable vector indicates the probability of each outcome after a very long time: it indicates the proportion of time we will see GREEN or BLUE win after considering every possible battle, even the very long ones; so it is indeed a true probability of win. For example, the following chart

shows that, if we simulate battles with identical initial states and transition probabilities and up to 20 time-steps, after 20 time-steps, BLUE is more likely to have won the majority of simulations (by obliterating the opposition, and remaining with 1, 2, 3 or 4 units up in the end). Notice the fix proportion of wins for BLUE and for GREEN. Else think about it this way. Let U represent the battle is undecided at that time step, and B that BLUE has won at that time-step. B will continue to be the state of the battle after it has become B for the first time (the battle is now over, so B remains the state of the system). G indicates likewise that GREEN has won at that time step, and an infinite string of Gs after the first G indicates that G will continue to be the state of the battle after GREEN has won:

Simulation 1: UUBBBBBBBBBBB....

Simulation 2: UUUUUUUGGGGG.....

etc.

We will have strings of Us followed by either B or G but not both, infinitely. If we count the proportion of Us at the first time step (1st position of the string) across all simulations, this corresponds to the chart's 1 proportions. If we count the proportion of Gs and Bs at the 20th step across all simulations, this corresponds to the chart's 20th step proportions. The higher the time-step will indicate the true probability of win after most battles are (long) over, and this probability is stable/convergent. In my particular example, most battles are over by the time they reach the 20th time step, and BLUE has won most battles, no matter how short or long they took.

I have crafted an .xls file that might make things clearer. The items that you may want to modify are in grey, and the chart is there too, in a different sheet. For the (grey) state vector, put a 1 in the initial state and 0 everywhere else; otherwise make sure you enter proportions and that they add up to 1 in the end. I only did the chart with states up to (4,3), but you can probably extrapolate the idea for states that have more than 4 BLUE units and 3 GREEN ones. You can probably use this chart for smaller dimension games without much effort, too.

As always, comments or questions are welcome!

Enjoy!