Archive

Posts Tagged ‘Linear Algebra’

On Patchix by Patchix Products – Tying Up Loose Ends - (RWLA,MCT,GT,AM Part V)

October 17th, 2010 No comments

In this post I want to "tie up a few lose ends."  For example, in my last post I stated that the patchix pattern

 \begin{array}{ccc} p_1(x,y) & = & 1 - cos(2 \pi x) cos(2 \pi y) \\ p_2(x,y) & = & 1 + \frac{cos(2 \pi x) cos(2 \pi y)}{2} \\ p_3(x,y) & = & 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{4} \\ p_2(x,y) & = & 1 + \frac{cos(2 \pi x) cos(2 \pi y)}{8} \\ \vdots \\ p_t(x,y) & = & 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{t-1}} \end{array}

for  t \in \mathbb{Z^+} , but I didn't prove it.  It's simple to do by induction: by the inductive hypothesis,

 p_1(x,y) = 1 - cos(2 \pi x) cos(2 \pi y) = 1 - \frac{cos(2 \pi x) cos(2 \pi t)}{(-2)^{1-1}}

By the inductive step, assume

 p_k(x,y) = 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{k-1}}

Then,

 \begin{array}{ccc} p_{k+1}(x,t) & = & \int_0^1 p_1(1-y,t) \cdot p_k(x,y) dy \\ & = & \int_0^1 \left( 1 - cos(2 \pi (1-y))cos(2 \pi t) \right) \cdot \left( 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{k-1}} \right) dy \end{array}

Now, if one dislikes shortcuts one can expand the product and integrate term by term to one's heart's content.  The "shorter" version is to relate the story: notice the product of 1 with itself is 1, and such will integrate to 1 in the unit interval.  So we save it.  The integrals  \int_0^1 cos(2 \pi y) dy and  \int_0^1 cos(2 \pi - 2\pi y) dy both evaluate to zero, so we are left only with the task of evaluating the crossterm:

 \begin{array}{ccc} && \int_0^1 cos(2 \pi (1-y))cos(2 \pi t) \cdot \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{k-1}} dy \\ & = & \frac{cos(2 \pi t) cos (2 \pi x)}{(-2)^{k-1}} \int_0^1 cos(2 \pi - 2 \pi y) cos(2 \pi y) dy \\ & = & \frac{cos(2 \pi t) cos (2 \pi x)}{(-2)^{k-1}} \int_0^1 cos^2(2 \pi y) dy \\ & = & \frac{cos(2 \pi t) cos (2 \pi x)}{(-2)^{k-1}} \cdot \frac{1}{2} \\ & = & -\frac{cos(2 \pi t) cos (2 \pi x)}{(-2)^{k}} \end{array}

Let's not forget the 1 we had saved, so:

 p_{k+1}(x,t) = 1 - \frac{cos(2 \pi x) cos(2 \pi t)}{(-2)^{k}} \rightsquigarrow 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{k}} = p_{k+1}(x,y)

as we wanted to show.

So finally notice that, of course, if we take the limit as  t approaches infinity, the patch evolution tendency is to become 1, the uniform distribution:

 \lim_{t \rightarrow \infty} p_t(x,y) = 1 = u(x,y)

From here on out, I want to set up the operative framework of patchixes, in analogy with discrete matrices.  I want to show that in general, patchix products are non-commutative.  This is easily done by counterexample:

We want to show that  p(x,y) \star q(x,y) \neq q(x,y) \star p(x,y) . So suppose the patchixes  p(x,y) = x and  q(x,y) = y . Then

 p(x,y) \star q(x,y) = \int_0^1 p(1-y,t) \cdot q(x,y) dy = \int_0^1 (1-y) y dy = \int_0^1 y - y^2 dy = \frac{1}{6}

and

 q(x,y) \star p(x,y) = \int_0^1 q(1-y,t) \cdot p(x,y) dy = \int_0^1 (t \cdot x) dy = t \cdot x \rightsquigarrow x \cdot y

are clearly not-equal.  It would be great to say that, because patchixes are non-commutative, patches are too, but we don't know that patches as a whole subset of patchixes commute, so let's disprove it.  Now suppose the patches  p(x,y) = x + \frac{1}{2} and  q(x,y) = 1 + xy - \frac{y}{2} .  Then

 \begin{array}{ccc} p(x,y) \star q(x,y) & = & \int_0^1 p(1-y,t) \cdot q(x,y) dy \\ & = & \int_0^1 \left( \frac{3}{2} - y \right) \cdot \left( 1 + xy - \frac{y}{2} \right) dy \\ & = & \frac{5x}{12} + \frac{19}{24} \end{array}

where

 \begin{array}{ccc} q(x,y) \star p(x,y) & = & \int_0^1 q(1-y,t) \cdot p(x,y) dy \\ & = & \int_0^1 q(1-y,t) \cdot p(x) dy \\ & = & p(x) \int_0^1 q(1-y,t) dy \\ & = & p(x) \cdot u(t) = p(x) \\ & = & x + \frac{1}{2} \end{array}

By refraining from calculating this last bit explicitly, we have (serendipitously) proved that any patch by a patch that is solely a function of  x returns the last patch, a result which reminds us of the analogous distribution by patch result I have shown in my previous post (a distribution on [0,1] times a patch that is solely a function of  x returns the patch, that viewed from the point of view of functions is a distribution on [0,1]).  A quick note: the integral  \int_0^1 q(1-y,t) dy is the unit distribution because  \int_0^1 q(x,y) dx = u(y) and  x \rightsquigarrow (1-y) and  dx \rightsquigarrow -dy .

The end result of these observations is that patches are also, in general, non-commutative.

Next, I want to show that patchixes in general are associative.  This is a bit tricky because of the "after integral" transformations we have to do, but it is doable if we keep careful track of our accounting.  We want to show that  [p(x,y) \star q(x,y)] \star r(x,y) = p(x,y) \star [q(x,y) \star r(x,y)] .  Let's begin with the left hand side.

 \begin{array}{ccc} [p(x,y) \star q(x,y)] \star r(x,y) & \rightsquigarrow & [p(x,w) \star q(x,w)] \star r(x,y) \\ & = & \left( \int_0^1 p(1-w, y) \cdot q(x, w) dw \right) \star r(x, y) \\ & = & \int_0^1 \left( \int_0^1 p(1-w, t) \cdot q(1-y, w) dw \right) \cdot r(x, y) dy \\ & = & \int_0^1 \int_0^1 p(1-w, t) \cdot q(1-y, w) \cdot r(x, y) dw dy \\ & = & s(x,t) \rightsquigarrow s(x,y) \end{array}

Now the right hand side

 \begin{array}{ccc} p(x,y) \star [q(x,y) \star r(x,y)] & \rightsquigarrow & p(x,w) \star \left( \int_0^1 q(1-y, w) \cdot r(x,y) dy \right) \\ & = & \int_0^1 p(1-w, t) \cdot \left( \int_0^1 q(1-y, w) \cdot r(x,y) dy \right ) dw \\ & = & \int_0^1 \int_0^1 p(1-w,t) \cdot q(1-y, w) \cdot r(x,y) dy dw \\ & = & s(x,t) \rightsquigarrow s(x,y) \end{array}

The two sides are equal when we can apply the Fubini theorem to exchange the order of integration.

Of course, patches, being a subset of patchixes, inherit associativity.

Defining a patchix left and right identity is extremely difficult, in the sense that, if we take a hint from discrete matrices, we'd be looking at a very special function on the  xy plane, so that  i(1-y,y) = i(x,1-x) = 1 and  0 everywhere else.  Because there is no "pretty" way to define this as a function of  x and  y both, showing that when we multiply a patchix by this function on either the right or the left requires elaborate explication. Unless we take it as axiomatic high ground, postulating the existence of an identity function  i(x,y) so that  i(x,y) \star p(x,y) = p(x,y) = p(x,y) \star i(x,y) to make the framework work, there is no easy way out.  Let's give it a shot then.

Left identity:

 i(x,y) \star p(x,y) = \int_0^1 i(1-y,t) \cdot p(x,y) dy

Now  i(1-y,t) = 1 only for values where  t = y , as we've defined it, otherwise the integral is zero and there is nothing to solve.  So then we've got

 \int_0^1 i(1-t,t) \cdot p(x,t) dy = \int_0^1 (1) \cdot p(x,t) dy = p(x,t) \rightsquigarrow p(x,y)

which is essentially the argument I make for the zero patch power in my informal paper on continuous Markov transition matrices or patches (however, there's a problem with this definition on patches, more of this below).  There's the question of why we didn't force the change of  dy \rightsquigarrow dt , and this is because the only way to obtain a function of both  x and  t is to force the patchix to the  x t plane and let the integral be taken in the  x y plane.  If this argument is unsatisfactory, consider this one:  at  t = 0 = y the patchix takes the values  p(x, 0) which is a function of  x alone.  Thus,

 \int_0^1 i(1,0) \cdot p(x,0) dy = p(x,0) \int_0^1 (1) dy = p(x,0)

if we do this for all  t \in [0,1] , we are certainly left with  p(x,t) .  We may raise the objection that, if we create a mental picture of the situation, at  t = 0 ,  i takes a value of 1 only at  y = 0 , so that, on the  x y plane, all values of  p(x, y) are zeroed except those at  y = 0 .  Thinking about it this way creates the difficulty of the integral with respect to  y : it evaluates to zero (there is no "area" in the  x y plane anymore, only a filament or fiber at  y=0 ), and we would be left with the zero patchix.  There is no way to resolve this except two ways: to send the patchix  p(x,y) to  p(x,t) before we take the integral in the  x,y plane, and then toss the integral out the window (or take it on the uniform distribution), or, to think of the filament  p(x,0) = p_0(x) as  p_0(x) \times [0,1] = p_0(x,y) and then integrate in the  x y plane to obtain  p_0(x) \rightsquigarrow p(x,0) and do this for all  t to get  p(x,t) .  Hence yes, the difficulty of defining the identity function on "surface" matrices (because it is not smooth like they are and because it is defined piece-wise).

Right identity:

 p(x,y) \star i(x,y) = \int_0^1 p(1-y,t) \cdot i(x,y) dy

Here we remind ourselves that  i(x,1-x) = 1 and zero otherwise, so that we can make the substitution

 \int_0^1 p(x,t) \cdot i(x,1-x) dy = \int_0^1 p(x,t) \cdot (1) dy = p(x,t) \rightsquigarrow p(x,y)

We of course have issues: it may seem redundant to send  x \rightsquigarrow 1-y \rightsquigarrow x , sending  x back to itself, but again this is the only way to remain consistent and get back the original function.  Again there's an issue of why we didn't send the integral  dy \rightsquigarrow -dx , but this has to remain in the  x y plane for the mechanics to work.  Other objections are likewise not easily resolved; but the argument would work out algebraically if we concede on a few things: otherwise we cannot but shrug at the fact that it is, indeed, a little bit of hocus pocus, and we return to our suggestion to postulate the identity function as an axiom. Perhaps maybe these issues can be resolved or elucidated a little later, I don't lose hope.

Defining inverse patchixes will also present a great difficulty, particularly because they have to produce the identity function when we "patchix multiply" two mutually inverse patchixes  together.  I was thinking that we could perhaps determine whether a particular patchix has one, by extending Sarrus's rule (for determinants) to be continuous, which would involve, I'm sure, multiple integrations.  This will be a topic of further investigation for me. The cool thing is, if we can elucidate how this "continuous version" of the determinant works, many different results from Linear Algebra could follow.  I am also trying to figure out how two inverse patchixes would look like, and if I can produce an example (at all), virtually from thin air.  If I can, then perhaps we're on our way to constructing patchix groups of all flavors.

Unfortunately, patches can't inherit the identity as we've defined it: the integral with respect to  x of  i(x,y) is zero for all  y .  Thus  i(x,y) is not a patch.

This problem makes us want to think of the uniform distribution  u(x,y) as another possible candidate for the identity for patchixes all, and it might just work if we agree that, when we don't have a function of  t or of  x after doing the setup-transformations for the integral, we send whatever function remains there before taking the integral.

Left identity:

 u(x,y) \star p(x,y) = \int_0^1 u(1-y,t) \cdot p(x,y) dy \rightsquigarrow \int_0^1 (1) \cdot p(x,t) dy = p(x,t) \rightsquigarrow p(x,y)

Right identity:

 p(x,y) \star u(x,y) = \int_0^1 p(1-y,t) \cdot u(x,y) dy \rightsquigarrow \int_0^1 p(x,t) \cdot (1) dy = p(x,t) \rightsquigarrow p(x,y)

This has several happy consequences: we avoid dealing with a piece-wise defined function  i(x,y) which is zero everywhere except on  y = 1-x , the uniform distribution is smooth, we can now more easily define inverses (by finding multiplicative inverse functions, more on this below), and, specifically regarding patches,  \int_0^1 u(x,y) dx = u(y) = 1 so the uniform distribution is indeed a patch.

In my mental picture, the "patchix product" of the uniform distribution with a patchix (and vice versa) doesn't "add up" (pun intended), but the algebraic trickery would seem to be the same even when using the alternative  i(x,y) .  So.  At this point I sort of have to convince myself into accepting this for now.

On Patch by Patch Products, Part II - (RWLA,MCT,GT,AM Part IV)

October 15th, 2010 3 comments

Last time I talked about a concept I invented, and based on my studies on Markov chains.  They are, essentially, "continuous matrices" (a surface on  [0,1] \times [0,1] ) with the property that they add to 1 if we take the integral with respect to  x for any  y , in analogy to the requirement in the usual Markov matrix treatment.  I dubbed such "patches," and explained a way to construct them.  In my previous post, I began thinking that patches seem to be very special, in the sense that self patch powers can represent the state of a liquid in time, if we allow ourselves to be a little imaginative.  Let's say that we disturb a uniform distributed patch to an initial state, the initial state patch, like this:

 p(x,y) = 1 - cos(2 \pi x) cos(2 \pi y)

It is easy to see that if we integrate with respect to  x our result is 1, so that it is indeed a patch. (I also constructed this function by letting  f_1(x) = cos(2 \pi x), g_1(y) = cos(2 \pi y), f_2(x) = 1 and calculated that  g_2(y) = 1 using the technique I talked about here).

Let's say we have depressed the liquid at the four corners and center of the confined space (necessarily a cube of dimensions  1 \times 1 \times h ), essentially giving it energy. Next calculate the patch powers (as described in my previous posts).  Interestingly, if we map the patch powers of such liquid, they will converge to a steady state, just like Markov matrixes would:

 p(x,y) = 1 - cos(2 \pi x) cos(2 \pi y)

 p_2(x,y) = 1 + \frac{cos(2 \pi x) cos(2 \pi y)}{2}

 p_3(x,y) = 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{4}

 p_4(x,y) = 1 + \frac{cos(2 \pi x) cos(2 \pi y)}{8}

 p_5(x,y) = 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{16}

 p_6(x,y) = 1 + \frac{cos(2 \pi x) cos(2 \pi y)}{32}

The evolution in time of this particular patch is easy to guess (although I should, technically, prove this by induction... I do in my next post):

 p_t(x,y) = 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{t-1}}

for  t \in \mathbb{Z^+} , and letting this parameter represent both time and the patch power.  Of course, if we integrate with respect to  x any of these, the result is 1, and so, they are, indeed, patches.

I would like to state in a different post the conditions in which a steady-state is achievable; my suspicion is that, in analogy to Markov chains, steady-state happens if the patch is non-zero for some power (and above) on  [0,1] \times [0,1] , a property that is called regularity within that context, and of course, I would like to be able to calculate the steady state as easily as it can be done with discrete Markov chains (I was afraid that, in this particular example, I wouldn't be able to achieve steady state because of the initial patch having zeroes at the corners and center).  It's pinned as one of my to-dos.  At any rate, the fact that there are patches that converge to a state (a 2D surface), and, specifically, that can converge to the uniform distribution surface, suggests that such systems, from the viewpoint of Physics, must dissipate energy and there is linked the concept of entropy.  Of course from a probabilistic point of view, entropy in this sense is non-existent; patches merely describe the probability of movement to another "position" on each  y fiber.

There are of course patches that do not converge to the uniform surface distribution, but to other types: in my previous post, the patch I constructed converged to a plane that is tilted in the unit cube.  I wonder if such cannot have a physical interpretation that relates it to gravity: the liquid experiences a uniform acceleration (gravity) normal to the (converging) plane, which of course says, from a physical interpretation, "the cube is tilted."  Again from the probabilistic point of view, the concept of gravity is an explanatory link to Physics, but the end-state arises without its action on the fluid at all!

There are fun topological considerations too: the fact that we can do this on the unit cube does not preclude us from doing it in, for example, a unit cylinder (a cup or mug!), provided we can find the appropriate retract-into-the-square function and vice-versa.  This I think might be very interesting to map movement in all kinds of containers.

I have already talked about a couple potential venues in Group Theory, which I really would also like to go into further at some point.

As in other posts, another possible area of investigation is the evolution of the surface in smaller bits of time. I was able to link, in previous Markov treatments, discrete representations of Markov chains to continuous time differential equations.  Here is where it would be immensely interesting to see if patches, under this light, do not converge to partial differential equation representations.  Which leads me to the last point, regarding Navier-Stokes turbulent flow (which I admit know very little about), and a potential link to its differential equation representation:

Here is why I think that turbulent flow can be explained by generalizing patches a little bit, to "megapatches" (essentially 3d-patches or tensors), since, now we can think not of a 2D surface converging, but a 3D one in time: a water sphere in space (I once saw a cool video on this and was left thoroughly fascinated) or a water balloon being poked could be understood this way, for example, so that the movement of water throughout the flexible container could be similarly traced (by mapping the probability of movement in the container)!  I need to flesh this out a little more, but I think it's also very interesting, potentially.

These studies make me ask myself, again: what is the relationship between stochastic processes and deterministic representations?  They seem to be too intimately linked to be considered separate.

On Patch by Patch Products - (RWLA,MCT,GT,AM Part III)

October 12th, 2010 No comments

In my previous post, I described the concept of a "patchix" and of a special kind, the "patch." I described how to multiply a continuous function on  [0,1] by a patch(ix). Today I want to talk about how to multiply a patch by a patch and certain properties of it.

In my description in my informal paper, I basically said that in order to (right) multiply a patchix by a patchix, say  p(x,y) with  q(x,y) , we would have to send  p(x,y) \rightsquigarrow p(1-y,t) and then integrate as:

 r(x,t) = \int_0^1 p(1-y,t) \cdot q(x,y) dy \rightsquigarrow r(x,y)

If the patchix is furthermore special, so that both  \int_0^1 q(x,y) dx = \int_0^1 p(x,y) dx = u(y) = 1 , the uniform distribution on [0,1] (so that  u of any fiber is 1), then  p(x,y) and  q(x,y) are "patches." I want to show that, when we "patchix multiply" two patches we obtain another one. Here's why: assume then  p(x,y) and  q(x,y) are "patches." Then the resultant  r(x,t) is a patch too if  \int_0^1 r(x,t) dx = u(t) = 1 . Thus:

 \int_0^1 r(x,t) dx = \int_0^1 \int_0^1 p(1-y,t) \cdot q(x,y) dy dx

If there is no issue with absolute convergence of the integrals (as there shouldn't be in a patch), by the Fubini theorem we can exchange the order of integration:

 \int_0^1 \int_0^1 p(1-y,t) \cdot q(x,y) dx dy = \int_0^1 p(1-y,t) \int_0^1 q(x,y) dx dy

The inner integral evaluates to  u(y) = 1 by hypothesis. Then we have  \int_0^1 p(1-y,t) dy = u(t) = 1 because the transformation  x \rightsquigarrow 1-y changes the orientation so that the direction of integrating to 1 now changes to be in the  y direction ( dx \rightsquigarrow -dy ). Nicely, we have just proven closure of patches.

Because this strongly suggests that patches may form a group (as may patchixes with other properties), I want to attempt to show associativity, identity and inverses of patches in my next post (and of other patchixes with particular properties).

For now, I'm a little more interested in solving a concrete example by calculating self-powers. In my last post, I constructed the following patch:

 p(x,y) = x^2 y^3 + x \left( 2 - \frac{2 y^3}{3} \right)

To calculate the second power, send  p(x,y) \rightsquigarrow p(1-y,t) . I get, in expanded form, from my calculator:

 p(1-y,t) = t^3 y^2 - \frac{4 t^3 y}{3} + \frac{t^3}{3} - 2y + 2

so that

 p_2(x,t) = \int_0^1 p(1-y,t) \cdot p(x,y) dy = \frac{29 x}{15} + \frac{t^3 x}{90} + \frac{x^2}{10} - \frac{t^3 x^2}{60}

Last, let's send  p_2(x,t) \rightsquigarrow p_2(x,y) = \frac{29 x}{15} + \frac{y^3 x}{90} + \frac{x^2}{10} - \frac{y^3 x^2}{60}

We can corroborate that this is a patch by integrating

 \int_0^1 p_2(x,y) dx = \int_0^1 \frac{29 x}{15} + \frac{y^3 x}{90} + \frac{x^2}{10} - \frac{y^3 x^2}{60} dx = 1 which is indeed the case.

To calculate the third power, we can:

 p_3(x,t) = \int_0^1 p(1-y,t) \cdot p_2(x,y) dy = \frac{1741 x}{900} - \frac{t^3 x}{5400} + \frac{59 x^2}{600} + \frac{t^3 x^2}{3600}

Then, send  p_3(x,t) \rightsquigarrow p_3(x,y) = \frac{1741 x}{900} - \frac{y^3 x}{5400} + \frac{59 x^2}{600} + \frac{y^3 x^2}{3600}

Again, we can corroborate that this is a patch by

 \int_0^1 p_2(x,y) dx = \int_0^1 \frac{1741 x}{900} - \frac{y^3 x}{5400} + \frac{59 x^2}{600} + \frac{y^3 x^2}{3600} dx = 1

which is the case.

Here's a countour 3D-plot of  p(x,y), p_2(x,y) \ldots p_7(x,y) : in other words, the 7-step time evolution of the patch (the "brane").  By looking at the plot, you can probably begin to tell where I'm trying to get at:  the patch evolution shows how a fluid could evolve in time (its movement, oscillation), provided the appropriate first-patch generator can be found for a particular movement.  The fact that, if patches mirror Markov "thinking", a patch that will eventually settle to its long-term stable distribution means that this (patch) treatment, when applied to the physical world, takes into account some sort of "entropy," or loss of energy of the system.  Also some sort of "viscosity," is my belief.  The patch evolution catches nicely and inherently all relevant physical properties.  I will continue to explore this in my next post, I think.

The above image has been scaled differently for the different functions so that they can be better seen as they converge.  In my next post, I would like to expound on the evolution of the following first-patch:

On Patchixes and Patches - or Pasqualian Matrixes - (RWLA,MCT,GT,AM Part II)

October 10th, 2010 3 comments

For the last few months I have been thinking about several very curious properties of patchixes and patches (mentioned here); in particular, having studied patch behavior in a "continuous Markov chain" context, and, at having been drinking a bowl of cereal and  observing the interesting movement of the remaining milk, it hit me: a patch could certainly describe milk movement at particular time steps.  It is my hope to try to elucidate this concept a little better here today.  In particular, I think I have discovered a new way to describe waves and oscillations, or rather, "cumulative movement where the amount of liquid is constant" in general, but, in my honest belief, I think this new way and the old way converge in limit (this based on my studies, here and here, or discrete Markov chains at the limit of tiny time steps, so that time is continuous), although it is a little bit unclear to me how at the moment.  It is my hope that this new way not only paves the way for a new and rich field of research, but I foresee it clarifying studies in, for example, turbulence, and, maybe one day, Navier-Stokes related concepts.  This last part may sound a little lofty and ambitious, but an approach in which, for example, vector fields of force or velocity need to be described for every particle and position of space, with overcomplicated second and third order partial derivatives, is in itself somewhat ambitious and lofty, and often prohibitive for finding exact solutions;  perhaps studying particle accumulations through a method of approximation, rather than individual particles, is the answer.

I want to attempt to describe the roadmap that led me to the concept of a patchix (pasqualian matrix) in the first place; it was in the context of discrete Markov chains.  Specifically, I thought that, as we study linear algebra, for a function or transformation  T(\textbf{v}) , with  \textbf{v} is an n-vector with  n entries (finite), we have  T can be described succinctly by an  n \times n matrix.  Such a matrix then, converts  \textbf{v} into another n-dimensional vector, say  \textbf{w} .  This field is very well studied of course: in particular, invertible transformations are very useful, and many matrixes can be used to describe symmetries, so that they underlie Group Theory:

 \textbf{v} \underrightarrow{T} \textbf{w}

Another useful transformation concept resides in  l_2 , the space of sequences whose lengths squared (dot product with itself) converge, that was used, for example by Heisenberg, in quantum mechanics, as I understand it.  For example, the sequence  x_1 + x_2 + \ldots can be transported to another  y_1 + y_2 + \ldots via  T , as by  T(x_1 + x_2 + \ldots) = y_1 + y_2 + \ldots .  Key here then was the fact that  x_1^2 + x_2^2 + \ldots converged, so that  \sqrt{x_1^2 + x_2^2 + \ldots} , the norm, is defined.  Also the dot product  x_1 y_1 + x_2 y_2 + \ldots converges (why?).  Importantly, however, this information points in the direction that a transformation matrix could be created for  T to facilitate computation, with an infinite number of entries, so that indeed a sequence is taken into another in this space in a manner that is easy and convenient.  I think this concept was used by Kolmogorov in extending Markov matrices as well, but I freely admit I am not very versed in mathematical history.  Help in this regard is muchly appreciated.

In function space such as  C^{\infty}[0,1] , the inner product of, say, f(x) with g(x) is also defined, as  \langle f(x), g(x) \rangle = \int_0^{1} f(x) \cdot g(x) dx , point-wise continuous multiplications of the functions summed absolutely convergently (which results from the integral).  Then the norm of  f(x) is  \sqrt{\langle f(x), f(x) \rangle} = \sqrt{\int_0^{1} f(x)^2 dx} .  The problem is of course no convenient "continuous matrix" that results in the transform  T(f(x)) = g(x) , although transforms of a kind can be achieved through a discrete matrix, if its coefficients represent, say, the coefficients of a (finite) polynomial.  Thus, we can transform polynomials into other polynomials, but this is limiting in scope in many ways.

The idea is that we transform a function to another by point-wise reassignment: continuously.  Thus the concept of a patchix (pasqualian matrix) emerges, we need only mimic the mechanical motions we go through when conveniently calculating any other matrix product.  Take a function  f(x) defined continuously on  [0,1] , send  x \rightsquigarrow 1-y so that  f(1-y) is now aligned with the y-axis. From the another viewpoint, consider  f(1-y) as  f(1-y,t) so that, at any value of  t , the cross-section looks like  f .  Define a patchix  p(x,y) on  [0,1] \times [0,1] .  Now "multiply" the function (actually a patchix itself from the different viewpoint) with the patchix as  \int_{0}^{1} f(1-y) \cdot p(x,y) dy = g(x) to obtain  g(x) .  The patchix has transformed  f(x) \rightsquigarrow g(x) as we wanted.  I think there are profound implications from this simple observation; one may now consider, for example, inverse patchixes (or how to get  g(x) back to  f(x) , identity patchixes, and along with these one must consider what it may mean, as crazy as it sounds, to solve an infinite (dense) system of equations; powers of patchixes and what they represent; eigenpatchixvalues and eigenfunctionvectors; group theoretical concepts such as symmetry groups the patchixes may give rise to, etc.

As much as that is extremely interesting to me, and I plan on continuing with my own investigations, my previous post and informal paper considered the implications of multiplying functions by functions, functions by patchixes, and patchixes by patchixes.  Actually I considered special kinds of patchixes  p(x,y) , those having the property that for any specific value  y_c \in [0,1] , then  \int_0^1 p(x,y_c) dx = 1 .  Such special patchixes I dubbed patches (pasqualian special matrixes), and I went on to attempt an extension of a Markov matrix and its concept into a Continuous Markov Patch, along with the logical extension of the Chapman-Kolmogorov equation by first defining patch (discrete) powers (this basically means "patchix multiplying" a patch with itself).  The post can be found here.

So today what I want to do is continue the characterization of patches that I started.  First of all, emulating some properties of the Markov treatment, I want to show how we can multiply a probability distribution (function) "vector" by a patch to obtain another probability distribution function vector. Now this probability distribution is special, in the sense that it doesn't live in all of  \mathbb{R} but in  [0,1] .  A beta distribution, such as  B(2,2) = 6(x)(1-x) , is the type that I'm specifically thinking about. So suppose we have a function  b(x) , which we must convert first to  b(1-y) in preparation to multiply by the patch.  Suppose then the patch is  p(x,y) with the property that, for any specific  y_c , then  \int_0^1 p(x,y_c) dx = 1 .  Now, the "patchix multiplication" is done by

 \int_0^1 b(1-y) \cdot p(x,y) dy

and is a function of  x .  We can show that this is indeed a probability distribution function vector by taking the integral for every infinitesimal change in  x , and see if it adds up to one, like this:

 \int_0^1 \int_0^1 b(1-y) \cdot p(x,y) dy dx

If there is no issue with absolute convergence of the integrals, there is no issue with the order of integration by the Fubini theorem, so we have:

 \int_0^1 \int_0^1 b(1-y) \cdot p(x,y) dx dy = \int_0^1 b(1-y) \int_0^1 p(x,y) dx dy

Now for the inner integral,  p(x,y) adds up to 1 for any choice of  y , so the whole artifact it is in effect a uniform distribution in  [0,1] with value 1 (i.e., for any choice of  y \in [0,1] , the value of the integral is 1).  Thus we have, in effect,

 \int_0^1 b(1-y) \int_0^1 p(x,y) dx dy = \int_0^1 b(1-y) \cdot u(y) dy = \int_0^1 b(1-y) (1) dy

for any choice of  y in  [0,1] , and that last part we know is 1 by hypothesis.

Here's a specific example:  Let's declare  b(x) = 6(x)(1-x) and  p(x,y) = x + \frac{1}{2} .  Of course, as required,  \int_0^1 p(x,y) dx = \int_0^1 x + \frac{1}{2} dx = (\frac{x^2}{2} + \frac{x}{2}) \vert^1_0 = 1 .  So then  b(1-y) = 6(1-y)(y) , and by "patchix multiplication"

 \int_0^1 b(1-y) \cdot p(x,y) dy = \int_0^1 6(1-y)(y) \cdot \left(x + \frac{1}{2} \right) dy = x + \frac{1}{2}

Thus, via this particular patch, the function of  b(x) = 6(x)(1-x) \rightsquigarrow c(x) = x + \frac{1}{2} , point by point.  Which brings me to my next point.

If  p(x,y) is really solely a function of  x , then it follows that  b(x) \rightsquigarrow p(x) any initial probability distribution becomes the patch function distribution (from the viewpoint of a single dimension, than two).  Here's why:

 \int_0^1 b(1-y) \cdot p(x,y) dy = \int_0^1 b(1-y) \cdot p(x) dy = p(x) \int_0^1 b(1-y) dy = p(x)

I think, of course, a lot more interesting are patches that are in fact functions of both  x and of  y .  There arises a problem in constructing them.  For example, let's assume that we can split  p(x,y) = f(x) + g(y) .  Forcing our requirement that  \int_0^1 p(x,y) dx = 1 for any  y \in [0,1] means:

 \int_0^1 p(x,y) dx = \int_0^1 f(x) dx + g(y) \int_0^1 dx = \int_0^1 f(x) dx + g(y) = 1

which implies certainly that   g(y) = 1 - \int_0^1 f(x) dx is a constant since the integral is a constant.  Thus it follows that  p(x,y) = p(x) is a function of  x alone.  Then we may try  p(x,y) = f(x) \cdot g(y) .  Forcing our requirement again,

 \int_0^1 p(x,y) dx = \int_0^1 f(x) \cdot g(y) dx = g(y) \int_0^1 f(x) dx = 1

means that  g(y) = \frac{1}{\int_0^1 f(x) dx} , again, a constant, and  p(x,y) = p(x) once more.  Clearly the function interactions should be more complex, let's say something like:  p(x,y) = f_1(x) \cdot g_1(y) + f_2(x) \cdot g_2(y) .

 \int_0^1 p(x,y) dx = g_1(y) \int_0^1 f_1(x) dx + g_2(y) \int_0^1 f_2(x) dx = 1

so that, determining three of the functions determines the last one, say

 g_2(y) = \frac{1-g_1(y) \int_0^1 f_1(x) dx}{\int_0^1 f_2(x) dx} is in fact, a function of  y .

Let's construct a patch in this manner and see its effect on a  B(2,2) .  Let  f_1(x) = x^2 , and  g_1(y) = y^3 , and  f_2(x) = x , so that

 g_2(y) = \frac{1 - g_1(y) \int_0^1 f_1(x) dx}{\int_0^1 f_2(x) dx} = \frac{1 - y^3 \int_0^1 x^2 dx}{\int_0^1 x dx} = \frac{1 - \frac{y^3}{3}}{\frac{1}{2}} = 2 - \frac{2y^3}{3}

and  p(x,y) = x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) .

So now the "patchix product" is

 \int_0^1 6(1-y)(y) \cdot \left(x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) \right) dy = \frac{x^2}{5} + \frac{28x}{15} which is a probability distribution on the interval  [0,1] and, as a matter of check, we can integrate with respect to  x to obtain 1.  Thus the probability distribution function  6(x)(1-x) is carried, point by point, as  6(x)(1-x) \rightsquigarrow \frac{x^2}{5} + \frac{28x}{15} which, quite frankly, is very amusing to me!

From an analytical point of view, it may be interesting or useful to see what happens to the uniform distribution on  [0,1] when it's "patchix multiplied" by the patch above.  We would have:

 \int_0^1 u(y) \cdot \left(x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) \right) dy = \int_0^1 (1) \cdot \left(x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) \right) dy = \frac{x^2}{4} + \frac{11x}{12}

so that  u(x) \rightsquigarrow \frac{x^2}{4} + \frac{11x}{12} .

In my next post, I want to talk about more in detail about "patchix multiplication" of, not a probability distribution on [0,1] vectors by a patch, but of a patch by a patch, which is the basis of (self) patch powers: with this I want to begin a discussion on how we can map oscillations and movement in a different way, so that perhaps we can trace my cereal milk movement in time.

On the SIR Equations and Their Transform into a Markov Transition Matrix (or, the Markovization of the SIR Model)

August 6th, 2010 No comments

It's been a while since I've had the time to write here, mostly for personal reasons.  However this period has not been without very exciting "discoveries," at least in my mathematics understanding.

Ever since last year in April with the Swine flu scare I attempted, at the very breakout, to model the progression of the disease directly using the differential equations and data provided by online articles and publications.  I'm not too sure if the raw data at the beginning of the scare was refined enough or accurate enough, but it was definitely enough to estimate some numbers and obtain a rough idea of the magnitude of the problem.  I suggested that, if anyone was interested, we could attempt to use Euler's method to complete a time evolution of the disease based on the coupled differential equations the model suggests.

Having read the excellent SIR explanation from Duke University, which differed slightly from another Calculus book in my possession in that it suggested using the proportions of susceptibles, infecteds, and recovereds as the basis for the calculations (than actual raw numbers), I began to think that, indeed, the SIR coupled-differential equation formulation was suspiciously very much like a Markovian formulation, and that, perhaps, there was an interesting link between the two.  If we think about it this way, a susceptible can (only) transition to being infected or stay as a susceptible, an infected can only transition to recovery or stay as an infected, and, once recovered, one can stay such.  In other words, S, I, and R are in effect Markovian "states."

Imagine for a moment that indeed we could somehow create an equivalence between a state-transition matrix and the actual differential equations, which, apparently we can, as I will explain shortly.  I couldn't at this point help but remember the Schrodinger vs. Heisenberg debate, in which Schrodinger's differential-equation formulation of quantum mechanics (which I know nothing about, by the way) was shown equivalent to Heisenberg's discrete-matrix approach (matrix mechanics).  At this point I began to wonder: can a system described by continuous differential equations be (always) transformed into matrix-form?  I'm not sure the answer is "yes" always, but I think I have succeeded transforming the SIR equations into matrix form, through Markovian reasoning.  Not that this is the way that matrix mechanics and wave mechanics were shown to be equivalent, that is definitely not my claim, nor do I know how they were shown equivalent either.

I started out reasoning that perhaps I could model the dynamics of susceptibles, infecteds, and recovereds by building a transition matrix as follows:

 A = \left[ \begin{array}{ccc} (1-a) & a & 0 \\ 0 & (1-b) & b \\ 0 & 0 & 1 \end{array} \right]

where  a represents the probability (or, from another POV, the proportion) of susceptibles becoming infecteds, and  b represents the probability (proportion) of infecteds becoming recovereds.  From the Markovian point-of-view, the entry  A_{1,1} is the probability/proportion of susceptibles who stay susceptibles (at the next step),  A_{1,2} the proportion of susceptibles who become infecteds (at the next step), and, naturally,  A_{1,3} the proportion of susceptibles who, at the next step, become recovereds: zero.  Then  A_{2,1} is the proportion of infecteds who become susceptibles (zero),  A_{2,2} is the proportion of infecteds that stay infected in the next time period, and  A_{2,3} is the proportion of  infecteds recovering.  Finally,  A_{3,1} and  A_{3,2} represent the proportion of recovereds that become susceptible or infected (zero) respectively, and  A_{3,3} represents the proportion of recovereds who stay recovered.  In other words, the transition matrix represents a birth process of susceptibles into infecteds and infecteds into recovereds.

Following the typical Markov chain understanding, let the vector

 \textbf{p(0)} = \left[ \begin{array}{ccc} p_s(0) & p_i(0) & p_r(0) \end{array} \right]

represent the initial proportions of susceptibles, infecteds, and recovereds.  Then the one-step forward proportion vector,  \textbf{p(1)} , is given by

 \textbf{p(1)} = \textbf{p(0)} \cdot A

or, following the argument through,

 \textbf{p(t+1)} = \textbf{p(t)} \cdot A

where we necessarily assume that  t \in \mathbb{Z_+} \cup \{0\} , or that time is discrete, which means we'd need or want to know or can calculate the proportion vector at discrete time intervals. Such is not necessarily the case (we may want to know the proportion vector at times on the real continuum), but we can resolve this objection (presently), by taking a hint from Queueing Theory.

At any rate, by writing the equations explicitly, we have:

 p_s(t+1) = p_s(t) \cdot (1-a) = p_s(t) - a \cdot p_s(t)

 p_i(t+1) = a \cdot p_s(t) + (1-b) \cdot p_i(t) = a \cdot p_s(t) + p_i(t) - b \cdot p_i(t)

 p_r(t+1) = b \cdot p_i(t) + p_r(t)

Here's the amazing thing, we can write these as difference equations that are VERY much like the SIR differential equations (a discretized version I guess).  This would have been a first step to obtain the steady state of the transition matrix (assuming the transition matrix had such: regularity, and other conditions).

 p_s(t+1) - p_s(t) = - a \cdot p_s(t)

 p_i(t+1) - p_i(t) = a \cdot p_s(t) - b \cdot p_i(t)

 p_r(t+1) - p_r(t) = b \cdot p_i(t)

Notice then the similarity (from Duke's website look that the SIR differential equations are formulated in terms of proportions, and from my previous post take notice of notation so that the analogy is clearer):

 \frac{ds}{dt} = -a s(t) i(t)

 \frac{di}{dt} = a s(t) i(t) - b i(t)

 \frac{dr}{dt} = b i(t)

They are not exactly the same, of course: one is continuous, the other is not, and then the rate of change in the SIR equations of susceptibles also depends on the proportion of infecteds at the time, which in turn affects the rate of change of the infecteds.

Before taking on these objections, setting the difference equations equal to zero, in the usual Markov-chain context, gives the steady-state probability or proportion vector.  Thus, in the very long-run, it must be true that

 0 = - a \cdot p_s(\infty) and, with  a nonzero,  p_s(\infty) = 0

 0 = a \cdot p_s(\infty) - b \cdot p_i(\infty) \Rightarrow p_i(\infty) = 0 and

 0 = b \cdot p_i(t) \Rightarrow p_i(\infty) = 0 . This last bit is redundant, but the proportion  p_r(\infty) must be one (because the sum of all proportions must be one).

Now let's take on the objections: the SIR differential equations suggests that how susceptibles change in time is proportional to both the susceptibles and the infecteds at the time; this furthermore affects the rate of change of the infecteds.  Taking a clue, we can manipulate the "transition matrix" to incorporate this term.  Thus:

 B = \left[ \begin{array}{ccc} (1 - a \cdot p_i(t)) & a \cdot p_i(t) & 0 \\ 0 & (1-b) & b \\ 0 & 0 & 1 \end{array} \right]

Notice that every row sums to 1 (provided appropriate conditions on  a, b ), as it should if this is a Markov chain transition matrix! This strongly suggests that we can follow a Markov chain treatment, and everything Markovian applies (particularly steady-state proportions). And so, the next-step equation is given by the usual Markov context as:

 \textbf{p(t+1)} = \textbf{p(t)} \cdot B

or, explicitly, as

 p_s(t+1) = p_s(t) \cdot (1 - a \cdot p_i(t)) = p_s(t) - a \cdot p_i(t) \cdot p_s(t)

 p_i(t+1) = a \cdot p_s(t) \cdot p_i(t) + (1-b) \cdot p_i(t) = a \cdot p_s(t) \cdot p_i(t) + p_i(t) - b \cdot p_i(t)

 p_r(t+1) = b \cdot p_i(t) + p_r(t)

with difference equations (now analogous to SIR differential equations)

 p_s(t+1) - p_s(t) = - a \cdot p_i(t) \cdot p_s(t)

 p_i(t+1) - p_i(t) = a \cdot p_s(t) \cdot p_i(t) - b \cdot p_i(t)

 p_r(t+1) - p_r(t) = b \cdot p_i(t)

and steady state proportions

 0 = - a \cdot p_i(\infty) \cdot p_s(\infty)

 0 = a \cdot p_s(\infty) \cdot p_i(\infty) - b \cdot p_i(\infty)

 0 = b \cdot p_i(\infty)

From the third equation it follows  p_i(\infty) = 0 , which now implies that both  p_s(\infty) and  p_r(\infty) can be anything at steady-state (as long, of course, that  p_s(\infty) + p_r(\infty) = 1 ).

Examining the Duke University link to a SIR progression graph, this vector tendency can be seen pretty clearly at large  t . Notice, for example, how  i(t) goes to zero while both  s(t), r(t) are anything.  We have in effect, proved that under the SIR model, the proportion of infecteds is asymptotic to 0. (Neat-o).

On to the continuity question: using the difference equations we are in effect assuming that the state vector  \textbf{p(t)} is known at discrete times.  In Queueing Theory when discussing birth-death processes, it is apparently conventional (in the standard treatment) to think that the probability in a given unit time interval tends to accumulate uniformly. Thus, we can rewrite the transition matrix as:

 C = \left[ \begin{array}{ccc} (1 - a \cdot p_i(t) \cdot \Delta t) & a \cdot p_i(t) \cdot \Delta t & 0 \\ 0 & (1-b \cdot \Delta t) & b \cdot \Delta t \\ 0 & 0 & 1 \end{array} \right]

We want to write out the explicit equations of

 \textbf{p(t+} \Delta t \textbf{)} = \textbf{p(t)} \cdot C

as

 p_s(t + \Delta t) = p_s(t) - a \cdot p_i(t) \cdot p_s(t) \cdot \Delta t

 p_i(t + \Delta t) = a \cdot p_s(t) \cdot p_i(t) \cdot \Delta t + p_i(t) - b \cdot p_i(t) \cdot \Delta t

 p_r(t + \Delta t) = b \cdot p_i(t) \cdot \Delta t + p_r(t)

By further algebraic manipulation,

 \frac{p_s(t + \Delta t) - p_s(t)}{\Delta t} = - a \cdot p_i(t) \cdot p_s(t)

 \frac{p_i(t + \Delta t) - p_i(t)}{\Delta t} = a \cdot p_s(t) \cdot p_i(t) - b \cdot p_i(t)

 \frac{p_r(t + \Delta t) - p_r(t)}{\Delta t} = b \cdot p_i(t)

Taking the limit as  \Delta t \rightarrow 0 , the difference quotient describes the very definition of the derivative, and we obtain the continuous SIR differential equations.  We have assumed, in effect, the vector  p(t) is known at all times  t \in \mathbb{R}^+ \cup \{0\} .

Perhaps an obvious thing to note, yet worth mentioning, now that we have shown the equivalence between the SIR model and the Markov transition matrix, is that, in the SIR model the proporitonality constant  a is now shown to be related to the probability of transitioning from the susceptible state to the infected state, and the proportionality constant  b is now shown to be the probability of transitioning from the infected state to the recovered state.  This, in my opinion, is an amazing insight!

Before, since the SIR differential equations could not be solved explicitly, we would rely on Euler's method to get a numerical approximation of the solution curves  s(t), i(t), r(t) .  The present analysis suggests we need only calculate powers of matrix  C to obtain a numerical sample of the all three solution curves concurrently (spaced as  \Delta t ), since, in the Markov chain context, the power of the matrix represents the  n \cdot \Delta t -step forward evolution of the initial (or current) state vector (in the context of a time-dependent matrix  C(t) , I actually mean by "power" consecutive matrix multiplications of  C , as by

 p(t + n \Delta t) = p(t) \cdot C(t) \cdot C(t + \Delta t) \cdot C(t + 2 \Delta t) \cdot \ldots \cdot C(t + (n-1) \Delta t) ,

not actual powers of  C(t) ).  Higher-rate samplings (higher-resolution solution curves) can be achieved by reducing  \Delta t .  It may be worth exploring the link between Euler's method and this sampling-method (see my next post), in terms of computational requirements, or in terms of a mathematical equivalence.

Needless to say, this method can probably be extended to "Markovizate" (Markovize?) other differential equations, such as SIRD or Lotka-Volterra.  I can only speculate as to the limits of this process, but I am certainly cognizant of two things: several, if not all, regular Markov chains can be formulated as continuous differential equations, and, several, if not all continuous differential equations can be formulated as Markov transition matrices.  I'd love to explore more such, and will probably in the future.