Archive

Posts Tagged ‘Markov chain’

On Eigen(patch(ix))values, II - (RWLA,MCT,GT,AM Part IX)

March 22nd, 2011 No comments

So remember my little conjecture from last time, that the number of patch(ix) (kernel) eigenvalues would depend on the number of x terms that composed it?  I started working it out by writing all expressions and trying to substitute them and I got sums of sums of sums and it became nightmarish, and since math is supposed to be elegant, I opted for a different track.  A little proposition did result, but I'm not sure yet if it means what I want it to mean. Haha.

If you recall, last time we figured that

 B_1 = \frac{B_2 \sum_{i=0}^\infty f_2^i(1-y)G_1^{i+1}(y)\vert_0^1}{\lambda - \sum_{i=0}^\infty f_1^i(1-y)G_1^{i+1}(y)\vert_0^1}

and

 B_2 = \frac{B_1 \sum_{i=0}^\infty f_1^i(1-y) G_2^{i+1}(y)\vert_0^1}{\lambda - \sum_{i=0}^\infty f_2^i(1-y) G_2^{i+1}(y)\vert_0^1}

Let's rename the sums by indexing over the subscripts, so that

 \begin{array}{ccc} C_{1,1} & = &\sum_{i=0}^\infty f_1^i(1-y)G_1^{i+1}(y)\vert_0^1 \\ C_{1,2} & = &\sum_{i=0}^\infty f_1^i(1-y) G_2^{i+1}(y)\vert_0^1 \\ C_{2,1} & = &\sum_{i=0}^\infty f_2^i(1-y)G_1^{i+1}(y)\vert_0^1 \\ C_{2,2} & = &\sum_{i=0}^\infty f_2^i(1-y) G_2^{i+1}(y)\vert_0^1 \end{array}

Renaming therefore the constants we get:

 B_1 = \frac{B_2 C_{2,1}}{\lambda - C_{1,1}}

and

 B_2 = \frac{B_1 C_{1,2}}{\lambda - C_{2,2}}

Last time we substituted one equation into the other to figure additional restrictions on  \lambda .  A faster way to do this is to  notice:

 \left( \lambda - C_{1,1} \right)B_1 = B_2 C_{2,1}

and

 \left( \lambda - C_{2,2} \right) B_2 = B_1 C_{1,2}

If we multiply these two expressions we get

 \left( \lambda - C_{1,1} \right)\left( \lambda - C_{2,2} \right) B_1 B_2 = B_1 B_2 C_{1,2} C_{2,1}

Finally, dividing out both  B_1, B_2 we arrive at the quadratic expression on  \lambda of before:

 \left( \lambda - C_{1,1} \right)\left( \lambda - C_{2,2} \right) = C_{1,2} C_{2,1}

Now.  Let's posit that, instead of  a(x) = B_1 f_1(x) + B_2 f_2(x) we have  a^*(x) = B_1 f_1(x) + B_3 f_3(x) .  Then by all the same arguments we should have an expression of  B_1 that is the same, and an expression of  B_3 that is:

 \left( \lambda - C_{3,3} \right) B_3 = B_1 C_{1,3}

with the similar implication that

 \left( \lambda - C_{1,1} \right)\left( \lambda - C_{3,3} \right) = C_{1,3} C_{3,1}

An  a^{**}(x) = B_2 f_2(x) + B_3 f_3(x) would give the implication

 \left( \lambda - C_{2,2} \right)\left( \lambda - C_{3,3} \right) = C_{2,3} C_{3,2}

If we are to multiply all similar expressions, we get

 \left( \lambda - C_{1,1} \right)^2\left( \lambda - C_{2,2} \right)^2 \left( \lambda - C_{3,3} \right)^2 = C_{1,2} C_{2,1}C_{1,3} C_{3,1}C_{2,3} C_{3,2}

or

 \left( \lambda - C_{1,1} \right) \left( \lambda - C_{2,2} \right) \left( \lambda - C_{3,3} \right) = \sqrt{C_{1,2} C_{2,1}C_{1,3} C_{3,1}C_{2,3} C_{3,2}}

In other words, we want to make a pairwise argument to obtain the product of the  \lambda -expressions and a polynomial in  \lambda .  Next I'd like to show the proposition:

 \left( \lambda - C_{1,1}\right) \cdot \left( \lambda - C_{2,2} \right) \cdot \ldots \cdot \left( \lambda - C_{n,n} \right) = \sqrt[n-1]{\prod_{\forall i, \forall j, i \neq j}^n C_{i,j}}

and for this I want to begin with a combinatorial argument.  On the left hand side, the number of pairwise comparisons we can make depends on the number of  \lambda factors of the  \lambda polynomial (or, the highest degree of the  \lambda polynomial).  That is to say, we can make  \binom{n}{2} pairwise comparisons, or  \frac{n!}{(n-2)!2!} = \frac{n (n-1)}{2} comparisons.  Now, I don't know whether anyone has ever noticed this, but this last simplified part looks exceptionally like Gauss's sum of consecutive integers (pyramidal series), so in other words, this last part is in effect  \sum_{i=1}^{n-1} i which I find very cool, because we have just shown, quite accidentally, the equivalence:

 \binom{n}{2} = \binom{n}{n-2} = \sum_{i=1}^{n-1} i

The way I actually figured this out is by noticing that, in our pairwise comparisons, say for the 3rd-degree-polynomial-in- \lambda case, by writing the pairwise comparisons first of the  (\lambda - C_{1,1}) products, then of the  (\lambda - C_{2,2}) (in other words, ordering logically all  \binom{3}{2} products), there were 2 of the first and 1 of the second (and none of the  (\lambda - C_{3,3}) ).  If we do the same for the 4th-degree, there are 3 of the  (\lambda - C_{1,1}) , 2 of the  (\lambda - C_{2,2}) , and 1 of the  (\lambda - C_{3,3}) , with none of the  (\lambda - C_{4,4}) .  In other words, the  \binom{4}{2} pair-products could be written as the sum of the cardinality of the groupings:  3 + 2 + 1 .

Now Gauss's sum of integers formula is already known to work in the general case (just use an inductive proof, e.g.), so the substitution of it into the binomial equivalence needs no further elaboration: it generalizes automatically for all  n .

So if we are to multiply all pairwise comparisons, notice there will be  n - 1 products of each  \lambda -factor: there are  n - 1 products belonging to the  (\lambda - C_{1,1}) grouping (because this first grouping has n-1 entries, from the Gauss formula equivalence), there are  n - 2 products belonging to the  (\lambda - C_{2,2}) PLUS the one already counted in the  (\lambda - C_{1,1}) grouping, for a total of, again,  n - 1 .  The  kth grouping  (\lambda - C_{k,k}) has  n - k products listed for itself PLUS one for each of the previous k - 1 groupings, for a total of  n - k + k - 1 = n - 1, and the k+1th grouping  (\lambda - C_{k+1, k+1}) has  n - (k+1) products listed for itself PLUS one for each of the previous  k groupings, for a total of n - (k+1) + k = n - 1.  We are left in effect with:

 (\lambda - C_{1,1})^{n-1} \cdot(\lambda - C_{2,2})^{n-1} \cdot \ldots \cdot (\lambda - C_{n,n})^{n-1}

The right hand side of each pairwise comparison was nothing more than the simple product on the cross indexes of  C , so it's not difficult to argue then that, if we multiply  \binom{n}{2} such pairs, we get  \prod_{\forall i, \forall j, i \neq j}^n C_{i,j} .   We then take the  n-1 th root on both sides of the equation.

Since the  n + 1 case follows the same basic structure of the argument, we are done with proving our proposition.

What I want this proposition to mean may be very different than what it actually is, I'm hopeful nevertheless but I agree that it requires a bit of further investigation.  As I hinted before, I would like that

 \left( \lambda - C_{1,1}\right) \cdot \left( \lambda - C_{2,2} \right) \cdot \ldots \cdot \left( \lambda - C_{n,n} \right) = \sqrt[n-1]{\prod_{\forall i, \forall j, i \neq j}^n C_{i,j}}

with, for example,  n = 3 represent the constraint on the eigen(patch(ix))values of  a^\circ = B_1 f_1(x) + B_2 f_2(x) + B_3 f_3(x) or, if not that, maybe  a^\circ_\star = a(x) + a^*(x) + a^{**}(x) = 2B_1 f_1(x) + 2 B_2 f_2(x) + 2 B_3 f_3(x) , which brings into question the superposition of functions and their effect on the eigenvalues.  I may be wildly speculating, but hey!  I don't really know better!  I'll do a few experiments and see what shows.

On Eigen(patch(ix))values - (RWLA,MCT,GT,AM Part VIII)

March 16th, 2011 No comments

So in the continuation of this series, I have been thinking long and hard about the curious property of the existence of eigen(patch(ix))values that I have talked about in a previous post.  I began to question whether such eigen(patch(ix))values are limited to a finite set (much as in finite matrices) or whether there was some other fundamental insight, like, if 1 is an eigen(patch(ix))value, then all elements of  \mathbb{R} are too (or all of  \mathbb{R} minus a finite set).  In my latest attempt to understand this, the question comes down to, using the "star" operator, whether

 a(x) \star p(x,y) = \lambda a(x)

has discrete values of  \lambda or, "what values can lambda take for the equation to be true," in direct analogy with eigenvalues when we're dealing with discrete matrices.  I am not using yet "integral transform notation" because this development seemed more intuitive to me, and thus I'm also limiting the treatment to "surfaces" that are smooth and defined on  [0,1] \times [0,1] , like I first thought of them. Thus, the above equation translates to:

 \int_0^1 a(1-y) p(x,y) dy = \lambda a(x)

and, if we recall our construction of the patch (or patchix if we relax the assumption that integrating with respect to x is 1)  p(x,y) = f_1(x) g_1(y) + f_2(x) g_2(y) :

 \begin{array}{ccc} \lambda a(x) & = &\int_0^1 a(1-y) \left(f_1(x) g_1(y) + f_2(x) g_2(y) \right) dy \\ & = & f_1(x) \int_0^1 a(1-y) g_1(y) dy + f_2(x) \int_0^1 a(1-y) g_2(y) dy \\ & = & B_1 f_1(x) + B_2 f_2(x) \end{array}

where  B_1, B_2 are constants.  It is very tempting to divide  \lambda as

 a(x) = \frac{B_1}{\lambda} f_1(x) + \frac{B_2}{\lambda} f_2(x)

must hold provided  \lambda \neq 0 .  So we have excluded an eigen(patch(ix))value right from the start, which is interesting.

We can systematically write the derivatives of  a(x) , as we're going to need them if we follow the algorithm I delineated in one of my previous posts (NB: we assume a finite number of derivatives or periodic ones, or infinite derivatives such that the subsequent sums we'll write are convergent):

 \begin{array}{ccc} a(x) & = & \frac{B_1}{\lambda} f_1(x) + \frac{B_2}{\lambda} f_2(x) \\ a'(x) & = & \frac{B_1}{\lambda} f'_1(x) + \frac{B_2}{\lambda} f'_2(x) \\ a''(x) & = & \frac{B_1}{\lambda} f''_1(x) + \frac{B_2}{\lambda} f''_2(x) \\ \vdots & \vdots & \vdots \\ a^k(x) & = & \frac{B_1}{\lambda} f^k_1(x) + \frac{B_2}{\lambda} f^k_2(x) \\ \vdots & \vdots & \vdots \end{array}

provided, as before,  \lambda \neq 0 .  We want to calculate the constants  B_1, B_2 , to see if they are restricted in some way by a formula, and we do this by integrating by parts as we did in a previous post to obtain the cool "pasquali series." Thus, we have that if  B_1 = \int_0^1 a(1-y) g_1(y) dy , the tabular method gives:

 \begin{array}{ccccc} \vert & Derivatives & \vert & Integrals & \vert \\ \vert & a(1-y) & \vert & g_1(y) & \vert \\ \vert & -a'(1-y) & \vert & G_1^1(y) & \vert \\ \vert & a''(1-y) & \vert & G_1^2(y) & \vert \\ \vert & \vdots & \vert & \vdots & \vert \end{array}

and so,

 \begin{array}{ccc} B_1 & = & \int_0^1 a(1-y) g_1(y) dy \\ & = & a(1-y) G_1^1(y) \vert_0^1 + a'(1-y) G_1^2(y) \vert_0^1 + \ldots \\ & = & \sum_{i = 0}^\infty a^i(1-y) G_1^{i + 1} \vert_0^1 \end{array}

if we remember the alternating sign of the multiplications, and we are allowed some leeway in notation.  Ultimately, this last bit means:  \sum_{i=0}^\infty a^i(0) G_1^{i+1}(1) - \sum_{i=0}^\infty a^i(1) G_1^{i+1}(0) .

Since we have already explicitly written the derivatives of  a(x) , the  a^i(0), a^i(1) derivatives can be written as  \frac{B_1}{\lambda} f_1^i(0) + \frac{B_2}{\lambda} f_2^i(0) and  \frac{B_1}{\lambda} f_1^i(1) + \frac{B_2}{\lambda} f_2^i(1) respectively.

We have then:

 B_1 = \sum_{i=0}^\infty \left( \frac{B_1}{\lambda} f_1^i(0) + \frac{B_2}{\lambda} f_2^i(0) \right) G_1^{i+1}(1) - \sum_{i=0}^\infty \left( \frac{B_1}{\lambda} f_1^i(1) + \frac{B_2}{\lambda} f_2^i(1) \right) G_1^{i+1}(0)

Since we aim to solve for  B_1 , multiplying by  \lambda makes things easier, and also we must rearrange all elements with  B_1 in them, so we get:

 \lambda B_1 = B_1 \sum_{i=0}^\infty \left( f_1^i(0) G_1^{i+1}(1) - f_1^i(1) G_1^{i+1}(0) \right) + B_2 \sum_{i=0}^\infty \left( f_2^i(0) G_1^{i+1}(1) - f_2^i(1) G_1^{i+1}(0) \right)

Subtracting both sides the common term and factoring the constant we endeavor to solve for, we get:

 \left( \lambda - \sum_{i=0}^\infty \left( f_1^i(0) G_1^{i+1}(1) - f_1(1) G_1^{i+1}(0) \right) \right) B_1 = B_2 \sum_{i=0}^\infty \left(f_2^i(0) G_1^{i+1}(1) - f_2^i(1) G_1^{i+1}(0) \right)

or

 B_1 = \frac{B_2 \sum_{i=0}^\infty f_2^i(1-y) G_1^{i+1}(y) \vert_0^1}{\lambda - \sum_{i=0}^\infty f_1^i(1-y) G_1^{i+1}(y) \vert_0^1} = \frac{B_2 D}{\lambda - C}

A similar argument for  B_2 suggests

 B_2 = \frac{B_1 \sum_{i=0}^\infty f_1^i(1-y) G_2^{i+1}(y) \vert_0^1}{\lambda - \sum_{i=0}^\infty f_2^i(1-y) G_2^{i+1}(y) \vert_0^1} = \frac{B_1 E}{\lambda - F}

where the new constants introduced emphasizes the expectation that the sums converge.  Plugging in the one into the other we get:

 B_1 = \frac{\left( \frac{B_1 E}{\lambda - F} \right) D}{\lambda - C} = \frac{B_1 E D}{(\lambda - F) (\lambda - C)}

and now we seem to have additional restrictions on lambda:  \lambda \neq F and  \lambda \neq C .  Furthermore, the constant  B_1 drops out of the equation, suggesting these constants can be anything we can imagine (all of  \mathbb{R} without restriction), but then we have the constraint:

 (\lambda - F)(\lambda - C) = ED

which is extraordinarily similar to its analogue in finite matrix or linear algebra contexts.  Expanding suggests:

 \lambda^2 - (F + C) \lambda + (CF - ED) = 0

which we can solve by the quadratic equation of course, as:

 \lambda_{1,2} = \frac{(F + C) \pm \sqrt{(F-C)^2 + 4ED} }{2}

So not only is  \lambda not equal to a few values, it is incredibly restricted to two of them.

So here's a sort of conjecture, and a plan for the proof.  The allowable values of  \lambda is equal to the number of x terms  a(x) (or  p(x,y) ) carries.  We have already shown the base case, we need only show the induction step, that it works for  k and  k+1 terms.

On Patch(ix)es as Kernels of Integral Transforms (RWLA,MCT,GT,AM Part VII)

February 7th, 2011 No comments

[This post is ongoing, as I think of a few things I will write them down too]

So just a couple of days ago I was asked by a student to give a class on DEs using Laplace transforms, and it was in my research that I realized that what I've been describing by converting a probability distribution on [0,1] to another is in effect a transform (minus the transform pair, which was unclear to me how to obtain, corresponding perhaps to inverting the patch(ix)).  The general form of integral transforms is, according to my book Advanced Engineering, 2nd ed., by Michael Greenberg p. 247:

 F(s) = \int_a^b f(t) K(t,s) dt , where  K(t,s) is called the kernel of the transform, and looks an awful lot like a function by patch(ix) "multiplication," which I described as:

 b(x) = \int_0^1 a(1-y) p(x,y) dy you may recall.  In the former context  p(x,y) looks like a kernel, but here  a(1-y) is a function of  y than of  x , and I sum across  y .  To rewrite patch(ix)-multiplication as an integral transform, it would seem we need to rethink the patch position on the xy plane, but it seems easy to do (and we do in number 1 below!).

In this post I want to (eventually be able to):

1. Formally rewrite my function-by-patch(ix) multiplication as a "Pasquali" integral transform.

If we are to modify patch multiplication to match the integral transform guideline, simply think of  p(t,s) as oriented a bit differently, yielding the fact that  \int_0^1 p(t,s) ds = 1 for any choice of  t .  Then, for a probability distribution  b(t) in [0,1], the integral transform is  B(s) = \int_0^1 b(t) p(t,s) dt .  Now  p(t,s) is indeed then a kernel.

2. Extend a function-by-patch multiplication to probability distributions and patches on all  \mathbb{R} and  \mathbb{R}^2 , respectively.

When I began thinking about probability distributions, I restricted them to the interval [0,1] and a patch on  [0,1] \times [0,1] , to try to obtain a strict analogy of (continuous) matrices with discrete matrices.  I had been thinking for a while that this need not be the case, but when I glanced at the discussion of integral transforms on my Greenberg book, and particularly the one on the Laplace transform, I realized I could have done it right away.  Thus, we can redefine patch multiplication as

 B(s) = \int_{-\infty}^{\infty} b(t) p(t,s) dt

with

 \int_{-\infty}^{\infty} p(t,s) ds = 1

3. Explore the possibility of an inverse-patch via studying inverse-transforms.

3a. Write the patch-inverse-patch relation as a transform pair.

4. Take a hint from the Laplace and Fourier transforms to see what new insights can be obtained on patch(ix)es (or vice-versa).

Vice-versa: Well one of the things we realize first and foremost, is that integral transforms are really an extension of the concept of matrix multiplication: if we create a matrix "surface" and multiply it by a "function" vector we obtain another "function," and the kernel (truly a continuous matrix) is exactly our path connecting the two.  Can we not think now of discrete matrices (finite, infinite) as "samplings" of such surfaces?  I think so.  We can also combine kernels with kernels (as I have done in previous posts) much as we can combine matrices with matrices.  I haven't really seen a discussion exploring this in books, which is perhaps a bit surprising.  At any rate, recasting this "combination" shouldn't be much of a problem, and the theorems I proved in previous posts should still hold, because the new notation represents rigid motions of the kernel, yielding new kernel spaces that are isomorphic to the original.

On Patch Stationariness (RWLA,MCT,GT,AM Part VI)

January 16th, 2011 No comments

In my previous posts, I have been discussing how we can extend functional analysis a little bit by "inventing" continuous matrices (surfaces) which contain all the information we may want on how to transform, in a special case, probability distributions from one to another, and I have tried, by reason of analogy, to extend Markov theory as well.  In this special case, I have been talking about how a surface "continuous collection of distributions" can reach steady-state: by self-combining these surfaces over and over; I even showed how to obtain a couple steady-states empirically by calculating patch powers specifically and then attempting to infer the time evolution, quite successfully in one case. The usual Markov treatment suggests another way to obtain the steady-state (the limiting transition probability matrix), by finding a stationary distribution so that left multiplying the vector  \mathbf{\widehat p}  by the transition probability matrix  P gives us  \mathbf{\widehat p} .  Within the discrete transition probability matrix context, a vector  \mathbf{\widehat p} with this property is also a (left) eigenvector of  P with eigenvalue 1.  See for example Schaum's series Probability, Random Variables, and Random Processes p. 169, as well as Laurie Snell's chapter 11 on Markov Chains on his online Probability book. An important theorem says that the limiting transition probability matrix  \lim_{n \rightarrow \infty} P^n = \mathbf{\widehat P} is a matrix whose rows are identical and equal to the stationary distribution  \mathbf{\widehat p} .  To calculate the stationary distribution (and the limiting transition probability matrix) one would usually solve a system of equations. For example, if:

 P = \left[ \begin{array}{cc} \frac{3}{4} & \frac{1}{4} \\ \frac{1}{2} & \frac{1}{2} \end{array} \right]

the stationary distribution

 \mathbf{\widehat p} P = \mathbf{\widehat p}

looks explicitly like:

 \left[ \begin{array}{cc} p_1 & p_2 \end{array} \right] \left[ \begin{array}{cc} \frac{3}{4} & \frac{1}{4} \\ \frac{1}{2} & \frac{1}{2} \end{array} \right] = \left[ \begin{array}{cc} p_1 & p_2 \end{array} \right]

in other words, the system:

 \frac{3}{4} p_1 + \frac{1}{2} p_2 = p_1

 \frac{1}{4} p_1 + \frac{1}{2} p_2 = p_2

each of which gives  p_1 = 2 p_2 and is solvable if we notice that  p_1 + p_2 = 1 , yielding  \mathbf{\widehat p} = \left[ \begin{array}{cc} \frac{2}{3} & \frac{1}{3} \end{array} \right] , and

 \mathbf{\widehat P} = \left[ \begin{array}{c} \mathbf{\widehat p} \\ \mathbf{\widehat p} \end{array} \right]

In this post, I want to set up an algorithm to calculate the stationary surface (steady-state) of patches as I've defined them, following in analogy the above argument.  To do so, I revisit both of my previous examples, now calculating the steady state from this vantage point.  The fact that we can define such an algorithm in the first place has ginormous implications, in the sense that we can define stationary function distributions that would seem therefore to be eigen(patch(ix))vectors (corresponding to eigen(patch(ix))values) of surface distributions, and we can seemingly also solve a continuously infinite quantity of independent equations, however strange this actually sounds.

Example 1, calculating the stationary patch  p_{\infty}(x,y) when  p_1(x,y) = 2 x - \frac{2 x y^3}{3} + x^2 y^3 .

I have already shown that  p_1(x,y) is indeed a patch because  \int_0^1 p_1(x,y) dx = 1 , for any choice of  y .

Suppose there exists a distribution defined as always on  x \in [0,1] , say  a(x) , so that

 a(x) \star p(x,y) = a(x) .  Explicitly,  \int_0^1 a(1 - y) \cdot \left( 2 x - \frac{2 x y^3}{3} + x^2 y^3 \right) dy = a(x)  We can break up the integral as:

 2 x \int_0^1 a(1-y) dy - \frac{2 x}{3} \int_0^1 y^3 a(1-y) dy + x^2 \int_0^1 y^3 a(1-y) dy = a(x)

The first part, we've seen many times, adds up to one because  a(x) is a probability distribution, so let's rewrite the whole thing as:

 2 x - \left( \frac{2 x}{3} - x^2 \right) \int_0^1 y^3 a(1 - y) dy = a(x)

The integral is in reality just a constant, so we have that a(x) looks something like:

 2 x - \left( \frac{2 x}{3} - x^2 \right) B = a(x) if we let

 B = \int_0^1 y^3 a(1 - y) dy

Now this integral in  y , though it is a constant, is seemingly impossible to solve without more knowledge of  a(1-y) ; but the truth of the matter is we have everything we need because we have a specification of  a(x) .  The crucial thing to notice is that derivatives of  a(x) do not exist "eternally," because  a(x) is a polynomial of maximal degree 2.  Thus we can attempt integration by parts and try to see where this takes us.  The tabular method gives us an organized way to write this out:

 \begin{array}{ccccc} | & Derivatives & | & Integrals & | \\ | & a(1-y) & | & y^3 & | \\ | & -a'(1-y) & | & \frac{y^4}{4} & | \\ | & a''(1-y) & | & \frac{y^5}{20} & | \\ | & 0 & | & \frac{y^6}{120} & | \\ | & \vdots & | & \vdots & | \end{array}

and, remembering the alternating sign when we multiply, we get the series:

 \frac{a(1-y) y^4}{4} + \frac{a'(1-y) y^5}{20} + \frac{a''(1-y) y^6}{120} + 0 + \ldots \arrowvert_0^1

The zeroth substitution of the lower limit of the integral gives us all zeroes, but the one-substitution gives us the interesting "pasquali series":

 \frac{a(0) }{4} + \frac{a'(0)}{20} + \frac{a''(0)}{120} + 0 + \ldots

which asks of us to evaluate  a(x) and its derivatives (until just before it vanishes) at zero:

 \begin{array}{ccc} a(x) & = & 2 x - \left( \frac{2 x}{3} - x^2 \right) B \\ a'(x) & = & 2 - \frac{2 B}{3} + 2 B x \\ a''(x) & = & 2 B \end{array}

 \begin{array}{ccc} a(0) & = & 0 \\ a'(0) & = & 2 - \frac{2 B}{3} = \frac{6 - 2 B}{3}\\ a''(0) & = & 2 B \end{array}

All that's left now is to substitute back into the series:

 B = \frac{\frac{6 - 2 B}{3}}{20} + \frac{2 B}{120} = \frac{1}{10} - \frac{B}{60} solves to  B = \frac{6}{61} which is what we want (I tested the following code with Wolfram Alpha: "integrate [[2(1-y) - (.0983607)(((2(1-y))/3) - (1-y)^2)]*[2x - (2 x y^3)/3 + x^2 y^3]] dy from y = 0 to 1" and obtained the same numeric decimal value at the output).

We have therefore that  a(x) = x - \left( \frac{2 x}{3} - x^2 \right) \frac{6}{61} is a stationary distribution, and the steady-state patch would seem to be  p_\infty(x,y) = a(x) .  I personally think this is very cool, because it validates several propositions: that we can find steady-state patches analytically (even when we may think we have a (continuously) infinite system to solve, it will reduce essentially to a (countable!) series estimable provided the "pasquali" series converges) by a means other than finding the patch powers and attempting to see a pattern, prove perhaps by induction, and then take the limit as patch powers go to infinity, much as I did in my previous post.  It also validates the "crazy" idea that (certain?) special surfaces like patches have eigen(patch(ix))vectors, as arguing  a(x) would suggest exist, and which we would have to obtain, in discrete matrixes, by solving a finite system of equations (and which we did here, again, by solving the "pasquali" series).

Example 2.  In my second example, take the patch  1 - cos(2 \pi x) cos( 2 \pi y) . Again we are looking at a patch because  \int_0^1 p(x,y) dx = 1 for any value of  y .  To establish the steady-state surface, or  p_\infty(x,y) , we proceed as before and write

 \int_0^1 a(1-y) \left( 1 - cos(2 \pi x) cos(2 \pi y) \right)dy = a(x) , or, explicitly,

 \int_0^1 a(1-y) dy - cos(2 \pi x) \int_0^1 a(1-y) cos(2 \pi y) dy = a(x)

The first integral adds up to 1 by hypothesis, where the second one is zero after integrating by parts:

 \begin{array}{ccccc} | & Derivatives & | & Integrals & | \\ | & a(1-y) & | & cos(2 \pi y) & | \\ | & -a'(1-y) & | & \frac{sin(2 \pi y)}{2 \pi} & | \\ | & a''(1-y) & | & \frac{-cos(2 \pi y)}{4 \pi} & | \\ | & -a'''(1-y) & | & \frac{-sin(2 \pi y)}{8 \pi} & | \\ | & \vdots & | & \vdots & | \end{array}

so we have:

 \frac{a(1-y) sin(2 \pi y)}{2 \pi} - \frac{a'(1-y) cos(2 \pi y)}{4 \pi} - \frac{a''(1-y) sin (2 \pi y)}{8 \pi} + \ldots \vert_0^1 and the awesome-slash-interesting "pasquali series"

 -\frac{a'(0)}{4 \pi} + \frac{a'''(0)}{16 \pi} - \frac{a^v (0)}{64 \pi} + \ldots from which we must subtract by the Fundamental Theorem of Calculus

 -\frac{a'(1)}{4 \pi} + \frac{a'''(1)}{16 \pi} - \frac{a^v(1)}{64 \pi} + \ldots

So we are left with  B = \frac{a'(1)}{4 \pi} - \frac{a'(0)}{4 \pi} + \frac{a'''(0)}{16 \pi} - \frac{a'''(1)}{16 \pi} + \frac{a^v(1)}{64 \pi} - \frac{a^v(0)}{64 \pi} + \ldots and also with

 \begin{array}{ccc} a(x) & = & 1 - cos(2 \pi x) B \\ a'(x) & = & 2 \pi sin(2 \pi x) B \\ a''(x) & = & 4 \pi cos(2 \pi x) B \\ a'''(x) & = & -8 \pi sin(2 \pi x) B \\ \vdots & \vdots & \vdots \end{array}

To show this thoroughly, we should prove by induction that every odd derivative of  a(x) contains a  sin term (or we can attempt an argument by periodicity of the derivative, as we do), and so evaluating such at 0 and at 1 literally causes the term to vanish, and leaving us with the fact that  B = 0 and that  a(x) = 1 .  Therefore, as before,  p_\infty(x, y) = a(x) = 1 , and this is consistent with my derivation in the previous post, too.

On Patchix by Patchix Products – Tying Up Loose Ends - (RWLA,MCT,GT,AM Part V)

October 17th, 2010 No comments

In this post I want to "tie up a few lose ends."  For example, in my last post I stated that the patchix pattern

 \begin{array}{ccc} p_1(x,y) & = & 1 - cos(2 \pi x) cos(2 \pi y) \\ p_2(x,y) & = & 1 + \frac{cos(2 \pi x) cos(2 \pi y)}{2} \\ p_3(x,y) & = & 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{4} \\ p_2(x,y) & = & 1 + \frac{cos(2 \pi x) cos(2 \pi y)}{8} \\ \vdots \\ p_t(x,y) & = & 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{t-1}} \end{array}

for  t \in \mathbb{Z^+} , but I didn't prove it.  It's simple to do by induction: by the inductive hypothesis,

 p_1(x,y) = 1 - cos(2 \pi x) cos(2 \pi y) = 1 - \frac{cos(2 \pi x) cos(2 \pi t)}{(-2)^{1-1}}

By the inductive step, assume

 p_k(x,y) = 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{k-1}}

Then,

 \begin{array}{ccc} p_{k+1}(x,t) & = & \int_0^1 p_1(1-y,t) \cdot p_k(x,y) dy \\ & = & \int_0^1 \left( 1 - cos(2 \pi (1-y))cos(2 \pi t) \right) \cdot \left( 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{k-1}} \right) dy \end{array}

Now, if one dislikes shortcuts one can expand the product and integrate term by term to one's heart's content.  The "shorter" version is to relate the story: notice the product of 1 with itself is 1, and such will integrate to 1 in the unit interval.  So we save it.  The integrals  \int_0^1 cos(2 \pi y) dy and  \int_0^1 cos(2 \pi - 2\pi y) dy both evaluate to zero, so we are left only with the task of evaluating the crossterm:

 \begin{array}{ccc} && \int_0^1 cos(2 \pi (1-y))cos(2 \pi t) \cdot \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{k-1}} dy \\ & = & \frac{cos(2 \pi t) cos (2 \pi x)}{(-2)^{k-1}} \int_0^1 cos(2 \pi - 2 \pi y) cos(2 \pi y) dy \\ & = & \frac{cos(2 \pi t) cos (2 \pi x)}{(-2)^{k-1}} \int_0^1 cos^2(2 \pi y) dy \\ & = & \frac{cos(2 \pi t) cos (2 \pi x)}{(-2)^{k-1}} \cdot \frac{1}{2} \\ & = & -\frac{cos(2 \pi t) cos (2 \pi x)}{(-2)^{k}} \end{array}

Let's not forget the 1 we had saved, so:

 p_{k+1}(x,t) = 1 - \frac{cos(2 \pi x) cos(2 \pi t)}{(-2)^{k}} \rightsquigarrow 1 - \frac{cos(2 \pi x) cos(2 \pi y)}{(-2)^{k}} = p_{k+1}(x,y)

as we wanted to show.

So finally notice that, of course, if we take the limit as  t approaches infinity, the patch evolution tendency is to become 1, the uniform distribution:

 \lim_{t \rightarrow \infty} p_t(x,y) = 1 = u(x,y)

From here on out, I want to set up the operative framework of patchixes, in analogy with discrete matrices.  I want to show that in general, patchix products are non-commutative.  This is easily done by counterexample:

We want to show that  p(x,y) \star q(x,y) \neq q(x,y) \star p(x,y) . So suppose the patchixes  p(x,y) = x and  q(x,y) = y . Then

 p(x,y) \star q(x,y) = \int_0^1 p(1-y,t) \cdot q(x,y) dy = \int_0^1 (1-y) y dy = \int_0^1 y - y^2 dy = \frac{1}{6}

and

 q(x,y) \star p(x,y) = \int_0^1 q(1-y,t) \cdot p(x,y) dy = \int_0^1 (t \cdot x) dy = t \cdot x \rightsquigarrow x \cdot y

are clearly not-equal.  It would be great to say that, because patchixes are non-commutative, patches are too, but we don't know that patches as a whole subset of patchixes commute, so let's disprove it.  Now suppose the patches  p(x,y) = x + \frac{1}{2} and  q(x,y) = 1 + xy - \frac{y}{2} .  Then

 \begin{array}{ccc} p(x,y) \star q(x,y) & = & \int_0^1 p(1-y,t) \cdot q(x,y) dy \\ & = & \int_0^1 \left( \frac{3}{2} - y \right) \cdot \left( 1 + xy - \frac{y}{2} \right) dy \\ & = & \frac{5x}{12} + \frac{19}{24} \end{array}

where

 \begin{array}{ccc} q(x,y) \star p(x,y) & = & \int_0^1 q(1-y,t) \cdot p(x,y) dy \\ & = & \int_0^1 q(1-y,t) \cdot p(x) dy \\ & = & p(x) \int_0^1 q(1-y,t) dy \\ & = & p(x) \cdot u(t) = p(x) \\ & = & x + \frac{1}{2} \end{array}

By refraining from calculating this last bit explicitly, we have (serendipitously) proved that any patch by a patch that is solely a function of  x returns the last patch, a result which reminds us of the analogous distribution by patch result I have shown in my previous post (a distribution on [0,1] times a patch that is solely a function of  x returns the patch, that viewed from the point of view of functions is a distribution on [0,1]).  A quick note: the integral  \int_0^1 q(1-y,t) dy is the unit distribution because  \int_0^1 q(x,y) dx = u(y) and  x \rightsquigarrow (1-y) and  dx \rightsquigarrow -dy .

The end result of these observations is that patches are also, in general, non-commutative.

Next, I want to show that patchixes in general are associative.  This is a bit tricky because of the "after integral" transformations we have to do, but it is doable if we keep careful track of our accounting.  We want to show that  [p(x,y) \star q(x,y)] \star r(x,y) = p(x,y) \star [q(x,y) \star r(x,y)] .  Let's begin with the left hand side.

 \begin{array}{ccc} [p(x,y) \star q(x,y)] \star r(x,y) & \rightsquigarrow & [p(x,w) \star q(x,w)] \star r(x,y) \\ & = & \left( \int_0^1 p(1-w, y) \cdot q(x, w) dw \right) \star r(x, y) \\ & = & \int_0^1 \left( \int_0^1 p(1-w, t) \cdot q(1-y, w) dw \right) \cdot r(x, y) dy \\ & = & \int_0^1 \int_0^1 p(1-w, t) \cdot q(1-y, w) \cdot r(x, y) dw dy \\ & = & s(x,t) \rightsquigarrow s(x,y) \end{array}

Now the right hand side

 \begin{array}{ccc} p(x,y) \star [q(x,y) \star r(x,y)] & \rightsquigarrow & p(x,w) \star \left( \int_0^1 q(1-y, w) \cdot r(x,y) dy \right) \\ & = & \int_0^1 p(1-w, t) \cdot \left( \int_0^1 q(1-y, w) \cdot r(x,y) dy \right ) dw \\ & = & \int_0^1 \int_0^1 p(1-w,t) \cdot q(1-y, w) \cdot r(x,y) dy dw \\ & = & s(x,t) \rightsquigarrow s(x,y) \end{array}

The two sides are equal when we can apply the Fubini theorem to exchange the order of integration.

Of course, patches, being a subset of patchixes, inherit associativity.

Defining a patchix left and right identity is extremely difficult, in the sense that, if we take a hint from discrete matrices, we'd be looking at a very special function on the  xy plane, so that  i(1-y,y) = i(x,1-x) = 1 and  0 everywhere else.  Because there is no "pretty" way to define this as a function of  x and  y both, showing that when we multiply a patchix by this function on either the right or the left requires elaborate explication. Unless we take it as axiomatic high ground, postulating the existence of an identity function  i(x,y) so that  i(x,y) \star p(x,y) = p(x,y) = p(x,y) \star i(x,y) to make the framework work, there is no easy way out.  Let's give it a shot then.

Left identity:

 i(x,y) \star p(x,y) = \int_0^1 i(1-y,t) \cdot p(x,y) dy

Now  i(1-y,t) = 1 only for values where  t = y , as we've defined it, otherwise the integral is zero and there is nothing to solve.  So then we've got

 \int_0^1 i(1-t,t) \cdot p(x,t) dy = \int_0^1 (1) \cdot p(x,t) dy = p(x,t) \rightsquigarrow p(x,y)

which is essentially the argument I make for the zero patch power in my informal paper on continuous Markov transition matrices or patches (however, there's a problem with this definition on patches, more of this below).  There's the question of why we didn't force the change of  dy \rightsquigarrow dt , and this is because the only way to obtain a function of both  x and  t is to force the patchix to the  x t plane and let the integral be taken in the  x y plane.  If this argument is unsatisfactory, consider this one:  at  t = 0 = y the patchix takes the values  p(x, 0) which is a function of  x alone.  Thus,

 \int_0^1 i(1,0) \cdot p(x,0) dy = p(x,0) \int_0^1 (1) dy = p(x,0)

if we do this for all  t \in [0,1] , we are certainly left with  p(x,t) .  We may raise the objection that, if we create a mental picture of the situation, at  t = 0 ,  i takes a value of 1 only at  y = 0 , so that, on the  x y plane, all values of  p(x, y) are zeroed except those at  y = 0 .  Thinking about it this way creates the difficulty of the integral with respect to  y : it evaluates to zero (there is no "area" in the  x y plane anymore, only a filament or fiber at  y=0 ), and we would be left with the zero patchix.  There is no way to resolve this except two ways: to send the patchix  p(x,y) to  p(x,t) before we take the integral in the  x,y plane, and then toss the integral out the window (or take it on the uniform distribution), or, to think of the filament  p(x,0) = p_0(x) as  p_0(x) \times [0,1] = p_0(x,y) and then integrate in the  x y plane to obtain  p_0(x) \rightsquigarrow p(x,0) and do this for all  t to get  p(x,t) .  Hence yes, the difficulty of defining the identity function on "surface" matrices (because it is not smooth like they are and because it is defined piece-wise).

Right identity:

 p(x,y) \star i(x,y) = \int_0^1 p(1-y,t) \cdot i(x,y) dy

Here we remind ourselves that  i(x,1-x) = 1 and zero otherwise, so that we can make the substitution

 \int_0^1 p(x,t) \cdot i(x,1-x) dy = \int_0^1 p(x,t) \cdot (1) dy = p(x,t) \rightsquigarrow p(x,y)

We of course have issues: it may seem redundant to send  x \rightsquigarrow 1-y \rightsquigarrow x , sending  x back to itself, but again this is the only way to remain consistent and get back the original function.  Again there's an issue of why we didn't send the integral  dy \rightsquigarrow -dx , but this has to remain in the  x y plane for the mechanics to work.  Other objections are likewise not easily resolved; but the argument would work out algebraically if we concede on a few things: otherwise we cannot but shrug at the fact that it is, indeed, a little bit of hocus pocus, and we return to our suggestion to postulate the identity function as an axiom. Perhaps maybe these issues can be resolved or elucidated a little later, I don't lose hope.

Defining inverse patchixes will also present a great difficulty, particularly because they have to produce the identity function when we "patchix multiply" two mutually inverse patchixes  together.  I was thinking that we could perhaps determine whether a particular patchix has one, by extending Sarrus's rule (for determinants) to be continuous, which would involve, I'm sure, multiple integrations.  This will be a topic of further investigation for me. The cool thing is, if we can elucidate how this "continuous version" of the determinant works, many different results from Linear Algebra could follow.  I am also trying to figure out how two inverse patchixes would look like, and if I can produce an example (at all), virtually from thin air.  If I can, then perhaps we're on our way to constructing patchix groups of all flavors.

Unfortunately, patches can't inherit the identity as we've defined it: the integral with respect to  x of  i(x,y) is zero for all  y .  Thus  i(x,y) is not a patch.

This problem makes us want to think of the uniform distribution  u(x,y) as another possible candidate for the identity for patchixes all, and it might just work if we agree that, when we don't have a function of  t or of  x after doing the setup-transformations for the integral, we send whatever function remains there before taking the integral.

Left identity:

 u(x,y) \star p(x,y) = \int_0^1 u(1-y,t) \cdot p(x,y) dy \rightsquigarrow \int_0^1 (1) \cdot p(x,t) dy = p(x,t) \rightsquigarrow p(x,y)

Right identity:

 p(x,y) \star u(x,y) = \int_0^1 p(1-y,t) \cdot u(x,y) dy \rightsquigarrow \int_0^1 p(x,t) \cdot (1) dy = p(x,t) \rightsquigarrow p(x,y)

This has several happy consequences: we avoid dealing with a piece-wise defined function  i(x,y) which is zero everywhere except on  y = 1-x , the uniform distribution is smooth, we can now more easily define inverses (by finding multiplicative inverse functions, more on this below), and, specifically regarding patches,  \int_0^1 u(x,y) dx = u(y) = 1 so the uniform distribution is indeed a patch.

In my mental picture, the "patchix product" of the uniform distribution with a patchix (and vice versa) doesn't "add up" (pun intended), but the algebraic trickery would seem to be the same even when using the alternative  i(x,y) .  So.  At this point I sort of have to convince myself into accepting this for now.