Archive

Posts Tagged ‘Markov’

On Optimizing the Traffic Systems of the World Using Markov Chains (How to Predict Traffic Jams)

May 3rd, 2011 2 comments

So it has been a recent dream of mine to optimize the traffic system of my hometown, and of Mexico in general, and even more generally of the world.  I seldom drive, people seem always to be on the cellphone (which, by law apparently, they shouldn't use at all when driving), crossing red lights, and so on.  More tragically in Guadalajara, the civil engineering of bridges and crossways intended to better circulation flow are often done seemingly with little thought (a bridge intended for double circulation is now for single circulation, and to do so they closed the entryway coming this way and modified it to go that way, so that a column splits the throughway in wedge-form and it's infinitely more prone to accidents than otherwise... another bridge is "closed" on full-of-traffic Sundays for bicycles... and another bridge-pass literally zigzags for entry to one major artery of the city at Los Cubos again increasing accidents in the zigzag portion).  There may be people that benefit, don't get me wrong: perhaps those that get the concession to build the bridge in the first place (often the highschooler cousin of the governor or mayor).

So as I walk to work, a little office but two blocks from my house, purposely selected to minimize my having to use a car or transport to get there, I have to traverse a street which has been fitted with a light.  It gets chaotic during a time of the day, and jams because of kindergarten schools, some 4 or 5 (which shouldn't be there in a residentially-zoned area).  A cycle of the four installed semaphores, from what I gather, lasts around 2 minutes, and it is apparently optimized so that the heaviest-traffic lane light lasts a little longer than the others.  I imagine this is the case for any crossroad lights, but it is working optimally only at heaviest traffic (and assuming that the heaviest traffic lane at high traffic times is always heavy in traffic);  when there is little traffic, cars still have to wait a better part of the 2 minute cycle for their turn.  In other words, the way the traffic lights are optimized works for a specific time of day and assuming heaviest traffic in a particular direction, which means it's suboptimal in all other cases.

So for the better part of a year now, maybe around 8 months, I have been thinking that lights should be adaptive, and of course I realize this is easier said than done.  In fact I have gone to friends at the local government to try to find ways to put my ideas into action, but to no satisfying results (it was actually dismissed outright).  I also tried entering a contest with this traffic idea (and a couple unrelated others).  The contest was called "Iniciativa Mexico," but it turned out to be a silly PR ploy that left many of us contestants further saddened and disillusioned at the state of our country (and questioning the lasting impact of the projects that did win).

Since it is my belief that this project could benefit all countries around the world, I'm making it available to the public through my weblog, on a couple conditions:

1.  That the use of these ideas and the use of ideas derived from these ideas be free, in that they generate no fee, because they are of benefit to the public and our world (in that these ideas intend to reduce CO2 and smog levels, for example), even if this means installation of special equipment on traffic lights and cars in general (such costs should be absorbed by the government without raising taxes for the purpose, by optimization of traffic budgets already in place).

2.  It is the understanding of private companies interested in implementing this system in cities and towns or traffic systems that they will obtain zero profit or will be at a loss in the production of traffic equipment that use these ideas or ideas derived from these.  Likewise, consulting services that use these ideas or ideas derived from these will be at zero profit for the private company, including analyses of traffic that use the following ideas or ideas derived from them.

3.  Effectively, these ideas and ideas derived from these should generate no monetary profit for anyone who wishes to use them, implement them, or otherwise benefit from their functioning or infrastructure anymore than the optimization of the traffic flow and CO2 level and smog level reduction, EXCEPT when optimized-with-these-ideas-or-derived-ideas traffic system is further proven, mathematically and through simulations and to the satisfaction of a majority of a committee of 7 Markov-trained and renown statisticians (who ask no money for this purpose), impossible, and implementation of a fee-system, for example in high-density areas, is a last-resort to regulate heavy traffic.

4. If any company or country or government, local or otherwise, obtains profit from these ideas or derived ideas, other companies or countries or governments should exert pressure so that the program remain free and of benefit to humanity.  The spirit of these clauses is that these ideas and derived ideas benefit the people of a country as a whole without incurring in extra cost to the people.  Private companies give infrastructure and consulting services because they benefit, and likewise for governments: there is no advantage to be taken except that which is beneficial for everyone who uses a car or other transport, in the form of better traffic flow, avoidance of traffic jams, and quicker or optimized movement to their destination.

5.  That being said, I reserve the right to modify the clauses as I see fit, in keeping with my original vision and a spirit of good-naturedness toward humanity and this world.

Our Earth needs a little breathing room, and this is my contribution to reducing the carbon dioxide signature, as best I know how currently.

We must think of a traffic network of a city as a system. Each car enters a "state" when it enters a new, directed-sense BLOCK (not street: a sense street is a state, a countersense same-street is another, within a block).  We can number the blocks of a city in any way we like, but it's best that we do so in a consistent manner: for example, suppose we are at a crossroads; there are 8 states if all streets are bidirectional, and so, going, say, clockwise, each state receives numbers 1 through 8.

If the optimal light cycle length is about two minutes (I surmise it so, I'm not sure it is; but the optimality should derive from, for example, psychological considerations on the amount of waiting time that reduces road-rage), let us declare this time as "universal time," in the sense that every two minutes we will monitor the current state of the system by sampling or calculating exactly how many cars are in which state (this will require cars to be fitted with, for example, RFID tags and scanners at the entrance of a street and block; not all cars necessarily but a substantial sample perhaps).  We seek to adjust, at every universal time-step, at an intersection, the proportion of time a light in a particular direction should stay on based on need.  This need could be something like this:

 L_1 = w_1 \cdot J + w_2 \cdot R + w_3 \cdot A

Where J is the possibility of a jam in the current (directed) state, R is the city-set preference (North-South streets may have preference, or toward-downtown streets may have the right-of way, e.g.), A is the average waiting time in a state is differing from the speed limit times length of block, corrected for outliers (one could use several techniques to eliminate or weigh-by-less parked vehicles that would contribute significantly to the waiting time average otherwise: as by a filter).  Other terms could be added but these seem to me the most significant.

The first term necessitates that we learn how to predict traffic jams.  But note that we know how many cars are in what state every 2 minutes or so.  The only thing we have to define is when a traffic jam occurs: perhaps if 80% of the (directed) street block capacity is exceeded.  So we need to know what the "working capacity" of such is: it's the length of the street divided by the average length of a car (buses and trucks could be excluded and then count as two cars), times the lanes that actually serve to move traffic: in most instances, two (the third lane may be reserved for parking; even a single parked car in the lane may disrupt flow enough to consider it not useful to move traffic).

 Wc = S_L \cdot C_L \cdot N_L

So let's therefore define a jam if  Load \geq 0.8 \cdot Wc .  We can also now define the "total working capacity of the system" as by  \sum_i Wc_i , which is independent of time because it's defined on (directed) street block static properties.  Each state now has a working capacity, so create the vector M with all capacity entries ordered by state, and let "m" be the normalized vector: divide all entries by the total working capacity of the system.  This vector gives us a threshold of the proportion of cars that each state can afford!  If we exceed it the system breaks because the total capacity of each street is exceeded.

Now on to the prediction bit.  Since we are sampling the state of the system every two minutes, we may have a history where we can count how many cars are at each time-step.  Let's suppose that we can count exactly where all cars are for now: we may have information on how many cars are in states 1-8 at  \ldots, t_{-2}, t_{-1} .  We may not have knowledge of  t_0 because that is information of the current ongoing cycle.  At any rate, we can form vectors  \ldots V(-2), V(-1) that describe the quantity of cars in each state.  In other words, we have the "total occupancy" of the system at any point in the past.  Let's take a look at the trajectory of a single car, C1.  Such a car may have been at t_{-2} in state 3 and at t_{-1} in state 1.  In other words, we can create the historical progress of C1 by tracing, at each 2 minute interval, its state:

C1 historical progress:  \{ \ldots, 3, 1 \} .

In reality, we don't need the whole historical progress, we just need the t_{-2} and t_{-1} state, because what we seek to count to produce a Markov chain matrix is how many cars jumped from what state to what state during the latest cycles.  For example, say we have several cars and three states:

C1: {1,2}

C2: {2,1}

C3: {1,1}

C4: {3,1}

C5: {1,2}

If we are to order by t_{-2} entries, we can quickly see how cars in state 1, say, have transitioned (moved) in traffic: 1/3 stayed in the same spot (awaiting the next cycle idly or parked; but let's say that there are no parked cars at the moment, for the sake of simplicity), 2/3 jumped to state 2.  We can do the same for the other states and obtain the following Markov matrix: the rows are the states, ordered in the standard way, and the columns are the states the cars transitioned to, also ordered in the standard way.

 P = \left( \begin{array}{ccc} \frac{1}{3} & \frac{2}{3} & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \end{array} \right)

So if cars move in the same proportion as they did in the previous couple light cycles, the powers of the matrix represent the proportions of cars at a particular state in the next time-step(s):  The second power of the matrix is the current light cycle system-wide state (the proportion of cars we can expect to see in each street-block state), the 3rd power is the next, the 4th power is the next next, and so on.  Let's say we can calculate up to the 11th power, or 10 steps-ahead.  This is a prediction of the state of the system 20 minutes forward in time! (Recall that a Markov matrix is really a convenient way to describe non-coupled linear differential equations). We can do it for more steps but now I want to explain what I meant by the J variable in the "need" calculation of the lights.

We said that at time  t_{-1} we can count the number of cars to obtain the vector V(-1), which tells us how many cars are in each state at that point in time, right?  If we normalize this vector by summing all entries, then dividing each entry by the total, we obtain the initial state vector v(-1).  So now

 v(-1) \cdot P = v(0)

gives the current proportion across states.  Similarly,  v(-1) \cdot P^2 = v(1), v(-1) \cdot P^3 = v(2), \ldots and so on.  If at any entry of the vectors v_i(t) \leq .8 m_i , we have a jam because that's how we defined a jam (exceeding 80% of the working capacity, the threshold).  Indeed the jam is at state  i and at time  t .

Since we can now see a jam at any state 20 (say) minutes in advance (on a per cycle of the semaphore lights), we can now qualify the need of traffic light L_1 with respect to potential traffic jams.  Say, if the traffic jam is expected in the current about-to-begin time-cycle, give it a grade of 10.  If it's in the next-next, give it a 9, and so, the 9th time-cycle (20 minutes from now) can be given a grade of 1. If no jam is expected in 9+1 cycles, give it a grade of 0. Since we are grading on a scale of 10, let the variable J be the proportion of the grade over the maximal score.  Thus J is higher if a jam is imminent, and lower if it's predicted within the next 20 minutes.  Each light at an intersection can be therefore given a "jam need" score, which is to say when will the state they control exceed  80 percent the working capacity, i.e., the threshold value.

The "preference" or right-of-way score can be simply 1 or 0 depending.

How the average differs from the maximal traversal time, the variable I gleefully assigned the letter A, may be a bit controversial, in the sense that, if you notice, it's not necessarily orthogonal to the jam variable J: the jam variable J measures predictively how a state or street-block is going to jam in a long-while; that is, it will measure the accumulation of the proportion of cars on that state after a time period, so it indirectly measures the waiting time at that state.  But explicitly calculating the deviation from the "city-proposed time" it should take to traverse a block-street, this "proposed time" being the length of the street times the reciprocal of the speed limit, may give additional information (than redundant) as to the immediate waiting times (within a couple traffic light cycles, than 10).  The stronger the deviation, the higher the need of the traffic light to be green longer. J also doesn't really take into account "empty" streets that may suddenly be filled in a cycle, which A might just catch.  So then let A be the percent deviation from the "proposed time."

How can we calculate the deviation from the "city-proposed-time"?  If each car history contains the additional information of when the car transitioned to a given state, (recall each position contains the state and the order describes which 2-minute universal time interval it belongs to, up to the 1-current), then we can enrich our knowledge, as by (assume universal time is every two minutes from the o'clock):

Car 1: {... 1 at 12:01:00 (uni 12:00:00 - 12:01:59), 2 at 12:02:05 (uni 12:02:00 - 12:03:59)}

Car 2: {... 1 at 12:05:40 (uni 12:04:00 - 12:05:59),  1 still (uni 12:06:00 - 12:07:59), 1 still (uni 12:08:00 - 12:09:59), 5 at 12:10:00 (uni 12:10:00 -12:11:59)}, etc.,

In this case when the current time measurement needs to be done we look at a 1-step lag transition time, so for Car 1 it would be: 1 minute 5 seconds stayed in state 1 (for uni step 12:04:00), and for Car 2 we would measure for uni step 12:12:00, 4 min 30 seconds in state 1.  We must look at when an actual transition has occurred for this, as we are looking for the time it takes to "jump" to another state.  There is the issue of getting rid of outlier influence, as by parked cars, but I suppose there are statistical methods to do so: using robust statistics, like the median time, or other Winsorized or trimmed estimators.

Once we calculate the need for each traffic light at an intersection (say 4), we need assign only the proportional amount of time each light should be green.  I assume a circular transition: a green light means go forward, left, right, or U-turn, and no other light can be green at the same time.  If we weigh each variable J, A, R (jar-jar!) equally, then each light has need:

L_i = \frac{1}{3} J + \frac{1}{3} A + \frac{1}{3} R with J, A, R can range between zero and one.

The proportion of time of a cycle a particular light should be green is thus:  \frac{L_i}{\sum_i L_i} with i is the number of traffic lights at an intersection, which is usually 4.  Higher-need lights will stay green longer (and it's possible that a light be green for longer than 1 cycle).  This has the happy consequence of maximizing the probability that a car find a traffic light is green in its traffic trajectory if: 1), a jam is prognosticated, for the effects of avoiding it, 2) the car is in a right-of-way street, and/or 3) there's a blockage that trumps the traffic flow that is relatively short-lived.

Having the ability to forecast jams gives us the tremendous ability of being able to alert the driver that such may happen and to convey to him, from BLOCKS AWAY, that he may want to choose alternative routes.  Personally, I would add informative LEDs under every light that say jam ahead, jam east, jam west.  Calculation of a Markov matrix is done, of course, at every 2 minute interval, and thus information is updated continually.

This sort of analysis with a Markov transition matrix has several cool consequences, in addition to those stated here.  Of course there aren't lights at every intersection, but inspection of the system could show systematic jam accumulations at intersections with no lights.  One could therefore install a traffic light at that intersection, or, better yet, run a simulation as to what would happen if a light were there.  Perhaps the light is not needed at exactly that intersection, but nearby, thereby liberating jams at other intersections down the street-way.  In any case, the Markov transition matrix allows us to simulate "what if" scenarios in flexible ways.  Another example: one has now criteria to evaluate whether speed bumps are necessary in certain streets: the net effect is to slow traffic so that more cars proportionately remain in a particular state or street block.  One can now understand if putting a speed bump or removing it in a street will have positive or detrimental effects for the system overall.  It could be more than one is needed!  In yet another example, it is often claimed that adding a roundabout (traffic circle, rotary) makes traffic more efficient.  Segmenting the roundabout into states can show this to be true or false within a traffic system.  It may be true in particular places, and false in others.  Adding a bridge or traffic level to alleviate traffic problems can now also be simulated with an additional entry on the Markov matrix: will it in effect better the flow, or make it worse?  This method allows us the systematic analysis that traffic needs in many cities around the world.

The traffic system as a whole can now function adaptively, in the sense that each light competes with others at the intersection for "green time," and this will depend largely on the time of day (more or less cars) and their aggregate movement around the city.  The adaptivity of the system-wide traffic lights to their need gives them flexibility and the system becomes a moving "organism" that, in adapting to the traffic, will more-efficiently-than-the-current system move cars around.  Dynamic Programming methods and theorems suggest that, by optimizing individually each intersection and traffic cycle, we arrive at the global optimal flow of traffic system-wide.

If this method of analyzing traffic is reminiscent to you of quantum mechanics, do tell.  We move proportions around in this method, but they correlate with actual physical objects (cars) in aggregate.  I think this is very cool and wonder if we cannot apply this in other subjects where flow across a network is important (but I'm not too learned about networks in general :( ).

In a next post, I think I may want to post an Excel sheet with an 8-state and 22-state system based on nearby blocks to my house, to make things a bit clearer.

On Eigen(patch(ix))values, II - (RWLA,MCT,GT,AM Part IX)

March 22nd, 2011 No comments

So remember my little conjecture from last time, that the number of patch(ix) (kernel) eigenvalues would depend on the number of x terms that composed it?  I started working it out by writing all expressions and trying to substitute them and I got sums of sums of sums and it became nightmarish, and since math is supposed to be elegant, I opted for a different track.  A little proposition did result, but I'm not sure yet if it means what I want it to mean. Haha.

If you recall, last time we figured that

 B_1 = \frac{B_2 \sum_{i=0}^\infty f_2^i(1-y)G_1^{i+1}(y)\vert_0^1}{\lambda - \sum_{i=0}^\infty f_1^i(1-y)G_1^{i+1}(y)\vert_0^1}

and

 B_2 = \frac{B_1 \sum_{i=0}^\infty f_1^i(1-y) G_2^{i+1}(y)\vert_0^1}{\lambda - \sum_{i=0}^\infty f_2^i(1-y) G_2^{i+1}(y)\vert_0^1}

Let's rename the sums by indexing over the subscripts, so that

 \begin{array}{ccc} C_{1,1} & = &\sum_{i=0}^\infty f_1^i(1-y)G_1^{i+1}(y)\vert_0^1 \\ C_{1,2} & = &\sum_{i=0}^\infty f_1^i(1-y) G_2^{i+1}(y)\vert_0^1 \\ C_{2,1} & = &\sum_{i=0}^\infty f_2^i(1-y)G_1^{i+1}(y)\vert_0^1 \\ C_{2,2} & = &\sum_{i=0}^\infty f_2^i(1-y) G_2^{i+1}(y)\vert_0^1 \end{array}

Renaming therefore the constants we get:

 B_1 = \frac{B_2 C_{2,1}}{\lambda - C_{1,1}}

and

 B_2 = \frac{B_1 C_{1,2}}{\lambda - C_{2,2}}

Last time we substituted one equation into the other to figure additional restrictions on  \lambda .  A faster way to do this is to  notice:

 \left( \lambda - C_{1,1} \right)B_1 = B_2 C_{2,1}

and

 \left( \lambda - C_{2,2} \right) B_2 = B_1 C_{1,2}

If we multiply these two expressions we get

 \left( \lambda - C_{1,1} \right)\left( \lambda - C_{2,2} \right) B_1 B_2 = B_1 B_2 C_{1,2} C_{2,1}

Finally, dividing out both  B_1, B_2 we arrive at the quadratic expression on  \lambda of before:

 \left( \lambda - C_{1,1} \right)\left( \lambda - C_{2,2} \right) = C_{1,2} C_{2,1}

Now.  Let's posit that, instead of  a(x) = B_1 f_1(x) + B_2 f_2(x) we have  a^*(x) = B_1 f_1(x) + B_3 f_3(x) .  Then by all the same arguments we should have an expression of  B_1 that is the same, and an expression of  B_3 that is:

 \left( \lambda - C_{3,3} \right) B_3 = B_1 C_{1,3}

with the similar implication that

 \left( \lambda - C_{1,1} \right)\left( \lambda - C_{3,3} \right) = C_{1,3} C_{3,1}

An  a^{**}(x) = B_2 f_2(x) + B_3 f_3(x) would give the implication

 \left( \lambda - C_{2,2} \right)\left( \lambda - C_{3,3} \right) = C_{2,3} C_{3,2}

If we are to multiply all similar expressions, we get

 \left( \lambda - C_{1,1} \right)^2\left( \lambda - C_{2,2} \right)^2 \left( \lambda - C_{3,3} \right)^2 = C_{1,2} C_{2,1}C_{1,3} C_{3,1}C_{2,3} C_{3,2}

or

 \left( \lambda - C_{1,1} \right) \left( \lambda - C_{2,2} \right) \left( \lambda - C_{3,3} \right) = \sqrt{C_{1,2} C_{2,1}C_{1,3} C_{3,1}C_{2,3} C_{3,2}}

In other words, we want to make a pairwise argument to obtain the product of the  \lambda -expressions and a polynomial in  \lambda .  Next I'd like to show the proposition:

 \left( \lambda - C_{1,1}\right) \cdot \left( \lambda - C_{2,2} \right) \cdot \ldots \cdot \left( \lambda - C_{n,n} \right) = \sqrt[n-1]{\prod_{\forall i, \forall j, i \neq j}^n C_{i,j}}

and for this I want to begin with a combinatorial argument.  On the left hand side, the number of pairwise comparisons we can make depends on the number of  \lambda factors of the  \lambda polynomial (or, the highest degree of the  \lambda polynomial).  That is to say, we can make  \binom{n}{2} pairwise comparisons, or  \frac{n!}{(n-2)!2!} = \frac{n (n-1)}{2} comparisons.  Now, I don't know whether anyone has ever noticed this, but this last simplified part looks exceptionally like Gauss's sum of consecutive integers (pyramidal series), so in other words, this last part is in effect  \sum_{i=1}^{n-1} i which I find very cool, because we have just shown, quite accidentally, the equivalence:

 \binom{n}{2} = \binom{n}{n-2} = \sum_{i=1}^{n-1} i

The way I actually figured this out is by noticing that, in our pairwise comparisons, say for the 3rd-degree-polynomial-in- \lambda case, by writing the pairwise comparisons first of the  (\lambda - C_{1,1}) products, then of the  (\lambda - C_{2,2}) (in other words, ordering logically all  \binom{3}{2} products), there were 2 of the first and 1 of the second (and none of the  (\lambda - C_{3,3}) ).  If we do the same for the 4th-degree, there are 3 of the  (\lambda - C_{1,1}) , 2 of the  (\lambda - C_{2,2}) , and 1 of the  (\lambda - C_{3,3}) , with none of the  (\lambda - C_{4,4}) .  In other words, the  \binom{4}{2} pair-products could be written as the sum of the cardinality of the groupings:  3 + 2 + 1 .

Now Gauss's sum of integers formula is already known to work in the general case (just use an inductive proof, e.g.), so the substitution of it into the binomial equivalence needs no further elaboration: it generalizes automatically for all  n .

So if we are to multiply all pairwise comparisons, notice there will be  n - 1 products of each  \lambda -factor: there are  n - 1 products belonging to the  (\lambda - C_{1,1}) grouping (because this first grouping has n-1 entries, from the Gauss formula equivalence), there are  n - 2 products belonging to the  (\lambda - C_{2,2}) PLUS the one already counted in the  (\lambda - C_{1,1}) grouping, for a total of, again,  n - 1 .  The  kth grouping  (\lambda - C_{k,k}) has  n - k products listed for itself PLUS one for each of the previous k - 1 groupings, for a total of  n - k + k - 1 = n - 1, and the k+1th grouping  (\lambda - C_{k+1, k+1}) has  n - (k+1) products listed for itself PLUS one for each of the previous  k groupings, for a total of n - (k+1) + k = n - 1.  We are left in effect with:

 (\lambda - C_{1,1})^{n-1} \cdot(\lambda - C_{2,2})^{n-1} \cdot \ldots \cdot (\lambda - C_{n,n})^{n-1}

The right hand side of each pairwise comparison was nothing more than the simple product on the cross indexes of  C , so it's not difficult to argue then that, if we multiply  \binom{n}{2} such pairs, we get  \prod_{\forall i, \forall j, i \neq j}^n C_{i,j} .   We then take the  n-1 th root on both sides of the equation.

Since the  n + 1 case follows the same basic structure of the argument, we are done with proving our proposition.

What I want this proposition to mean may be very different than what it actually is, I'm hopeful nevertheless but I agree that it requires a bit of further investigation.  As I hinted before, I would like that

 \left( \lambda - C_{1,1}\right) \cdot \left( \lambda - C_{2,2} \right) \cdot \ldots \cdot \left( \lambda - C_{n,n} \right) = \sqrt[n-1]{\prod_{\forall i, \forall j, i \neq j}^n C_{i,j}}

with, for example,  n = 3 represent the constraint on the eigen(patch(ix))values of  a^\circ = B_1 f_1(x) + B_2 f_2(x) + B_3 f_3(x) or, if not that, maybe  a^\circ_\star = a(x) + a^*(x) + a^{**}(x) = 2B_1 f_1(x) + 2 B_2 f_2(x) + 2 B_3 f_3(x) , which brings into question the superposition of functions and their effect on the eigenvalues.  I may be wildly speculating, but hey!  I don't really know better!  I'll do a few experiments and see what shows.

On Eigen(patch(ix))values - (RWLA,MCT,GT,AM Part VIII)

March 16th, 2011 No comments

So in the continuation of this series, I have been thinking long and hard about the curious property of the existence of eigen(patch(ix))values that I have talked about in a previous post.  I began to question whether such eigen(patch(ix))values are limited to a finite set (much as in finite matrices) or whether there was some other fundamental insight, like, if 1 is an eigen(patch(ix))value, then all elements of  \mathbb{R} are too (or all of  \mathbb{R} minus a finite set).  In my latest attempt to understand this, the question comes down to, using the "star" operator, whether

 a(x) \star p(x,y) = \lambda a(x)

has discrete values of  \lambda or, "what values can lambda take for the equation to be true," in direct analogy with eigenvalues when we're dealing with discrete matrices.  I am not using yet "integral transform notation" because this development seemed more intuitive to me, and thus I'm also limiting the treatment to "surfaces" that are smooth and defined on  [0,1] \times [0,1] , like I first thought of them. Thus, the above equation translates to:

 \int_0^1 a(1-y) p(x,y) dy = \lambda a(x)

and, if we recall our construction of the patch (or patchix if we relax the assumption that integrating with respect to x is 1)  p(x,y) = f_1(x) g_1(y) + f_2(x) g_2(y) :

 \begin{array}{ccc} \lambda a(x) & = &\int_0^1 a(1-y) \left(f_1(x) g_1(y) + f_2(x) g_2(y) \right) dy \\ & = & f_1(x) \int_0^1 a(1-y) g_1(y) dy + f_2(x) \int_0^1 a(1-y) g_2(y) dy \\ & = & B_1 f_1(x) + B_2 f_2(x) \end{array}

where  B_1, B_2 are constants.  It is very tempting to divide  \lambda as

 a(x) = \frac{B_1}{\lambda} f_1(x) + \frac{B_2}{\lambda} f_2(x)

must hold provided  \lambda \neq 0 .  So we have excluded an eigen(patch(ix))value right from the start, which is interesting.

We can systematically write the derivatives of  a(x) , as we're going to need them if we follow the algorithm I delineated in one of my previous posts (NB: we assume a finite number of derivatives or periodic ones, or infinite derivatives such that the subsequent sums we'll write are convergent):

 \begin{array}{ccc} a(x) & = & \frac{B_1}{\lambda} f_1(x) + \frac{B_2}{\lambda} f_2(x) \\ a'(x) & = & \frac{B_1}{\lambda} f'_1(x) + \frac{B_2}{\lambda} f'_2(x) \\ a''(x) & = & \frac{B_1}{\lambda} f''_1(x) + \frac{B_2}{\lambda} f''_2(x) \\ \vdots & \vdots & \vdots \\ a^k(x) & = & \frac{B_1}{\lambda} f^k_1(x) + \frac{B_2}{\lambda} f^k_2(x) \\ \vdots & \vdots & \vdots \end{array}

provided, as before,  \lambda \neq 0 .  We want to calculate the constants  B_1, B_2 , to see if they are restricted in some way by a formula, and we do this by integrating by parts as we did in a previous post to obtain the cool "pasquali series." Thus, we have that if  B_1 = \int_0^1 a(1-y) g_1(y) dy , the tabular method gives:

 \begin{array}{ccccc} \vert & Derivatives & \vert & Integrals & \vert \\ \vert & a(1-y) & \vert & g_1(y) & \vert \\ \vert & -a'(1-y) & \vert & G_1^1(y) & \vert \\ \vert & a''(1-y) & \vert & G_1^2(y) & \vert \\ \vert & \vdots & \vert & \vdots & \vert \end{array}

and so,

 \begin{array}{ccc} B_1 & = & \int_0^1 a(1-y) g_1(y) dy \\ & = & a(1-y) G_1^1(y) \vert_0^1 + a'(1-y) G_1^2(y) \vert_0^1 + \ldots \\ & = & \sum_{i = 0}^\infty a^i(1-y) G_1^{i + 1} \vert_0^1 \end{array}

if we remember the alternating sign of the multiplications, and we are allowed some leeway in notation.  Ultimately, this last bit means:  \sum_{i=0}^\infty a^i(0) G_1^{i+1}(1) - \sum_{i=0}^\infty a^i(1) G_1^{i+1}(0) .

Since we have already explicitly written the derivatives of  a(x) , the  a^i(0), a^i(1) derivatives can be written as  \frac{B_1}{\lambda} f_1^i(0) + \frac{B_2}{\lambda} f_2^i(0) and  \frac{B_1}{\lambda} f_1^i(1) + \frac{B_2}{\lambda} f_2^i(1) respectively.

We have then:

 B_1 = \sum_{i=0}^\infty \left( \frac{B_1}{\lambda} f_1^i(0) + \frac{B_2}{\lambda} f_2^i(0) \right) G_1^{i+1}(1) - \sum_{i=0}^\infty \left( \frac{B_1}{\lambda} f_1^i(1) + \frac{B_2}{\lambda} f_2^i(1) \right) G_1^{i+1}(0)

Since we aim to solve for  B_1 , multiplying by  \lambda makes things easier, and also we must rearrange all elements with  B_1 in them, so we get:

 \lambda B_1 = B_1 \sum_{i=0}^\infty \left( f_1^i(0) G_1^{i+1}(1) - f_1^i(1) G_1^{i+1}(0) \right) + B_2 \sum_{i=0}^\infty \left( f_2^i(0) G_1^{i+1}(1) - f_2^i(1) G_1^{i+1}(0) \right)

Subtracting both sides the common term and factoring the constant we endeavor to solve for, we get:

 \left( \lambda - \sum_{i=0}^\infty \left( f_1^i(0) G_1^{i+1}(1) - f_1(1) G_1^{i+1}(0) \right) \right) B_1 = B_2 \sum_{i=0}^\infty \left(f_2^i(0) G_1^{i+1}(1) - f_2^i(1) G_1^{i+1}(0) \right)

or

 B_1 = \frac{B_2 \sum_{i=0}^\infty f_2^i(1-y) G_1^{i+1}(y) \vert_0^1}{\lambda - \sum_{i=0}^\infty f_1^i(1-y) G_1^{i+1}(y) \vert_0^1} = \frac{B_2 D}{\lambda - C}

A similar argument for  B_2 suggests

 B_2 = \frac{B_1 \sum_{i=0}^\infty f_1^i(1-y) G_2^{i+1}(y) \vert_0^1}{\lambda - \sum_{i=0}^\infty f_2^i(1-y) G_2^{i+1}(y) \vert_0^1} = \frac{B_1 E}{\lambda - F}

where the new constants introduced emphasizes the expectation that the sums converge.  Plugging in the one into the other we get:

 B_1 = \frac{\left( \frac{B_1 E}{\lambda - F} \right) D}{\lambda - C} = \frac{B_1 E D}{(\lambda - F) (\lambda - C)}

and now we seem to have additional restrictions on lambda:  \lambda \neq F and  \lambda \neq C .  Furthermore, the constant  B_1 drops out of the equation, suggesting these constants can be anything we can imagine (all of  \mathbb{R} without restriction), but then we have the constraint:

 (\lambda - F)(\lambda - C) = ED

which is extraordinarily similar to its analogue in finite matrix or linear algebra contexts.  Expanding suggests:

 \lambda^2 - (F + C) \lambda + (CF - ED) = 0

which we can solve by the quadratic equation of course, as:

 \lambda_{1,2} = \frac{(F + C) \pm \sqrt{(F-C)^2 + 4ED} }{2}

So not only is  \lambda not equal to a few values, it is incredibly restricted to two of them.

So here's a sort of conjecture, and a plan for the proof.  The allowable values of  \lambda is equal to the number of x terms  a(x) (or  p(x,y) ) carries.  We have already shown the base case, we need only show the induction step, that it works for  k and  k+1 terms.

On Patch(ix)es as Kernels of Integral Transforms (RWLA,MCT,GT,AM Part VII)

February 7th, 2011 No comments

[This post is ongoing, as I think of a few things I will write them down too]

So just a couple of days ago I was asked by a student to give a class on DEs using Laplace transforms, and it was in my research that I realized that what I've been describing by converting a probability distribution on [0,1] to another is in effect a transform (minus the transform pair, which was unclear to me how to obtain, corresponding perhaps to inverting the patch(ix)).  The general form of integral transforms is, according to my book Advanced Engineering, 2nd ed., by Michael Greenberg p. 247:

 F(s) = \int_a^b f(t) K(t,s) dt , where  K(t,s) is called the kernel of the transform, and looks an awful lot like a function by patch(ix) "multiplication," which I described as:

 b(x) = \int_0^1 a(1-y) p(x,y) dy you may recall.  In the former context  p(x,y) looks like a kernel, but here  a(1-y) is a function of  y than of  x , and I sum across  y .  To rewrite patch(ix)-multiplication as an integral transform, it would seem we need to rethink the patch position on the xy plane, but it seems easy to do (and we do in number 1 below!).

In this post I want to (eventually be able to):

1. Formally rewrite my function-by-patch(ix) multiplication as a "Pasquali" integral transform.

If we are to modify patch multiplication to match the integral transform guideline, simply think of  p(t,s) as oriented a bit differently, yielding the fact that  \int_0^1 p(t,s) ds = 1 for any choice of  t .  Then, for a probability distribution  b(t) in [0,1], the integral transform is  B(s) = \int_0^1 b(t) p(t,s) dt .  Now  p(t,s) is indeed then a kernel.

2. Extend a function-by-patch multiplication to probability distributions and patches on all  \mathbb{R} and  \mathbb{R}^2 , respectively.

When I began thinking about probability distributions, I restricted them to the interval [0,1] and a patch on  [0,1] \times [0,1] , to try to obtain a strict analogy of (continuous) matrices with discrete matrices.  I had been thinking for a while that this need not be the case, but when I glanced at the discussion of integral transforms on my Greenberg book, and particularly the one on the Laplace transform, I realized I could have done it right away.  Thus, we can redefine patch multiplication as

 B(s) = \int_{-\infty}^{\infty} b(t) p(t,s) dt

with

 \int_{-\infty}^{\infty} p(t,s) ds = 1

3. Explore the possibility of an inverse-patch via studying inverse-transforms.

3a. Write the patch-inverse-patch relation as a transform pair.

4. Take a hint from the Laplace and Fourier transforms to see what new insights can be obtained on patch(ix)es (or vice-versa).

Vice-versa: Well one of the things we realize first and foremost, is that integral transforms are really an extension of the concept of matrix multiplication: if we create a matrix "surface" and multiply it by a "function" vector we obtain another "function," and the kernel (truly a continuous matrix) is exactly our path connecting the two.  Can we not think now of discrete matrices (finite, infinite) as "samplings" of such surfaces?  I think so.  We can also combine kernels with kernels (as I have done in previous posts) much as we can combine matrices with matrices.  I haven't really seen a discussion exploring this in books, which is perhaps a bit surprising.  At any rate, recasting this "combination" shouldn't be much of a problem, and the theorems I proved in previous posts should still hold, because the new notation represents rigid motions of the kernel, yielding new kernel spaces that are isomorphic to the original.

On Patch Stationariness (RWLA,MCT,GT,AM Part VI)

January 16th, 2011 No comments

In my previous posts, I have been discussing how we can extend functional analysis a little bit by "inventing" continuous matrices (surfaces) which contain all the information we may want on how to transform, in a special case, probability distributions from one to another, and I have tried, by reason of analogy, to extend Markov theory as well.  In this special case, I have been talking about how a surface "continuous collection of distributions" can reach steady-state: by self-combining these surfaces over and over; I even showed how to obtain a couple steady-states empirically by calculating patch powers specifically and then attempting to infer the time evolution, quite successfully in one case. The usual Markov treatment suggests another way to obtain the steady-state (the limiting transition probability matrix), by finding a stationary distribution so that left multiplying the vector  \mathbf{\widehat p}  by the transition probability matrix  P gives us  \mathbf{\widehat p} .  Within the discrete transition probability matrix context, a vector  \mathbf{\widehat p} with this property is also a (left) eigenvector of  P with eigenvalue 1.  See for example Schaum's series Probability, Random Variables, and Random Processes p. 169, as well as Laurie Snell's chapter 11 on Markov Chains on his online Probability book. An important theorem says that the limiting transition probability matrix  \lim_{n \rightarrow \infty} P^n = \mathbf{\widehat P} is a matrix whose rows are identical and equal to the stationary distribution  \mathbf{\widehat p} .  To calculate the stationary distribution (and the limiting transition probability matrix) one would usually solve a system of equations. For example, if:

 P = \left[ \begin{array}{cc} \frac{3}{4} & \frac{1}{4} \\ \frac{1}{2} & \frac{1}{2} \end{array} \right]

the stationary distribution

 \mathbf{\widehat p} P = \mathbf{\widehat p}

looks explicitly like:

 \left[ \begin{array}{cc} p_1 & p_2 \end{array} \right] \left[ \begin{array}{cc} \frac{3}{4} & \frac{1}{4} \\ \frac{1}{2} & \frac{1}{2} \end{array} \right] = \left[ \begin{array}{cc} p_1 & p_2 \end{array} \right]

in other words, the system:

 \frac{3}{4} p_1 + \frac{1}{2} p_2 = p_1

 \frac{1}{4} p_1 + \frac{1}{2} p_2 = p_2

each of which gives  p_1 = 2 p_2 and is solvable if we notice that  p_1 + p_2 = 1 , yielding  \mathbf{\widehat p} = \left[ \begin{array}{cc} \frac{2}{3} & \frac{1}{3} \end{array} \right] , and

 \mathbf{\widehat P} = \left[ \begin{array}{c} \mathbf{\widehat p} \\ \mathbf{\widehat p} \end{array} \right]

In this post, I want to set up an algorithm to calculate the stationary surface (steady-state) of patches as I've defined them, following in analogy the above argument.  To do so, I revisit both of my previous examples, now calculating the steady state from this vantage point.  The fact that we can define such an algorithm in the first place has ginormous implications, in the sense that we can define stationary function distributions that would seem therefore to be eigen(patch(ix))vectors (corresponding to eigen(patch(ix))values) of surface distributions, and we can seemingly also solve a continuously infinite quantity of independent equations, however strange this actually sounds.

Example 1, calculating the stationary patch  p_{\infty}(x,y) when  p_1(x,y) = 2 x - \frac{2 x y^3}{3} + x^2 y^3 .

I have already shown that  p_1(x,y) is indeed a patch because  \int_0^1 p_1(x,y) dx = 1 , for any choice of  y .

Suppose there exists a distribution defined as always on  x \in [0,1] , say  a(x) , so that

 a(x) \star p(x,y) = a(x) .  Explicitly,  \int_0^1 a(1 - y) \cdot \left( 2 x - \frac{2 x y^3}{3} + x^2 y^3 \right) dy = a(x)  We can break up the integral as:

 2 x \int_0^1 a(1-y) dy - \frac{2 x}{3} \int_0^1 y^3 a(1-y) dy + x^2 \int_0^1 y^3 a(1-y) dy = a(x)

The first part, we've seen many times, adds up to one because  a(x) is a probability distribution, so let's rewrite the whole thing as:

 2 x - \left( \frac{2 x}{3} - x^2 \right) \int_0^1 y^3 a(1 - y) dy = a(x)

The integral is in reality just a constant, so we have that a(x) looks something like:

 2 x - \left( \frac{2 x}{3} - x^2 \right) B = a(x) if we let

 B = \int_0^1 y^3 a(1 - y) dy

Now this integral in  y , though it is a constant, is seemingly impossible to solve without more knowledge of  a(1-y) ; but the truth of the matter is we have everything we need because we have a specification of  a(x) .  The crucial thing to notice is that derivatives of  a(x) do not exist "eternally," because  a(x) is a polynomial of maximal degree 2.  Thus we can attempt integration by parts and try to see where this takes us.  The tabular method gives us an organized way to write this out:

 \begin{array}{ccccc} | & Derivatives & | & Integrals & | \\ | & a(1-y) & | & y^3 & | \\ | & -a'(1-y) & | & \frac{y^4}{4} & | \\ | & a''(1-y) & | & \frac{y^5}{20} & | \\ | & 0 & | & \frac{y^6}{120} & | \\ | & \vdots & | & \vdots & | \end{array}

and, remembering the alternating sign when we multiply, we get the series:

 \frac{a(1-y) y^4}{4} + \frac{a'(1-y) y^5}{20} + \frac{a''(1-y) y^6}{120} + 0 + \ldots \arrowvert_0^1

The zeroth substitution of the lower limit of the integral gives us all zeroes, but the one-substitution gives us the interesting "pasquali series":

 \frac{a(0) }{4} + \frac{a'(0)}{20} + \frac{a''(0)}{120} + 0 + \ldots

which asks of us to evaluate  a(x) and its derivatives (until just before it vanishes) at zero:

 \begin{array}{ccc} a(x) & = & 2 x - \left( \frac{2 x}{3} - x^2 \right) B \\ a'(x) & = & 2 - \frac{2 B}{3} + 2 B x \\ a''(x) & = & 2 B \end{array}

 \begin{array}{ccc} a(0) & = & 0 \\ a'(0) & = & 2 - \frac{2 B}{3} = \frac{6 - 2 B}{3}\\ a''(0) & = & 2 B \end{array}

All that's left now is to substitute back into the series:

 B = \frac{\frac{6 - 2 B}{3}}{20} + \frac{2 B}{120} = \frac{1}{10} - \frac{B}{60} solves to  B = \frac{6}{61} which is what we want (I tested the following code with Wolfram Alpha: "integrate [[2(1-y) - (.0983607)(((2(1-y))/3) - (1-y)^2)]*[2x - (2 x y^3)/3 + x^2 y^3]] dy from y = 0 to 1" and obtained the same numeric decimal value at the output).

We have therefore that  a(x) = x - \left( \frac{2 x}{3} - x^2 \right) \frac{6}{61} is a stationary distribution, and the steady-state patch would seem to be  p_\infty(x,y) = a(x) .  I personally think this is very cool, because it validates several propositions: that we can find steady-state patches analytically (even when we may think we have a (continuously) infinite system to solve, it will reduce essentially to a (countable!) series estimable provided the "pasquali" series converges) by a means other than finding the patch powers and attempting to see a pattern, prove perhaps by induction, and then take the limit as patch powers go to infinity, much as I did in my previous post.  It also validates the "crazy" idea that (certain?) special surfaces like patches have eigen(patch(ix))vectors, as arguing  a(x) would suggest exist, and which we would have to obtain, in discrete matrixes, by solving a finite system of equations (and which we did here, again, by solving the "pasquali" series).

Example 2.  In my second example, take the patch  1 - cos(2 \pi x) cos( 2 \pi y) . Again we are looking at a patch because  \int_0^1 p(x,y) dx = 1 for any value of  y .  To establish the steady-state surface, or  p_\infty(x,y) , we proceed as before and write

 \int_0^1 a(1-y) \left( 1 - cos(2 \pi x) cos(2 \pi y) \right)dy = a(x) , or, explicitly,

 \int_0^1 a(1-y) dy - cos(2 \pi x) \int_0^1 a(1-y) cos(2 \pi y) dy = a(x)

The first integral adds up to 1 by hypothesis, where the second one is zero after integrating by parts:

 \begin{array}{ccccc} | & Derivatives & | & Integrals & | \\ | & a(1-y) & | & cos(2 \pi y) & | \\ | & -a'(1-y) & | & \frac{sin(2 \pi y)}{2 \pi} & | \\ | & a''(1-y) & | & \frac{-cos(2 \pi y)}{4 \pi} & | \\ | & -a'''(1-y) & | & \frac{-sin(2 \pi y)}{8 \pi} & | \\ | & \vdots & | & \vdots & | \end{array}

so we have:

 \frac{a(1-y) sin(2 \pi y)}{2 \pi} - \frac{a'(1-y) cos(2 \pi y)}{4 \pi} - \frac{a''(1-y) sin (2 \pi y)}{8 \pi} + \ldots \vert_0^1 and the awesome-slash-interesting "pasquali series"

 -\frac{a'(0)}{4 \pi} + \frac{a'''(0)}{16 \pi} - \frac{a^v (0)}{64 \pi} + \ldots from which we must subtract by the Fundamental Theorem of Calculus

 -\frac{a'(1)}{4 \pi} + \frac{a'''(1)}{16 \pi} - \frac{a^v(1)}{64 \pi} + \ldots

So we are left with  B = \frac{a'(1)}{4 \pi} - \frac{a'(0)}{4 \pi} + \frac{a'''(0)}{16 \pi} - \frac{a'''(1)}{16 \pi} + \frac{a^v(1)}{64 \pi} - \frac{a^v(0)}{64 \pi} + \ldots and also with

 \begin{array}{ccc} a(x) & = & 1 - cos(2 \pi x) B \\ a'(x) & = & 2 \pi sin(2 \pi x) B \\ a''(x) & = & 4 \pi cos(2 \pi x) B \\ a'''(x) & = & -8 \pi sin(2 \pi x) B \\ \vdots & \vdots & \vdots \end{array}

To show this thoroughly, we should prove by induction that every odd derivative of  a(x) contains a  sin term (or we can attempt an argument by periodicity of the derivative, as we do), and so evaluating such at 0 and at 1 literally causes the term to vanish, and leaving us with the fact that  B = 0 and that  a(x) = 1 .  Therefore, as before,  p_\infty(x, y) = a(x) = 1 , and this is consistent with my derivation in the previous post, too.