### Archive

Posts Tagged ‘probability’

## On Scaled Patches and Time-Warping

Suppose we have an idealized canal of width $1$, on which a fluid flow has been established in some remote past. Let us focus solely on the dynamics of the surface.  Pick a spot along the canal which we will call $t_0$.  Next pick a spot $y_0 \in [0,1]$ along the width of the canal, which we will monitor.  Pick a second spot $t_1$ down the canal, some distance from the original spot we picked.  Now let us assume that, up the canal at some remote point, a paper boat has been released.  We will only care about the boat if the boat passes through $(t_0, y_0)$ which we have picked, and we will write down the resultant position at $t_1$.  Let us do this a number of times with any number of boats, and obtain a distribution of the position of the boat at $t_1$, saving it. Next let us repeat the experiment, this time focusing on $y_0^\prime$, save the resultant distribution at $t_1$, and so on and so forth, until we are comfortable having mapped the totality of positions at $t_0$.  Let us next put together (stack, respecting the natural order of $y$) all the distributions we obtained at $t_1$.  We now have a discrete surface which we can smooth to obtain a Pasquali patch.

Let us now look at position $t_2$ which is the same distance as $t_1$ is from $t_0$.  Having defined the dynamics of the system (from a single Pasquali patch), the dynamics at $t_2$ can be theoretically described by $P^2$.  We can therefore ascertain the probability that we will find the boat at $t_2$ along the width of the canal.  In fact, at $t_n$, $n$ very large, we can ascertain the probability that the boat will be at any position along the width.  It should be close to $P^\infty$.  More importantly, a great distance from the origin (any distance, not necessarily a distance $n \cdot \Delta t_n$), the position probability is aptly described by $P^\infty$.  See Figure 1 and Figure 2.

We can experimentally create Pasquali patch and use it for prediction. We can perform the measurement at an arbitrary distance.

//

We can then use Pasquali patch powers for position prediction down the canal, at position n times delta t down the origin.

This simple thought experiment brings about several questions. What if the dynamics of the surface system are described by the Pasquali patch, but at points which are not a distance $\Delta t_n$ apart? In other words, what if the description is apt but at points that are not linear in distance? This curious situation suggests a time anomaly, and therefore a manner in which we can measure time warps (by measuring the actual time differences between Pasquali patches). See Figure 3.

In this schematic a Pasquali patch and its powers do describe the system, but at non-equidistant points. The arrow of time is warped.

Next, we looked at the surface dynamics of the system. If we add a depth variable to the canal, we can in theory produce a Pasquali cube, which would measure the dynamics of any point on the $[0,1] \times [0,1]$ cross-section a discrete distance down the canal (and any distance very far from our origin).

A third question arises when we consider the same canal, but whose width opens by a scalar (linear) amount a distance from our chosen origin.  There is no reason we cannot “renormalize” the width (set it equal to 1 again) at a point some set distance from our chosen origin, and proceed with our analysis as before.  See Figure 4.

In this schematic the width of the canal grows linearly, but Pasquali patch dynamics are conserved, suitably scaled.

In a subsequent post I'd like to reconsider the slit experiment under this new light, see where it takes us.

## On Patchixes and Patches - or Pasqualian Matrixes - (RWLA,MCT,GT,AM Part II)

For the last few months I have been thinking about several very curious properties of patchixes and patches (mentioned here); in particular, having studied patch behavior in a "continuous Markov chain" context, and, at having been drinking a bowl of cereal and  observing the interesting movement of the remaining milk, it hit me: a patch could certainly describe milk movement at particular time steps.  It is my hope to try to elucidate this concept a little better here today.  In particular, I think I have discovered a new way to describe waves and oscillations, or rather, "cumulative movement where the amount of liquid is constant" in general, but, in my honest belief, I think this new way and the old way converge in limit (this based on my studies, here and here, or discrete Markov chains at the limit of tiny time steps, so that time is continuous), although it is a little bit unclear to me how at the moment.  It is my hope that this new way not only paves the way for a new and rich field of research, but I foresee it clarifying studies in, for example, turbulence, and, maybe one day, Navier-Stokes related concepts.  This last part may sound a little lofty and ambitious, but an approach in which, for example, vector fields of force or velocity need to be described for every particle and position of space, with overcomplicated second and third order partial derivatives, is in itself somewhat ambitious and lofty, and often prohibitive for finding exact solutions;  perhaps studying particle accumulations through a method of approximation, rather than individual particles, is the answer.

I want to attempt to describe the roadmap that led me to the concept of a patchix (pasqualian matrix) in the first place; it was in the context of discrete Markov chains.  Specifically, I thought that, as we study linear algebra, for a function or transformation $T(\textbf{v})$, with $\textbf{v}$ is an n-vector with $n$ entries (finite), we have $T$ can be described succinctly by an $n \times n$ matrix.  Such a matrix then, converts $\textbf{v}$ into another n-dimensional vector, say $\textbf{w}$.  This field is very well studied of course: in particular, invertible transformations are very useful, and many matrixes can be used to describe symmetries, so that they underlie Group Theory:

$\textbf{v} \underrightarrow{T} \textbf{w}$

Another useful transformation concept resides in $l_2$, the space of sequences whose lengths squared (dot product with itself) converge, that was used, for example by Heisenberg, in quantum mechanics, as I understand it.  For example, the sequence $x_1 + x_2 + \ldots$ can be transported to another $y_1 + y_2 + \ldots$ via $T$, as by $T(x_1 + x_2 + \ldots) = y_1 + y_2 + \ldots$.  Key here then was the fact that $x_1^2 + x_2^2 + \ldots$ converged, so that $\sqrt{x_1^2 + x_2^2 + \ldots}$, the norm, is defined.  Also the dot product $x_1 y_1 + x_2 y_2 + \ldots$ converges (why?).  Importantly, however, this information points in the direction that a transformation matrix could be created for $T$ to facilitate computation, with an infinite number of entries, so that indeed a sequence is taken into another in this space in a manner that is easy and convenient.  I think this concept was used by Kolmogorov in extending Markov matrices as well, but I freely admit I am not very versed in mathematical history.  Help in this regard is muchly appreciated.

In function space such as $C^{\infty}[0,1]$, the inner product of, say, f(x) with g(x) is also defined, as $\langle f(x), g(x) \rangle = \int_0^{1} f(x) \cdot g(x) dx$, point-wise continuous multiplications of the functions summed absolutely convergently (which results from the integral).  Then the norm of $f(x)$ is $\sqrt{\langle f(x), f(x) \rangle} = \sqrt{\int_0^{1} f(x)^2 dx}$.  The problem is of course no convenient "continuous matrix" that results in the transform $T(f(x)) = g(x)$, although transforms of a kind can be achieved through a discrete matrix, if its coefficients represent, say, the coefficients of a (finite) polynomial.  Thus, we can transform polynomials into other polynomials, but this is limiting in scope in many ways.

The idea is that we transform a function to another by point-wise reassignment: continuously.  Thus the concept of a patchix (pasqualian matrix) emerges, we need only mimic the mechanical motions we go through when conveniently calculating any other matrix product.  Take a function $f(x)$ defined continuously on $[0,1]$, send $x \rightsquigarrow 1-y$ so that $f(1-y)$ is now aligned with the y-axis. From the another viewpoint, consider $f(1-y)$ as $f(1-y,t)$ so that, at any value of $t$, the cross-section looks like $f$.  Define a patchix $p(x,y)$ on $[0,1] \times [0,1]$.  Now "multiply" the function (actually a patchix itself from the different viewpoint) with the patchix as $\int_{0}^{1} f(1-y) \cdot p(x,y) dy = g(x)$ to obtain $g(x)$.  The patchix has transformed $f(x) \rightsquigarrow g(x)$ as we wanted.  I think there are profound implications from this simple observation; one may now consider, for example, inverse patchixes (or how to get $g(x)$ back to $f(x)$, identity patchixes, and along with these one must consider what it may mean, as crazy as it sounds, to solve an infinite (dense) system of equations; powers of patchixes and what they represent; eigenpatchixvalues and eigenfunctionvectors; group theoretical concepts such as symmetry groups the patchixes may give rise to, etc.

As much as that is extremely interesting to me, and I plan on continuing with my own investigations, my previous post and informal paper considered the implications of multiplying functions by functions, functions by patchixes, and patchixes by patchixes.  Actually I considered special kinds of patchixes $p(x,y)$, those having the property that for any specific value $y_c \in [0,1]$, then $\int_0^1 p(x,y_c) dx = 1$.  Such special patchixes I dubbed patches (pasqualian special matrixes), and I went on to attempt an extension of a Markov matrix and its concept into a Continuous Markov Patch, along with the logical extension of the Chapman-Kolmogorov equation by first defining patch (discrete) powers (this basically means "patchix multiplying" a patch with itself).  The post can be found here.

So today what I want to do is continue the characterization of patches that I started.  First of all, emulating some properties of the Markov treatment, I want to show how we can multiply a probability distribution (function) "vector" by a patch to obtain another probability distribution function vector. Now this probability distribution is special, in the sense that it doesn't live in all of $\mathbb{R}$ but in $[0,1]$.  A beta distribution, such as $B(2,2) = 6(x)(1-x)$, is the type that I'm specifically thinking about. So suppose we have a function $b(x)$, which we must convert first to $b(1-y)$ in preparation to multiply by the patch.  Suppose then the patch is $p(x,y)$ with the property that, for any specific $y_c$, then $\int_0^1 p(x,y_c) dx = 1$.  Now, the "patchix multiplication" is done by

$\int_0^1 b(1-y) \cdot p(x,y) dy$

and is a function of $x$.  We can show that this is indeed a probability distribution function vector by taking the integral for every infinitesimal change in $x$, and see if it adds up to one, like this:

$\int_0^1 \int_0^1 b(1-y) \cdot p(x,y) dy dx$

If there is no issue with absolute convergence of the integrals, there is no issue with the order of integration by the Fubini theorem, so we have:

$\int_0^1 \int_0^1 b(1-y) \cdot p(x,y) dx dy = \int_0^1 b(1-y) \int_0^1 p(x,y) dx dy$

Now for the inner integral, $p(x,y)$ adds up to 1 for any choice of $y$, so the whole artifact it is in effect a uniform distribution in $[0,1]$ with value 1 (i.e., for any choice of $y \in [0,1]$, the value of the integral is 1).  Thus we have, in effect,

$\int_0^1 b(1-y) \int_0^1 p(x,y) dx dy = \int_0^1 b(1-y) \cdot u(y) dy = \int_0^1 b(1-y) (1) dy$

for any choice of $y$ in $[0,1]$, and that last part we know is 1 by hypothesis.

Here's a specific example:  Let's declare $b(x) = 6(x)(1-x)$ and $p(x,y) = x + \frac{1}{2}$.  Of course, as required, $\int_0^1 p(x,y) dx = \int_0^1 x + \frac{1}{2} dx = (\frac{x^2}{2} + \frac{x}{2}) \vert^1_0 = 1$ .  So then $b(1-y) = 6(1-y)(y)$, and by "patchix multiplication"

$\int_0^1 b(1-y) \cdot p(x,y) dy = \int_0^1 6(1-y)(y) \cdot \left(x + \frac{1}{2} \right) dy = x + \frac{1}{2}$

Thus, via this particular patch, the function of $b(x) = 6(x)(1-x) \rightsquigarrow c(x) = x + \frac{1}{2}$, point by point.  Which brings me to my next point.

If $p(x,y)$ is really solely a function of $x$, then it follows that $b(x) \rightsquigarrow p(x)$ any initial probability distribution becomes the patch function distribution (from the viewpoint of a single dimension, than two).  Here's why:

$\int_0^1 b(1-y) \cdot p(x,y) dy = \int_0^1 b(1-y) \cdot p(x) dy = p(x) \int_0^1 b(1-y) dy = p(x)$

I think, of course, a lot more interesting are patches that are in fact functions of both $x$ and of $y$.  There arises a problem in constructing them.  For example, let's assume that we can split $p(x,y) = f(x) + g(y)$.  Forcing our requirement that $\int_0^1 p(x,y) dx = 1$ for any $y \in [0,1]$ means:

$\int_0^1 p(x,y) dx = \int_0^1 f(x) dx + g(y) \int_0^1 dx = \int_0^1 f(x) dx + g(y) = 1$

which implies certainly that  $g(y) = 1 - \int_0^1 f(x) dx$ is a constant since the integral is a constant.  Thus it follows that $p(x,y) = p(x)$ is a function of $x$ alone.  Then we may try $p(x,y) = f(x) \cdot g(y)$.  Forcing our requirement again,

$\int_0^1 p(x,y) dx = \int_0^1 f(x) \cdot g(y) dx = g(y) \int_0^1 f(x) dx = 1$

means that $g(y) = \frac{1}{\int_0^1 f(x) dx}$, again, a constant, and $p(x,y) = p(x)$ once more.  Clearly the function interactions should be more complex, let's say something like: $p(x,y) = f_1(x) \cdot g_1(y) + f_2(x) \cdot g_2(y)$.

$\int_0^1 p(x,y) dx = g_1(y) \int_0^1 f_1(x) dx + g_2(y) \int_0^1 f_2(x) dx = 1$

so that, determining three of the functions determines the last one, say

$g_2(y) = \frac{1-g_1(y) \int_0^1 f_1(x) dx}{\int_0^1 f_2(x) dx}$ is in fact, a function of $y$.

Let's construct a patch in this manner and see its effect on a $B(2,2)$.  Let $f_1(x) = x^2$, and $g_1(y) = y^3$, and $f_2(x) = x$, so that

$g_2(y) = \frac{1 - g_1(y) \int_0^1 f_1(x) dx}{\int_0^1 f_2(x) dx} = \frac{1 - y^3 \int_0^1 x^2 dx}{\int_0^1 x dx} = \frac{1 - \frac{y^3}{3}}{\frac{1}{2}} = 2 - \frac{2y^3}{3}$

and $p(x,y) = x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right)$.

So now the "patchix product" is

$\int_0^1 6(1-y)(y) \cdot \left(x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) \right) dy = \frac{x^2}{5} + \frac{28x}{15}$ which is a probability distribution on the interval $[0,1]$ and, as a matter of check, we can integrate with respect to $x$ to obtain 1.  Thus the probability distribution function $6(x)(1-x)$ is carried, point by point, as $6(x)(1-x) \rightsquigarrow \frac{x^2}{5} + \frac{28x}{15}$ which, quite frankly, is very amusing to me!

From an analytical point of view, it may be interesting or useful to see what happens to the uniform distribution on $[0,1]$ when it's "patchix multiplied" by the patch above.  We would have:

$\int_0^1 u(y) \cdot \left(x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) \right) dy = \int_0^1 (1) \cdot \left(x^2 y^3 + x \left(2 - \frac{2y^3}{3} \right) \right) dy = \frac{x^2}{4} + \frac{11x}{12}$

so that $u(x) \rightsquigarrow \frac{x^2}{4} + \frac{11x}{12}$.

In my next post, I want to talk about more in detail about "patchix multiplication" of, not a probability distribution on [0,1] vectors by a patch, but of a patch by a patch, which is the basis of (self) patch powers: with this I want to begin a discussion on how we can map oscillations and movement in a different way, so that perhaps we can trace my cereal milk movement in time.

Categories:

## On revolutionizing the whole of Linear Algebra, Markov Chain Theory, Group Theory... and all of Mathematics

I have been so remiss about writing here lately!  I'm so sorry!  There are several good reasons for this, believe me.  Among them: (1) I have been enthralled with deciphering a two hundred-year old code, the Beale cipher part I, with no substantial results except several good ideas that I may yet pursue and expound on soon here.  But this post is not intended to be about that.  (2) My computer died around December and I got a new one and I hadn't downloaded TEX; I used this as an excuse not to write proofs from Munkres's Topology chapter 1, and so, I have added none.  I slap myself for this (some of the problems are really boring, although they are enlightening in some ways, I have to admit, part of the reason why I began doing them in the first place). (3) The drudgery of day to day work, which is soooo utterly boring that it leaves me little time for "fun," or math stuff, and my attention being constantly hogged by every possible distraction, at home, etc.  Anyway.

For a few months now I have been reading a lot on Markov chains because they have captured my fancy recently (they are so cool), and in fact they tie in to a couple projects I've been having or been thinking about.  I even wrote J. Laurie Snell because a chapter on his book was excellent (the one on Markov chains) with plenty of amazing exercises that I really enjoyed.  In looking over that book and a Schaum's outline, a couple questions came to my head and I just couldn't let go of these thoughts; I even sort of had to invent a concept that I want to describe here.

So in my interpretation of what a Markov chain is, and really with zero rigor, consider you have $n < \infty$ states, position yourself at $i$.  In the next time period, you are allowed to change state if you want, and you will jump to another state $j$ (possibly $i$) with probability $p_ij$ (starting from $i$).  These probabilities can be neatly summarized in a finite $n \times n$ matrix, with each row being a discrete distribution of your jumping probabilities, and therefore each row sums to 1 in totality.  I think it was Kolmogorov who extended the idea to an infinite matrix, but we must be careful with the word "infinite,"  as the number of states are still countable, and so they are summarized by an $\infty \times \infty$ countably infinite matrix.  Being keen that you are, dear reader, you know I'm setting this question up:  What would an uncountably infinite transition probability matrix look like?  No one seems to be thinking about this, or at least I couldn't find any literature on the subject.  So here are my thoughts:

The easiest answer is to consider a state $i$ to be any of the real numbers in an interval, say $[0,1]$, and to imagine such a state can change to any other state on such a real interval (that is isomorphic to any other connected closed interval of the same type, as we may know from analysis).  This is summarized by a continuous probability distribution on $[0,1]$, whose sum is again 1; a good candidate is a beta function, such as $6 x (1-x)$, with parameters (2,2).  I think we can "collect" such probability distributions continuously on $[0,1] \times [0,1]$: a transition probability patch, as I've been calling it.   It turns out that it becomes important, if patches are going to be of any use in the theory, to be able to raise the patch to powers (akin to raising matrixes to powers), to multiply patches by (function) vectors and other tensors, and to extend the common matrix algebra to conform to patches; but this is merely a mechanical problem, as I describe in the following pdf.  (Comments are very welcome, preferably here on the site!).

CSCIMCR

As you may be able to tell, I've managed to go quite a long ways with this, so that patches conform reasonably to a number discrete Markov chain concepts, including a patch version of the Chapman-Kolmogorov equations; but having created patches, there is no reason why we cannot extend the idea to "patchixes" or continuous matrixes on $[0,1] \times [0,1]$ without the restriction that each row cross-section sum to 1; in fact it seems possible to define identity patchixes (patches), and, in further work (hopefully I'll be involved in it), kernels, images, eigenvalues and eigenvectors of patchixes, commuting patchixes, commutator patchixes, and a slew of group theoretical concepts.

Having defined a patchix, if we think of the values of the patchix as the coefficients in front of, say, a polynomial, can we not imagine a new "polynomial" object that runs through exponents of say $x$ continuously between $[0,1]$ with each term being "added" to another? (Consider for example something like $\sum_i g(i)x^i, i \in [0,1]$?)  I think these are questions worth asking, even if they are a little bit crazy, and I do intend to explore them some, even if it later turns out it's a waste of time.

Categories:

## On the National Mexican Lottery, II

The jackpot this week is up at 300 million (MXP)! A few million more and this game turns into a fair or favorable game, how cool!

In my last post, I mentioned that some of my friends and students said that one could be really lucky and win the jackpot if one bought the first few tickets and won:

"Unless he got truly lucky and won the jackpot before he spent too much, as in buying the first few tickets, and then quit, they argued!"

And there wasn't really much I could say about it.  It is true after all that one could be so lucky, with minuscule probability.  However, "what is the average number of tickets you have to buy before you win the first time, if you buy the 6-choice/7-choice/etc. repeatedly?" was a question that people kept asking me with some insistence.  Although to me it seemed somewhat evident that you needed to buy about $\binom{56}{6}$ tickets on average for the 6-choice, my friends and students weren't convinced until I showed them the mathematics that supported this.

It is really not difficult to calculate such if one understands what expected value is.  So let us assume the words

LLLLLLLLLW

LW

W

LLLLLLLLLLLLLLW,

etc., are Bernoulli-trial strings (there really are two possibilities, the binary win or lose), and they are allowable if we stop after the first win.  For the 6-choice, each word has probability $(1 / \binom{56}{6}) \cdot (1 - (1 / \binom{56}{6}))^{n-1}$ because the nth win is preceded by $n - 1$ losses.

The expected value is:

$(1 / \binom{56}{6}) \sum_{n=1}^{\infty} n \cdot (1 - (1 / \binom{56}{6}))^{n-1}$

One recognizes this as a convergent geometric series* (all probabilities are less than one so they lie inside the radius of convergence), and thus the above equals

$(1 / \binom{56}{6}) \cdot \frac{1}{(1 / \binom{56}{6})^2} = \binom{56}{6}$,

the sum having been substituted adequately.  Confirming my "far-out" claim (to me really unsurprisingly), you have to wait an average of $\binom{56}{6}$ or about thirty-two million tickets before you'll see the first win.

This idea can be extended for the 7, 8, 9, 10-choice and so on.

*NB

The series representation of the function

$\frac{1}{(1-x)^2} = \sum_{n=0}^{\infty} n \cdot x^{n-1}$ with radius of convergence $-1 < x < 1$.

This is obtainable by taking the derivative of the series representation for:

$\frac{1}{1-x} = \sum_{n=0}^{\infty} x^n$ with radius of convergence $-1 < x < 1$.

Categories:

## On the National Mexican Lottery, I (a Cool Combinatorial Identity)

I sometimes help people to prepare for any of the plethora of standardized tests required for everything academic, and one of my students (now applying for a Fulbright), also a high school friend, while on the improbable topic (since it scarcely appears in the general exams) of probability, asked me about the likelihood of winning the most popular game of chance by the National Mexican Lottery (Julio is naturally curious, but these hard times surely provide an additional motivation!).  In the beginning he phrased it thusly: you have a piece of paper with fifty-six numbers and you can pick six of them.  Then, at the lottery, if the six balls match your chosen six, you win the grand prize.  Naturally, I replied almost without thinking, that the probability of winning was

$\frac{1}{\binom{56}{6}}$,

or one in about thirty million.

This follows from basic considerations in combinatorics and probability: suppose you can fill six slots. In the first slot, you can place any one of the 56 numbers.  Having chosen one number in the first slot, there are 55 left that can go into the second slot and so on... 54... 53, 52, and finally 51 remaining numbers can go in the last slot.  By the counting principle, the number of arrangements is simply then $56 \cdot 55 \cdot ... \cdot 51 = \frac{56!}{50!}$.  Since the ordering doesn't matter, and there are 6! ways in which, having chosen a particular configuration of six, such can be ordered differently, the total number of arrangements must be modded by 6!.  Thus, we obtain $\frac{56!}{50!6!}$ possible outcomes.  This is really the definition of the combinatorial operation "choose:"  $\binom{n}{s} = \frac{n!}{s! (n-s)!}$.

My friend Julio then told me that there was the option of buying an extra choice.  In other words, rather than the 6 basic choices available, you could purchase 7 (the lottery would still pick 6 of 56 balls).  In fact, he said, you can purchase 8, 9, and even 10 choices.  He wanted to know how much his probability of win had increased in each of the cases.  He was thinking of buying a ticket or several, especially because at the time (last week or so) the jackpot was 206 million pesos or about 18 million dollars (now it's at 240 mill MXP, or about 21 mill USD, having there been no winner), and was interested in knowing what strategy maximized his probability of winning.  I thought to myself: "Hmm... lotteries aren't usually fair games... but let's try it out and see if we can figure something with the power of mathematics."  Admittedly at first I was stumped... I had to think this through a bit! It wasn't as obvious as you might have thought from the first derivation.  In the end I reasoned it as follows:

The fact that now I can purchase 7 options means, by the above reasoning that I can choose in  $\binom{56}{7}$ ways seven numbers, ordering not mattering.  This is my sample space.  Now, the thing is that out of these, there are six set or marked-for-win balls, so if I assume I have actually chosen them and I will win, there is a remaining one-number that can be wrong, and it can be any of 50 numbers.  There are $\binom{50}{1}$ to choose such.  My probability of win is therefore $\frac{\binom{50}{1}}{\binom{56}{7}}$.

For eight options, a similar argument means that my sample space is $\binom{56}{8}$.  With six balls set to win, I have two balls that can be wrong, or $\binom{50}{2}$ ways in which I can be right.  The probability of winning in this case is $\binom{50}{2}/\binom{56}{8}$.

For nine options, the probability is $\binom{50}{3}/\binom{56}{9}$, and for ten it is $\binom{50}{4}/\binom{56}{10}$.

Generally speaking and by the above argument, I thought, if I have a set of n balls to choose from, with s marked to win, and the possibility of choosing r, with $r \geq s$, it must be that the probability of win is therefore:

$\binom{n-s}{r-s} / \binom{n}{r}$        (A)

Happy at my apparent triumph, it never occurred to me that there was another way to argue the matter at all.  In fact I would soon discover there was a simpler way to think about it!  I began concocting a table of probabilities to show Julio, and, as my sister does, she meanders circuitously and then into my room, finally asking about what I'm doing.

A genius engineer like she is, my sister tends to think of things in a lot simpler and efficient ways than I can ever possibly.  I think it is a blessing to have someone like my sister.  In so many ways she's very much like me, but also so dissimilar, and so she comes up with different considerations on a problem... such as sometimes lead to tiny discoveries, like the identity I'll be proving in a bit.  By working on the problem of winning probabilities, she argued the following: there are $\binom{56}{6}$ possible outcomes of choosing six balls from fifty six.  If I have seven slots to choose six correct balls, then there are $\binom{7}{6}$ ways I can do this.  If there are eight slots, there are $\binom{8}{6}$ ways to do this, and so on.

In general, my sister's argument for the probability of win can be expressed as:

$\binom{r}{s} / \binom{n}{s}$       (B)

The most interesting thing about this exchange is that it happened in a matter of minutes... such is opportunity.  Thrilling, amusing, and... evanescent.  Kind of like life.

The Pasquali-Pasquali combinatorial identity.  I'm calling it like this temporarily because, despite my efforts, I have been unable to find it explicitly in this form in either combinatorics or probability texts.  Unfortunately, I haven't access to scholarly mathematics magazines, but I'm much grateful to my readership if they would point me to a proper reference.  In the meantime, it's nice that this identity has its motivation in a real-world combinatorial argument.

$\binom{n-s}{r-s} \cdot \binom{n}{s} = \binom{n}{r} \cdot \binom{r}{s}$, with $n \geq r \geq s \geq 0$.

Proof.

By the definition of the choice operation,

$\binom{n-s}{r-s} \cdot \binom{n}{s} = \frac{(n-s)!}{(r-s)!(n-r)!} \cdot \frac{n!}{s!(n-s)!} = \frac{n!}{(r-s)!(n-r)!s!} =$

$= \frac{n! r!}{r!(n-r)!s!(r-s)!} = \frac{n!}{r!(n-r)!} \cdot \frac{r!}{s!(r-s)!} = \binom{n}{r} \cdot \binom{r}{s} \verb| | \Box$

I am sure I have read this interesting datum about the National Mexican Lottery somewhere, but I cannot pinpoint exactly from what book: it is the number one (or two) source of income of the Mexican government.  Everybody plays this game of chance, in the hopes of becoming millionaires from one day to the next; such is the Mexican inclination, such is the Mexican character: sensation-craving.  A real desire to change an otherwise ordinary existence.

Although it is true, as will be seen in the file I'll be linking to, that the probability of win is increased by 7 times if one purchases the 7th extra choice (as compared to the base case of 6 choices), by 28 times if one purchases the 8th choice, 84 times for the 9th, 210 times for the tenth, and so on... this can only be leading or encouraging pieces of information (in fact the National Lottery publishes these probabilities in the hopes of persuading people to play).  Firstly, it is still extremely improbable to win. Secondly, what one must really focus on is expected profit.  Negative expected profit means that, if you play repeatedly, on average, you will be losing money despite the occasional win (!).  Such are called "unfair" games, because money goes out of your pocket and into the coffers of the House. "Fair" games are those in which the expected profit is equal to zero, and "favorable" if expected profit is positive, as you are in effect winning money on average.

The point at which this particular game is "fair" is in reality determined by the cost of the ticket.  The normal 6 choice ticket is 15 pesos, but the expected profit on the ticket is about -9 pesos:

$P = (1 / \binom{56}{6}) \cdot (206,000,000 - 15) + (1 - (1 / \binom{56}{6})) \cdot (-15)$

To be fair, the jackpot would have to be around 488 million pesos:

$0 = (1 / \binom{56}{6}) \cdot (J - 15) + (1 - (1 / \binom{56}{6})) \cdot (-15)$

Above 488 million, it is really to your advantage to buy as many (different-combination) tickets as possible.  My suggestion to Julio and some other very interested friends was to wait until the jackpot accumulated about 488 million pesos, so that they would have a real chance of earning some money.

Julio conceded, but my not very mathematically inclined friends and students complained.

First of all, of course it would never reach that much!  It has never been 206 mill (let alone now 240 mill MXP), and someone was SURE to buy tickets until they got it.  This was a golden opportunity, you see.  I replied that they forgot what expected profit meant: the individual in question would spend more money than the jackpot before he won, most probably, and that if he kept at it provided the same jackpot on average no matter how many times he won he would still be losing money.  Unless he got truly lucky and won the jackpot before he spent too much, as in buying the first few tickets, and then quit, they argued!  I had to concede, but I also offered another solution: boycott this particular game of chance until the ticket cost descends to a price that would make the game fair.  In this case, the jackpot would have to remain at 206 mill (or 240 mill this week), and the ticket price would have to go down to about 6 (and something) pesos.

Everyone groaned!  I was in effect suggesting not to play the game, but that was not it at all.  I was merely suggesting playing the game when circumstances were more favorable, or to go into the game with the mindset of not winning.  "The game is a game of chance you are sure to lose.  Buy the ticket for social reasons, because your friends are doing it, because you like the thrill of choosing 6 numbers... or what have you.  But not because you think you're lucky and you are sure you will win, because in effect it's exactly the opposite.  The National Lottery is smart!"  My statement and somewhat lopsided grin allowed my friends a way out.

They played, and lost.

Would it have been better if they had purchased 7, 8, 9, or 10 options? The answer is no.  The price of buying an extra option is determined by the size of the fair jackpot, in this case about 488 mill (based on the 15 peso 6 choice ticket), and is proportional to the probability of winning:

Example, determining the 7 choice fair price at 488 mill (based on the 15 peso 6 choice ticket):

$0 = (\binom{7}{6} / \binom{56}{6}) \cdot (488,000,000 - T_7) + (1 - (\binom{7}{6} / \binom{56}{6})) \cdot (-T_7)$

Buying 7, 8, 9 or 10 options has an escalating negative expected profit respectively, at 206 mill (at anything less than 488 mill, really).  It's like being penalized more harshly and more harshly for wanting to better your chances of win!

Example, determining the expected profit having solved for $T_7$ above, 7 choice ticket:

$P = (\binom{7}{6} / \binom{56}{6}) \cdot (206,000,000 - T_7) + (1 - (\binom{7}{6} / \binom{56}{6})) \cdot (-T_7)$

So you really are better off and losing less money if you play the 6-choice for fun  (actually as infrequently as possible).  Since you are going to lose anyway, better lose less money than more money, is what I say!

Categories: