### Archive

Posts Tagged ‘statistics’

## On Scaled Patches and Time-Warping

Suppose we have an idealized canal of width $1$, on which a fluid flow has been established in some remote past. Let us focus solely on the dynamics of the surface.  Pick a spot along the canal which we will call $t_0$.  Next pick a spot $y_0 \in [0,1]$ along the width of the canal, which we will monitor.  Pick a second spot $t_1$ down the canal, some distance from the original spot we picked.  Now let us assume that, up the canal at some remote point, a paper boat has been released.  We will only care about the boat if the boat passes through $(t_0, y_0)$ which we have picked, and we will write down the resultant position at $t_1$.  Let us do this a number of times with any number of boats, and obtain a distribution of the position of the boat at $t_1$, saving it. Next let us repeat the experiment, this time focusing on $y_0^\prime$, save the resultant distribution at $t_1$, and so on and so forth, until we are comfortable having mapped the totality of positions at $t_0$.  Let us next put together (stack, respecting the natural order of $y$) all the distributions we obtained at $t_1$.  We now have a discrete surface which we can smooth to obtain a Pasquali patch.

Let us now look at position $t_2$ which is the same distance as $t_1$ is from $t_0$.  Having defined the dynamics of the system (from a single Pasquali patch), the dynamics at $t_2$ can be theoretically described by $P^2$.  We can therefore ascertain the probability that we will find the boat at $t_2$ along the width of the canal.  In fact, at $t_n$, $n$ very large, we can ascertain the probability that the boat will be at any position along the width.  It should be close to $P^\infty$.  More importantly, a great distance from the origin (any distance, not necessarily a distance $n \cdot \Delta t_n$), the position probability is aptly described by $P^\infty$.  See Figure 1 and Figure 2.

We can experimentally create Pasquali patch and use it for prediction. We can perform the measurement at an arbitrary distance.

//

We can then use Pasquali patch powers for position prediction down the canal, at position n times delta t down the origin.

This simple thought experiment brings about several questions. What if the dynamics of the surface system are described by the Pasquali patch, but at points which are not a distance $\Delta t_n$ apart? In other words, what if the description is apt but at points that are not linear in distance? This curious situation suggests a time anomaly, and therefore a manner in which we can measure time warps (by measuring the actual time differences between Pasquali patches). See Figure 3.

In this schematic a Pasquali patch and its powers do describe the system, but at non-equidistant points. The arrow of time is warped.

Next, we looked at the surface dynamics of the system. If we add a depth variable to the canal, we can in theory produce a Pasquali cube, which would measure the dynamics of any point on the $[0,1] \times [0,1]$ cross-section a discrete distance down the canal (and any distance very far from our origin).

A third question arises when we consider the same canal, but whose width opens by a scalar (linear) amount a distance from our chosen origin.  There is no reason we cannot “renormalize” the width (set it equal to 1 again) at a point some set distance from our chosen origin, and proceed with our analysis as before.  See Figure 4.

In this schematic the width of the canal grows linearly, but Pasquali patch dynamics are conserved, suitably scaled.

In a subsequent post I'd like to reconsider the slit experiment under this new light, see where it takes us.

## On Utilities Consumption I: Water

So more or less since I came back to Mexico I've been saving my water bills, just for fun.  "Eventually," I thought to myself, "I may be able to do something with them."   Of course by that I meant that sixty years later I would have enough data points to discern some fascinating trends, including those detailing draughts and relative water abundance, and I would absolutely be able to use Viterbi algortithms or some Markovian insight to predict next year's weather, oh, and the price of potable water.  I'm the impatient kind, and have accumulated only about four (sometimes five) year's worth of data.  Here it is; I have plotted my cubic-meter consumption on a month-to-month basis (the bills come in monthly).  I've assumed that I use the water that I need.  Also that the trends represent a fairly typical (there may be some argument here, I predict) three-person household consumption (I don't live alone).  And yes, admittedly, sometimes there are sisterly visits in December and there's that factor to take into account, but oh well!  Let's just hypothesize my water consumption is fairly typical for a 3 or 4 person household in this geographical region in Mexico, shall we?

I've plotted the average consumption, and the two-sigma 95% bounds I calculated using Bessel's correction of the standard deviation for samples, aka "sample standard deviation": $\sqrt{\frac{1}{N-1}\sum_{i=1}^{N}\left(x_i - \overline{x} \right)^2 }$.  The correction gives an unbiased variance, even though the standard deviation is slightly underbiased... not that it matters much anyway.

An interesting detail is that September seems to me the more precise.  My friend Ben objects to my use of the word "precise" in such a way.  He's a physicist. He pointed out to me that what I meant was variance.  He kept going on about how precision applies to instrumentation, and how the readings would have a precision estimate that would be reported alongside.  I countered that it probably did have an implied precision, because the measure is (surely) to significant figures, and so, a reading of 34, really meant anywhere between 33 and 35, as per the usual rules. Also, I told him his interpretation of precision did not matter in this my particular case.  A not so interesting debate ensued, culminating in our agreeing that precision is dependent on context. My argument was that if my monthly data points represent estimates of an "actual" (fictitious) consumption for that month, then indeed my spread indicates precision (where accuracy would indicate how close I came to the "actual" consumption). Anyway.  The debate was illuminating in some ways, but banal in many others.

The wide fluctuation in March-April was probably due to a small leak I had in 2008 that I corrected immediately, although I did interpolate some values that were missing for 2007 and 2008 right around that time too. I'm estimating my consumption for October to be about 15 cubic meters, the (unbiased) sample (arithmetic) average for that month.

## On Auctions, Part III and On Pricing, Part VI - (On Diversity)

Before going into details respecting the weighted function $\mu_*$ and the variance $\sigma^2_*$, I was thinking going a little bit into the mix of individuals at an auction or several auctions.  I've been loosely categorizing the types as clueless or "laymen," "in-betweens," and "experts".  The number of subdivisions is up to anybody, but three is a practical and manageable number to me.  Let's suppose I have access to the data as before for P1...Pn individuals at auction A1: $\{\mu_i, \sigma^2_i\}$.  Let's suppose then that you can be called an "expert" at any auction if you believe your quote $\mu_i$ is correct to within plus or minus (0-10]%, an "in-between" if you think it is correct to within between (10 and 50]% above or below, and a layman if you think your quote is correct within more than (50-100]% above or below.  These percentages can be translated back to appropriate bounds of variances and so we can place each individual's variance in one of the three categories.  If we count up the proportion of variances lying in each "box" ($p_k, k=1...3; \sum p_k=1$) we can then borrow from Information Theory the measure of surprise or entropy as an indicator of diversity!  This has already been done in Biological Information Theory to see how diverse in species an area is (link or reference forthcoming):

$H = -\sum_{k=1}^3 p_k log_2 p_k$

where conventionally $p_k log_2 p_k = 0$ whenever $p_k = 0$.  $H$ is maximal if the proportions across each box are equal: $p_1 = p_2 = p_3 = \frac{1}{3}$ and zero or close to zero whenever the proportion of one box is 1 or close to 1.

Therefore, we can compare several different auctions' diversity or population mix and determine whether it's attended by mostly experts, in-betweens, or laymen (by proportion) or whether there is a happy jumble of all (how "ordered" the mix is).

In fact, why the method of measuring diversity (by measuring information-theoretical entropy) is not more greatly exploited by Mankinde is really a bugging question in my mind: it can be applied everywhere!  For example, I was just at the mall and thought "Hmm, this winter season seems to be only purples and blues.  I wonder what the most diverse season in terms of color in men's shirts is... probably summer?" I also thought at the time of my visiting the mall "This measure of diversity could really be applied to scale countries in terms of mix of ethnicity or nationality - is the US most diverse because it is (purportedly) a melting pot?  Has it gotten less diverse post-9/11 with all the added restrictions on foreign nationals?" or something similar for a social-networking site or a school/university (I do wonder about my alma mater - e.g.?), or if I'm a company producing a number of different SKUs, "did I produce many different SKUs or more of a specific type?" or for a chain of restaurants one could determine whether the population is within an age range at a location compared to another (others) more uniform, or does the population at Chain 1 request more of a particular kind of menu item or it's more or less evenly distributed amongst all menu items, or does it peak by days or months of the year, etc.  These questions answered numerically can then help decide whether "I should buy more ingredients of this specific type, during such-and-such period."

Categories:

## Some Thoughts On Pricing, Part IV

On the other hand, if instead of making the variance or standard deviation tight we allow it to be relaxed, the same Gaussian distribution becomes more and more like a uniform distribution over the entire real domain.  If there is one other company competing against me, and it's a real coin toss regarding how F1 will price, it pays for me to price above the mean given my belief of the mean and standard deviation.  Recall that for a uniform distribution expected profit looks like an inverted parabola, and great uncertainty around ten pesos will occasion my expected profit to look just similarly so.

Something akin happens when there are 2, 3 companies competing against me, except the maximum of this "inverted parabola" (it actually isn't parabolic, but sort of) is closer to 10 (the mean) and my maximum expected profit is lesser.  When there is a lot of competition pricing all over the place it really becomes a coin toss as to where I should price, as maximum expected profit will be more or less the same regardless (in fact, close to zero).

The graphs (soon-coming) perhaps will make this more obvious.

Categories:

## Some Thoughts On Pricing, Part III

What happens when competition decides they will become organized and the price of a product is exactly 10 pesos?  If I am bound by the price too due to politics (perhaps the government itself sets the price because it has such powers) or some other factor, and I have to price myself at 10 pesos, then the rational consumer is faced with identical products at identical prices to choose from.  Perhaps he will then choose at random.  If F1 is my only competitor, he will choose me at the shops half the time.  If instead there are Fn competing companies, I will be chosen perhaps $\frac{1}{n+1}$ of the time.  My expected profit in such a situation is easily calculated as:

$E_p(10) = \frac{1}{n+1} \cdot (9) + \frac{n}{n+1} \cdot (-1) = constant$

Having established such, let us assume that I am not bound by any politics.  Then it is only obvious that I would want to price at 9.99, since this virtually guarantees that the rational consumer choosing as by the C1 axiom will pick me over any other product: I am guaranteed in effect selling 100 percent of my product, and furthermore at maximal expected profit, since selling at anything less than 9.99 would mean obtaining less for product I am sure to give away.  In terms of expected profit, we can make

$E_p(x) = x - 1$

if I price at less than ten $x < 10$, or

$E_p(x) = -1$

if I price at more than ten $x > 10$, for any amount of competition against me (does not depend on n, since everyone is pricing at 10).  Maximal expected profit is at a price of 9.99 in these cases.

I like to consider this particular example the limiting case in which the Gaussian distribution is tightly wound around 10 pesos. The tighter the certainty around ten pesos, the closer I am to the above distribution (except the definition at 10 pesos, which we reasoned in a different way).  This is because the probability of selling my product at less than 10 pesos is essentially 1, where selling at a price above 10 pesos is essentially zero.  Perhaps this can be more easily seen upon inspection of the following graphs.

First, I have graphed what happens as the certainty of F1's pricing becomes tighter and tighter (one other competing firm).

In this next graph, I have shown what happens as the certainty of F1...F5's pricing becomes tighter.

For 40 competing firms, this is what happens.

All these graphs show that indeed the limiting distribution will not depend on the number of companies competing against me as they converge or stack upon a single price quote.  The more in agreement companies are about what the price should be, the less their ability to sell (oppositely for me) and the more of the pie I can take, and my expected profit per product will balloon to an absolute maximum of 8.99 pesos (9.99 revenue - 1 cost).

For a company thinking this way we have differences of pennies across comparable products, much perhaps as Elisa remarks in her comments about pricing in Switzerland.

Categories: