### Archive

Archive for February, 2011

## On Generalizing Voting Systems and Methods

A while ago my mathematician friend Jim and I got into an email volley trying to decide which was the best voting method, and we touched upon many interesting ideas, including Arrow's theorem.  I never knew the theory of voting was so developed, much less "complicated," but it is.  Many people have come up with different ways to rank candidates as "fairly" as possible, for example, using the Schulze method.

Just recently I figured out on my own that a few voting methods can be summarized in a Markov chain, and also it would seem I have invented a ranking method that uses a dynamical equilibrium to rank candidates: in other words, the social "consensus" is the a steady-state vector extracted from a steady Markov matrix.

The idea could go like this (more really an example):

Imagine for a second that there are $n$ voters and $p$ candidates.  These $n$ voters have a definite amount of "voting stock."  It doesn't matter how much each actually owns, what matters is that at each "round" they will always trade a fixed proportion and give it to the candidates (who will (in this model) abstain from voting because they are being elected, or they will write themselves in as one of the voters).  If you are keen, I am setting this up so that what we care about is the steady-state Markov matrix, (and so actual numbers of voting stock don't have an effect).  Now at the first "round" the voters will only vote for the candidates and give them the proportion of stock they allotted to each, and the candidates on their part will "return" the stock they CURRENTLY hold EQUALLY among all the voters, abstaining from giving stock to any of the candidates (including themselves).  So what results is a $(n + p) \times (n+ p)$ matrix with entries that are zero for "states" that relate voters to voters and candidates to candidates.  An interesting oscillating Markov matrix results: high even powers show the proportion of stock that the candidates hold (steadily) when the stock is handed to them.  On the other hand, high odd powers show what voters hold steadily.  The ranking, the result of the voting, should be the steady-state even power candidate row (any).

Now the proportion handed to candidates can be fractional, or 1 (100 percent of the vote goes to a candidate).  If all voters vote with 1, then a majority ranking results.  On the other hand, fractional votes tend to accumulate in an interesting manner, as my friend Ben points out: at first I thought they accumulated as per usual geometric/convergent Markov rules, but Ben noticed that squaring the matrix actually stabilized it (for all subsequent even powers of the matrix).  This wasn't immediately obvious to me, and the reason "why" was obscure.

More specifically.  Suppose there are 3 voters, each giving 100 percent of their voting stock (or their one vote) to any of 2 candidates.  Let's say voter 1 votes for candidate A, voter 2 for candidate A, and voter 3 for candidate B.  The vector $\left[ \begin{array}{ccccc} V_1 & V_2 & V_3 & A & B \end{array} \right]$ and its transpose describe the column and rows of the matrix:

$M = \left[ \begin{array}{ccccc} 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 1 \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 & 0 \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 & 0 \end{array} \right]$

Note how the candidate rows return an "even" fraction of the vote back to the voters.  The idea again is that we are looking for the steady-state of a dynamical system, or, how the voting stock will be traded after a very long while: the ranking will be the fraction each candidate holds at equilibrium.

I used Octave (as you probably already know, I'm a big fan) to calculate matrix powers, and obtained, in fractions of a second, that

$M^{100} = \left( M^2 \right) = \left[ \begin{array}{ccccc} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 & 0 \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 & 0 \\ \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 & 0 \\ 0 & 0 & 0 & \frac{2}{3} & \frac{1}{3} \\ 0 & 0 & 0 & \frac{2}{3} & \frac{1}{3} \end{array} \right]$

Note this is an even power matrix, and what we are interested in is one of the candidate rows (which should be identical).  Candidate A receives $\frac{2}{3}$ of the vote and Candidate B receives $\frac{1}{3}$, as is logical from a simple count of the fraction of the votes.

Now, the issue with the Schulze method is it asks for an actual rank as an input from the voters (than individual votes), so I've reinterpreted to mean that, if you are a voter, you divide your voting stock and trade in at each round a given different percent to each candidate. For example, let's say that there are candidates A, B, C, D, and E, and the preference of a particular voter is C>A>B>E>D.  One issue is that we have no idea how much more one candidate is preferred over the other (a realm that belongs to the tricky subject of range voting, but I think Markov thinking summarizes most methods kind of nicely), so what I've done is I've said, OK, the voter has expressed his preference as CABED: if we divide up his 100 percent vote proportionately among the five candidates, perhaps the voter would agree to assign the vector $\left[ \begin{array}{ccccc} \frac{4}{15} & \frac{3}{15} & \frac{5}{15} & \frac{1}{15} & \frac{2}{15} \end{array} \right]$ to $\left[ \begin{array}{ccccc} A & B & C & D & E \end{array} \right]$, his highest vote goes to C, and his lowest, to D.

I tried this out with Wikipedia's example of Schulze method ranking, and obtained similar results to the Schulze rank except for close-call-rankings, and so there are two major differences with Schulze method: at steady state, my ranking method ("Pasquali method") provides an actual number that can be renormalized on a scale (say, 1 to 5 for 5 candidates), so that close calls can be differentiated, and the other difference is that close calls may be different to the Schulze rank, but perhaps made identical by appropriate tweaking of per-voter fractional-vote-vector-choosing.

I think a social function ("Pasquali social function", "Pasquali rank") of rank like the one I have invented is quite novel in considering "rank" a steady-state vector of a dynamical system, the "votes" being an allotment of "percents" done in time until a steady equilibrium is reached, at which time candidates "hold," during their part of the round (even powers of the Markov matrix), the equilibrium fractional (percent) vote.  Many other methods seek, for example, "centers of mass" for consensus rankings or estimates, where I look for the equilibrium trade at steady-state at infinite time, or a long time in the future.

Using the Wikipedia example:

Example (45 voters; 5 candidates):

• 5 ACBED (that is, 5 voters have order of preference: A > C > B > E > D)
• 8 BEDAC
• 3 CABED
• 7 CAEBD
• 7 DCEBA

the Schulze preference is E>A>C>B>D.  Expressed with the (above) fractional preference "strengths", and the appropriate vector for each voter, and then calculating the, say, 100th matrix power (or the second power), my numbers suggest the social preference: E>A>B>C>D, with fractional equilibrium vector E = 0.2178, A = 0.2119, B = 0.2030, C = 0.1985, D = 0.1689.

[**In talking to my friend Ben after I posted the original article, he made a very interesting observation: "This method you describe," he noted, "is much like one I studied freshman year in Game Theory that involves assigning $\frac{(p+1) p}{2}$ votes to $p$ candidates, and is well-researched."  The cool thing, in my opinion, is the fact that we can perhaps begin to understand these methods within the context of a dynamical system or Markov chain, and this brings together these methods under one umbrella of understanding.**]

[**My friend Ben also pointed out that squaring the matrix as I defined it actually stabilized it (at even powers), hence the parenthesis in the above matrix-power calculations.  Since this was not really obvious to me, I started working it out. Here is the little lemma that resulted.  Create the augmented Markov matrix (so that summing any row is 1) $M = \left[ \begin{array}{cc} 0 & A \\ B & 0 \end{array} \right]$, which is $(n+p) \times (n+p)$ rows by columns, and so the upper left zero matrix is $n \times n$, matrix $A$ is $n \times p$, $B$ is $p \times n$ and the lower right zero matrix is $p \times p$.  Note that the rows of $A$ and $B$ must sum to one by requirement (but they are not square matrices).  Note then that we can take out the normalizing constant factor out of every entry of $A$, that is, $\frac{p(p+1)}{2}$, so that $A = \frac{2}{p(p+1)} A^*$, with $A^*$ contains the profile assignment of one integer in $1..p \in \mathbb{Z}^+$ at each column.  As we defined it, we can pull out similarly a normalizing constant factor out of every entry of $B$, so that $B = \frac{1}{n} B^*$ with $B^*$ contains a 1 at every entry.  Next calculate $M^2 = \left[ \begin{array}{cc} AB & 0 \\ 0 & BA \end{array} \right]$, which products we can calculate because $AB$ is $(n \times p) \times (p \times n) = n \times n$ and $BA$ is $(p \times n) \times (n \times p) = p \times p$.  Now $AB = \frac{2}{p(p+1)} A^* \frac{1}{n} B^*$, but $A^* B^*$ is a matrix that at every entry has the sum of $p$ positive integers, so if we pull these out we are left with $\frac{2}{p(p+1)} \cdot \frac{p(p+1)}{2} \cdot \frac{1}{n} C$ where $C$ is a $n \times n$ matrix whose entries are all ones. Thus we are left with $\frac{1}{n} C$. $BA$ is trickier to calculate because this actually assigns the fraction of votes to each candidate. In effect we have $\frac{2}{p(p+1)} \frac{1}{n} B^* A^*$, with $D = B^* A^*$ is a $p \times p$ matrix that has each column equal to the sum of (unnormalized) votes each candidate received (repeated every row).  We have $M^2 = \frac{1}{n} \left[ \begin{array}{cc} C & 0 \\ 0 & \frac{2}{p(p+1)} D \end{array} \right]$.  Now we want to show that $M^{2m}$ is this same matrix, so let's calculate $M^4 = M^2 M^2$ by associativity of square matrices.  We have $M^4 = \frac{1}{n^2} \left[ \begin{array}{cc} C & 0 \\ 0 & \frac{2}{p(p+1)} D \end{array} \right] \cdot \left[ \begin{array}{cc} C & 0 \\ 0 & \frac{2}{p(p+1)} D \end{array} \right] = \frac{1}{n^2} \left[ \begin{array}{cc} C^2 & 0 \\ 0 & \frac{2^2}{p^2(p+1)^2} D^2 \end{array} \right]$.  Note that $C^2$ is a $n \times n$ square matrix whose entries are all $n$, or in other words $C^2 = n C$.  The difficult bit is calculating $D^2$.  First, note that every row of $D$ has the same integer profile, and also every column has the same entry (different columns have different entries), like $D = \left[ \begin{array}{cccc} a_1 & a_2 & \ldots & a_p \\ \vdots & \vdots & \vdots & \vdots \\ a_1 & a_2 & \ldots & a_p \end{array} \right]$, so that $D^2 = \left[ \begin{array}{cccc} a_1 \sum_{i = 1}^p a_i & a_2 \sum_{i = 1} ^ p a_i & \ldots & a_p \sum_{i = 1}^p a_i \\ \vdots & \vdots & \vdots & \vdots \\ a_1 \sum{i = 1}^p a_i & a_2 \sum_{i = 1}^p a_i & \ldots & a_p \sum_{i = 1}^p a_i \end{array} \right]$, or $D^2 = \sum_{i = 1}^{p} a_i D$.  We must understand the sum correctly: it is asking us to sum a row of $D$ or, from another perspective, sum (down) the columns AND the resultant row of $A^*$ (summing down was how $D$ was produced in the first place).  There is no reason why we can't exchange the order of summing, since $A^*$ is finite, so let's add its rows (right) first and then add down the resulting column.  We already know that for $A^*$ every row adds up to $\frac{p(p+1)}{2}$, and furthermore we know there are $n$ of them, so that we have $n \cdot \frac{p(p+1)}{2}$ is the result of the sum.  Thus, $D^2 = n \cdot \frac{p(p+1)}{2} D$.  If we are to rewrite $M^4$ as $\frac{1}{n^2} \left[ \begin{array}{cc} nC & 0 \\ 0 & \frac{2^2}{p^2(p+1)^2} \cdot n \cdot \frac{p(p+1)}{2} D \end{array} \right]$ and factor the $n$, we see that indeed $M^4 = M^2$.  Now of course suppose $M^{2k} = M^2$.  Then $M^{2(k+1)} = M^{2k} M^2 = M^2 M^2 = M^2$, as Ben rightly and keenly observed.**]

[**Lastly, to address Ben's correct objection that this method I've described is the normalized version of assigning $\frac{p(p+1)}{2}$ votes to $p$ candidates, and that he thoroughly studied this method in his Harvard freshman year Game Theory course, let us not forget that we can create a different dynamical assignment that stabilizes in the usual geometric manner of a Markov chain, for example by relaxing the supposition that the candidates give back 100 percent of the vote they receive at each turn to each of the n voters.  Perhaps they keep a fraction of the vote for themselves, say $\frac{1}{n + 1}$, and give back the remaining $\frac{n}{n+1}$ part of the vote equally to the $n$ voters, thus having  the $p \times p$ matrix with diagonal $\frac{1}{n+1}$ and all other entries are 0.  Since this kind of system describes a different, richer possibility or scenario, I venture to disagree with Ben, who concludes this method doesn't really generalize known voting systems, and aver that my calculation method does actually generalize or absorb several others, including conventional and unconventional ones. **]

Whether this Pasqualian method is Condorcet and conforms to Arrow's theorem or not or passes certain conditions is something I would still like to analyze, perhaps in a future post.

An observation: mixed votes (some voting 100 percent, others fractions) is also possible.

Now, there is no reason why the matrix would be limited to a finite number of voters or candidates, by extending the matrix to a countably infinite by countably infinite matrix one could, theoretically, rank either a countably infinite number of candidates with finite number of voters or, on the other hand, rank a finite number of candidates with an countably infinite number of voters, as by a Kolmogorovian or Heisenbergian extension (since at the matrix-power level we would be dealing with sequences in $l^2$, which converge).  Exactly how this would be done is a little fuzzy to me at the moment, but that's probably because the call of the beautiful beach with a gamma of turquoise colors is kind of irresistible right now... Caribbean tan... here I come!

View from Tulum, Quintana Roo

Categories:

## On Patch(ix)es as Kernels of Integral Transforms (RWLA,MCT,GT,AM Part VII)

[This post is ongoing, as I think of a few things I will write them down too]

So just a couple of days ago I was asked by a student to give a class on DEs using Laplace transforms, and it was in my research that I realized that what I've been describing by converting a probability distribution on [0,1] to another is in effect a transform (minus the transform pair, which was unclear to me how to obtain, corresponding perhaps to inverting the patch(ix)).  The general form of integral transforms is, according to my book Advanced Engineering, 2nd ed., by Michael Greenberg p. 247:

$F(s) = \int_a^b f(t) K(t,s) dt$, where $K(t,s)$ is called the kernel of the transform, and looks an awful lot like a function by patch(ix) "multiplication," which I described as:

$b(x) = \int_0^1 a(1-y) p(x,y) dy$ you may recall.  In the former context $p(x,y)$ looks like a kernel, but here $a(1-y)$ is a function of $y$ than of $x$, and I sum across $y$.  To rewrite patch(ix)-multiplication as an integral transform, it would seem we need to rethink the patch position on the xy plane, but it seems easy to do (and we do in number 1 below!).

In this post I want to (eventually be able to):

1. Formally rewrite my function-by-patch(ix) multiplication as a "Pasquali" integral transform.

If we are to modify patch multiplication to match the integral transform guideline, simply think of $p(t,s)$ as oriented a bit differently, yielding the fact that $\int_0^1 p(t,s) ds = 1$ for any choice of $t$.  Then, for a probability distribution $b(t)$ in [0,1], the integral transform is $B(s) = \int_0^1 b(t) p(t,s) dt$.  Now $p(t,s)$ is indeed then a kernel.

2. Extend a function-by-patch multiplication to probability distributions and patches on all $\mathbb{R}$ and $\mathbb{R}^2$, respectively.

When I began thinking about probability distributions, I restricted them to the interval [0,1] and a patch on $[0,1] \times [0,1]$, to try to obtain a strict analogy of (continuous) matrices with discrete matrices.  I had been thinking for a while that this need not be the case, but when I glanced at the discussion of integral transforms on my Greenberg book, and particularly the one on the Laplace transform, I realized I could have done it right away.  Thus, we can redefine patch multiplication as

$B(s) = \int_{-\infty}^{\infty} b(t) p(t,s) dt$

with

$\int_{-\infty}^{\infty} p(t,s) ds = 1$

3. Explore the possibility of an inverse-patch via studying inverse-transforms.

3a. Write the patch-inverse-patch relation as a transform pair.

4. Take a hint from the Laplace and Fourier transforms to see what new insights can be obtained on patch(ix)es (or vice-versa).

Vice-versa: Well one of the things we realize first and foremost, is that integral transforms are really an extension of the concept of matrix multiplication: if we create a matrix "surface" and multiply it by a "function" vector we obtain another "function," and the kernel (truly a continuous matrix) is exactly our path connecting the two.  Can we not think now of discrete matrices (finite, infinite) as "samplings" of such surfaces?  I think so.  We can also combine kernels with kernels (as I have done in previous posts) much as we can combine matrices with matrices.  I haven't really seen a discussion exploring this in books, which is perhaps a bit surprising.  At any rate, recasting this "combination" shouldn't be much of a problem, and the theorems I proved in previous posts should still hold, because the new notation represents rigid motions of the kernel, yielding new kernel spaces that are isomorphic to the original.