This is just to get into the habit again. I haven’t posted anything for sometime, due pressures of work and study. With this post, I intend to become a regular once more.
I made a presentation yesterday Feb. 27 at the Institute of Mathematics in U.P. Diliman, before some 15 mathematicians. The presentation was arranged by Dr. Jose Ma. Balmaceda, director of the Math Institute where I am also lecturer. Former COMELEC Commissioner Mehol Sadain and former COMELEC IT Department head Ernie del Rosario also attended. My topic was the determination of sample sizes for a post-election audit. Halalang Marangal (HALAL), of which I am secretary-general, is recommending to the COMELEC to adopt Confidence Level Targetting (CLT) instead of Fixed Percentage of Precincts (FPP) for setting the sample size in the random manual audit (RMA) that is required by law for the May 2010 Philippine elections. Current proposals today for the sample size range from 200-plus (the number of legislative districts, which is what the law says) to 1,600-plus (the number of councilor districts). All proposals, except ours, are based on fixed audits.
The biggest problem of fixed audits in setting the sample size is that confidence level (one minus the significance level) of the result will be unpredictable. If the win is a landslide, the sample size may be unnecessarily large. But if it is a close contest, the sample size may be too small.
The advantage of Confidence Level Targetting is that it takes the winning margin into account, when setting the sample size, so that the confidence level reaches the desired level. To give an extreme example: if the winning margin is just one vote, even if only one precinct is left out of the sample (that is, all precincts except one are audited), the result of the audit will remain inconclusive.
How is Confidence Level Targetting done?
Here’s the basic process:
1. Adopt a target confidence level L. HALAL recommends 95% (the same level which is typically used in establishing scientific “truths”). The American Statistical Association on the other recommends 99% (which will take longer and will also be more expensive). I can be happy with either. Let us say 95%.
2. Determine the winning margin M. This is the difference in the votes received by the winner and the nearest loser. In case of multi-slot contests like senatorial or council elections, use the difference in votes received by the last among the winners and the first among the losers. The audit starts with the hypothesis that this margin is the result of cheating.
3. Estimate the average number of false votes V a presumed cheat will try to gain in a single precinct. The idea is: any gain higher than V will be too obvious. Lower than V, the results are still plausible enough. So, V is the highest false gain that a cheat will dare attain in one precinct. Of course, this will still vary from precinct to precinct, so we take V as the average target of the cheat per precinct. In my presentation, I assumed that V = 500. That is, the cheats will target a false gain of 500 votes in the precincts where they will operate.
4. From M and V, we can compute the minimum number of precincts P that the cheat must have operate in, to get a total false gain of M+1 (that is, to overcome the lead of the presumed true winner):
P = (M+1) / V
P*V = M+1
That is, the cheat targets an average false gain of V in P precincts, in order to get a total false gain of M+1.
This false gain may be attained in several ways. By simply adding zeroes to the votes of the favored candidate, by removing digits from the true winner, or by vote shifting. In local parlance, “dagdag-bawas” or padding-shaving.
Now, we have an estimate P of the number of bad precincts.
5. We must now compute the sample size which will give us a very high probability (95%, if a target confidence level of 0.95 is adopted) of drawing at least one of these bad precincts. The formula is:
S = [N – (P – 1)/2] * [1 – (1-L)^1/P], where
S is the sample size
N is the total number of precincts (75,471 clusters for the May 2010 elections)
L is the target confidence level (and 1-L is the significance level of the test)
P is the estimated number of bad precincts
I took this formula from existing literature on the mathematics of election audits, specifically from an article by Aslam, Popa and Rivest (2007). I can explain the details if there is any interest. In whole procedure is explained by Dopp (2009).
6. We now have the desired sample size. If we draw this number of precincts randomly (with emphasis on randomly, that is, every precinct has an equal chance of being selected as every other precinct) from the total number of precincts, we are 95% sure we will get at least one bad precinct. The ballots in the drawn precincts are then counted manually, the votes tallied, and the results of the audit compared with the machine results for discrepancies.
7. There are two possibilities, and at this point, language becomes important. So, note carefully the words and phrases I use.
The first possibility is that we don’t find any precinct with a discrepancy as large or larger than V. None. Then, we can now conclude (note the language now!) with 95% confidence that there was no cheating that was significant enough to change the outcome of the election. In the parlance of statistics, we have falsified the null hypothesis (that the cheating changed the outcome of the elections).
The second possibility is that we find at least one precinct with a discrepancy as large or larger than V. In this case, the audit results are inconclusive. (Sorry if that sounds a little bit counter-intuitive, but that’s statistics). The, our conclusion will be (note the language!) we cannot confidently assert (at the 95% confidence level, if you insist on being quantitative) that there was no cheating significant enough to change the election outcome.
In other words: we had started with the hypothesis that the cheating was significant enough to change the election outcome. If we find no bad precinct in the audit, as described above, then we can confidently conclude that the hypothesis was false, the winner is the true winner, and s/he can be proclaimed. But if we find at least one bad precinct, we cannot confidently conclude that significant cheating occurred.
8. To confidently conclude that the cheating was significant enough to change the outcome, we need a different approach. More on this later.