Disclaimer: This is an apolitical post, aimed at explaining the math behind sample count performed during Singapore General Elections. No political parties will be mentioned and names of electoral division (e.g. SMCs/GRCs) are mentioned to illustrate the math with actual numbers. Also, this is our own independent analysis on the math behind sample count. We do not work at the Elections Department Singapore.

What is a sample count?

Implemented in the Singapore General Elections (GE), a sample count is performed at the start of the vote counting process. Briefly, for each electoral division, a small number of votes are sampled from each polling station to provide a statistical estimate of the eventual election result. The purpose of sample count is to help prevent speculation and misinformation from unofficial sources while counting is underway and also serves as a check against the final election results.

In detail, 100 votes are randomly drawn from each polling station and the proportion of votes for each candidate (or group of candidates) are counted. Within each electoral division, the vote proportions are then averaged across polling stations, with weightage given to account for the number of votes cast at each polling station.

Infographic illustrating the sample count process. Votes are sampled from each polling station and a weighted average of the samples (accounting for number of votes) is then calculated to derive the sample count. Image credit: Elections Department Singapore.

Figure 1: Infographic illustrating the sample count process. Votes are sampled from each polling station and a weighted average of the samples (accounting for number of votes) is then calculated to derive the sample count. Image credit: Elections Department Singapore.

When first implemented publicly in the 2015 GE, it is often quoted that the estimate from sample count has an error margin of ±4% (with 95% confidence). This means that the sample count estimate should not differ from the actual voting proportion by more than 4% for 95% of the estimates made. Here, we are going to explore how this number comes about.

Math behind sample count

First, let us reiterate the problem. The votes drawn for the sample count are essentially samples drawn from a population (i.e. all votes) and we are trying to use the sample proportion to estimate the true population proportion. As with all sample estimates, there is some degree of uncertainty involved. This uncertainty is often represented in the form of a confidence interval, a range of values that the true proportion is likely to lie in. The confidence interval also has an associated confidence level, often set to be 95%. This means that if this sampling has been done infinitely, 95% of the confidence interval would contain the true proportion.

Next, we can derive the confidence interval from the standard deviation (SD) of the estimate. If we assume that the samples follow a normal (“bell curve”) distribution, the two sided 95% confidence interval is ±1.96 SD of the estimate. Fortunately, the standard deviation (SD) of the sample proportion is given by this simple formula:

\[\begin{equation*} \sigma_{p} = \sqrt{\dfrac{p(1-p)}{n}} \end{equation*}\]

where \(p\) is the true population proportion and \(n\) is the sample size, in this case the number of sample count votes. There are two points to note here. First, the SD is the largest when \(p\) is 0.5 due to the \(p(1-p)\) term in the numerator. Second and more importantly, the SD is large when sample size \(n\) is small which make sense given that a smaller sample size introduces more uncertainty.

In the context of the GE, this implies that the SMC with the lowest number of polling stations will have the highest error margin / confidence interval in their sample count. In the 2015 GE, Potong Pasir SMC has the smallest number of votes cast which is 17,407 votes. Also, we estimate there to be a polling station for every 2,600 voters given that there are 2,304,331 votes cast and 880 polling stations. Thus, we estimate Potong Pasir SMC to have about 6-7 polling stations (17,407 / 2,600 ~ 6.7). Assuming that there are only six polling stations, \(n\) would be 600 since 100 votes are sampled randomly from each polling station. Plugging \(n=600\) and \(p=0.5\) into the formula above, the 95% confidence interval would be:

\[\begin{equation*} 1.96 \times \sqrt{\dfrac{0.5 \times (1-0.5)}{600}} = 0.0400 \end{equation*}\]

which is exactly the 4% error of margin reported! At this point, it is important to realise that we have calculated the worst possible error of margin. In contrast, for larger electoral divisions, e.g. the GRCs, there are a lot more polling stations and thus more sample count votes and this decreases the error of margin by a lot. For instance, the largest GRC in GE2015, Ang Mo Kio GRC, has 187,771 votes cast, corresponding to ~72 polling stations and \(n\) goes up to 7,200. The corresponding error of margin then drops to 1.15%.

Numbers for GE 2025

For this GE, we managed to get the exact number of polling stations for each SMC/GRC. The SMCs have 9 to 17 polling stations, corresponding to 2.38% to 3.27% error of margin, with Mountbatten SMC having the smallest number of polling stations. For the GRCs, the number of polling stations ranges from 48 to 81, bringing the error of margin down to 1.09% to 1.41%, with Pasir Ris-Changi GRC having the smallest number of polling stations.

Sample count and actual election results for the 2025 GE. The 95 percent CI (i.e. error margin) of the sample count are given in red while the actual results are in black dots.

Figure 2: Sample count and actual election results for the 2025 GE. The 95 percent CI (i.e. error margin) of the sample count are given in red while the actual results are in black dots.

Sample vote counting is completed as of 2255H. As expected, the SMCs have higher error margins with Sembawang West SMC’s error margin sticking very close to the 50% mark while Jalan Kayu SMC’s error margin goes below 50%. As for the other electoral divisions, the sample votes are very likely to be representative and voters probably do not need to wait till the actual counts to be announced!

With all the results in, only one electoral division has its actual votes lying outside of the error margin:

  • Yio Chu Kang SMC (Actual: 78.73%) (Estimate: 76±2.52 = 73.48–78.52%)

This is quite expected as the error margins have a 95% confidence so there is a 1-in-20 chance that it is wrong, which pan out to be around 1 to 2 constituencies. So conclusion, Statistics works (again)!

For the numerically inclined, the actual numbers for 2025 GE are presented below:

Numbers for GE 2020

Due to COVID-19, there is an increase in the number of polling stations from 880 to 1,100, which translates to an average of ~2,400 voters per polling station (total electorate of 2,653,942 voters). Using this estimate, the SMCs have 8 to 14 polling stations, corresponding to 2.62% to 3.46% error of margin, with Mountbatten SMC having the smallest number of polling stations. For the GRCs, the number of polling stations ranges from 39 to 72, bringing the error of margin down to 1.15% to 1.57%, with Pasir Ris-Changi GRC having the smallest number of polling stations.

Sample count and actual election results for the 2020 GE. The 95 percent CI (i.e. error margin) of the sample count are given in red while the actual results are in black dots.

Figure 3: Sample count and actual election results for the 2020 GE. The 95 percent CI (i.e. error margin) of the sample count are given in red while the actual results are in black dots.

For the 2020 GE, we expect lower error margins due to the increase in the number of polling stations. However, we observed that 7 out of 31 electoral divisions have their actual vote proportions lying outside of the sample count error margin (outside of ±1.96 SD). This is unexpected as a 95% confidence meant that we should observe one or two electoral divisions being lying outside the error. Also, for six of these electoral divisions, the sample count overestimated the majority vote proportion. One possible explanation is that the sample count process is not entirely random. Due to COVID-19, the Elections Department Singapore has prescribed timebands for voters to stagger the flow of voters into polling stations. This staggering of voters could have decreased the randomness of the order in which votes were cast and possibly affected the randomness of the sample count.

For the numerically inclined, the actual numbers for 2020 GE are presented below:

Numbers for GE 2015

For 2015 GE, we estimate an average of ~2,600 voters per polling station. Using this estimate, the SMCs have 6 to 13 polling stations, corresponding to 2.72% to 4.00% error of margin, with Mountbatten SMC having the smallest number of polling stations. For the GRCs, the number of polling stations ranges from 35 to 66, bringing the error of margin down to 1.21% to 1.66%, with Pasir Ris-Changi GRC having the smallest number of polling stations.

Sample count and actual election results for the 2015 GE. The 95 percent CI (i.e. error margin) of the sample count are given in red while the actual results are in black dots. Note that we have used the sample count proportion for $p$ instead of the worst case scenario $p$ of 0.5. Clearly, most of the actual results fall within the error margin with the exception of Jurong GRC. And this is also expected since the estimates have a 95 percent confidence.

Figure 4: Sample count and actual election results for the 2015 GE. The 95 percent CI (i.e. error margin) of the sample count are given in red while the actual results are in black dots. Note that we have used the sample count proportion for \(p\) instead of the worst case scenario \(p\) of 0.5. Clearly, most of the actual results fall within the error margin with the exception of Jurong GRC. And this is also expected since the estimates have a 95 percent confidence.

For the 2015 GE, only Jurong GRC’s actual vote deviates from the 95% confidence error of margin! Statistics works!

Finally, for the numerically inclined, the actual numbers for 2015 GE are presented below:

Concluding Remarks

In conclusion, hopefully I have brought about a better understanding of the sample count process and error of margin involved in the estimates. It is also amazing to point out that we are able to make rather accurate estimates of the population proportion with a relatively small number of samples.