When we first thought of starting this website, our primary goal was to collect all election related data in one place, clean it, understand it, and then eventually use it to predict future election outcomes. Our research so far has indicated that there’s already an incredible amount and variety of datasets available (albeit in disparate locations) with the essential ingredients for building an election prediction algorithm. And we’re not talking about predicting which party will win elections overall. That’s valuable too. However, what’s more interesting, and not done before, is forecasting winners on each of the 272 National Assembly seats up for grabs in the 2018 elections. This is the ultimate goal of the Pakistan Election Watch project. We hope to leverage many different datasets out there, augment them with our own polling and research, and build a robust statistical model that provides meaningful insights into who might win a particular National Assembly constituency in the upcoming 2018 Pakistan Elections.
Our first step toward building such a model will be to categorize each of the 272 National Assembly seats as either “Safe” seats or “Swing” seats based on different factors. A safe seat here is defined as a constituency with a high likelihood of being won by a particular party or candidate that has a strong vote bank there. A swing seat on the other hand is one where either no single party has clear dominance, or where opinions and preferences have shifted away from the dominant party/candidate over time. Predicting a winner on a swing seat will thus involve further analysis looking more in depth into characteristics of the constituency (e.g. census demographics, opinion polling, candidate strength, etc). In the section below we share some of the factors and data features we’ll use to make the safe vs swing categorizations, and for predicting which way a swing seat might go.
Our modeling approach is driven by the hypothesis that certain factors available in the data are good predictors of whether a constituency is a safe seat or a swing seat (as defined above). Some of the factors we’ll be exploring are the following:
Winning Streaks: By analyzing the last three elections (2002, 2008, and 2013), we can isolate seats where a particular party or candidate won the seat in all three (streak of 3) or the most recent two (streak of 2) elections. Such winning streak would indicate the presence of a strong vote bank in that constituency that’s loyal to the party/candidate.
Victory Margin: The margin of votes by which a candidate won a seat, both in absolute and percentage terms, is an important metric. A seat where the victory margin was way above the average could be thought of as a safe seat, while a vice versa situation (victory margin < 5%) would indicate a possible swing seat. We will also analyze the trend in victory margins over the past three election cycles. Seats where margins are receding could point to a swing in voter preferences.
Turnout Rate: Pakistan has historically had a consistently low turnout rate. We will review turnout rates by each constituency, revealing areas where both victory margins and turnouts were low in 2013 - essentially swing seats where parties could swing results in their favor by increasing turnout among their voters.
Candidate Strength: We may have many political parties, but the strength of “electables” in our elections remains stronger than ever. These are personalities whose vote bank is tied to them rather than any political party they are a part of. Their powers stems from their feudal or tribal clans, because feudal lordship or tribal allegiances holds more sway in the rural heartland of Pakistan than policy positions and manifestos. We will track who these “electables”, and factor that into who might win a constituency they are contesting in.
Demographic Shifts: These are exciting times on the data front because Pakistan just went through a census exercise after almost 20 years. The latest census results, once released, will point us to areas where demographics have changed considerably with time. To predict how these demographics shifts will impact voter preferences, we’ll make use of opinion polls that break out voting preferences by demographics.
These are just a few of many features we plan to explore and use toward building our election prediction model.
Challenges & Caveats
We’d be remiss if we didn’t point out the many challenges and caveats that come with building a model with the lofty aim of predicting constituency-wise election results.
Using the factors outlined above, our model will assign to each NA constituency a
Seat Safety Score indicating the likelihood of the seat being won by the same party that won it the last time round in 2013
Predicted Winner Party, indicating which of the three major political parties (PML-N, PPP, and PTI) is the likely winner of the seat in 2018
Probability/Confidence Score which is a measure of how confident we are in prediction of who will win this seat in this upcoming election
While this is the eventual goal, along the way we’ll explore each factor we want to include in our model and see how good of a predictor that feature is. Below are links to posts detailing our findings and taking a deep-dive on each individual factor: