[These refer to the polling averages displayed on the Upshot’s 2020 Presidential Polling Diary.]
At first glance, a poll average is one of the easiest ways to make sense of the dozens of polls that can come out on a single day. But it’s really not so easy. Some polls persistently lean to one party or the other. Some states don’t have enough recent polls. And some polls are of such uncertain or outright dubious quality that perhaps they should be tossed out entirely.
Below, the methodological decisions we made in doing ours. (Our decisions, and most kinds of public poll averages, owe a debt to FiveThirtyEight, which collected the polling data used here, and which has been publishing poll averages with an extensive methodology for more than a decade.)
Much of our approach will be familiar to poll watchers:
We adjust polls to the universe of likely voters, based on polls that publish a result for both registered and likely voters. In practice, this means we would shift a poll of registered voters — who tend to be a bit more young and diverse — about one percentage point toward the president.
We adjust for frequency. If a pollster conducts many polls, each poll receives less weight. And if a pollster releases multiple versions of the same poll — for example, the results under different assumptions about turnout — the versions are averaged together.
We adjust for change over time. If new polls show a shift in the race, compared with older polls by the same pollster, then other old polls from different pollsters will be adjusted accordingly. This adjustment is modest far from the election but aggressive as the election gets closer. This adjustment does not account for demographic similarities between states or for the possibility that some states are more sensitive to changes in the political environment than others are. Taking account of this so-called elasticity in a state can be helpful, but it comes close enough to the line between forecasting and poll averaging that we’re content to wait for new data rather than to try to model out how the polls are shifting.
We adjust for house effects. If a pollster consistently leans one way with respect to the average, we adjust its results back toward the average. Of course, this raises a question: What’s the average? Here, the average is weighted by pollster quality, described below.
We weight by recency and sample size. More recent and larger polls get more weight, though there are diminishing returns to larger sample size, and older polls still get some weight.
We weight by pollster quality. Not surprisingly, pollsters with rigorous methodologies and a strong track record tend to produce poll results that come closer to the outcome of the election. A pollster’s record is evaluated on several measures:
— A pollster’s average error compared with other polls in similar kinds of races, like a House race.
— A pollster’s average error compared with other pollsters in the exact same race.
— Whether a pollster is committed to transparency by publishing microdata or is a member of the American Association for Public Opinion Research Transparency Initiative, which is a strong predictor of lower error and bias in poll results.
— Whether a pollster conducts live interviews via cellphones (it’s better if it does).
— Whether a pollster is prone to systematic bias: a tendency to lean toward a party across all of its polls in a cycle or race. It does not matter whether the pollster is biased toward the same party from year to year or race to race; it matters only whether the results are usually biased, one way or another.
The systematic bias point is a little complicated, but it’s important: A poll average needs unbiased polls a lot more than it needs accurate polls.
Imagine two pollsters. One conducts three polls and shows Hillary Clinton doing exactly three percentage points better than the final result each time. That’s an average error of three points. Another poll also does three surveys, with one showing her doing five points better than the final result, one showing her doing two points better and one showing her opponent doing five points better than the result. That’s an average error of four points. But a poll average consisting only of the more accurate pollster would oddly fare much worse. It would show Mrs. Clinton leading by three, while the poll average consisting of the less accurate poll would show a dead heat.
This is not an abstract problem. Pollsters increasingly negotiate something known as bias-variance trade-off. They consider steps that might reduce error or costs, like weighting on party identification or sampling from an online panel, but might risk bias if they’re wrong about the partisan makeup of the electorate or if their sample is no longer representative of the population. Here, the incentives of the pollster and the poll aggregator simply do not align: A pollster wants to reduce error and save money; a poll aggregator would gladly take higher error in exchange for unbiased samples, which usually come at great cost if they’re possible at all. There are also cases where systematic bias can arise even when a pollster makes every effort to produce an unbiased result. Failing to weight by education is a good example.
If it was possible, systematic bias would be the most important factor to consider in evaluating a pollster. The catch is that systematic bias is hard to measure and harder to predict. It’s hard to measure because most pollsters don’t survey enough in a cycle to identify a clear bias, and because sometimes bias can vary quite a bit by state or region. A pollster weighting on party identification, for instance, might badly underestimate the Democratic advantage in party identification in one state and overestimate it in the next. It’s hard to predict because the methodological choices that might lead to bias in one year might not lead to bias in the next. Alternately, a choice that was unbiased in one year might be biased in the next.
Take Monmouth University, a high-profile, high-quality pollster that has actually been quite prone to systemic bias in recent years. In 2016, it didn’t weight by education — and in part as a result, its polls tilted heavily toward Mrs. Clinton. In the 2016 and 2020 Iowa caucuses, Monmouth polled only recent voters and underestimated support for Bernie Sanders as a result. But Monmouth is polling a broad universe of voters in 2020, and it’s now weighting by education. Perhaps it will be systematically biased yet again, but there’s no obvious reason to expect it.
A pollster’s record of systemic bias does help predict future systematic bias, but to a very limited extent. By far the strongest predictor of whether a pollster is prone to systemic bias is whether it’s a member of an organization committed to transparency in public opinion research. There are many possible reasons for this connection: Perhaps transparency makes it harder for pollsters to put their thumb on the scale, or maybe involvement in a scientific community indicates an effort to identify and follow best practices. It may even make it easier to pick up best practices.
Of course, there’s no guarantee that the pollsters who follow best practices will have the best results. They have over the longer run, but in any given election they might be wrong. The same can be said for this polling average. Over the long run, the choices laid out here would do a good job of anticipating final election results. In any given election, they might not.
Source: Elections - nytimes.com