In a recent CommonWealth article by Lawrence DiCara and James Sutherland, “A Tale of Two Cities: Boston Mayoral Vote Shows Big Split on Education and Income,” the authors note two continuing empirical trends in municipal voting in Boston: (1) northern neighborhoods continue to exhibit lower turnout than southern neighborhoods and (2) voting behavior appears to be predicted by levels of income and education when you examine aggregate data trends.

The authors go on to largely dismiss the value of polling in elections. Specifically, they point to a UMass Lowell Center for Public Opinion poll, fielded October 2 through 7, that I directed, which had John Connolly leading Marty Walsh by 8 points among likely voters. On Election Day (November 5), Walsh won 52-48. In the article, they wrote:

“A second observation stems from the horse race narrative of the election perpetuated by media outlets and polling data. As both campaigns surely understood, polling data at the municipal level can be very inaccurate and quite volatile. In all modesty, pollsters did not read our previous analysis showing that upwards of 100,000 Bostonian voters – most of them younger and residing in the city’s northern neighborhoods – only vote every four years.

“For instance, the October 10 UMass Lowell poll benchmarked 18-29 year old voters for 37.6 percent of their sample pool. While this demographic surely makes up that percent of the city’s total population, the number of people from that age group who actually vote is substantially lower during municipal elections. Similarly, as we have alluded to in previous articles, the 11.6 percent benchmark for voters over the age of 65 is also exceptionally low. This same poll also benchmarked Allston-Brighton and downtown residents to cast 13.4 percent and 17.2 percent of the citywide vote, respectively. In fact, Allston-Brighton ended up contributing only 6.1 percent of the citywide votes, while the downtown precincts accounted for only 12 percent.

“… It is also worth noting that each poll conducted in the month of October put both mayoral candidates within the margin of error and thus the results were inconclusive, despite what was reported by media outlets.”

This unfairly characterizes the methodology employed in our poll and, more broadly, mischaracterizes the value of public opinion research to election narratives. UMass Lowell’s Center for Public Opinion’s mission is to produce high-quality information that enriches our learning objectives as a public research university and provide valuable information to the public and government. The central tenets of this mission are that our data collection be guided by rigorous methodological standards and that our decision-making be transparent. Because of random sampling error, one out of every 20 polls done properly will produce a prediction and margin of error that does not capture the true value we are trying to measure. We will not always be right, but we always want it to be explicitly clear what we have done.

From the quote above from DiCara and Sutherland, it appears as though they have three main criticisms of polling, particularly as it relates to the recent Boston mayoral election: (1) polling entities (and especially UMass Lowell) made obviously false assumptions about the nature of the electorate, which by implication led to inaccurate results; (2) polling this cycle was volatile, unpredictable, and therefore unhelpful; and (3) every poll conducted in this election was within the margin of error, and therefore every result was inconclusive.

The basis of any sampling technique is to try to collect a random sample of individuals from an underlying population of interest so that the laws of probability and statistics can be used to say something interesting about the population as a whole. Everything a pollster reports will include a level of confidence (how sure are we that what is being said about the data is right) and a margin of error (the range in which we believe the true value falls given the stated level of confidence). In collecting that sample, the pollster is trying to eliminate bias by ensuring that everyone in the target population has an equal probability of being selected, that people not in the target population don’t make it into the sample, and that the questions we ask do not systematically bias the results.

The two most common methods of sampling are telephone samples that use either a random-digit-dial (RDD) technique in which a list of phone numbers are generated at random or a registration-based-sampling (RBS) through which a sample is taken from the list of registered voters. Both techniques have been shown to be similar in side-by-side testing and each have their strengths and weaknesses.

In both cases, pollsters must contend with the problem that some people are more likely to answer their phones than others and that some people may have more than one phone number. Two especially well-known instances of these problems are that those age 65 and older are far more likely to participate in surveys than those 18 to 29, and some people have either only a landline or only a cell phone, while others have both. To correct for any bias created by generating the sample, it is common to use known information about the population at large from the census to reweight the results.

In an RDD survey, which is what we used in the 2013 Boston mayoral election, this is done by collecting demographic information about everyone who answered our phone calls, including those who indicated that they were not registered to vote or unlikely to vote. We therefore have three samples nested in our survey: (1) a random sample of all Boston adults, (2) a subsample of registered voters and (3) a further subsample of those deemed most likely to vote based on answering a few questions about their vote intention and interest in the election.

DiCara and Sutherland claim that the UMass Lowell survey of Boston adults, released on October 10, mistakenly benchmarked 37.6 percent of the sample of likely voters to be age 18 to 29, and overestimated the number of likely voters who will come from northern precincts, especially Allston-Brighton and downtown. This represents a misunderstanding of how a representative sample is achieved in RDD surveys. It is true that we did benchmark that young people (18 to 29) would make up 37.6 percent of the total sample and that Allston-Brighton and downtown residents would make up 13.4 percent and 17.2 percent, respectively. However, these are population benchmark numbers that are used to construct survey weights. Survey weights are used to give relative importance to cases that are underrepresented in the actual collection of the sample so that our total population sample of Boston adults matches the margins from the US Census for age, gender, race/ethnicity, education, and geography. From that sample, however, we have a smaller sample of registered voters (RV) and likely voters (LV). Note that a census target that 37.6 percent of the population is age 18 to 29 only manifests in a prediction that 19.3 percent of the electorate will be 18 to 29. The table below shows how this process unfolds for the three relevant categories.

In fairness, DiCara and Sutherland are correct that the LV population contains too many voters from Allston-Brighton compared to the final election returns where they were 6.1 percent, not 12.5 percent, of the electorate. However, outside of Allston-Brighton, Connolly still led Walsh by 9 points among likely voters (46 to 37) in this poll; the small overrepresentation of Allston-Brighton residents therefore did not leverage our data to create a drastically different picture of the horse race. One of two things happened. Either our poll was wildly inaccurate for a different reason than the one identified by DiCara and Sutherland or something changed in this election between October 7 and November 5. We are very confident that the latter explanation has far greater validity.

One of the points of leverage we have to consider is to look at what other pollsters did. DiCara and Sutherland claim that polls had little to say in the Boston mayoral election since they were all within the margin of error. While each poll does carry a margin of error, a collection of polls can give us considerably more information. A margin of error is a range in which we are 95 percent confident that the population result lies. Therefore, if the lead is greater than two times the margin of error, we can say that a lead is statistically significant, which means that there’s only a 1/20 chance we would be wrong by trusting the poll. However, it is a mistake to say that a poll within the margin of error means that the race is tied. If one candidate is leading, then the poll is predicting that one candidate has a greater probability of winning.

If you think about this in terms of betting, a bet with a probability of winning of 50 percent is not the same as a bet with a probability of winning of 75 percent. Every poll gives us some information and each reveals a degree of uncertainty. This is what national polling aggregators like the Huffington Post’s pollster.com and Nate Silver’s FiveThirtyEight.com blog have taught us about elections more generally.

We are warned by Dicara and Sutherland to dismiss the polls since they were volatile. But polling in this cycle was not especially volatile. The story that the polls collectively tell is that Connolly began with a lead, but Walsh surged late. Here are the likely voter estimates from every poll conducted by an unaffiliated organization between the preliminary election on September 24 and the general election on November 5.

What were the major events in this election cycle? Given the vast literature on municipal elections, I would point to two especially important factors in this campaign. The first is that in city elections, as Karen Kaufmann argues in The Urban Voter, partisan cues are often supplanted by group-centric cues. Our early October poll showed that African-Americans and Latinos were on par more undecided and less firm in their support of the two candidates. Furthermore, Connolly led among African-American likely voters 52 percent to 38 percent, and among Latino likely voters 34 percent to 21 percent in a field period that ended October 7.

The endorsement of Walsh by John Barros (October 8), Felix Arroyo (October 8), and Charlotte Golar Richie (October 11) provided an easy cue for many voters. This is not to say that all minority voters moved because of these endorsements, but it was enough to change the generalization: later polls showed Walsh leading among African-American and Latino voters, while early polls showed the opposite. Additionally, most election observers agree that Walsh won the actual campaign on every metric by which we can measure campaigns. He not only built a larger ground organization that contacted more voters, but he also outspent Connolly by about $1 million when you factor in outside money. Had Walsh been leading in our October poll (Connolly led by 8), it would have been shocking if he had only won the election by 3 points come Election Day given these campaign dynamics.

Polls conducted by non-partisan organizations collectively accurately conveyed the trajectory of this election; only two organizations (Suffolk University and UMass Amherst) fielded polls in the last two weeks of the campaign and both had Walsh winning. While it is true that Walsh’s lead in these last two polls was within the margin of error, such outcomes both gave him a higher probability of winning, since a result within the margin of error is not the same as a poll where the candidates are tied.

One last point is worth considering. I would strongly caution against dismissing polling in favor of only analyzing precinct-level data to draw inferences about human behavior. This type of data can reveal trends, but drawing inferences about individual level phenomena from aggregate level data is a common mistake researchers make called the ecological fallacy. If we ultimately want to say something about individual level behavior, we require individual level data. Polls are a complement to observed trends in aggregate data.

I greatly appreciate the kind of data-driven journalism offered by DiCara and Sutherland. I believe that the advent of outlets like FiveThirtyEight.com and especially the Monkey Cage Blog (now of the Washington Post) have transformed the landscape of the way that high-quality empirical research gets reported in conventional media outlets. My goal here has been to provide some insight into the strengths and limitations of polling. Polling as an industry does best when pollsters go the extra mile to explain their methodology, provide detailed cross-tabs and answer questions. This is one of the central commitments of the UMass Lowell Center for Public Opinion.

Joshua J. Dyck is an associate professor of political science and co-director of the UMass Lowell Center for Public Opinion. He is the co-editor of The Guide to State Politics and Policy and has published nearly 20 peer-reviewed journal articles on participation, turnout, public opinion, state politics and the ballot initiative process in numerous outlets, including Public Opinion Quarterly and the Journal of Politics.