By tradition, in the US presidential election, all the electoral votes of a state are cast for the candidate receiving the majority support from voters. As a result, the political polarity of states (as opposed to individual voters) ultimately decides the outcome of the presidential election. In this analysis, we set out to answer two questions:
We turned to Twitter for insight. We wrote a computer program to trawl through the 10+ million tweets that would roll in between October 31 - November 6. The program identifies tweets that discuss topics related to the election: either the candidates, employment, taxes, debt, education, healthcare, and foreign policy. Since some people have suggested that superstorm Sandy may influence the election, we tossed her in as a topic as well. We only kept tweets for which we could determine the state from which they originated.
Once we had all these state-specific tweets, we visualized their relative abundance on a map of the US. All the topics were measured simply in terms of popularity: what percentage of people in a given state were tweeting about that particular issue? It's worth noting that the idea of "percentage of people" is a tricky one which required adjusting the measurements for each state. If you're interested in this detail, see our discussion of per capita below.
This project is the result of a collaboration between Ben Deverett, Wendy Liu, and Professor Derek Ruths. Ben and Wendy, undergraduate researchers in the Network Dynamics Lab, deserve the lion's share of the credit for writing the vast majority of the code: Ben wrote the backend Twitter analysis code and Wendy constructed the interactive web interface. For more information on the project, please send us an email!
To a certain extent the colors may indicate the leanings of a state. However, there are a couple important factors to keep in mind when reading these results.
So overall, we would caution you about using the results we're showing as a way of predicting the election. But, that said, we'd caution you about considering the results of any poll to be predictive.
States like New York and Texas have much larger populations than others like Oregon and West Virginia. As a result, just a small percentage of Texans tweeting about jobs, for example, could make Texans look much more interested in jobs than Arizonans, even if all Arizonans were tweeting about jobs as well: there just aren't enough Arizonans. Since we wanted to assess the relative importance of a topic to the population of an entire state, we divided the number of tweets for a given topic by the population of that state. This gives a sort of per capita popularity of the topic, reducing the distortion that states with big populations might otherwise have.
In general, it can be pretty difficult (i.e., impossible) to figure out where any given tweet is from. Sometimes (about 1% of the time, to be precise) tweets are geotagged, meaning that they are marked with the exact longitude-latitude coordinate at which the user generated them. Most of the time, however, this information isn't available. We can overcome the lack of geotagging in about 10% of cases with some additional trickery. We first look at the location that the user reports in her profile. If this doesn't read "The moon" or "Middle earth", then we have a shot at resolving the user down to a state or, often, a city. Another source of information is the timezone the user reports, since people often choose a timezone designation incorporating a nearby city (e.g., "Eastern Time/Pittsburg"). By combining all these hints about where a person is located, we can obtain a pretty good idea of what state from which a tweet originates.
We would love to share our data. However, we're limited in what we can share by Twitter's Terms of Usage (which we consider quite reasonable as they're designed to protect the privacy of Twitter users). As a result, the best we can do is offer list of the ID numbers of tweets that we used in our analysis. From this you can go download the individual tweets yourself. If you're interested in this data, please send us an email.
Brought to you by the Network Dynamics Lab at McGill University.