# US Voter Turnout from 1980 to 2014

October 11, 2018

For this Tidy Tuesday challenge we'll use R to take a look at how voter turnout varies by state for presidential and midterm elections from 1980 to 2014.

library(tidyverse)
library(geofacet)

select(-X1, -icpsr_state_code, -alphanumeric_state_code)

turnout
# A tibble: 936 x 4
<dbl> <chr>                   <dbl>           <dbl>
1  2014 United States        83262122       227157964
2  2014 Alabama               1191274         3588783
4  2014 Arizona               1537671         4510186
5  2014 Arkansas               852642         2117881
6  2014 California            7513972        24440416
8  2014 Connecticut           1096556         2577311
9  2014 Delaware               238110          681526
10  2014 District of Columbia   177176          495899
# … with 926 more rows

Are we missing voting data?

turnout %>%
n_missing_eligible_voters = sum(is.na(eligible_voters)))
# A tibble: 1 x 2
<int>                     <int>
1             223                         0

Yikes, 223 rows in our dataset are missing values for votes. That's... not great. In a more rigorous analysis we would need to get to the bottom of this, but since we are making an exploratory data visualization, we'll simply make a note of it on the final graphic.

Let's calculate a few new variables:

1. Percent of eligible voters who voted in a given year in each state.
2. Percent of eligible voters who voted in a given year in the US.
3. Difference between the voter turnout of each state and the national average in a given year.
4. Categorical version of differences which will be easier to intepret and will allow us to use a discrete color palette rather than a continuous one.
turnout <- turnout %>%
filter(!is.na(votes), state %in% c(state.name, "District of Columbia")) %>%  # state.name is a base R constant
mutate(percent_voted = 100 * votes / eligible_voters) %>%
group_by(year) %>%
mutate(national_percent_voted = 100 * sum(votes) / sum(eligible_voters)) %>%
ungroup() %>%
mutate(state_vs_national = percent_voted - national_percent_voted,
state_vs_national_category = cut(state_vs_national,
breaks = c(-Inf, -20, -15, -10, -5, -2, 2, 5, 10, 15, 20, Inf),
ordered_result = TRUE))

turnout
# A tibble: 704 x 6
year state                percent_voted national_percent_voted state_vs_national state_vs_national_category
<dbl> <chr>                        <dbl>                  <dbl>             <dbl> <ord>
1  2014 Alabama                       33.2                   38.3             -5.11 (-10,-5]
2  2014 Alaska                        54.8                   38.3             16.5  (15,20]
3  2014 Arizona                       34.1                   38.3             -4.21 (-5,-2]
4  2014 Arkansas                      40.3                   38.3              1.95 (-2,2]
5  2014 California                    30.7                   38.3             -7.56 (-10,-5]
6  2014 Colorado                      54.7                   38.3             16.4  (15,20]
7  2014 Connecticut                   42.5                   38.3              4.24 (2,5]
8  2014 Delaware                      34.9                   38.3             -3.37 (-5,-2]
9  2014 District of Columbia          35.7                   38.3             -2.58 (-5,-2]
10  2014 Florida                       43.3                   38.3              5.01 (5,10]
# … with 694 more rows

With the data in this form, we can make a bar chart for each state and Washington DC with election year on the x-axis and votes cast per 100 eligible voters on the y-axis.

We'll use state_vs_national_category to color the bars by the degree to which a state's voter turnout compares to the national average in a given year.

We'll alo use facet_geo from the geofacet package to position the state bar charts in the shape of the US.

ggplot(turnout, aes(year, percent_voted, fill = state_vs_national_category)) +
geom_col(width = 1.7, size = 0) +
facet_geo(~ state) +
scale_fill_brewer(labels = c("-20", "-15", "-10", " -5", " -2", " 0 ", " +2", " +5", "+10", "+15", "+20"),
type = "div", palette = "PuOr", direction = 1) +
scale_x_continuous(breaks = c(1980, 1990, 2000, 2010), labels = c("'80", "'90", "'00", "'10")) +
labs(title = "Minnesota leads the nation in voter turnout in presidential and midterm elections",
subtitle = "Votes cast per 100 eligible voters in each state in presidential (1980, '84, '88, '92, '96, '00, '04, '08, 2012) and midterm (1982, '86, '90, '94, '98, '02, '06, '10, 2014) elections*",
caption = "*223 election years in this dataset are missing the number of votes cast, leading to missing bars in many states\nSource: data.world | Graphic: nsgrantham.com/voter-turnout",
fill = "Votes cast relative to the national average in a given year",
x = "Election year", y = "Votes cast per 100 eligible voters") +
guides(fill = guide_legend(title.position = "top", label.position = "bottom", nrow = 1)) +
theme_minimal(base_family = "Fira Sans Extra Condensed Light", base_size = 14) +
theme(plot.title = element_text(family = "Fira Sans Extra Condensed", face = "bold", size = 22),
plot.margin = margin(0.5, 0.5, 0.5, 0.5, "cm"),
legend.direction = "horizontal",
legend.position = c(0.2, 0.97),
legend.spacing.x = unit(0.59, "lines"),
legend.title = element_text(size = 13),
panel.grid.major.x = element_line(size = 0.2),
panel.grid.minor.x = element_blank(),
panel.grid.major.y = element_line(size = 0.2),
panel.grid.minor.y = element_blank())

ggsave("voter-turnout.png", width = 14, height = 7)

Nice job Minnesota!