Global Mortality Rates

Global Mortality Rates

May 07, 2018
Medium: R and ggplot2
Large: JPEG

Death comes for us all. And what a splendid variety of forms it can take! Cardiovascular disease, cancer, natural disasters, malaria, infections — the list is almost endless.

This week we’ll look at mortality data collected by the World Health Organization (WHO) to see how cause of death varies by the socio-demographic index of the country in which you live.

library(tidyverse)
library(readxl)

download.file("https://github.com/rfordatascience/tidytuesday/blob/master/data/2018/2018-04-16/global_mortality.xlsx?raw=true", "global_mortality.xlsx")

mortality <- read_xlsx("global_mortality.xlsx") %>%
  filter(str_detect(country, "SDI")) %>%
  rename(sociodemographic_index = country) %>%
  mutate(sociodemographic_index = sub(" SDI", "", sociodemographic_index)) %>%
  mutate(sociodemographic_index = fct_relevel(sociodemographic_index, !! c("High", "High-middle", "Middle", "Low-middle", "Low")))

mortality
# A tibble: 135 x 35
   sociodemographi… country_code  year `Cardiovascular… `Cancers (%)` `Respiratory di…
   <chr>            <chr>        <dbl>            <dbl>         <dbl>            <dbl>
 1 High             NA            1990             42.7          24.2             4.45
 2 High             NA            1991             42.3          24.4             4.47
 3 High             NA            1992             41.8          24.6             4.50
 4 High             NA            1993             41.4          24.8             4.56
 5 High             NA            1994             40.9          25.0             4.59
 6 High             NA            1995             40.5          25.1             4.65
 7 High             NA            1996             40.1          25.4             4.70
 8 High             NA            1997             39.7          25.6             4.76
 9 High             NA            1998             39.2          25.8             4.81
10 High             NA            1999             38.8          25.9             4.89
# … with 125 more rows, and 29 more variables: `Diabetes (%)` <dbl>, `Dementia (%)` <dbl>,
#   `Lower respiratory infections (%)` <dbl>, `Neonatal deaths (%)` <dbl>, `Diarrheal diseases
#   (%)` <dbl>, `Road accidents (%)` <dbl>, `Liver disease (%)` <dbl>, `Tuberculosis
#   (%)` <dbl>, `Kidney disease (%)` <dbl>, `Digestive diseases (%)` <dbl>, `HIV/AIDS
#   (%)` <dbl>, `Suicide (%)` <dbl>, `Malaria (%)` <dbl>, `Homicide (%)` <dbl>, `Nutritional
#   deficiencies (%)` <dbl>, `Meningitis (%)` <dbl>, `Protein-energy malnutrition (%)` <dbl>,
#   `Drowning (%)` <dbl>, `Maternal deaths (%)` <dbl>, `Parkinson disease (%)` <dbl>, `Alcohol
#   disorders (%)` <dbl>, `Intestinal infectious diseases (%)` <dbl>, `Drug disorders
#   (%)` <dbl>, `Hepatitis (%)` <dbl>, `Fire (%)` <dbl>, `Heat-related (hot and cold exposure)
#   (%)` <dbl>, `Natural disasters (%)` <dbl>, `Conflict (%)` <dbl>, `Terrorism (%)` <dbl>

The dataset is in a wide format, so we’ll need to pivot into a longer, tidy format.

mortality <- mortality %>%
  select_if(~ all(!is.na(.))) %>%
  rename_all(~ sub(" [(]%[)]", "", .)) %>%
  gather(cause_of_death, deaths_per_1k, -sociodemographic_index, -year) %>%
  mutate(deaths_per_1k = 10 * deaths_per_1k,
         cause_of_death = replace(cause_of_death, cause_of_death == "Parkinson disease", "Parkinson's disease"),
         cause_of_death = replace(cause_of_death, cause_of_death == "Heat-related (hot and cold exposure)", "Heat-related"))

mortality
# A tibble: 4,050 x 4
   sociodemographic_index  year cause_of_death          deaths_per_1k
   <chr>                  <dbl> <chr>                           <dbl>
 1 High                    1990 Cardiovascular diseases          427.
 2 High                    1991 Cardiovascular diseases          423.
 3 High                    1992 Cardiovascular diseases          418.
 4 High                    1993 Cardiovascular diseases          414.
 5 High                    1994 Cardiovascular diseases          409.
 6 High                    1995 Cardiovascular diseases          405.
 7 High                    1996 Cardiovascular diseases          401.
 8 High                    1997 Cardiovascular diseases          397.
 9 High                    1998 Cardiovascular diseases          392.
10 High                    1999 Cardiovascular diseases          388.
# … with 4,040 more rows

This part looks a little esoteric at first, but the point of it is to calculate an invisible upper limit on a series of facetted plots.

parse_factor_to_numeric <- function(x) as.numeric(levels(x))[x]

yvals <- mortality %>%
  group_by(cause_of_death) %>%
  summarize(ymax = max(deaths_per_1k),
            yupper = cut(max(deaths_per_1k), c(0, 5, 15, 30, 50, 150, 300, 500), 
                         labels = c(5, 15, 30, 50, 150, 300, 500))) %>%
  mutate(yupper = parse_factor_to_numeric(yupper)) %>%
  arrange(desc(ymax))

mortality <- mortality %>%
  left_join(yvals, by = "cause_of_death") %>%
  mutate(cause_of_death = factor(cause_of_death, levels = yvals$cause_of_death))

mortality
# A tibble: 4,050 x 6
   sociodemographic_index  year cause_of_death          deaths_per_1k  ymax yupper
   <chr>                  <dbl> <fct>                           <dbl> <dbl>  <dbl>
 1 High                    1990 Cardiovascular diseases          427.  459.    500
 2 High                    1991 Cardiovascular diseases          423.  459.    500
 3 High                    1992 Cardiovascular diseases          418.  459.    500
 4 High                    1993 Cardiovascular diseases          414.  459.    500
 5 High                    1994 Cardiovascular diseases          409.  459.    500
 6 High                    1995 Cardiovascular diseases          405.  459.    500
 7 High                    1996 Cardiovascular diseases          401.  459.    500
 8 High                    1997 Cardiovascular diseases          397.  459.    500
 9 High                    1998 Cardiovascular diseases          392.  459.    500
10 High                    1999 Cardiovascular diseases          388.  459.    500
# … with 4,040 more rows

Lookin’ good.

Now we’ll make line graphs with year on the x-axis, deaths per 1,000 people on the y-axis, and lines colored by the socio-demoographic index (SDI) from ‘Low’ to ‘High’.

ggplot(mortality, aes(year, deaths_per_1k, color = sociodemographic_index)) +
  geom_hline(yintercept = 0,  size = 0.3, alpha = 0.5) +
  geom_hline(aes(yintercept = yupper), size = 0.3, alpha = 0) +
  geom_line() +
  facet_wrap(~ cause_of_death, scales = "free_y", ncol = 3) +
  scale_color_manual(values = c("#ffd700", "#fa8775", "#ea5f94", "#cd34b5", "#0000ff")) +
  labs(title = "Global mortality rates by Socio-demographic Index (SDI) from 1990 to 2016",
       subtitle = "",
       caption = "Source: ourworldindata.org | Graphic: nsgrantham.com/global-mortality",
       x = NULL, y = "Deaths per 1,000 people") +
  theme_minimal(base_family = "Fira Sans Extra Condensed Light", base_size = 12) +
  theme(plot.title = element_text(family = "Fira Sans Extra Condensed"),
        legend.position = c(0.33, 1.0358),
        legend.direction = "horizontal",
        legend.title = element_blank(),
        axis.text = element_text(size = 7),
        panel.grid.major.y = element_blank(),
        panel.grid.minor.y = element_blank(),
        panel.grid.major.x = element_line(size = 0.4),
        panel.grid.minor.x = element_blank())

ggsave("global-mortality.png", width = 6.5, height = 12)

Boy, what a heart-warming graphic.