Interactive mode: click a code block or Show Plot button to reveal/hide its corresponding plot.

Week 6 Data Aggregation in R

Introduction

The summarize() function from the dplyr package is a powerful tool for creating summary statistics of your data. It allows you to collapse a dataset to a single row or a summary for each group of observations. In this tutorial, we’ll explore the basic and advanced uses of summarize().

library(tidyverse)
gapminder<-haven::read_dta("gapminder.dta")
head(gapminder)

Basic Usage of summarize()

The basic syntax of summarize() is straightforward. You provide it with a dataset and specify the summary statistics you want to compute.

gapminder %>%
  summarize(global_avg_lifeExp = mean(lifeExp,na.rm = TRUE),
            n= n())
Explanation of na.rm = TRUE

When working with data in R, it’s common to encounter missing values (NAs) in datasets. Most summarization functions in R, such as mean()sum(), and median(), will return NA if any of the values being summarized are missing, which may distort the results.

To handle this, many R functions include an argument called na.rm. The argument stands for “remove NAs” and is a logical value (TRUE or FALSE). When set to TRUE, the function ignores any NA values and proceeds with the calculation using only the non-missing values.

In our case today, we know there is no NA in the data so I omitted na.rm = TRUE

Grouped Summaries with group_by()

Often, you want to compute summaries for subgroups within your data. This is where group_by() comes into play.

gapminder %>%
  group_by(country) %>%
  summarize(avg_lifeExp = mean(lifeExp),
            n=n())

Calculate the total population growth for each country over the years (1952-2007).

# Example: Summarizing Population Growth
population_growth <- gapminder %>%
  group_by(country) %>%
  summarize(
    from = first(year),
    pop1952 = first(pop),
    to = last(year),
    pop2007 = last(pop),
    pop_growth = last(pop) - first(pop))

head(population_growth)

Creating Cross-Sectional Data from Longitudinal Data

By summarizing longitudinal data, you can create new cross-sectional datasets for further analysis.

Create a cross-sectional dataset that includes the average life expectancy, average GDP per capital and population growth for each continent.

cross_sectional_data <- gapminder %>%
  mutate(country = as.character(country)) %>%
  group_by(continent) %>%
  summarize(
    avg_lifeExp = mean(lifeExp),
    avg_gdpPercap = median(gdpPercap),
    continent_pop = sum(pop)
  )

head(cross_sectional_data)
Why Summarizing Longitudinal Data to Cross-Sectional Data Could be Useful

Longitudinal data tracks the same subjects (e.g., countries, individuals) across multiple time points. While this is useful for analyzing trends over time, sometimes it’s necessary to condense the data into a cross-sectional format, where each observation is represented by a single row. Cross-sectional data represents the “snapshot” of each entity at a given moment or an aggregation over time, and it’s often used for comparative or overview analyses.

Benefits of Summarizing Longitudinal Data:

  1. Simplification: Summarizing longitudinal data into cross-sectional form simplifies the dataset, making it easier to analyze, visualize, or compare.

  2. Comparative Analysis: By reducing data over time into key metrics (like averages, sums, or differences), we can compare entities (e.g., countries, individuals) in a more direct manner.

  3. Data Reduction: Summarizing data reduces the number of rows and complexity, which can be helpful when analyzing or visualizing large datasets.

Advanced Usage

Summarizing with Multiple Grouping Variables

You can summarize data using multiple grouping variables to get more granular insights.

#Example: Average Life Expectancy ect by Continent and Year
by_continent_year <- gapminder %>%
  group_by(continent, year) %>%
  summarize(
    avg_lifeExp = mean(lifeExp),
    avg_gdpPercap = mean(gdpPercap),
    continent_pop = sum(pop))

head(by_continent_year)

Counts and proportions of logical values: sum(x > 10)mean(y == 0). When used with numeric functions, TRUE is converted to 1 and FALSE to 0. This makes sum() and mean() very useful: sum(x) gives the number of TRUEs in x, and mean(x) gives the proportion.

gapminder %>%
  group_by(continent,year) %>%
  summarize(
    prop_1000 = mean(gdpPercap<1000)*100
  )

Merging Summaries with Original Data

You can merge the summarized data back with the original dataset for comparative analysis.

# Example: Merging Average Life Expectancy with Original Data
gapminder_with_summary <- gapminder %>%
  left_join(by_continent_year, by = c("continent","year"))

head(gapminder_with_summary)

* Working with window Functions

gapminder_with_summary<-gapminder_with_summary%>%
  mutate(lag_avg_GPDpc = lag(avg_gdpPercap))

head(gapminder_with_summary)

** Transfer data to wide

by_continent_year_wide <- by_continent_year %>%
  pivot_wider(names_from = year, values_from = c(avg_lifeExp,avg_gdpPercap,continent_pop))

head(by_continent_year_wide)

Using across() for Summarizing Multiple Columns

Demonstrate how to apply summary functions across multiple columns using the across() helper.

# Example: Calculate the mean of multiple numeric columns
gapminder %>%
  group_by(continent) %>%
  summarize(across(c(lifeExp, gdpPercap), mean))

Applying Multiple Functions with across()

Apply different functions to different columns within a single summarize() call.

# Example: Apply different functions to different columns
gapminder %>%
  group_by(continent) %>%
  summarize(
    across(c(lifeExp,gdpPercap), mean, .names = "avg_{col}"),
    across(c(lifeExp,gdpPercap), median, .names = "median_{col}")
  )

Bonus: Mapping Your Data

Make sure you have the necessary packages installed:

#if (FALSE) install.packages("ggplot2")
#if (FALSE) install.packages("rnaturalearth")
#if (FALSE) install.packages("rnaturalearthdata")
library(tidyverse)
library(rnaturalearth)
library(rnaturalearthdata)
library(ggplot2)

we will summarize the gapminder data by country to calculate the average life expectancy for each country.

if (FALSE) install.packages("stargazer")
# Summarizing data by continent
cross_sectional_data <- gapminder %>%
  group_by(country) %>%
  summarize(
    avg_lifeExp = mean(lifeExp, na.rm = TRUE)
  )

Use the rnaturalearth package to get the world map data for countries.

# Getting world map data
world_map <- ne_countries(scale = "medium", returnclass = "sf")

Next, we will merge the country_data (average life expectancy) with the world_map dataset. The world_map dataset has country names, so we will use left_join() to merge them based on the country name.

# Merging the country-level life expectancy with the world map
cross_sectional_data$country <- as.character(cross_sectional_data$country)
world_map$name <- as.character(world_map$name)
world_map_data <- world_map %>%
  left_join(cross_sectional_data, by = c("name" = "country"))

#why error? 
cross_sectional_data$country <- as.character(cross_sectional_data$country)


world_map_data <- world_map %>%
  left_join(cross_sectional_data, by = c("name" = "country"))

Now we can create the map using ggplot2. We will use geom_sf() to plot the map, and scale_fill_viridis_c() to color the countries based on life expectancy.

# Plotting the map
ggplot(data = world_map_data)+
  geom_sf(aes(fill = avg_lifeExp)) +
  scale_fill_viridis_c(option = "plasma", na.value = "gray50") +
  labs(title = "Average Life Expectancy by Continent",     
       fill = "Life Expectancy") +
  theme_minimal()

# why some countries are gray (NA)? 

table(cross_sectional_data$country)
## 
##   1  10 100 101 102 103 104 105 106 107 108 109  11 110 111 112 113 114 115 116 117 118 119  12 120 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 121 122 123 124 125 126 127 128 129  13 130 131 132 133 134 135 136 137 138 139  14 140 141 142  15 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  16  17  18  19   2  20  21  22  23  24  25  26  27  28  29   3  30  31  32  33  34  35  36  37  38 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  39   4  40  41  42  43  44  45  46  47  48  49   5  50  51  52  53  54  55  56  57  58  59   6  60 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  61  62  63  64  65  66  67  68  69   7  70  71  72  73  74  75  76  77  78  79   8  80  81  82  83 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
##  84  85  86  87  88  89   9  90  91  92  93  94  95  96  97  98  99 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1
table(world_map$name)
## 
##               Afghanistan                     Åland                   Albania 
##                         1                         1                         1 
##                   Algeria            American Samoa                   Andorra 
##                         1                         1                         1 
##                    Angola                  Anguilla                Antarctica 
##                         1                         1                         1 
##         Antigua and Barb.                 Argentina                   Armenia 
##                         1                         1                         1 
##                     Aruba   Ashmore and Cartier Is.                 Australia 
##                         1                         1                         1 
##                   Austria                Azerbaijan                   Bahamas 
##                         1                         1                         1 
##                   Bahrain                Bangladesh                  Barbados 
##                         1                         1                         1 
##                   Belarus                   Belgium                    Belize 
##                         1                         1                         1 
##                     Benin                   Bermuda                    Bhutan 
##                         1                         1                         1 
##                   Bolivia          Bosnia and Herz.                  Botswana 
##                         1                         1                         1 
##     Br. Indian Ocean Ter.                    Brazil        British Virgin Is. 
##                         1                         1                         1 
##                    Brunei                  Bulgaria              Burkina Faso 
##                         1                         1                         1 
##                   Burundi                Cabo Verde                  Cambodia 
##                         1                         1                         1 
##                  Cameroon                    Canada                Cayman Is. 
##                         1                         1                         1 
##      Central African Rep.                      Chad                     Chile 
##                         1                         1                         1 
##                     China                  Colombia                   Comoros 
##                         1                         1                         1 
##                     Congo                  Cook Is.                Costa Rica 
##                         1                         1                         1 
##             Côte d'Ivoire                   Croatia                      Cuba 
##                         1                         1                         1 
##                   Curaçao                    Cyprus                   Czechia 
##                         1                         1                         1 
##           Dem. Rep. Congo                   Denmark                  Djibouti 
##                         1                         1                         1 
##                  Dominica            Dominican Rep.                   Ecuador 
##                         1                         1                         1 
##                     Egypt               El Salvador                Eq. Guinea 
##                         1                         1                         1 
##                   Eritrea                   Estonia                  eSwatini 
##                         1                         1                         1 
##                  Ethiopia                Faeroe Is.              Falkland Is. 
##                         1                         1                         1 
##                      Fiji                   Finland             Fr. Polynesia 
##                         1                         1                         1 
##    Fr. S. Antarctic Lands                    France                     Gabon 
##                         1                         1                         1 
##                    Gambia                   Georgia                   Germany 
##                         1                         1                         1 
##                     Ghana                    Greece                 Greenland 
##                         1                         1                         1 
##                   Grenada                      Guam                 Guatemala 
##                         1                         1                         1 
##                  Guernsey                    Guinea             Guinea-Bissau 
##                         1                         1                         1 
##                    Guyana                     Haiti Heard I. and McDonald Is. 
##                         1                         1                         1 
##                  Honduras                 Hong Kong                   Hungary 
##                         1                         1                         1 
##                   Iceland                     India         Indian Ocean Ter. 
##                         1                         1                         1 
##                 Indonesia                      Iran                      Iraq 
##                         1                         1                         1 
##                   Ireland               Isle of Man                    Israel 
##                         1                         1                         1 
##                     Italy                   Jamaica                     Japan 
##                         1                         1                         1 
##                    Jersey                    Jordan                Kazakhstan 
##                         1                         1                         1 
##                     Kenya                  Kiribati                    Kosovo 
##                         1                         1                         1 
##                    Kuwait                Kyrgyzstan                      Laos 
##                         1                         1                         1 
##                    Latvia                   Lebanon                   Lesotho 
##                         1                         1                         1 
##                   Liberia                     Libya             Liechtenstein 
##                         1                         1                         1 
##                 Lithuania                Luxembourg                     Macao 
##                         1                         1                         1 
##                Madagascar                    Malawi                  Malaysia 
##                         1                         1                         1 
##                  Maldives                      Mali                     Malta 
##                         1                         1                         1 
##              Marshall Is.                Mauritania                 Mauritius 
##                         1                         1                         1 
##                    Mexico                Micronesia                   Moldova 
##                         1                         1                         1 
##                    Monaco                  Mongolia                Montenegro 
##                         1                         1                         1 
##                Montserrat                   Morocco                Mozambique 
##                         1                         1                         1 
##                   Myanmar                 N. Cyprus            N. Mariana Is. 
##                         1                         1                         1 
##                   Namibia                     Nauru                     Nepal 
##                         1                         1                         1 
##               Netherlands             New Caledonia               New Zealand 
##                         1                         1                         1 
##                 Nicaragua                     Niger                   Nigeria 
##                         1                         1                         1 
##                      Niue            Norfolk Island               North Korea 
##                         1                         1                         1 
##           North Macedonia                    Norway                      Oman 
##                         1                         1                         1 
##                  Pakistan                     Palau                 Palestine 
##                         1                         1                         1 
##                    Panama          Papua New Guinea                  Paraguay 
##                         1                         1                         1 
##                      Peru               Philippines              Pitcairn Is. 
##                         1                         1                         1 
##                    Poland                  Portugal               Puerto Rico 
##                         1                         1                         1 
##                     Qatar                   Romania                    Russia 
##                         1                         1                         1 
##                    Rwanda       S. Geo. and the Is.                  S. Sudan 
##                         1                         1                         1 
##              Saint Helena               Saint Lucia                     Samoa 
##                         1                         1                         1 
##                San Marino     São Tomé and Principe              Saudi Arabia 
##                         1                         1                         1 
##                   Senegal                    Serbia                Seychelles 
##                         1                         1                         1 
##           Siachen Glacier              Sierra Leone                 Singapore 
##                         1                         1                         1 
##              Sint Maarten                  Slovakia                  Slovenia 
##                         1                         1                         1 
##               Solomon Is.                   Somalia                Somaliland 
##                         1                         1                         1 
##              South Africa               South Korea                     Spain 
##                         1                         1                         1 
##                 Sri Lanka             St-Barthélemy                 St-Martin 
##                         1                         1                         1 
##       St. Kitts and Nevis   St. Pierre and Miquelon        St. Vin. and Gren. 
##                         1                         1                         1 
##                     Sudan                  Suriname                    Sweden 
##                         1                         1                         1 
##               Switzerland                     Syria                    Taiwan 
##                         1                         1                         1 
##                Tajikistan                  Tanzania                  Thailand 
##                         1                         1                         1 
##               Timor-Leste                      Togo                     Tonga 
##                         1                         1                         1 
##       Trinidad and Tobago                   Tunisia                    Turkey 
##                         1                         1                         1 
##              Turkmenistan      Turks and Caicos Is.                    Tuvalu 
##                         1                         1                         1 
##           U.S. Virgin Is.                    Uganda                   Ukraine 
##                         1                         1                         1 
##      United Arab Emirates            United Kingdom  United States of America 
##                         1                         1                         1 
##                   Uruguay                Uzbekistan                   Vanuatu 
##                         1                         1                         1 
##                   Vatican                 Venezuela                   Vietnam 
##                         1                         1                         1 
##                 W. Sahara     Wallis and Futuna Is.                     Yemen 
##                         1                         1                         1 
##                    Zambia                  Zimbabwe 
##                         1                         1
world_map<-world_map%>%
  mutate(
    name = case_when(
    name == "United States of America" ~ "United States",
    T~name
  ))

table(world_map$name)
## 
##               Afghanistan                     Åland                   Albania 
##                         1                         1                         1 
##                   Algeria            American Samoa                   Andorra 
##                         1                         1                         1 
##                    Angola                  Anguilla                Antarctica 
##                         1                         1                         1 
##         Antigua and Barb.                 Argentina                   Armenia 
##                         1                         1                         1 
##                     Aruba   Ashmore and Cartier Is.                 Australia 
##                         1                         1                         1 
##                   Austria                Azerbaijan                   Bahamas 
##                         1                         1                         1 
##                   Bahrain                Bangladesh                  Barbados 
##                         1                         1                         1 
##                   Belarus                   Belgium                    Belize 
##                         1                         1                         1 
##                     Benin                   Bermuda                    Bhutan 
##                         1                         1                         1 
##                   Bolivia          Bosnia and Herz.                  Botswana 
##                         1                         1                         1 
##     Br. Indian Ocean Ter.                    Brazil        British Virgin Is. 
##                         1                         1                         1 
##                    Brunei                  Bulgaria              Burkina Faso 
##                         1                         1                         1 
##                   Burundi                Cabo Verde                  Cambodia 
##                         1                         1                         1 
##                  Cameroon                    Canada                Cayman Is. 
##                         1                         1                         1 
##      Central African Rep.                      Chad                     Chile 
##                         1                         1                         1 
##                     China                  Colombia                   Comoros 
##                         1                         1                         1 
##                     Congo                  Cook Is.                Costa Rica 
##                         1                         1                         1 
##             Côte d'Ivoire                   Croatia                      Cuba 
##                         1                         1                         1 
##                   Curaçao                    Cyprus                   Czechia 
##                         1                         1                         1 
##           Dem. Rep. Congo                   Denmark                  Djibouti 
##                         1                         1                         1 
##                  Dominica            Dominican Rep.                   Ecuador 
##                         1                         1                         1 
##                     Egypt               El Salvador                Eq. Guinea 
##                         1                         1                         1 
##                   Eritrea                   Estonia                  eSwatini 
##                         1                         1                         1 
##                  Ethiopia                Faeroe Is.              Falkland Is. 
##                         1                         1                         1 
##                      Fiji                   Finland             Fr. Polynesia 
##                         1                         1                         1 
##    Fr. S. Antarctic Lands                    France                     Gabon 
##                         1                         1                         1 
##                    Gambia                   Georgia                   Germany 
##                         1                         1                         1 
##                     Ghana                    Greece                 Greenland 
##                         1                         1                         1 
##                   Grenada                      Guam                 Guatemala 
##                         1                         1                         1 
##                  Guernsey                    Guinea             Guinea-Bissau 
##                         1                         1                         1 
##                    Guyana                     Haiti Heard I. and McDonald Is. 
##                         1                         1                         1 
##                  Honduras                 Hong Kong                   Hungary 
##                         1                         1                         1 
##                   Iceland                     India         Indian Ocean Ter. 
##                         1                         1                         1 
##                 Indonesia                      Iran                      Iraq 
##                         1                         1                         1 
##                   Ireland               Isle of Man                    Israel 
##                         1                         1                         1 
##                     Italy                   Jamaica                     Japan 
##                         1                         1                         1 
##                    Jersey                    Jordan                Kazakhstan 
##                         1                         1                         1 
##                     Kenya                  Kiribati                    Kosovo 
##                         1                         1                         1 
##                    Kuwait                Kyrgyzstan                      Laos 
##                         1                         1                         1 
##                    Latvia                   Lebanon                   Lesotho 
##                         1                         1                         1 
##                   Liberia                     Libya             Liechtenstein 
##                         1                         1                         1 
##                 Lithuania                Luxembourg                     Macao 
##                         1                         1                         1 
##                Madagascar                    Malawi                  Malaysia 
##                         1                         1                         1 
##                  Maldives                      Mali                     Malta 
##                         1                         1                         1 
##              Marshall Is.                Mauritania                 Mauritius 
##                         1                         1                         1 
##                    Mexico                Micronesia                   Moldova 
##                         1                         1                         1 
##                    Monaco                  Mongolia                Montenegro 
##                         1                         1                         1 
##                Montserrat                   Morocco                Mozambique 
##                         1                         1                         1 
##                   Myanmar                 N. Cyprus            N. Mariana Is. 
##                         1                         1                         1 
##                   Namibia                     Nauru                     Nepal 
##                         1                         1                         1 
##               Netherlands             New Caledonia               New Zealand 
##                         1                         1                         1 
##                 Nicaragua                     Niger                   Nigeria 
##                         1                         1                         1 
##                      Niue            Norfolk Island               North Korea 
##                         1                         1                         1 
##           North Macedonia                    Norway                      Oman 
##                         1                         1                         1 
##                  Pakistan                     Palau                 Palestine 
##                         1                         1                         1 
##                    Panama          Papua New Guinea                  Paraguay 
##                         1                         1                         1 
##                      Peru               Philippines              Pitcairn Is. 
##                         1                         1                         1 
##                    Poland                  Portugal               Puerto Rico 
##                         1                         1                         1 
##                     Qatar                   Romania                    Russia 
##                         1                         1                         1 
##                    Rwanda       S. Geo. and the Is.                  S. Sudan 
##                         1                         1                         1 
##              Saint Helena               Saint Lucia                     Samoa 
##                         1                         1                         1 
##                San Marino     São Tomé and Principe              Saudi Arabia 
##                         1                         1                         1 
##                   Senegal                    Serbia                Seychelles 
##                         1                         1                         1 
##           Siachen Glacier              Sierra Leone                 Singapore 
##                         1                         1                         1 
##              Sint Maarten                  Slovakia                  Slovenia 
##                         1                         1                         1 
##               Solomon Is.                   Somalia                Somaliland 
##                         1                         1                         1 
##              South Africa               South Korea                     Spain 
##                         1                         1                         1 
##                 Sri Lanka             St-Barthélemy                 St-Martin 
##                         1                         1                         1 
##       St. Kitts and Nevis   St. Pierre and Miquelon        St. Vin. and Gren. 
##                         1                         1                         1 
##                     Sudan                  Suriname                    Sweden 
##                         1                         1                         1 
##               Switzerland                     Syria                    Taiwan 
##                         1                         1                         1 
##                Tajikistan                  Tanzania                  Thailand 
##                         1                         1                         1 
##               Timor-Leste                      Togo                     Tonga 
##                         1                         1                         1 
##       Trinidad and Tobago                   Tunisia                    Turkey 
##                         1                         1                         1 
##              Turkmenistan      Turks and Caicos Is.                    Tuvalu 
##                         1                         1                         1 
##           U.S. Virgin Is.                    Uganda                   Ukraine 
##                         1                         1                         1 
##      United Arab Emirates            United Kingdom             United States 
##                         1                         1                         1 
##                   Uruguay                Uzbekistan                   Vanuatu 
##                         1                         1                         1 
##                   Vatican                 Venezuela                   Vietnam 
##                         1                         1                         1 
##                 W. Sahara     Wallis and Futuna Is.                     Yemen 
##                         1                         1                         1 
##                    Zambia                  Zimbabwe 
##                         1                         1