Session 6: Reporting Tables and Results

Interactive mode: click a code block or Show Plot button to reveal/hide its corresponding plot.

Week 7 Reporting Tables and Results

Introduction: `stargazer`

stargazer is one of the most commonly used packages for creating beautiful tables in R. It allows exporting tables to LaTeX, HTML, or plain text, which can be easily copied to Word.

Installation:

#if (FALSE) install.packages("stargazer")

# Load necessary libraries
library(stargazer)
library(tidyverse)

Example: Summary Table

You can summarize the key variables used in the regression models to give a sense of the sample distribution. This step is important for providing context before presenting regression results.

This will output a summary table that includes the mean, standard deviation (sd), minimum (min), maximum (max), and number of observations (n) for each of the variables used in the regression models.

# Example data
data(mtcars)

# Summary table
stargazer(mtcars, type = "text")  # You can change 'text' to 'html' or 'latex' to suit your output needs

## 
## ============================================
## Statistic N   Mean   St. Dev.  Min     Max  
## --------------------------------------------
## mpg       32 20.091   6.027   10.400 33.900 
## cyl       32  6.188   1.786     4       8   
## disp      32 230.722 123.939  71.100 472.000
## hp        32 146.688  68.563    52     335  
## drat      32  3.597   0.535   2.760   4.930 
## wt        32  3.217   0.978   1.513   5.424 
## qsec      32 17.849   1.787   14.500 22.900 
## vs        32  0.438   0.504     0       1   
## am        32  0.406   0.499     0       1   
## gear      32  3.688   0.738     3       5   
## carb      32  2.812   1.615     1       8   
## --------------------------------------------

# Summary table with selected variables
stargazer(mtcars[,c("cyl","disp","hp")], type = "text")  # You can change 'text' to 'html' or 'latex' to suit your output needs

## 
## ============================================
## Statistic N   Mean   St. Dev.  Min     Max  
## --------------------------------------------
## cyl       32  6.188   1.786     4       8   
## disp      32 230.722 123.939  71.100 472.000
## hp        32 146.688  68.563    52     335  
## --------------------------------------------

mtcars%>%
  select(cyl,disp,hp)%>%
  stargazer(type = "text")

## 
## ============================================
## Statistic N   Mean   St. Dev.  Min     Max  
## --------------------------------------------
## cyl       32  6.188   1.786     4       8   
## disp      32 230.722 123.939  71.100 472.000
## hp        32 146.688  68.563    52     335  
## --------------------------------------------

Example: Regression Table

# Fit a regression model
model1 <- lm(mpg ~ cyl + disp, data = mtcars)

model2 <- lm(mpg ~ cyl + disp + hp, data = mtcars)
# Regression table
stargazer(model1,model2, type = "text")  # Change 'text' to 'html' for Word compatibility

## 
## =================================================================
##                                  Dependent variable:             
##                     ---------------------------------------------
##                                          mpg                     
##                              (1)                    (2)          
## -----------------------------------------------------------------
## cyl                        -1.587**                -1.227        
##                            (0.712)                (0.797)        
##                                                                  
## disp                       -0.021*                -0.019*        
##                            (0.010)                (0.010)        
##                                                                  
## hp                                                 -0.015        
##                                                   (0.015)        
##                                                                  
## Constant                  34.661***              34.185***       
##                            (2.547)                (2.591)        
##                                                                  
## -----------------------------------------------------------------
## Observations                  32                     32          
## R2                          0.760                  0.768         
## Adjusted R2                 0.743                  0.743         
## Residual Std. Error    3.055 (df = 29)        3.055 (df = 28)    
## F Statistic         45.808*** (df = 2; 29) 30.877*** (df = 3; 28)
## =================================================================
## Note:                                 *p<0.1; **p<0.05; ***p<0.01

Exporting to Word or PDF

# Export to Word Doc
stargazer(mtcars, type = "html", out = "summary_table.doc")

## 
## <table style="text-align:center"><tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Statistic</td><td>N</td><td>Mean</td><td>St. Dev.</td><td>Min</td><td>Max</td></tr>
## <tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">mpg</td><td>32</td><td>20.091</td><td>6.027</td><td>10.400</td><td>33.900</td></tr>
## <tr><td style="text-align:left">cyl</td><td>32</td><td>6.188</td><td>1.786</td><td>4</td><td>8</td></tr>
## <tr><td style="text-align:left">disp</td><td>32</td><td>230.722</td><td>123.939</td><td>71.100</td><td>472.000</td></tr>
## <tr><td style="text-align:left">hp</td><td>32</td><td>146.688</td><td>68.563</td><td>52</td><td>335</td></tr>
## <tr><td style="text-align:left">drat</td><td>32</td><td>3.597</td><td>0.535</td><td>2.760</td><td>4.930</td></tr>
## <tr><td style="text-align:left">wt</td><td>32</td><td>3.217</td><td>0.978</td><td>1.513</td><td>5.424</td></tr>
## <tr><td style="text-align:left">qsec</td><td>32</td><td>17.849</td><td>1.787</td><td>14.500</td><td>22.900</td></tr>
## <tr><td style="text-align:left">vs</td><td>32</td><td>0.438</td><td>0.504</td><td>0</td><td>1</td></tr>
## <tr><td style="text-align:left">am</td><td>32</td><td>0.406</td><td>0.499</td><td>0</td><td>1</td></tr>
## <tr><td style="text-align:left">gear</td><td>32</td><td>3.688</td><td>0.738</td><td>3</td><td>5</td></tr>
## <tr><td style="text-align:left">carb</td><td>32</td><td>2.812</td><td>1.615</td><td>1</td><td>8</td></tr>
## <tr><td colspan="6" style="border-bottom: 1px solid black"></td></tr></table>

# Export to Word PDF

stargazer(model1,model2, type = "html", out = "regression_table.html")

## 
## <table style="text-align:center"><tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="2"><em>Dependent variable:</em></td></tr>
## <tr><td></td><td colspan="2" style="border-bottom: 1px solid black"></td></tr>
## <tr><td style="text-align:left"></td><td colspan="2">mpg</td></tr>
## <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td></tr>
## <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">cyl</td><td>-1.587<sup>**</sup></td><td>-1.227</td></tr>
## <tr><td style="text-align:left"></td><td>(0.712)</td><td>(0.797)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td></tr>
## <tr><td style="text-align:left">disp</td><td>-0.021<sup>*</sup></td><td>-0.019<sup>*</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.010)</td><td>(0.010)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td></tr>
## <tr><td style="text-align:left">hp</td><td></td><td>-0.015</td></tr>
## <tr><td style="text-align:left"></td><td></td><td>(0.015)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td></tr>
## <tr><td style="text-align:left">Constant</td><td>34.661<sup>***</sup></td><td>34.185<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(2.547)</td><td>(2.591)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td></tr>
## <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Observations</td><td>32</td><td>32</td></tr>
## <tr><td style="text-align:left">R<sup>2</sup></td><td>0.760</td><td>0.768</td></tr>
## <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.743</td><td>0.743</td></tr>
## <tr><td style="text-align:left">Residual Std. Error</td><td>3.055 (df = 29)</td><td>3.055 (df = 28)</td></tr>
## <tr><td style="text-align:left">F Statistic</td><td>45.808<sup>***</sup> (df = 2; 29)</td><td>30.877<sup>***</sup> (df = 3; 28)</td></tr>
## <tr><td colspan="3" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Note:</em></td><td colspan="2" style="text-align:right"><sup>*</sup>p<0.1; <sup>**</sup>p<0.05; <sup>***</sup>p<0.01</td></tr>
## </table>

if (nzchar(Sys.which("pandoc"))) system("pandoc -s regression_table.html -o regression_table.pdf") else message("pandoc not available in this runtime; skipping PDF conversion.")

Exercise: Wages

# Install necessary packages
#if (FALSE) install.packages("stargazer")
#if (FALSE) install.packages("Ecdat")     # Contains the Wages dataset

# Load required libraries
library(stargazer)
library(Ecdat)

# Load the wages dataset
data(Wages)

head(Wages)

Sample Statistics

Question: Why are we missing some variables in the table? How would you fix it?

# Create a summary table of sample statistics
stargazer(Wages, type = "text", title = "Sample Statistics for Wages Dataset",
          summary.stat = c("mean", "sd", "min", "max", "n"))

## 
## Sample Statistics for Wages Dataset
## ===========================================
## Statistic  Mean  St. Dev.  Min   Max    N  
## -------------------------------------------
## exp       19.854  10.966    1    51   4,165
## wks       46.812  5.129     5    52   4,165
## ind       0.395   0.489     0     1   4,165
## ed        12.845  2.788     4    17   4,165
## lwage     6.676   0.462   4.605 8.537 4,165
## -------------------------------------------

class(Wages$sex)

## [1] "factor"

Wages$sex <- as.numeric(Wages$sex)

class(Wages$sex)

## [1] "numeric"

Regression Tables

# Fit multiple regression models using the wages dataset
model1 <- lm(lwage ~ ed + exp, data = Wages)
model2 <- lm(lwage ~ ed + exp + sex, data = Wages)
model3 <- lm(lwage ~ ed + exp + sex + union, data = Wages)

# Create a basic regression table with multiple models quickly in console
stargazer(model1, model2, model3, type = "text")

## 
## =================================================================================================
##                                                  Dependent variable:                             
##                     -----------------------------------------------------------------------------
##                                                         lwage                                    
##                                (1)                       (2)                       (3)           
## -------------------------------------------------------------------------------------------------
## ed                          0.076***                  0.075***                  0.079***         
##                              (0.002)                   (0.002)                   (0.002)         
##                                                                                                  
## exp                         0.013***                  0.012***                  0.012***         
##                              (0.001)                   (0.001)                   (0.001)         
##                                                                                                  
## sex                                                   0.436***                  0.421***         
##                                                        (0.019)                   (0.019)         
##                                                                                                  
## unionyes                                                                        0.085***         
##                                                                                  (0.013)         
##                                                                                                  
## Constant                    5.436***                  4.652***                  4.597***         
##                              (0.034)                   (0.046)                   (0.047)         
##                                                                                                  
## -------------------------------------------------------------------------------------------------
## Observations                  4,165                     4,165                     4,165          
## R2                            0.247                     0.335                     0.342          
## Adjusted R2                   0.246                     0.335                     0.342          
## Residual Std. Error     0.401 (df = 4162)         0.376 (df = 4161)         0.374 (df = 4160)    
## F Statistic         681.552*** (df = 2; 4162) 698.837*** (df = 3; 4161) 541.007*** (df = 4; 4160)
## =================================================================================================
## Note:                                                                 *p<0.1; **p<0.05; ***p<0.01

# you can also add more details
stargazer(model1, model2, model3, type = "text",
          style = 'aer',
          title = "Basic Regression Results Using Wages Dataset",
          column.labels = c("Model 1", "Model 2", "Model 3"),
          dep.var.labels = "Hourly Wage",
          covariate.labels = c("Years of Education", "Years of Experience", "Gender (Male = 1)", "Union Membership (Yes = 1)"),
          notes = "Standard errors in parentheses")

## 
## Basic Regression Results Using Wages Dataset
## ========================================================================================================
##                                                             Hourly Wage                                 
##                                     Model 1                   Model 2                   Model 3         
##                                       (1)                       (2)                       (3)           
## --------------------------------------------------------------------------------------------------------
## Years of Education                 0.076***                  0.075***                  0.079***         
##                                     (0.002)                   (0.002)                   (0.002)         
##                                                                                                         
## Years of Experience                0.013***                  0.012***                  0.012***         
##                                     (0.001)                   (0.001)                   (0.001)         
##                                                                                                         
## Gender (Male = 1)                                            0.436***                  0.421***         
##                                                               (0.019)                   (0.019)         
##                                                                                                         
## Union Membership (Yes = 1)                                                             0.085***         
##                                                                                         (0.013)         
##                                                                                                         
## Constant                           5.436***                  4.652***                  4.597***         
##                                     (0.034)                   (0.046)                   (0.047)         
##                                                                                                         
## Observations                         4,165                     4,165                     4,165          
## R2                                   0.247                     0.335                     0.342          
## Adjusted R2                          0.246                     0.335                     0.342          
## Residual Std. Error            0.401 (df = 4162)         0.376 (df = 4161)         0.374 (df = 4160)    
## F Statistic                681.552*** (df = 2; 4162) 698.837*** (df = 3; 4161) 541.007*** (df = 4; 4160)
## --------------------------------------------------------------------------------------------------------
## Notes:                     ***Significant at the 1 percent level.                                       
##                            **Significant at the 5 percent level.                                        
##                            *Significant at the 10 percent level.                                        
##                            Standard errors in parentheses

stargazer(model1, model2, model3, type = "html",out = "3models.doc",
          style = "aer",
          title = "Basic Regression Results Using Wages Dataset",
          column.labels = c("Model 1", "Model 2", "Model 3"),
          dep.var.labels = "Hourly Wage",
          covariate.labels = c("Years of Education", "Years of Experience", "Gender (Male = 1)", "Union Membership (Yes = 1)"),
          notes = "Standard errors in parentheses")

## 
## <table style="text-align:center"><caption><strong>Basic Regression Results Using Wages Dataset</strong></caption>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"></td><td colspan="3">Hourly Wage</td></tr>
## <tr><td style="text-align:left"></td><td>Model 1</td><td>Model 2</td><td>Model 3</td></tr>
## <tr><td style="text-align:left"></td><td>(1)</td><td>(2)</td><td>(3)</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left">Years of Education</td><td>0.076<sup>***</sup></td><td>0.075<sup>***</sup></td><td>0.079<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.002)</td><td>(0.002)</td><td>(0.002)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">Years of Experience</td><td>0.013<sup>***</sup></td><td>0.012<sup>***</sup></td><td>0.012<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.001)</td><td>(0.001)</td><td>(0.001)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">Gender (Male = 1)</td><td></td><td>0.436<sup>***</sup></td><td>0.421<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td></td><td>(0.019)</td><td>(0.019)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">Union Membership (Yes = 1)</td><td></td><td></td><td>0.085<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td>(0.013)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">Constant</td><td>5.436<sup>***</sup></td><td>4.652<sup>***</sup></td><td>4.597<sup>***</sup></td></tr>
## <tr><td style="text-align:left"></td><td>(0.034)</td><td>(0.046)</td><td>(0.047)</td></tr>
## <tr><td style="text-align:left"></td><td></td><td></td><td></td></tr>
## <tr><td style="text-align:left">Observations</td><td>4,165</td><td>4,165</td><td>4,165</td></tr>
## <tr><td style="text-align:left">R<sup>2</sup></td><td>0.247</td><td>0.335</td><td>0.342</td></tr>
## <tr><td style="text-align:left">Adjusted R<sup>2</sup></td><td>0.246</td><td>0.335</td><td>0.342</td></tr>
## <tr><td style="text-align:left">Residual Std. Error</td><td>0.401 (df = 4162)</td><td>0.376 (df = 4161)</td><td>0.374 (df = 4160)</td></tr>
## <tr><td style="text-align:left">F Statistic</td><td>681.552<sup>***</sup> (df = 2; 4162)</td><td>698.837<sup>***</sup> (df = 3; 4161)</td><td>541.007<sup>***</sup> (df = 4; 4160)</td></tr>
## <tr><td colspan="4" style="border-bottom: 1px solid black"></td></tr><tr><td style="text-align:left"><em>Notes:</em></td><td colspan="3" style="text-align:left"><sup>***</sup>Significant at the 1 percent level.</td></tr>
## <tr><td style="text-align:left"></td><td colspan="3" style="text-align:left"><sup>**</sup>Significant at the 5 percent level.</td></tr>
## <tr><td style="text-align:left"></td><td colspan="3" style="text-align:left"><sup>*</sup>Significant at the 10 percent level.</td></tr>
## <tr><td style="text-align:left"></td><td colspan="3" style="text-align:left">Standard errors in parentheses</td></tr>
## </table>

Extra

Including Robust Standard Errors:

how to adjust standard errors for heteroskedasticity or clustered standard errors.

library(sandwich)
robust_se <- list(sqrt(diag(vcovHC(model1, type = "HC1"))),
                  sqrt(diag(vcovHC(model2, type = "HC1"))),
                  sqrt(diag(vcovHC(model3, type = "HC1"))))

stargazer(model1, model2, model3, type = "text", 
          se = robust_se, 
          title = "Regression with Robust Standard Errors")

## 
## Regression with Robust Standard Errors
## =================================================================================================
##                                                  Dependent variable:                             
##                     -----------------------------------------------------------------------------
##                                                         lwage                                    
##                                (1)                       (2)                       (3)           
## -------------------------------------------------------------------------------------------------
## ed                          0.076***                  0.075***                  0.079***         
##                              (0.002)                   (0.002)                   (0.002)         
##                                                                                                  
## exp                         0.013***                  0.012***                  0.012***         
##                              (0.001)                   (0.001)                   (0.001)         
##                                                                                                  
## sex                                                   0.436***                  0.421***         
##                                                        (0.018)                   (0.018)         
##                                                                                                  
## unionyes                                                                        0.085***         
##                                                                                  (0.012)         
##                                                                                                  
## Constant                    5.436***                  4.652***                  4.597***         
##                              (0.037)                   (0.046)                   (0.046)         
##                                                                                                  
## -------------------------------------------------------------------------------------------------
## Observations                  4,165                     4,165                     4,165          
## R2                            0.247                     0.335                     0.342          
## Adjusted R2                   0.246                     0.335                     0.342          
## Residual Std. Error     0.401 (df = 4162)         0.376 (df = 4161)         0.374 (df = 4160)    
## F Statistic         681.552*** (df = 2; 4162) 698.837*** (df = 3; 4161) 541.007*** (df = 4; 4160)
## =================================================================================================
## Note:                                                                 *p<0.1; **p<0.05; ***p<0.01

Why Robust Standard Errors?
In regression analysis, the standard errors of your coefficients are used to calculate test statistics and confidence intervals. However, standard errors assume that the error terms (residuals) are homoscedastic, meaning they have a constant variance. Heteroskedasticity: If this assumption is violated, the standard errors may be incorrect, leading to invalid inference. In such cases, the estimated coefficients are still unbiased, but the standard errors will be underestimated or overestimated, making your p-values and confidence intervals unreliable. Robust Standard Errors: Using robust standard errors corrects for heteroskedasticity by adjusting the standard errors to account for non-constant variance. This method ensures that hypothesis tests remain valid even when the homoscedasticity assumption is violated.

In regression analysis, the standard errors of your coefficients are used to calculate test statistics and confidence intervals. However, standard errors assume that the error terms (residuals) are homoscedastic, meaning they have a constant variance.

Heteroskedasticity: If this assumption is violated, the standard errors may be incorrect, leading to invalid inference. In such cases, the estimated coefficients are still unbiased, but the standard errors will be underestimated or overestimated, making your p-values and confidence intervals unreliable.
Robust Standard Errors: Using robust standard errors corrects for heteroskedasticity by adjusting the standard errors to account for non-constant variance. This method ensures that hypothesis tests remain valid even when the homoscedasticity assumption is violated.

Coefficient Plot

Install and Load the Package:

# Install the package if it's not already installed
if (FALSE) install.packages("coefplot")

# Load the package
library(coefplot)

Fit a Regression Model:

# Example regression model
model3 <- lm(lwage ~ ed + exp + sex + union, data = Wages)

Plot Coefficients:

# Create a coefficient plot
coefplot(model3,intercept=FALSE)

Customization:

# Customized coefficient plot
coefplot<-coefplot(model3,
         title = "Coefficient Plot with Customization",  # Add a title
         xlab = "Coefficient Estimates",                 # Label for x-axis
         ylab = "Variables",                             # Label for y-axis
         color = "black",
         intercept = FALSE,
         innerCI = 1.96,
         grid = FALSE)# Remove gridlines for a cleaner look