Regression Coefficients with Interactions and Clustering

60 views Asked by At

I am running a linear regression model that has interaction terms and double clustering, and am having a hard time with getting the combined regression coefficients and 95% confidence intervals in R. I am wondering if anyone knows of an R package or another workaround to get this information.

Here is some dummy data and an example of my base model:

dat <- data.frame(a=rnorm(100, 0, 1),
                  b=as.factor(rbinom(100, 1, 0.2)),
                  c=as.factor(rbinom(100, 1, 0.6)),
                  y=rnorm(100, 0, 1),
                  clust1=as.factor(rbinom(100, 10, 0.1)),
                  clust2=as.factor(rbinom(100, 5, 0.5)))

mod <- lm(y ~ a*b*c, data=dat)

In my data, y and a are continuous while b and c are binary. I then account for the double clustering with:

library(lmtest)
library(sandwich)

coeftest(mod, vcov=vcovCL, cluster= ~clust1 + clust2)

Which gives me the estimates and standard errors. I need to know the combined regression parameters and 95% confidence intervals for all combinations of the binary predictors:

When b=0 and c=0; when b=1 and c=1; when b=0 and c=1; and when b=1 and c=0.

I have tried using lincom from the biostat3 package: https://www.rdocumentation.org/packages/biostat3/versions/0.1.9/topics/lincom, which is based on linearHypothesis from car: https://www.rdocumentation.org/packages/car/versions/3.1-2/topics/linearHypothesis. This example is for those with b=0 and c=1:

library(biostat3)

lincom(mod, vcov=vcovCL, cluster= ~clust1 + clust2, "a+c+a:c")

This seems promising, but there is not a ton of documentation on this command so I am not clear if it is doing exactly what I want.

Does anyone know of a way to do this? Or, can confirm that lincom from the biostat3 package is correct? Thank you in advance, and please let me know if I can expand on anything.

1

There are 1 answers

0
Vincent On

I’m not exactly sure what you mean by “the combined regression parameters”. In any case, you can almost certainly compute the quantities of interest using the marginaleffects package (disclaimer: I am the maintainer). There are over 30 chapters of detailed tutorials at this website: https://marginaleffects.com

The obvious place to begin is the “Get Started” page here: https://marginaleffects.com/vignettes/get_started.html

Data and model

options(width = 10000)
dat <- data.frame(a=rnorm(100, 0, 1),
                  b=as.factor(rbinom(100, 1, 0.2)),
                  c=as.factor(rbinom(100, 1, 0.6)),
                  y=rnorm(100, 0, 1),
                  clust1=as.factor(rbinom(100, 10, 0.1)),
                  clust2=as.factor(rbinom(100, 5, 0.5)))

mod <- lm(y ~ a*b*c, data=dat)

Predictions

You can get fitted values for every combination of the predictors in your model. In marginaleffects, we call this a “prediction”:

https://marginaleffects.com/vignettes/predictions.html

library(marginaleffects)

predictions(
    mod,
    newdata = datagrid(
        a = mean,
        b = unique,
        c = unique)
)
# 
#       a b c Estimate Std. Error       z Pr(>|z|)   S  2.5 % 97.5 %
#  0.0194 0 0 -0.18668      0.157 -1.1863    0.235 2.1 -0.495  0.122
#  0.0194 0 1 -0.00692      0.138 -0.0501    0.960 0.1 -0.278  0.264
#  0.0194 1 0  0.02691      0.415  0.0648    0.948 0.1 -0.787  0.841
#  0.0194 1 1 -0.15562      0.223 -0.6985    0.485 1.0 -0.592  0.281
# 
# Columns: rowid, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, y, a, b, c 
# Type:  response

Comparisons

You can measure the “effect” of changing a by 1 unit, holding constant other predictors at every one of their combinations. In the marginaleffects package, we call this a “comparison”:

https://marginaleffects.com/vignettes/comparisons.html

comparisons(
    mod,
    variables = "a",
    newdata = datagrid(
        b = unique,
        c = unique)
)
# 
#  Term Contrast b c Estimate Std. Error      z Pr(>|z|)   S  2.5 % 97.5 %      a
#     a       +1 0 0  -0.1190      0.153 -0.777    0.437 1.2 -0.419  0.181 0.0194
#     a       +1 0 1  -0.0413      0.114 -0.361    0.718 0.5 -0.265  0.183 0.0194
#     a       +1 1 0   0.1566      0.440  0.356    0.722 0.5 -0.705  1.018 0.0194
#     a       +1 1 1   0.0315      0.219  0.144    0.886 0.2 -0.398  0.461 0.0194
# 
# Columns: rowid, term, contrast, estimate, std.error, statistic, p.value, s.value, conf.low, conf.high, b, c, predicted_lo, predicted_hi, predicted, y, a 
# Type:  response