model.matrix for full model

205 views Asked by At

I'm modeling "full linear model" (main effects, interactions of the second order and quadratic terms) in R but wondering on how to create all of these possible combinations in the simplest way, e.g., by model.matrix. For example, model with main effects and all possible interactions of the second order can be done such as

M1<-matrix(rnorm(36),nrow=6)
colnames(M1)<-LETTERS[1:6]

X<-as.data.frame(M1[,1:5])
Y<-M1[,6]

f <- as.formula( ~ .^2)
MM<-model.matrix(f, X)[,-1]

lin_model<-lm(Y ~ MM)

Is there any simple way how to adjust f formula in the code above to be able to add quadratic terms into formula? E.g. if I have >10 feature I'd like to avoid explicit way of doing that.

1

There are 1 answers

5
jay.sf On BEST ANSWER

Maybe you are looking for this.

set.seed(42)
M1 <- matrix(rnorm(36), ncol=6, dimnames=list(NULL, LETTERS[1:6]))

X <- as.data.frame(M1[,1:5])
Y <- M1[,6]

(f <- as.formula(paste0(' ~ .^2 + ', paste(sapply(colnames(M1)[-6], \(x) paste0('I(', x, '^2)')), collapse=' + '))))
# ~.^2 + I(A^2) + I(B^2) + I(C^2) + I(D^2) + I(E^2)

(MM <- model.matrix(f, X)[,-1])
#            A          B          C          D           E     I(A^2)     I(B^2)     I(C^2)    I(D^2)      I(E^2)         A:B
# 1  1.3709584  1.51152200 -1.3888607 -2.4404669  1.8951935 1.87952706 2.284698749 1.92893405 5.95587883 3.59175826  2.07223385
# 2 -0.5646982 -0.09465904 -0.2787888  1.3201133 -0.4304691 0.31888402 0.008960334 0.07772318 1.74269925 0.18530367  0.05345379
# 3  0.3631284  2.01842371 -0.1333213 -0.3066386 -0.2572694 0.13186224 4.074034289 0.01777458 0.09402723 0.06618754  0.73294700
# 4  0.6328626 -0.06271410  0.6359504 -1.7813084 -1.7631631 0.40051508 0.003933058 0.40443291 3.17305974 3.10874406 -0.03968941
# 5  0.4042683  1.30486965 -0.2842529 -0.1719174  0.4600974 0.16343288 1.702684815 0.08079972 0.02955558 0.21168958  0.52751747
# 6 -0.1061245  2.28664539 -2.6564554  1.2146747 -0.6399949 0.01126241 5.228747152 7.05675540 1.47543462 0.40959344 -0.24266914#           A:C        A:D         A:E        B:C        B:D         B:E         C:D         C:E         D:E
#           A:C         A:D         A:E         B:C        B:D         B:E         C:D        C:E         D:E
# 1 -1.90407031 -3.34577875  2.59823148 -2.09929350 -3.6888194  2.86462661  3.38946861 -2.6321597 -4.62515697
# 2  0.15743151 -0.74546559  0.24308513  0.02638988 -0.1249607  0.04074779 -0.36803277  0.1200100 -0.56826805
# 3 -0.04841277 -0.11134919 -0.09342182 -0.26909895 -0.6189266 -0.51927862  0.04088147  0.0342995  0.07888872
# 4  0.40246923 -1.12732350 -1.11583998 -0.03988306  0.1117132  0.11057518 -1.13282381 -1.1212843  3.14073727
# 5 -0.11491445 -0.06950074  0.18600279 -0.37091301 -0.2243297  0.60036708  0.04886801 -0.1307840 -0.07909872
# 6  0.28191505 -0.12890676  0.06791915 -6.07437155  2.7775303 -1.46344133 -3.22672919  1.7001179 -0.77738558

This might look more appealing, but as noted by @onyambu actually it's the same what you get with poly.

PM <- poly(as.matrix(X), degree=2, raw=TRUE)


## show column key of PM, corresponding to MM
colnames(MM)[match(PM[1, ], MM[1, ])]
# [1] "A"      "I(A^2)" "B"      "A:B"    "I(B^2)" "C"      "A:C"    "B:C"    "I(C^2)" "D"      "A:D"   
# [12] "B:D"    "C:D"    "I(D^2)" "E"      "A:E"    "B:E"    "C:E"    "D:E"    "I(E^2)"