In the provided article, the author proposes a method for assessing the relative importance of predictors in a regression model. One frequently employed technique involves standardizing the variables, achieved by subtracting the mean and dividing by the standard deviation. However, it's important to note that this approach is applicable solely to numeric predictors due to its mathematical nature. But what about non-numeric predictors, such as categorical ones? How can we standardize them?
model (regular regression); model2 (regression with standardized continuous variables)
data(mtcars)
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- as.factor(mtcars$am)
mtcars$gear <- as.factor(mtcars$gear)
mtcars$carb <- as.factor(mtcars$carb)
model <- lm(mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb, mtcars)
summary(model)
----
**Call:
lm(formula = mpg ~ cyl + disp + hp + drat + wt + qsec + vs +
am + gear + carb, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.5087 -1.3584 -0.0948 0.7745 4.6251
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.87913 20.06582 1.190 0.2525
cyl6 -2.64870 3.04089 -0.871 0.3975
cyl8 -0.33616 7.15954 -0.047 0.9632
disp 0.03555 0.03190 1.114 0.2827
hp -0.07051 0.03943 -1.788 0.0939 .
drat 1.18283 2.48348 0.476 0.6407
wt -4.52978 2.53875 -1.784 0.0946 .
qsec 0.36784 0.93540 0.393 0.6997
vs1 1.93085 2.87126 0.672 0.5115
am1 1.21212 3.21355 0.377 0.7113
gear4 1.11435 3.79952 0.293 0.7733
gear5 2.52840 3.73636 0.677 0.5089
carb2 -0.97935 2.31797 -0.423 0.6787
carb3 2.99964 4.29355 0.699 0.4955
carb4 1.09142 4.44962 0.245 0.8096
carb6 4.47757 6.38406 0.701 0.4938
carb8 7.25041 8.36057 0.867 0.3995
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.833 on 15 degrees of freedom
Multiple R-squared: 0.8931, Adjusted R-squared: 0.779
F-statistic: 7.83 on 16 and 15 DF, p-value: 0.000124**
----
model2 <- lm(mpg ~ cyl + scale(disp) + scale(hp) + scale(drat) + scale(wt) + scale(qsec) + vs + am + gear + carb, mtcars)
summary(model2)
----
Call:
lm(formula = mpg ~ cyl + scale(disp) + scale(hp) + scale(drat) +
scale(wt) + scale(qsec) + vs + am + gear + carb, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.5087 -1.3584 -0.0948 0.7745 4.6251
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.9842 5.3241 3.378 0.00414 **
cyl6 -2.6487 3.0409 -0.871 0.39747
cyl8 -0.3362 7.1595 -0.047 0.96317
scale(disp) 4.4056 3.9535 1.114 0.28267
scale(hp) -4.8342 2.7031 -1.788 0.09393 .
scale(drat) 0.6324 1.3279 0.476 0.64074
scale(wt) -4.4322 2.4841 -1.784 0.09462 .
scale(qsec) 0.6573 1.6715 0.393 0.69967
vs1 1.9309 2.8713 0.672 0.51151
am1 1.2121 3.2135 0.377 0.71132
gear4 1.1144 3.7995 0.293 0.77332
gear5 2.5284 3.7364 0.677 0.50890
carb2 -0.9794 2.3180 -0.423 0.67865
carb3 2.9996 4.2935 0.699 0.49547
carb4 1.0914 4.4496 0.245 0.80956
carb6 4.4776 6.3841 0.701 0.49381
carb8 7.2504 8.3606 0.867 0.39948
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.833 on 15 degrees of freedom
Multiple R-squared: 0.8931, Adjusted R-squared: 0.779
F-statistic: 7.83 on 16 and 15 DF, p-value: 0.000124
----
Based on model2 output, the order of importance for the continuous predictors is: disp > qsec > drat > wt > hp
Is this interpretation correct?
How do we assess the order of importance for the categorical predictors, i.e. cyl, vs, am, gear, carb?
It would help me immensely if you could provide, in addition to your explanation, an illustrative example using R programming.