I am trying to use speedglm to achieve a faster GLM estimation than glm, but why it is even slower?
set.seed(0)
n=1e3
p=1e3
x=matrix(runif(n*p),nrow=n)
y=sample(0:1,n,replace = T)
ptm <- proc.time()
fit=glm(y~x,family=binomial())
print(proc.time() - ptm)
# user system elapsed
# 10.71 0.07 10.78
library(speedglm)
ptm <- proc.time()
fit=speedglm(y~x,family=binomial())
print(proc.time() - ptm)
# user system elapsed
# 15.11 0.12 15.25
The efficiency of
speedglmoverglm, is the way it reduces then * pmodel matrix to ap * pmatrix. However, if you haven = p, there is no effective reduction. What you really want to check, is then >> pcase.More insight form computational complexity, in each iteration of Fisher scoring.
glmusing QR factorization for an * pmatrix takes2np^2 - (2/3)p^3FLOP, whilespeedglmforming matrix cross product of an * pmatrix, followed by a QR factorization of ap * pmatrix, involvesnp^2 + (4/3)p^3FLOP. So asn >> p,speedglmonly has half the computation amount ofglm. Furthermore, the blocking, caching strategy used byspeedglmgives better use of computer hardware, giving high performance.If you have
n = p, you immediately see thatglmtakes(4/3)p^3FLOP, butspeedglmtakesp^3 + (4/3)p^3FLOP, more expensive! In fact, the matrix cross product becomes a shear overhead in this case!