If I view a logistic regression model from SparkR::spark.logit() ...
library(SparkR)
mtcars_sdf <- createDataFrame(mtcars)
model <- spark.logit(mtcars_sdf, am ~ wt, family = 'binomial')
summary(model)
The summary of the model will only display the coefficients
$coefficients
Estimate
(Intercept) 12.04037
wt -4.02397
How can I view the additional model statistics like standard errors and p-values as I do with glm()?
base_glm <- glm(am ~ wt, family = 'binomial', mtcars)
summary(base_glm)
I know that I can fit a glm() to my spark data frame, but the model fitting speed is 3x longer for glm() using a spark data frame vs spark.logit() using a spark data frame.
model_s <- glm(am ~ wt, family = 'binomial', mtcars_sdf)
summary(model_s)
Deviance Residuals:
(Note: These are approximate quantiles with relative error <= 0.01)
Min 1Q Median 3Q Max
-1.70604 0.72898 0.72898 0.83613 0.83613
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.6851e+00 2.42620590 -1.1067 0.26841
recipe_id 7.6825e-05 0.00004985 1.5411 0.12329
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 520.25 on 499 degrees of freedom
Residual deviance: 568.59 on 498 degrees of freedom
AIC: 572.6
Number of Fisher Scoring iterations: 10