I have an lm model in R that I have trained and serialized. Inside a function, where I pass as input the model and a feature vector (one single array), I have:
CREATE OR REPLACE FUNCTION lm_predict(
feat_vec float[],
model bytea
)
RETURNS float
AS
$$
#R-code goes here.
mdl <- unserialize(model)
# class(feat_vec) outputs "array"
y_hat <- predict.lm(mdl, newdata = as.data.frame.list(feat_vec))
return (y_hat)
$$ LANGUAGE 'plr';
This returns the wrong y_hat!! I know this because this other solution works (the inputs to this function are still the model (in a bytearray) and one feat_vec (array)):
CREATE OR REPLACE FUNCTION lm_predict(
feat_vec float[],
model bytea
)
RETURNS float
AS
$$
#R-code goes here.
mdl <- unserialize(model)
coef = mdl$coefficients
y_hat = coef[1] + as.numeric(coef[-1]%*%feat_vec)
return (y_hat)
$$ LANGUAGE 'plr';
What am I doing wrong?? It is the same unserialized model, the first option should give me the right answer as well...
The problem seems to be the use of
newdata = as.data.frame.list(feat_vec). As discussed in your previous question, this returns ugly column names. While when you callpredict,newdatamust have column names consistent with covariates names in your model formula. You should get some warning message when you callpredict.What you need is
This is the same as what you can compute manually: