Phylogenetic comparative binomial models - What's the best tool to build them?

48 views Asked by At

I have species-specific disease status for hundreds of species. Some species have 10, and others have 100 individuals sampled, all having disease status of positive or negative (1/0). I want to build phylogenetic models in R, to explore what predicts species-specific disease risk and I'm a bit lost on what to do. I found the following tools:

  1. compar.gee:
library(ape)   
model1 <- compar.gee(cbind(positive, negative) ~ pred1 + pred2,
                     phy=phylogeny, family="binomial", data=data)
  • this seems to be very easy to work with, but the documentation is scarce. E.g. I don't see if phylogenetic dependence can be estimated using this model (e.g. similarly how lambda is estimated in the case of gls models).
  • Second, I do not understand why are there so few articles using this tool. Is there a reason to avoid it? Are type-I or type-II errors too high? Is the model too rigid to incorporate phylogenetic dependence?
  1. GLS regression with logit transformed proportions
library(nlme)    
model2 <- gls(logit(proportion) ~ pred1 + pred2,
              cor=corPagel(1,phylogeny,form=~species), data=data)
  • problematic, as logit(0) is undefined and I have taxa where no positive cases were found. I know car::logit(x) has a way around this, but I'm not a big fan of this as the transformed values will change depending on the range of the original values.
  1. MCMCglmm
library(MCMCglmm)      
inv.phylo<-inverseA(phylogeny,nodes="TIPS",scale=TRUE)   
model3 <- MCMCglmm(cbind(positive, negative) ~ pred1 + pred2,
                            random=~phylo+species,
                            family = cbind("binomial"),
                            ginverse=list(phylo=inv.phylo$Ainv),
                            prior = list(G=list(G1=list(V=1,nu=0.02),
                                         G2=list(V=1,nu=0.02)),
                                         R=list(V=1,nu=0.02)),
                            data = data,
                            nitt = 10000, thin = 5, burnin = 1000)
  • Can be problematic due to the difficulty of visualizing the results, time-consuming computations, especially if multiple priors are tested, and potential problems with convergence.

Which tool (OR OTHER) would be best? Which of these should be avoided (if any)?

Thanks a lot for reading this and thinking about the answer!

Cheers,
Jasmine

0

There are 0 answers