Interpreting multilevel linear models with repeated measures / time-series data in R

Question

Interpreting multilevel linear models with repeated measures / time-series data in R

47 views Asked by identic0n At 08 March 2024 at 02:41

I am working through the Andy Field textbook Discovering Statistics Using R and on p.899 came to a point where my R code was not returning the same results as shown in the book.

Specifically, the example deals with fitting a multilevel linear model to repeated measures data, and does so by slowly building up to the final model by creating simpler models one parameter at a time.

My issue stems from coercing the Time variable from a factor into a numeric, which is straightforward, but my solution involved imputing the Time integers as months (e.g., the data file uses "Satisfaction_6_Months" and "Satisfaction_12_Months") so I coded the four Time levels as c(0,6,12,18).

The text (actually the errata online) uses a different approach, coding the four Time levels as c(0,1,2,3). I would have expected that this difference between Time codes would have little impact on the data, but I am quite wrong.

The code below demonstrates how the final line in the anova (containing ARModel) is very different in these two situations.

I am trying to understand 1) why this difference is observed at all, 2) why it is only observed for this last point in the model building exercise, and 3) what it means in general for when I put this content into practice and might have good reason for adopting a month-wise coding structure.

First, run the full script to replicate the results. Then, uncomment the indicated line and run again.

library(reshape2)

data_url <- "https://studysites.sagepub.com/dsur/study/DSUR%20Data%20Files/Chapter%2019/Honeymoon%20Period.dat"

satisfactionData <- read.delim(data_url, header = TRUE)

restructuredData <- melt(satisfactionData, 
                         id = c("Person", "Gender"), 
                         measured = c("Satisfaction_Base", "Satisfaction_6_Months", "Satisfaction_12_Months", "Satisfaction_18_Months"))

names(restructuredData) <- c("Person", "Gender", "Time", "Life_Satisfaction")

restructuredData$Time<-as.numeric(restructuredData$Time)-1

# ***************************************************************
# On second pass, uncomment the line below and rerun whole script
# ***************************************************************
# restructuredData$Time = restructuredData$Time*6


# create models one parameter at a time
intercept <- gls(Life_Satisfaction~1, 
                 data = restructuredData, 
                 method = "ML", 
                 na.action = na.exclude)

randomIntercept <- lme(Life_Satisfaction ~1, 
                       data = restructuredData, 
                       random = ~1|Person, 
                       method = "ML",  
                       na.action = na.exclude, 
                       control = list(opt="optim"))

timeRI <- update(randomIntercept, .~. + Time)

timeRS <- update(timeRI, random = ~Time|Person)

ARModel <- update(timeRS, correlation = corAR1(0, form = ~Time|Person))

anova(intercept, randomIntercept, timeRI, timeRS, ARModel)

# Pass #1 - with Time as 0,1,2,3

#                 Model df      AIC  BIC         logLik   Test   L.Ratio p-value
# intercept           1  2 2064.053 2072.217 -1030.0263                         
# randomIntercept     2  3 1991.396 2003.642  -992.6978 1 vs 2  74.65704  <.0001
# timeRI              3  4 1871.728 1888.057  -931.8642 2 vs 3 121.66714  <.0001
# timeRS              4  6 1874.626 1899.120  -931.3131 3 vs 4   1.10224  0.5763
# ARModel             5  7 1872.891 1901.466  -929.4453 4 vs 5   3.73564  0.0533


# Pass #2 - with Time as 0,6,12,18

#                 Model df      AIC  BIC         logLik   Test   L.Ratio p-value
# intercept           1  2 2064.053 2072.217 -1030.0263                         
# randomIntercept     2  3 1991.396 2003.642  -992.6978 1 vs 2  74.65704  <.0001
# timeRI              3  4 1871.728 1888.057  -931.8642 2 vs 3 121.66714  <.0001
# timeRS              4  6 1874.627 1899.120  -931.3135 3 vs 4   1.10151  0.5765
# ARModel             5  7 1876.627 1905.203  -931.3135 4 vs 5   0.00001  0.9978

Original Q&A

There are 2 answers

**PBulls** · Answer 1 · 2024-03-08T11:43:23+00:00

This isn't really a programming question, but let's go.

The assumption behind a first-order autoregressive covariance is that each timepoint is correlated with the others via a single parameter (Phi in the output of nlme::lme), and this correlation regresses proportionally to the distance in time. Specifically, if the correlation between t and t+1 is Phi, the correlation between t and t+2 is Phi^2 (smaller, because Phi is almost always <1). This generalizes to cor(t, t+x) = Phi^x.

You can actually see this in action if you look at the (initial) correlation matrices of these two models. I'm not using the default Phi=0 here because then you'll end up with a matrix full of zeroes:

restructuredData$Time6 <- restructuredData$Time * 6

## correlation matrix with Time = 0, 1, 2, 3
cor1 <- nlme::corAR1(.8, form = ~Time|Person) |> nlme::Initialize(data=restructuredData)
nlme::corMatrix(cor1)[[1]]

#>      [,1] [,2] [,3] [,4]
#> [1,] 1.00 0.80 0.64 0.51
#> [2,] 0.80 1.00 0.80 0.64
#> [3,] 0.64 0.80 1.00 0.80
#> [4,] 0.51 0.64 0.80 1.00

## correlation matrix with Time = 0, 6, 12, 18
cor6 <- nlme::corAR1(.8, form = ~Time6|Person) |> nlme::Initialize(data=restructuredData)
nlme::corMatrix(cor6)[[1]]

#>      [,1] [,2] [,3] [,4]
#> [1,] 1.00 0.26 0.07 0.02
#> [2,] 0.26 1.00 0.26 0.07
#> [3,] 0.07 0.26 1.00 0.26
#> [4,] 0.02 0.07 0.26 1.00

Notice how, for the same Phi, increasing the time scale by 6 reduces the (assumed) correlation between otherwise identical observations by a power of 6, or 0.80^6 ~ 0.26. This is different from the effects that you added in previous steps, because there your fitted parameters and their variance will simply be six times larger (it's essentially a change in unit, the t/F statistics will remain identical).

The fitted Phi in the 0:3 scale is 0.215, so in the six-fold time scale that would correspond to 0.215^6 ~ 1e-4, very close to zero. Combined with the default initialized value of actually zero the model with the six-fold time fits a correlation of zero. This is why only the last step differs, and why those two models aren't different in the ANOVA in the six-fold scale.

I also want to highlight here that this is quite sensitive to the initialization value in corAR1: making it non-zero will allow the second model to fit a non-zero correlation, though it seems to be quite sensitive to the value you do specify. All of this might point towards the correlation actually being quite low and/or AR1 not being the most appropriate covariance structure.

Finally, note that this is also a result of how the correlation matrix was specified; another way would be to specify Phi as the correlation between two adjacent timepoints regardless of their distance (i.e. only assuming they are all equidistant which the original model basically did). You can do this by changing form = Time|Person to form = 1|Person. The fitting routine now assumes that your data was provided in order of time within group, and probably also that all groups contain an equal number of observations or at least also intermediate missing values. Each record within group then takes a next row/column in the correlation matrix, without taking into account how far apart they might be (other than the chronology which you set by sorting). In this way the two models will again be identical.

**identic0n** · Answer 2 · 2024-03-08T16:54:43+00:00

Following PBulls' advice, I slightly modified the code and now get a result nearly identical to the book results:

library(reshape2)

data_url <- "https://studysites.sagepub.com/dsur/study/DSUR%20Data%20Files/Chapter%2019/Honeymoon%20Period.dat"

satisfactionData <- read.delim(data_url, header = TRUE)

restructuredData <- melt(satisfactionData, 
                         id = c("Person", "Gender"), 
                         measured = c("Satisfaction_Base", "Satisfaction_6_Months", "Satisfaction_12_Months", "Satisfaction_18_Months"))

names(restructuredData) <- c("Person", "Gender", "Time", "Life_Satisfaction")

restructuredData$Time<-as.numeric(restructuredData$Time)-1
# ***************************************************************
# Leave my original c(0,6,12,18) codes in
# ***************************************************************
restructuredData$Time = restructuredData$Time*6


# create models one parameter at a time
intercept <- gls(Life_Satisfaction~1, 
                 data = restructuredData, 
                 method = "ML", 
                 na.action = na.exclude)

randomIntercept <- lme(Life_Satisfaction ~1, 
                       data = restructuredData, 
                       random = ~1|Person, 
                       method = "ML",  
                       na.action = na.exclude, 
                       control = list(opt="optim"))

timeRI <- update(randomIntercept, .~. + Time)

timeRS <- update(timeRI, random = ~Time|Person)

ARModel_corAR1 <- update(timeRS, correlation = corAR1(form = ~Time|Person))
anova(intercept, randomIntercept, timeRI, timeRS, ARModel_corAR1)

#                 Model df      AIC  BIC         logLik   Test   L.Ratio p-value
# intercept           1  2 2064.053 2072.217 -1030.0263                         
# randomIntercept     2  3 1991.396 2003.642  -992.6978 1 vs 2  74.65704  <.0001
# timeRI              3  4 1871.728 1888.057  -931.8642 2 vs 3 121.66714  <.0001
# timeRS              4  6 1874.627 1899.120  -931.3135 3 vs 4   1.10151  0.5765
# ARModel_corAR1      5  7 1876.627 1905.203  -931.3135 4 vs 5   0.00001  0.9978

ARModel_corCAR1 <- update(timeRS, correlation = corCAR1(form = ~Time|Person))
anova(intercept, randomIntercept, timeRI, timeRS, ARModel_corCAR1)

#                 Model df      AIC  BIC         logLik   Test   L.Ratio p-value
# intercept           1  2 2064.053 2072.217 -1030.0263                         
# randomIntercept     2  3 1991.396 2003.642  -992.6978 1 vs 2  74.65704  <.0001
# timeRI              3  4 1871.728 1888.057  -931.8642 2 vs 3 121.66714  <.0001
# timeRS              4  6 1874.627 1899.120  -931.3135 3 vs 4   1.10151  0.5765
# ARModel_corCAR1     5  7 1872.884 1901.459  -929.4419 4 vs 5   3.74319  0.0530

This last model corresponds very closely to the book, where L.Ratio = 3.736 and p = .0533.

Simply stated, my Time code was better modeled as continuous (CAR1) rather than discrete (AR1).

TechQA.

Interpreting multilevel linear models with repeated measures / time-series data in R

There are 2 answers

Related Questions in R

Related Questions in TIME-SERIES

Related Questions in MULTILEVEL-ANALYSIS

Popular Questions

Trending Questions