Plotting a structural topic model - how to allow for discontinuity over time

132 views Asked by At

I am running a structural topic model using the stm package in R. My model includes an interaction effect between faction_id and numeric_date (a measure of time). I am using the following code to first estimate and then plot proportion of topic 9 over time for both factions present in the data, (see plot).

# Model fit
model.fac.dat.int <- stm(
  documents = docs,
  vocab = vocab,
  K = 20,
  prevalence = ~ faction_id * s(numeric_date),
  max.em.its = 75,
  data = meta,
  reportevery = 50,
  verbose = TRUE,
  init.type = "Spectral"
)

# Estimate effect for topic 9
est.fac.dat.int <-
  estimateEffect(
    formula = c(9) ~ faction_id * numeric_date,
    stmobj = model.fac.dat.int,
    metadata = meta,
    uncertainty = "None"
  )

# Plot for faction_id = Greens
plot(
              est.fac.dat.int,
              covariate = "numeric_date",
              model = model.fac.dat.int,
              method = "continuous",
              moderator = "faction_id",
              moderator.value = "Greens",
              linecol = "green",
              xlab = "Time",
              ylim = c(0, 0.1),
              printlegend = F
            )

# Add plot for faction_id = CDU
plot(
  est.fac.dat.int,
  covariate = "numeric_date",
  model = model.fac.dat.int,
  method = "continuous",
  moderator = "faction_id",
  moderator.value = "CDU",
  linecol = "red",
  add = T,
  printlegend = F
)

In a next step, I would like to allow for a break in the linear plots at numeric_date = 7000 (date of elections). I have theoretical reasons to believe the plot lines shift to a lower level after the cutoff point, and believe the current plot may hide this effect. So essentially, I would like to create an RDD-like plot.

I am not sure how to go about this, as the stm package does not specifically provide a function for this scenario. I have also considered using the rdd package, but I do not know how to combine it with my stm setup.

Would it make more sense to simply estimate the effect for numeric_date < 7000 and > 7000 separately and then add the corresponding two plots together?

Thank you, and feel free to ask if you need me to explain more.

0

There are 0 answers