Is there a way for TraMineR to find the first time leaving a state for each observation?

95 views Asked by At

I am analysing data on the UK labour market and have my data in wide form, with each column corresponding to a particular month, and each row corresponding to a particular individual.

Is there a way to determine the first month in which each individual left full-time education (one of my states), and create a new variable containing this information for each individual?

My columns are simply: ID, month 1, month 2 etc. Full-time education is represented by "1".

Any ideas on whether this is possible?

2

There are 2 answers

0
maraab On

To my knowledge, there is no such option in TraMineR (but I might be wrong). However, you can identify the first occurrence of a state with TraMineR::seqfpos. If all of your sequences start with full-time education, you could identify the first occurrence of every other state of the alphabet using TraMineR::seqfpos and then identify the first transition among those.

In the example below, we have indeed example data in which all sequences start in state "0". Hence, we could use the `seqfpos' approach (see bottom).

In instances, in which not all cases start with the same state, we have to identify transitions more explicitly. This is the first approach shown in the code below.

Note, I tend to use tidyverse functions for data management. I am pretty sure, there are other/better solutions to your problem, but the solution works.

library(TraMineR)
#> 
#> TraMineR stable version 2.2-4 (Built: 2022-06-27)
#> Website: http://traminer.unige.ch
#> Please type 'citation("TraMineR")' for citation information.
library(tidyverse)

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# Example data from TraMineR

## biofam data set
data(biofam)
## We use only a sample of 300 cases
set.seed(10)
biofam <- biofam[sample(nrow(biofam),300),]
biofam.lab <- c("Parent", "Left", "Married", "Left+Marr",
                "Child", "Left+Child", "Left+Marr+Child", "Divorced")
biofam.seq <- seqdef(biofam, 10:25, labels=biofam.lab)

# for our purpose 10 cases are enough
# note that we generate an id variable based on the rownumber which
# we also will use for joining the new variable later 
biofam <- rowid_to_column(biofam[1:10, ], var = "id") 
biofam.seq <- biofam.seq[1:10, ]


# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# Reshape sequence data into lon format with pivot longer and identify
# transitions out of state "0" by id. Save position (time) when this tranistion
# is observed; only keep cases with such tranistions

biofam2 <- biofam.seq |> 
  rowid_to_column(var = "id") |> 
  pivot_longer(-id) |> 
  group_by(id) |>
  mutate(first_trans = ifelse(lag(value) == "0" & value != "0", 
                              row_number(), 
                              NA)) |>
  filter(!is.na(first_trans))

# inspect outcome
biofam2
#> # A tibble: 7 × 4
#> # Groups:   id [7]
#>      id name  value first_trans
#>   <int> <chr> <fct>       <int>
#> 1     1 a23   2               9
#> 2     2 a16   1               2
#> 3     4 a28   3              14
#> 4     5 a23   3               9
#> 5     7 a21   2               7
#> 6     9 a23   1               9
#> 7    10 a17   1               3

# continue: only keep each cases first tranisition (minumum)
# and merge this information to our original data (using our id indicator)
biofam2 <- biofam2 |> 
  summarize(first_trans = min(first_trans)) |> 
  right_join(biofam, by = "id") |> 
  arrange(id)

# inspect again (just a slection of variables)
biofam2 |> 
  select(1:2, sex, starts_with("a"))
#> # A tibble: 10 × 19
#>       id first_trans sex     a15   a16   a17   a18   a19   a20   a21   a22   a23
#>    <int>       <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     1           9 woman     0     0     0     0     0     0     0     0     2
#>  2     2           2 man       0     1     1     1     1     1     1     1     6
#>  3     3          NA man       0     0     0     0     0     0     0     0     0
#>  4     4          14 man       0     0     0     0     0     0     0     0     0
#>  5     5           9 man       0     0     0     0     0     0     0     0     3
#>  6     6          NA man       0     0     0     0     0     0     0     0     0
#>  7     7           7 man       0     0     0     0     0     0     2     2     2
#>  8     8          NA woman     0     0     0     0     0     0     0     0     0
#>  9     9           9 man       0     0     0     0     0     0     0     0     1
#> 10    10           3 woman     0     0     1     1     1     1     1     1     1
#> # … with 7 more variables: a24 <dbl>, a25 <dbl>, a26 <dbl>, a27 <dbl>,
#> #   a28 <dbl>, a29 <dbl>, a30 <dbl>

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# Alternative approach using TraMineR::seqfpos
# identify first occurence for each state except state "0"
# then identify the first of all those tranistions (min)
# merge this information to original data

biofam3 <- map_dfc(1:7, ~seqfpos(biofam.seq, .x)) |>
  rowid_to_column(var = "id") |> 
  rowwise() |>  
  mutate(first_trans = min(c_across(-id), na.rm = T)) |> 
  ungroup() |> 
  filter(is.finite(first_trans)) |> 
  select(id, first_trans) |> 
  right_join(biofam, by = "id") |> 
  arrange(id)

# inspect again (just a slection of variables)
biofam3 |> 
  select(1:2, sex, starts_with("a"))
#> # A tibble: 10 × 19
#>       id first_trans sex     a15   a16   a17   a18   a19   a20   a21   a22   a23
#>    <int>       <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     1           9 woman     0     0     0     0     0     0     0     0     2
#>  2     2           2 man       0     1     1     1     1     1     1     1     6
#>  3     3          NA man       0     0     0     0     0     0     0     0     0
#>  4     4          14 man       0     0     0     0     0     0     0     0     0
#>  5     5           9 man       0     0     0     0     0     0     0     0     3
#>  6     6          NA man       0     0     0     0     0     0     0     0     0
#>  7     7           7 man       0     0     0     0     0     0     2     2     2
#>  8     8          NA woman     0     0     0     0     0     0     0     0     0
#>  9     9           9 man       0     0     0     0     0     0     0     0     1
#> 10    10           3 woman     0     0     1     1     1     1     1     1     1
#> # … with 7 more variables: a24 <dbl>, a25 <dbl>, a26 <dbl>, a27 <dbl>,
#> #   a28 <dbl>, a29 <dbl>, a30 <dbl>

# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2
Gilbert On

You can get the time (position in the sequence) of the end of the first spell in a given state by summing the duration of the spells until this spell. You get the duration with seqdur and the position of the first spell in a given state by applying seqfpos to the sequence of distinct successive states (DSS), which you get with seqdss.

I illustrate using the first 10 sequences of the mvad data and looking for when student first left further education (FE). Sequences from Sept 93 to Aug 99 are in columns 17 to 86.

library(TraMineR)
data(mvad)
mvad.seq <- seqdef(mvad[,17:86])
m.seq <- seqdef(mvad[1:10,17:86])
seqiplot(m.seq)

[First 10 mvad sequences 1

We define a function that extracts the DSS, the position of the first spell in the selected state, the spell lengths, computes the cumulated spell lengths, and returns (for each sequence) the end position of first spell in the selected state. We assign time 0, when there is no spell in the given state.

tfirst.left <- function(seqdata,state){
  s.dss <- seqdss(seqdata)
  pos <- seqfpos(s.dss,state)
  s.dur <- seqdur(seqdata)
  s.cumdur <- t(apply(s.dur,1,cumsum))
  tl <- vector(length=nrow(s.dur))
  for (i in 1:nrow(s.dur)){
    tl[i] <- ifelse(is.na(pos[i]),0,s.cumdur[i,pos[i]])
  }
  return(tl)
}

Now, applying the function to our set of sequences m.seq

tfirst.left(m.seq, state="FE")

## [1]  0 36 58  0 25  0 30 22  0  0