Template Lubridate script for duplicating rows for a given start/end date

47 views Asked by At

I have a dataset in Spotfire where each row contains a TASK_START and TASK_END column, as well as several other columns containing values.

As an example input, something like this:

TASK_START TASK_END USER PROJECT
01-JAN-24 01-MAR-24 name1 project1
01-JAN-24 01-APR-24 name1 project2
01-FEB-24 01-APR-24 name2 project3

Where I would like to see an output like this:

MONTH USER PROJECT
JAN-24 name1 project1
FEB-24 name1 project1
JAN-24 name1 project2
FEB-24 name1 project2
MAR-24 name1 project2
FEB-24 name2 project3
MAR-24 name2 project3

I'm considering doing this in the data source using SQL but the person responsible for the source data is reluctant to create the view I want, so am instead looking at doing the transformation with Spotfire. All my research points to R and Lubridate, but I haven't been able to find an example doing quite what I want, and since I'm new to R haven't managed to adapt any of the examples I've found.

I'd be really grateful if anyone could point me in the right direction.

Edit: This is the resource I've been looking at in the Tibco documentation: https://support.tibco.com/s/article/How-to-create-a-continuous-date-range-given-start-and-end-dates-in-TIBCO-Spotfire-with-Data-Functions but this seems to assume the end date is the start of the next block, whereas my dataset as lots of overlapping timespans.

Edit after trying the suggestions from sonshine:

Using this code, where i have added in a couple more columns and used my real column names

library(lubridate)
library(data.table)
# cast to data.table
dat <- data.table::as.data.table(dat)
# cast long
dat <- data.table::melt(
    dat,
    id.vars = c("UC_USER", "PROJECT_ID", "FTE_UC", "FTE_DEMAND"),
    variable.name = "TYPE",
    value.name = "MONTH"
)
# parse time (dmy)
dat[, MONTH := lubridate::dmy(MONTH)]
# if task-end, then -1 month
date[TYPE == "TASK_END", MONTH := MONTH - months(1)]
# expand month if gap
dat <- dat[,
    .(MONTH = seq(min(MONTH), max(MONTH), by = "1 month")),
    by = .(UC_USER, PROJECT_ID, FTE_UC, FTE_DEMAND)
]
# format as month-year
dat[, MONTH := format(MONTH, "%b-%y")]

# set month as first column
data.table::setcolorder(dat, "MONTH")

Spotfire then prompts me to Edit Parameters and define what the input is, I have options for Input handler of Columns, Expression or None. If I choose columns I can select the columns from my source data table. I can then specify I want the output to be a new data table and give it a name.

After correcting the typo in my previous edit, I get this error:

TIBCO Enterprise Runtime for R returned an error

The data function 'expandDateTable (8) (expandDateTable)' could not be executed.

Error in withCallingHandlers({
    library(lubridate)
    library(dat : object 'TYPE' not found
    eval(script, envir = .GlobalEnv)
    eval(script, envir = .GlobalEnv)
    withCallingHandlers({

The columns UC_USER and PROJECT_ID are strings, FTE_UC and FTE_DEMAND are numbers and TASK_START and TASK_END are dates.

1

There are 1 answers

2
sonshine On

Welcome to SO! I'm not sure I follow on the logic exactly but this would be my attempt. Oh and always nice to give data to folks for reproducibility using dput():

# data
dat <- structure(
    list(
        TASK_START = c("01-JAN-24", "01-JAN-24", "01-FEB-24"),
        TASK_END = c("01-MAR-24", "01-APR-24", "01-APR-24"),
        USER = c("name1", "name1", "name2"),
        PROJECT = c("project1", "project2", "project3")
    ),
    row.names = c(NA, -3L), class = "data.frame"
)

I prefer using data.table for data manipulation but there's always a dplyr solution. First I cast to a data.table. You'll need both lubridate and data.table for this solution.

library(lubridate)
library(data.table)
# cast to data.table
dat <- data.table::as.data.table(dat)

Then I cast into a long format and order by user and project:

# cast long
dat <- data.table::melt(
    dat,
    id.vars = c("USER", "PROJECT"),
    variable.name = "TYPE",
    value.name = "MONTH"
)

# arrange by user, project
data.table::setorder(dat, USER, PROJECT)
> dat
     USER  PROJECT       TYPE     MONTH
   <char>   <char>     <fctr>    <char>
1:  name1 project1 TASK_START 01-JAN-24
2:  name1 project1   TASK_END 01-MAR-24
3:  name1 project2 TASK_START 01-JAN-24
4:  name1 project2   TASK_END 01-APR-24
5:  name2 project3 TASK_START 01-FEB-24
6:  name2 project3   TASK_END 01-APR-24

From there, I parse the date time with lubridate:

# parse time (dmy)
dat[, MONTH := lubridate::dmy(MONTH)]

This was where I got a bit confused - in your desired output it looks like you want TASK_END - 1 month so that's what I did here.

# if task-end, then -1 month
dat[TYPE == "TASK_END", MONTH := MONTH - months(1)]
> dat
     USER  PROJECT       TYPE      MONTH
   <char>   <char>     <fctr>     <Date>
1:  name1 project1 TASK_START 2024-01-01
2:  name1 project1   TASK_END 2024-02-01
3:  name1 project2 TASK_START 2024-01-01
4:  name1 project2   TASK_END 2024-03-01
5:  name2 project3 TASK_START 2024-02-01
6:  name2 project3   TASK_END 2024-03-01

Next I sequence between the min month and max month by user and project which will fill in the month if a task took more than 1 month to complete:

# expand month if gap
dat <- dat[,
    .(MONTH = seq(min(MONTH), max(MONTH), by = "1 month")),
    by = .(USER, PROJECT)
]
> dat
     USER  PROJECT      MONTH
   <char>   <char>     <Date>
1:  name1 project1 2024-01-01
2:  name1 project1 2024-02-01
3:  name1 project2 2024-01-01
4:  name1 project2 2024-02-01
5:  name1 project2 2024-03-01
6:  name2 project3 2024-02-01
7:  name2 project3 2024-03-01

Finally, I format as month year (I reference this article for date formatting constantly) and set MONTH as the first column:

# format as month-year
dat[, MONTH := format(MONTH, "%b-%y")]

# set month as first column
data.table::setcolorder(dat, "MONTH")

And we're left with:

> dat
    MONTH   USER  PROJECT
   <char> <char>   <char>
1: Jan-24  name1 project1
2: Feb-24  name1 project1
3: Jan-24  name1 project2
4: Feb-24  name1 project2
5: Mar-24  name1 project2
6: Feb-24  name2 project3
7: Mar-24  name2 project3