I have a dataset in Spotfire where each row contains a TASK_START and TASK_END column, as well as several other columns containing values.
As an example input, something like this:
| TASK_START | TASK_END | USER | PROJECT |
|---|---|---|---|
| 01-JAN-24 | 01-MAR-24 | name1 | project1 |
| 01-JAN-24 | 01-APR-24 | name1 | project2 |
| 01-FEB-24 | 01-APR-24 | name2 | project3 |
Where I would like to see an output like this:
| MONTH | USER | PROJECT |
|---|---|---|
| JAN-24 | name1 | project1 |
| FEB-24 | name1 | project1 |
| JAN-24 | name1 | project2 |
| FEB-24 | name1 | project2 |
| MAR-24 | name1 | project2 |
| FEB-24 | name2 | project3 |
| MAR-24 | name2 | project3 |
I'm considering doing this in the data source using SQL but the person responsible for the source data is reluctant to create the view I want, so am instead looking at doing the transformation with Spotfire. All my research points to R and Lubridate, but I haven't been able to find an example doing quite what I want, and since I'm new to R haven't managed to adapt any of the examples I've found.
I'd be really grateful if anyone could point me in the right direction.
Edit: This is the resource I've been looking at in the Tibco documentation: https://support.tibco.com/s/article/How-to-create-a-continuous-date-range-given-start-and-end-dates-in-TIBCO-Spotfire-with-Data-Functions but this seems to assume the end date is the start of the next block, whereas my dataset as lots of overlapping timespans.
Edit after trying the suggestions from sonshine:
Using this code, where i have added in a couple more columns and used my real column names
library(lubridate)
library(data.table)
# cast to data.table
dat <- data.table::as.data.table(dat)
# cast long
dat <- data.table::melt(
dat,
id.vars = c("UC_USER", "PROJECT_ID", "FTE_UC", "FTE_DEMAND"),
variable.name = "TYPE",
value.name = "MONTH"
)
# parse time (dmy)
dat[, MONTH := lubridate::dmy(MONTH)]
# if task-end, then -1 month
date[TYPE == "TASK_END", MONTH := MONTH - months(1)]
# expand month if gap
dat <- dat[,
.(MONTH = seq(min(MONTH), max(MONTH), by = "1 month")),
by = .(UC_USER, PROJECT_ID, FTE_UC, FTE_DEMAND)
]
# format as month-year
dat[, MONTH := format(MONTH, "%b-%y")]
# set month as first column
data.table::setcolorder(dat, "MONTH")
Spotfire then prompts me to Edit Parameters and define what the input is, I have options for Input handler of Columns, Expression or None. If I choose columns I can select the columns from my source data table. I can then specify I want the output to be a new data table and give it a name.
After correcting the typo in my previous edit, I get this error:
TIBCO Enterprise Runtime for R returned an error
The data function 'expandDateTable (8) (expandDateTable)' could not be executed.
Error in withCallingHandlers({
library(lubridate)
library(dat : object 'TYPE' not found
eval(script, envir = .GlobalEnv)
eval(script, envir = .GlobalEnv)
withCallingHandlers({
The columns UC_USER and PROJECT_ID are strings, FTE_UC and FTE_DEMAND are numbers and TASK_START and TASK_END are dates.
Welcome to SO! I'm not sure I follow on the logic exactly but this would be my attempt. Oh and always nice to give data to folks for reproducibility using
dput():I prefer using
data.tablefor data manipulation but there's always adplyrsolution. First I cast to adata.table. You'll need bothlubridateanddata.tablefor this solution.Then I cast into a long format and order by user and project:
From there, I parse the date time with
lubridate:This was where I got a bit confused - in your desired output it looks like you want TASK_END - 1 month so that's what I did here.
Next I sequence between the min month and max month by user and project which will fill in the month if a task took more than 1 month to complete:
Finally, I format as month year (I reference this article for date formatting constantly) and set MONTH as the first column:
And we're left with: