I have this dataset which contains patient_id, diagnosis_date, diagnosis, trt_date, and drug_code.
# Create the dataset
data <- data.frame(
patient_id = c(1, 1, 1, 1, 5, 5, 7, 7),
diagnosis_date = as.Date(c("1/9/10", "1/9/10", "1/9/10",
"1/9/10", "1/11/10", "1/11/10",
"1/9/10", "1/9/10"), format = "%m/%d/%y"),
diagnosis = c("breast cancer", "breast cancer", "breast cancer",
"breast cancer", "breast cancer", "breast cancer",
"breast cancer", "breast cancer"),
trt_date = as.Date(c("1/20/10", "1/20/10", "1/21/10",
"1/21/10", "1/29/10", "1/30/10",
"1/25/10", "1/26/10"), format = "%m/%d/%y"),
drug_code = c("A", "B", "A", "A", "B", "A", "A", "A")
)
Here are the conditions for filtering:
Patient id 1, has first treatment date of 1/20/10 and received drug A, but also received drug B on the same date (second line of observation). This is called first line combo therapy with drug A and B, which we will need to filter out . So patient id 1 will be filtered out.
Patient id 5, has first treatment date of 1/29/10 and received only drug B, this is called first line monotherapy with drug B. So patient id 5 will be kept.
Patient id 7 has first treatment date of 1/25/10 and received only drug A, this is called first line monotherapy with drug A.
As a result, I want observations that only have first line monotherapy with drug A and first line monotherapy with drug B, but not first line combo therapy with drug A and B. So patient id 7 will be kept.
So the desired output would look like:
patient_id diagnosis_date diagnosis trt_date drug_code
5 2010-01-11 breast cancer 2010-01-29 B
5 2010-01-11 breast cancer 2010-01-30 A
7 2010-01-09 breast cancer 2010-01-25 A
7 2010-01-09 breast cancer 2010-01-26 A
How can I achieve this?