I have a large list of files and want to sort columns from certain files in the list into two tibbles. The file names contain the phrases I need to identify which files go where, but I can't figure out an efficient and universal way to do the sorting and appending. All files contain the same columns and column names, but varying numbers of columns. I would prefer tidyverse solutions. An example of my file names:
A1_SORE6_REEC_T=1.csv
A5_minCMV_noREEC_T=5.csv
and the tibbles look something like this, plus a few other columns:
file1 <- tibble(ID = 1:50, Ch1_RawIntDen = sample(1:2000, 50), Ch1_IntDen = sample(1:2000, 50), Ch1_IntNorm = sample(1:2000, 50))
I want to be able to filter files by any part, or multiple parts, (A#, SORE6/minCMV, REEC/noREEC, and T=#) of these file names, then extract one column from my filtered files and append them all into one tibble.
I have technically found a way to accomplish this, but it is hideous and very hardcoded. I would also like to be able to filter by multiple phrases within the filename, but right now I can only sort by one phrase. Here is the code I use now, in which I'm grabbing the Ch1_IntNorm column from each file:
filelist <- list.files(path = "data-cleaned", full.names = TRUE)
for (x in 1:length(filelist)) {
if (str_detect(string = filelist[[x]], "SORE6") == TRUE) {
if (exists("SORE6", envir = ) == FALSE) {
SORE6 <- read_csv(filelist[[x]], col_select = Ch1_IntNorm)
} else {
SORE6 <- rbind(SORE6,read_csv(filelist[[x]], col_select = Ch1_IntNorm))
}
} else {
if (exists("minCMV", envir = ) == FALSE) {
minCMV <- read_csv(filelist[[x]], col_select = Ch1_IntNorm)
} else {
minCMV <- rbind(minCMV, read_csv(filelist[[x]], col_select = Ch1_IntNorm))
}
}
}
I couldn't figure out how to append all the files together without using one file as the starting point, hence the check for an existing variable. You can also see I don't technically grab the "minCMV" files based on their filename, I just define anything that isn't "SORE6" as "minCMV", which will not always be the case going forward.
I want the final tibbles to look something like this:
SORE6 <- tibble(Ch1_IntNorm = sample(1:2000, 200))
I'm new to R so I would appreciate an explanation of how any solutions work. Thanks!