A way to sort files into tibbles based on filename in R?

49 views Asked by gvit At 12 October 2023 at 17:13

I have a large list of files and want to sort columns from certain files in the list into two tibbles. The file names contain the phrases I need to identify which files go where, but I can't figure out an efficient and universal way to do the sorting and appending. All files contain the same columns and column names, but varying numbers of columns. I would prefer tidyverse solutions. An example of my file names:

A1_SORE6_REEC_T=1.csv
A5_minCMV_noREEC_T=5.csv

and the tibbles look something like this, plus a few other columns:

file1 <- tibble(ID = 1:50, Ch1_RawIntDen = sample(1:2000, 50), Ch1_IntDen = sample(1:2000, 50), Ch1_IntNorm = sample(1:2000, 50))

I want to be able to filter files by any part, or multiple parts, (A#, SORE6/minCMV, REEC/noREEC, and T=#) of these file names, then extract one column from my filtered files and append them all into one tibble.

I have technically found a way to accomplish this, but it is hideous and very hardcoded. I would also like to be able to filter by multiple phrases within the filename, but right now I can only sort by one phrase. Here is the code I use now, in which I'm grabbing the Ch1_IntNorm column from each file:

filelist <- list.files(path = "data-cleaned", full.names = TRUE)

for (x in 1:length(filelist)) {
  if (str_detect(string = filelist[[x]], "SORE6") == TRUE) {
    if (exists("SORE6", envir = ) == FALSE) {
      SORE6 <- read_csv(filelist[[x]], col_select = Ch1_IntNorm)
    } else {
      SORE6 <- rbind(SORE6,read_csv(filelist[[x]], col_select = Ch1_IntNorm))
    }
  } else {
    if (exists("minCMV", envir = ) == FALSE) {
      minCMV <- read_csv(filelist[[x]], col_select = Ch1_IntNorm)
    } else {
      minCMV <- rbind(minCMV, read_csv(filelist[[x]], col_select = Ch1_IntNorm))
    }
  }
  
}

I couldn't figure out how to append all the files together without using one file as the starting point, hence the check for an existing variable. You can also see I don't technically grab the "minCMV" files based on their filename, I just define anything that isn't "SORE6" as "minCMV", which will not always be the case going forward.

I want the final tibbles to look something like this:

SORE6 <- tibble(Ch1_IntNorm = sample(1:2000, 200))

I'm new to R so I would appreciate an explanation of how any solutions work. Thanks!

Original Q&A

TechQA.

A way to sort files into tibbles based on filename in R?

There are 0 answers

Related Questions in R

Related Questions in SORTING

Related Questions in APPEND

Related Questions in TIBBLE

Related Questions in FILE-SORTING

Popular Questions

Trending Questions