How to change names of files from one csv column to the next

131 views Asked by At

I have a list of files that I have downloaded from the internet using an R script to run through a csv that contains download links. However when my files download they are named from the end of the csv column titled external instead of with the ID name that I want.

My data looks like this:

| ID       | external                                      |
| -------- | --------------------------------------------- |
| ABC_101  | https://peaches.com/12345_download            |
| ABC_102  | https://peaches.com/123456_download           |

So when the file downloads it is named 12345_download but I want it to be named ABC_101. I am working with over 1,000 instances so ideally I would want to write an R script that can match the last part in the external column then name it to the correct ID column.

library(dplyr)
library(stringdist)
library(writexl)

# set working directory for project to access files 
setwd("/home2/peach/")

# read files in folder and get list of file names
file_names <- list.files(path ="peaches_downloads/downloads/",
                         all.files=TRUE, 
                         full.names=TRUE,
                         recursive=TRUE,
                         pattern=".jpg") %>%
  data.frame(paths = .)

# extract part of file name [remove directory sub strings] that
# comes before .jpg + other parts of naming convention and add a column. 
file_names$match.name <- file_names %>%
  pull(paths) %>%
  basename() %>%
  gsub(pattern = "\\.jpg.*", replacement = "") %>%
  gsub(pattern = "_download", replacement = "")

# read in excel/csv file with names to change to 
name_data <- read.csv("peaches_downloads/xlsx/fruit_full_dump_.csv")

# matching with external 
# extract part of the external name to get external to match path names 
name_data <- name_data %>% mutate (external_match = name_data$external <- name_data %>%
                                     pull(external) %>%
                                     basename() %>%
                                     gsub(pattern = "\\.jpg.*", replacement = "") %>%
                                     gsub(pattern = "_download", replacement = ""))
check2 <- print(file_names$match.name %in% name_data$external_match) %>% data.frame(check2 =.)

1

There are 1 answers

0
Mohamed Desouky On
  • You can try this
for(f in list.files("Your_files_directory") ){
    ind <- which(grepl(f , name_data$external))
    file.rename(f , name_data$ID[ind])
}

Where "Your_files_directory" is the directory you download your file directory and name_data your date which contains ID and external.