R loop to extract data from excel files and save outputs in separate dataframes

406 views Asked by At

I would like to loop through a set of excel files in a folder (let's imagine the folder contains 5 excel files). Each excel file contains different data but with the same structure, that looks like this:

Col_01 Col_02 Col_03
0.5 0.2 0.1
0.8 1.2 3.1
0.1 0.9 0.4

I'd like to extract a specific range of data (let's imagine the range A2:C4) from each one of these excels, and then save the outputs in different data frames. I created this code:

library(readxl)

# Take the list of excel files from a folder
list_excelfiles <- list.files(pattern='*.xlsx')

# Define codes that will be used in the output names of the dataframes created
codes <- c("001", "002", "003", "004", "005")

# Start loop
for (x in list_excelfiles) {
  # Define the range of data to extract from the excel files
  range <- c("A2:C4")
  
  # Use readxl package to read the specific range of data from excel, and save the extracted range into a dataframe 
  x <- read_excel(x, sheet = NULL, range = range, col_types = "numeric")

  # Assign a name to the data frames created
  for (i in codes)  {
     assign(paste("Data_", i, sep = "_"), x) 
     }

At this point the loop should create 5 dataframes (named Data_001, Data_002, etc.) containing 5 set of data. Unfortunately the code above creates the dataframes with the correct naming (e.g. Data_001, Data_002, etc.), but the content of all the dataframes is always the same (namely: all the dataframes report the data taken from the last excel). Can anybody help?

1

There are 1 answers

0
goblinshark On

I think the issue is that for every single excel file you are redefining all of the Data_code variables as containing the content for that excel file. So for the final excel file it overwrites all the variables as using that excel file's content.

To get around this you just need to do the assign once in the main for loop e.g. in the code below I edited it so that it goes through each excel file and for each one:

  1. loads the data
  2. creates a new variable for this data called Data_00i where i is some number which starts at 1 and goes up to the number of excel files
library(readxl)

# Take the list of excel files from a folder
list_excelfiles <- list.files(pattern='*.xlsx')

# Start loop
for (i in seq(length(list_excelfiles))) {
  # Define the range of data to extract from the excel files
  range <- c("A2:C4")
  
  # Use readxl package to read the specific range of data from excel, and save the extracted range into a dataframe 
  x <- read_excel(list_excelfiles[i], sheet = NULL, range = range, col_types = "numeric")

  # Assign a name to the data frames created
  assign(paste("Data_00", i), x) 
}