I'm trying to use a triple nested for loop to:
- loop over a list of dataframes
- for each dataframe, loop over the each unique value in the "treatment" column
- for each unique treatment value, perform a statistical test (
shapiro.test) on the 4 possible variables ("QY_max","FvFo","NPQ_Lss","QY_Lss") of the dataframe.
Therefore, I've written following function:
treatment_normality<-function(df_list){
for (df in df_list){
#df.name <- deparse(substitute(df))
#print(df.name)
for (treat in unique(df$treatment)){
print(treat)
df_treat<-dplyr::filter(df, treatment %in% treat)
for (parameter in c("QY_max","FvFo","NPQ_Lss","QY_Lss"))
print(parameter)
print(shapiro.test(df_treat[[parameter]]))
}
}
}
I regularly print the variables here to know where I'm at in the for loop when I check the output. However, I'm experiencing several issues which I cannot fix. My results look like:
[1] "A"
[1] "QY_max"
[1] "FvFo"
[1] "NPQ_Lss"
[1] "QY_Lss"
Shapiro-Wilk normality test
data: df_treat[[parameter]]
W = 0.94347, p-value = 0.5415
[1] "B"
[1] "QY_max"
[1] "FvFo"
[1] "NPQ_Lss"
[1] "QY_Lss"
Shapiro-Wilk normality test
data: df_treat[[parameter]]
W = 0.93471, p-value = 0.5065
[1] "C"
[1] "QY_max"
[1] "FvFo"
[1] "NPQ_Lss"
[1] "QY_Lss"
I cannot figure out why this for loop does not print my treatment (print(treat)), then print the parameter (print(parameter)) and then the test output (print(shapiro...)). Why does it print all parameters after one another? And why does it show data: df[[parameter]] and not the name of my name of my dataframe + variable name e.g. df1[["QY_max"]]? I'd also like to print NAME of the dataframe which is being used with df.name <- deparse(substitute(df)) and then print(df.name), which should do the trick, but messes up all the results (therefore commented out).
Any idea how can I fix these issues?
Below, you can find a MRE (ChatGPT):
# Set seed for reproducibility
set.seed(123)
# Create a list to store dataframes
list_of_dataframes <- list()
# Define treatments and levels
treatments <- c("A", "B", "C", "D", "E")
num_levels <- length(treatments)
# Number of rows in each dataframe
num_rows <- 3 * num_levels
# Generate data for each dataframe
for (i in 1:3) {
treatment <- rep(treatments, each = 3) # Each treatment has 2 occurrences
QY_max <- runif(num_rows, min = 0, max = 100) # Random QY_max values
FvFo <- runif(num_rows, min = 0, max = 1) # Random FvFo values
NPQ_Lss <- runif(num_rows, min = 0, max = 2) # Random NPQ_Lss values
QY_Lss <- runif(num_rows, min = 0, max = 1) # Random QY_Lss values
# Create dataframe
df_name <- paste0("df", i)
list_of_dataframes[[df_name]] <- data.frame(treatment, QY_max, FvFo, NPQ_Lss, QY_Lss)
}
Thanks!