I am trying to import a long list of tables into RStudio. I want to be able to paste a list of table names into RStudio and then easily import all of the tables (doing so manually would take a very long time).
I am able to paste the list of table names into R and create a data frame and a vector that is a list of table names in the structure: 'databasename.schemaname.tablename1' etc
When I use dbReadTable to import any single table using the below, it works fine.
table1 <- dbReadTable(conn, SQL('databasename.schemaname.tablename1'))
I have tried writing a for loop and a function using dbReadTable() to do this for all tables in the list (input_list, a character vector of length n), which I would like to import into objects with the same name. However, I run into errors. This is what I have tried:
Version 1
# v1
# create a function to bulk import from SQL
bulk_sql_import <- function(t_list) {
for(i in seq_along(t_list)) {
dbReadTable(conn,
SQL(get(t_list[i])))
}
}
# use the function
bulk_sql_import(input_list)
#Error in h(simpleError(msg, call)) :
# error in evaluating the argument 'name' in selecting a method for function 'dbReadTable': invalid first argument
Version 2
v2
# create a function to bulk import from SQL
bulk_sql_import2 <- function(t_list) {
for(i in seq_along(t_list)) {
get(t_list[i]) <- dbReadTable(conn,
SQL(get(t_list[i])))
}
}
# use the function
bulk_sql_import2(input_list)
#Error in h(simpleError(msg, call)) :
# error in evaluating the argument 'name' in selecting a method for function 'dbReadTable': object 'databasename.schemaname.tablename1' not found
This last error is confusing as the table can be found by dbReadTable() normally.
Any help would be greatly appreciated!
Thanks :)
get(..) <- ...isn't a thing.assignis the way to do that, but ... generally usingget/assignare not indicators of solid workflows. Generally it is better to work with a list of frames. (Functions that rely on this break the popular and effective "functional paradigm", where functions return things and do not invoke side effect.)Realize that even if we do
assign(get(t_list[i]), dbReadTable(..))here, it is here, as in "in this function", not in the calling environment. You need to store it in the environment of the calling environment. We'll useenvir=parent.frame()in the function definition to remedy this.More-so in functional programming, your function should never break "scope" and expect a variable to be available that is not explicitly passed to it. Here,
connis mysteriously used without definition. It's bad practice to assume it in the calling environment, much safer to have it passed explicitly (perhaps with a default value) by the caller.Here's the final thing: you are trying to
get(t_list[i]), which means to "find the local variable that is named this" ... but there is no local variable yet, so it will fail withobject '**' not found.As a first cut, this should work for you:
I'll spend a moment to evolve this a little, in a way that makes it a bit more programmatic, flexible, and robust. For instance, we don't need to count with
i, we just need names;lapplyis a little more canonical here, though admittedly it doesn't add a lot but is often a preferred way of doing things in R (and is in alignment with the "list of frames" concept I mentioned earlier).Improvements:
envir=NULLif/when you want it to just return, not assigninvisible(.)so that if you just call it by itself, it will not trash your consoletryCatchso that if one table fails, it will continue; the error text is included in place of the table (as acharacterstring); there are many other ways to approach this, some subjective, feel free to explore this spacePersonally, I would set the default to
envir=NULLand require that the user explicitly elect side-effect by calling it asto indicate the intent to store it in the current (calling) environment.