How to filter a dataframe consisting of some id directly in the Database using dbplyr automatically

51 views Asked by At

I'm designing some analysis in rstudio the analysis will require me to check some failed id directly in the database without manually checking the id and quoting them just like we normally do when we want to filter some IDs in the database and collect any ID that matches the record I searched for further analysis

These are the id's in my df that I want to search in the db using dblyr

result_df <- all.electricity.nov %>%
  filter(merchant.y %in% top.3.merchant & Status != "Successful")%>%
  select("request_id")

After opening a successful connection to the DB, this is the query filter using dbplyr below is this

merchants <- tbl(copect, "elite_day")%>%
  filter(tranx_no %in% result_df$request_id)%>%
  collect()

and I get an error that the result_df column header is not found, see the error below

Error in `filter()`:
ℹ In argument: `tranx_no %in% result_df$request_id`
Caused by error:
! Object `request_id` not found.

My question is

  1. Since I dont want to insert this id's manually, is it possible to pass the selected column that consists of the IDs I want to filter their records in the db without quoting them and adding commas to separate them just like we manually add quotes and comma when filtering ids in the DB.

  2. Is it possible to filter them directly without manual intervention to make the report more intelligent without human intervention?

  3. Why is it not seeing the column named request_id?

I'm new in R and not that good with Database with R, Kindly assist me.

1

There are 1 answers

1
Yomi.blaze93 On

I just got the issue resolved now

so it happens that the filter function from dbplyr expects a vector of values for the filtering condition. When i use select("request_id") in my original code, it creates a data frame with a single column named request_id. However, when i try to use result_df$request_id in the filter function, it doesn't recognize it as a vector.

result_df <- all.electricity.nov %>%
  filter(merchant.y %in% top.3.merchant & Status != "Successful")%>%
  select("request_id")

I resolve this, by using the pull() function to extract the vector of IDs from the data frame. So, instead of passing a data frame with a single column, I am passing a vector of IDs directly to the filter function, and it can work as expected.

This modification allows me to work seamlessly with the vector of IDs in the context of the dbplyr operations in my database table.

# Assuming result_df is a data frame
result_df <- all.electricity.nov %>%
  filter(merchant.y %in% top.3.merchant & Status != "Successful") %>%
  select("request_id")

# Extract the vector of IDs using pull()
request_ids <- pull(result_df, request_id)

# Print the vector to check its content
print(request_ids)

# Filter the database table using dbplyr
merchants <- tbl(copect, "elite_pay") %>%
  filter(tranx_no %in% request_ids) %>%
  collect()

which works well and resolved my error.

Thanks