Overview:
Azure HDInsight
Cluster Type: ML Services (R Server)
Version: R Server 9.1 (HDI 3.6)
I am trying to import a csv file from Azure data storage blob into R server environment. But it's obviously not as easy as I thought it would be or just as easy as locally.
First thing I tried was installing sparklyr package and set connection.
#install.packages("devtools")
#devtools::install_github("rstudio/sparklyr")
install.packages("sparklyr")
library(sparklyr)
sc <- spark_connect(master = "yarn")
But due to an old version installed in HDI, there's an error message.
Error in start_shell(master = master, spark_home = spark_home, spark_version = version, :
sparklyr does not currently support Spark version: 2.1.1.2.6.2.38
Then I tried to use rxSparkConnect but didn't work either.
#Sys.setenv(SPARK_HOME_VERSION="2.1.1.2.6.2.38-1")
cc <- rxSparkConnect(interop = "sparklyr")
sc <- rxGetSparklyrConnection(cc)
orgins <- file.path("wasb://[email protected]","FILENAME.csv")
spark_read_csv(sc,path = origins, name = "df")
How would you read a csv file from azure storage blob into the r server environment?
I'm a little upset at myself that this is taking so long, and it shouldn't be this complicated, please help me guys! Thanks in advance!
related post 1
related post 2
I found a imperfect work around is to upload the data in the "local" environment in the bottom right corner and simply read the csv file from there.

There's gotta be a better way to do it, since it's a lot of manual work, probably impractical if data size is big and it's a waste of storage blob.