How to load data from Azure Databricks SQL to GCP Databricks SQL

85 views Asked by At

Is there an easy way to load data from Azure Databricks Spark DB to GCP Databricks Spark DB?

1

There are 1 answers

3
Kombajn zbożowy On
  1. Obtain JDBC details from Azure instance and use them in GCP to pull data just as from any other JDBC source.
// This is run in GCP instance
some_table = spark.read
  .format("jdbc")
  .option("url", "jdbc:databricks://adb-xxxx.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/xxxx;AuthMech=3;UID=token;PWD=xxxx")
  .option("dbtable", "some_table")
  .load()
  1. Assuming Azure data is stored in Blob/ADLSv2 storage, mount it in GCP instance's DBFS and read data directly.
// This is run in GCP instance
// Assuming ADLSv2 on Azure side
val configs = Map(
  "fs.azure.account.auth.type" -> "OAuth",
  "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id" -> "<application-id>",
  "fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
  "fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/<directory-id>/oauth2/token")

dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mountPoint = "/mnt/<mount-name>",
  extraConfigs = configs)

some_data = spark.read
  .format("delta")
  .load("/mnt/<mount_name>/<some_schema>/<some_table>")