I have followed this Microsoft Documentation to connect to my gen2 storage account: https://learn.microsoft.com/en-gb/azure/databricks/connect/storage/tutorial-azure-storage
and used this to authenticate according to step 6:
service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>")
spark.conf.set("fs.azure.account.auth.type.<storage-account>.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.<storage-account>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.<storage-account>.dfs.core.windows.net", "<application-id>")
spark.conf.set("fs.azure.account.oauth2.client.secret.<storage-account>.dfs.core.windows.net", service_credential)
spark.conf.set("fs.azure.account.oauth2.client.endpoint.<storage-account>.dfs.core.windows.net", "https://login.microsoftonline.com/<directory-id>/oauth2/token")
Now when I am running this:
df = spark.read.csv("abfss://<filepath>")
I am getting this error: abfss://filepath has invalid authority.
I have double checked :
- tenant id of the SP
- client id of the SP
- secret scope name created according to the above mentioned documentation
- The role of the service principal in the container is "Storage Blob data Contributor"
File Service properties of my storage account:
Large file share Disabled
Identity-based access Not configured
Default share-level permissions Disabled
Soft delete Enabled (7 days)
Share capacity 5 TiB
Scope for SP didn't work even though the SP had "Storage Blob Data Contributor" role. So I tried creating a scope for my container's access key and it worked without any issues. Not sure exactly what the issue was though with the SP. I used this: