Running a databricks notebook connected to git via ADF independent from git username

Question

Running a databricks notebook connected to git via ADF independent from git username

2.3k views Asked by Ali Saberi At 20 January 2022 at 22:06

In our company for orchestrating of running Databricks notebooks, experimentally we learned to connect our notebooks (affiliated to a git repository) to ADF pipelines, however, there is an issue.

As you can see in the photo attached to this question path to the notebook depends on the employee username, which is not a stable solution at production.

What is/are the solution(s) to solve it?.

update: The main issue is keeping employee username out of production to avoid any future failure. Either in path of ADF or secondary storage place which can be read by lookup but still sitting production side.

Path selection in ADF:

Original Q&A

There are 2 answers

Utkarsh Pal On 21 January 2022 at 10:31

You can use Azure DevOps source control to manage the developer and production Databrick Notebooks or other related codes/scripts/documents in Git. Learn more here.

Keep your Notebooks in logical distributed repositories in Github and use the same path in your Azure Data Factory in Notebook activity.

If you want to pass the dynamic path in Notebook activity, you should have placeholder of the notebook file paths lists something like a text/csv file or a SQL table where all the notebooks paths will be available.

Then use the Lookup activity in the ADF to get the list of those paths and pass the lookup output to a ForEach activity and have a Notebook activity inside ForEach and pass the path (for each iteration) to the parameters. This way you can avoid hardcoded field path in the pipeline.

**Alex Ott** · Accepted Answer · 2022-01-21T12:46:50+00:00

If you want to avoid having the username in the path, then you can just create a folder inside Repos, and do checkout there (here is full instruction):

In the Repos, in the top-level part, click on the ᐯ near the "Repos" header, select "Create" and select "Folder". Give it some name, like, "Staging":

Create a repository inside that folder

Click on the ᐯ near the "Staging" folder, and click "Create" and select "Repo":

After that you can navigate to that repository in the ADF UI.

It's also recommended to set permissions on the folder, so only specific people can update projects inside it.

TechQA.

Running a databricks notebook connected to git via ADF independent from git username

There are 2 answers

Related Questions in AZURE-DATA-FACTORY

Related Questions in DATABRICKS

Related Questions in AZURE-DATABRICKS

Related Questions in AZURE-GIT-DEPLOYMENT

Related Questions in DATABRICKS-REPOS

Popular Questions

Trending Questions