how to call databricks notebook from python with rest api

450 views Asked by At

I want to create a python notebook on my desktop that pass an input to another notebook in databricks, and then return the output of the databricks notebook. For example, my local python file will pass a string into a databricks notebook, which will reverse the string and then output the result back to my local python file. What would be the best way to achieve this?

This is what I have so far. I am getting a response from the api, but I am expecting an attribute in the metadata called "notebook_output". What am I missing to get this response? Or is there somewhere else I can look to get the notebook output from the run?

import os
from databricks_cli.sdk.api_client import ApiClient
from databricks_cli.clusters.api import ClusterApi

os.environ['DATABRICKS_HOST'] = "https://adb-################.##.azuredatabricks.net/"
os.environ['DATABRICKS_TOKEN'] = "token-value" 

api_client = ApiClient(host=os.getenv('DATABRICKS_HOST'), token=os.getenv('DATABRICKS_TOKEN'))

runJson = """
        {
        "name": "test job",
        "max_concurrent_runs": 1,
        "tasks": [
            {
            "task_key": "test",
            "description": "test",
            "notebook_task":
                {
                "notebook_path": "/Users/[email protected]/api_test"
                },
            "existing_cluster_id": "cluster_name",
            "timeout_seconds": 3600,
            "max_retries": 3,
            "retry_on_timeout": true
            }
            ]
        }
        """

runs_api = RunsApi(api_client)
run_id = runs_api.submit_run(runJson)
metadata = runs_api.get_run_output(run_id['run_id'])['metadata']

Output:

{'job_id': 398029273095601, 'run_id': 150609942, 'creator_user_name': 'user', 'number_in_job': 150609942, 'state': {'life_cycle_state': 'TERMINATED', 'result_state': 'SUCCESS', 'state_message': '', 'user_cancelled_or_timedout': False}, 
'task': {'notebook_task': {'notebook_path': 'path', 'source': 'WORKSPACE'}}, 
'cluster_spec': {'existing_cluster_id': 'cluster'}, 
'cluster_instance': {'cluster_id': 'id', 'spark_context_id': 'id'}, 'start_time': 1683904971067, 'setup_duration': 1000, 'execution_duration': 8000, 'cleanup_duration': 0, 'end_time': 1683904981007, 'run_duration': 9940, 'run_name': 
'Untitled', 'run_page_url': 'url', 'run_type': 'SUBMIT_RUN', 'attempt_number': 0, 'format': 'SINGLE_TASK'}
1

There are 1 answers

1
Abhishek Jain On

To get the notebook_output you need to use api/2.0/jobs/runs/get-output endpoint which returns response as notebook_output