How to use Broadcast in Foundry code repository

79 views Asked by At

i would like to use Broadcast in my code repository to access the model in a pandas udf e.g.

path_to_model = "path/to/model.h5"
model = load_model(path_to_model)
model_bcast = sc.broadcast(model)

Can someone help me how to access the spark context in the foundry code repo or is there another way to pass the model and the pandas udf?

Thanks and BR

Adrian

access the model in the pandas udf

1

There are 1 answers

0
ZettaP On

If you look to do:

# Load the model using Broadcast
model_bc = spark.sparkContext.broadcast(joblib.load(model_path))

then you can use "ctx" to access the spark context, see below example. You can add "ctx" as an argument to your transform and it will be automatically populated with the sparkContext. See this.

@transform(
    out=Output("/path/dataset")
)
def out_1(ctx, out):    
    # [...]
    ctx.spark_session.sparkContext.broadcast(...)
    # [...]

This is untested code as I'm not familiar with the "broadcast" statement in Pandas, but that should still be enough to unblock you.