Azure Automated ML - how do I append prediction to source data?

48 views Asked by At

Currently using the Automated ML to capability build a classification model using some dummy data to determine if fictional abandoned website visitors would be likely to make a purchase if they were retargeted.

I've managed to create the model and pipeline and deploy this to a batch endpoint which runs. I was expecting the output of this process would update the input with the prediction appended as the last column.

Instead currently, the output being returned looks like this:

['sample_data_141223.csv', '0']
['sample_data_141223.csv', '0']
['sample_data_141223.csv', '0']
['sample_data_141223.csv', '0']
['sample_data_141223.csv', '0']
['sample_data_141223.csv', '0']
['sample_data_141223.csv', '0']
['sample_data_141223.csv', '1']
['sample_data_141223.csv', '1']
['sample_data_141223.csv', '0']
['sample_data_141223.csv', '0']
['sample_data_141223.csv', '1']
['sample_data_141223.csv', '0']

Is there a way in which I can adjust my output so the score is appended? And for bonus marks, is there a way to also append any sort of confidence score for each prediction?

Thanks in advance,

Will S-J

1

There are 1 answers

1
Rishabh Meshram On

To adjust your output so the score is appended to the original data, you can modify your scoring script. In the scoring script, after making predictions with your model, you can append these predictions to your original data.

Once your training job is completed, you can extract the generated scoring script for the best model. enter image description here

In the scoring script, after making predictions with your model, you can append these predictions to your original data.

As for appending a confidence score for each prediction, this depends on whether your model can output prediction probabilities or some form of confidence measure. If your model has a method like predict_proba, you can use it to get the confidence score.

def run(mini_batch):
   # Load the model
   model = joblib.load(Model.get_model_path('your_model_name'))

   # Prepare the data
   data = pd.DataFrame(mini_batch)

   # Make predictions
   predictions = model.predict(data)

   # Get prediction probabilities or confidence scores
   confidence_scores = model.predict_proba(data)

   # Append predictions and confidence scores to data
   data['predictions'] = predictions
   data['confidence_scores'] = confidence_scores

   return data 

Then you can create a batch endpoint deployment with your custom scoring script that will be able to give you output as per requirement can be save in csv format. enter image description here

For more details regarding scoring script in batch endpoints, check this documentation.