How to submit a GCP AI Platform training job frominside a GCP Cloud Build pipeline?

Question

How to submit a GCP AI Platform training job frominside a GCP Cloud Build pipeline?

558 views Asked by Dr. Fabien Tarrade At 15 December 2020 at 15:58

I have a pretty standard CI pipeline using Cloud Build for my Machine Learning training model based on container:

check python error use flake8
check syntax and style issue using pylint, pydocstyle ...
build a base container (CPU/GPU)
build a specialized ML container for my model
check the vulnerability of the packages installed
run tests units

Now in Machine Learning it is impossible to validate a model without testing it with real data. Normally we add 2 extra checks:

Fix all random seed and run on a test data to see if we find the exact same results
Train the model on a batch and see if we can over fit and have the loss going to zero

This allow to catch issues inside the code of model. In my setup, I have my Cloud Build in a build GCP project and the data in another GCP project.

Q1: did somebody managed to use AI Platform training service in Cloud Build to train on data sitting in another GCP project ?

Q2: how to tell Cloud Build to wait until the AI Platform training job finished and check what is the status (successful/failed) ? It seems that the only option when looking at the documentation link it to use --stream-logsbut it seems non optimal (using such option, I saw some huge delay)

Original Q&A

There are 1 answers

**guillaume blaquiere** · Accepted Answer · 2020-12-15T20:52:11+00:00

When you submit an AI platform training job, you can specify a service account email to use.

Be sure that the service account has enough authorization in the other project to use data from there.

For you second question, you have 2 solutions

Use --stream-logs as you mentioned. If you don't want the logs in your Cloud Build, you can redirect the stdout and/or the stderr to /dev/null

- name: name: 'gcr.io/cloud-builders/gcloud'
  entrypoint: 'bash'
  args:
    - -c
    - |
         gcloud ai-platform jobs submit training <your params> --stream-logs >/dev/null 2>/dev/null

Or you can create an infinite loop that check the status

- name: name: 'gcr.io/cloud-builders/gcloud'
  entrypoint: 'bash'
  args:
    - -c
    - |
        JOB_NAME=<UNIQUE Job NAME>
        gcloud ai-platform jobs submit training $${JOB_NAME} <your params> 
        # test the job status every 60 seconds
        while [ -z "$$(gcloud ai-platform jobs describe $${JOB_NAME} | grep SUCCEEDED)" ]; do sleep 60; done

Here my test is simple, but you can customize the status tests as you want to match your requirement

Don't forget to set the timeout as expected.

TechQA.

How to submit a GCP AI Platform training job frominside a GCP Cloud Build pipeline?

There are 1 answers

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GOOGLE-CLOUD-BUILD

Related Questions in GCP-AI-PLATFORM-TRAINING

Popular Questions

Trending Questions