Receiving HTTP 401 when accessing Cloud Composer's Airflow Rest API

3.9k views Asked by At

I am trying to invoke Airflow 2.0's Stable REST API from Cloud Composer Version 1 via a Python script and encountered a HTTP 401 error while referring to Triggering DAGS with Cloud Functions and Access the Airflow REST API.

The service account has the following list of permissions:

  • roles/iam.serviceAccountUser (Service Account User)
  • roles/composer.user (Composer User)
  • roles/iap.httpsResourceAccessor (IAP-Secured Web App User, added when the application returned a 403, which was unusual as the guides did not specify the need for such a permission)

I am not sure what is wrong with my configuration; I have tried giving the service account the Editor role and roles/iap.tunnelResourceAccessor (IAP-Secured Tunnel User) & roles/composer.admin (Composer Administrator), but to no avail.

EDIT: I found the source of my problems: The Airflow Database did not have the credentials of the service account in the users table. However, this is unusual as I currently have a service account (the first I created) whose details were added automatically to the table. Subsequent service accounts were not added to the users table when they tried to initially access the REST API, thus returning the 401. I am not sure of a way to create users without passwords since the Airflow web server is protected by IAP.

5

There are 5 answers

2
Seng Cheong On BEST ANSWER

Thanks to answers posted by @Adrie Bennadji and @ewertonvsilva, I was able to diagnose the HTTP 401 issue.

The email field in some of Airflow's database tables that are pertaining to users, have a limit of 64 characters (Type: character varying(64)), as noted in: Understanding the Airflow Metadata Database

Coincidentally, my first service account had an email whose character length was just over 64 characters.

When I tried running the command: gcloud composer environments run <instance-name> --location=<location> users -- create --use-random-password --username "accounts.google.com:<service_accounts_uid>" --role Op --email <service-account-username>@<...>.iam.gserviceaccount.com -f Service -l Account as suggested by @ewertonvsilva to add my other service accounts, they failed with the following error: (psycopg2.errors.StringDataRightTruncation) value too long for type character varying(64).

As a result, I created new service accounts with shorter emails and these were able to be authenticated automatically. I was also able to add these new service accounts with shorter emails to Airflow manually via the gcloud command and authenticate them. Also, I discovered that the failure to add the user upon first acccess to the REST API was actually logged in Cloud Logging. However, at that time I was not aware of how Cloud Composer handled new users accessing the REST API and the HTTP 401 error was a red herring.

Thus, the solution is to ensure that the combined length of your service account's email is lesser than 64 characters.

0
Adrien Bennadji On

ewertonvsilva's solution worked for me (manually adding the service account to Airflow using gcloud composer environments run <instance-name> --location=<location> users -- create ... )

At first it didn't work but changing the username to accounts.google.com:<service_accounts_uid> made it work.

Sorry for not commenting, not enough reputation.

0
ewertonvsilva On

Based on @Adrien's Bennadji feedback, I'm posting the final answer.

  • Create the service accounts with the proper permissions for cloud composer;

  • Via gcloud console, add the users in airflow database manually: gcloud composer environments run <instance-name> --location=<location> users -- create --use-random-password --username "accounts.google.com:<service_accounts_uid>" --role Op --email <service-account-username>@<...>.iam.gserviceaccount.com -f Service -l Account

  • And then, list the users with: gcloud composer environments run <env_name> --location=<env_loc> users -- list

use: accounts.google.com:<service_accounts_uid> for the username.

0
Anton Kumpan On

Copying my answer from https://stackoverflow.com/a/70217282/9583820

It looks like instead of creating Airflow accounts with

gcloud composer environments run

You can just use GCP service accounts with email length <64 symbols.

It will work automatically under those conditions:

TL'DR version:

In order to make Airflow Stable API work at GCP Composer:

  1. Set "api-auth_backend" to "airflow.composer.api.backend.composer_auth"
  2. Make sure your service account email length is <64 symbols
  3. Make sure your service account has required permissions (Composer User role should be sufficient)

Longread:

We are using Airflow for a while now, and started with version 1.x.x with "experimental" (now deprecated) API's.

To Authorize, we are using "Bearer" token obtained with service account:

# Obtain an OpenID Connect (OIDC) token from metadata server or using service account.
google_open_id_connect_token = id_token.fetch_id_token(Request(), client_id)

# Fetch the Identity-Aware Proxy-protected URL, including an
# Authorization header containing "Bearer " followed by a
# Google-issued OpenID Connect token for the service account.
resp = requests.request(
    method, url,
    headers={'Authorization': 'Bearer {}'.format(
        google_open_id_connect_token)}, **kwargs)

Now we are migrating to Airflow 2.x.x and faced with exact same issue: 403 FORBIDDEN.

Our environment details are:

composer-1.17.3-airflow-2.1.2 (Google Cloud Platform)

"api-auth_backend" is set to "airflow.api.auth.backend.default".

Documentation claims that:

After you set the api-auth_backend configuration option to airflow.api.auth.backend.default, the Airflow web server accepts all API requests without authentication.

However, this does not seem to be true.

In experimental way, we found that if "api-auth_backend" is set to "airflow.composer.api.backend.composer_auth", Stable REST API (Airflow 2.X.X) starting to work.

But there is other caveat to this: for us, some of our service accounts did work, and some did not. The ones that did not work were throwing "401 Unauthorized" error. We figured out that accounts having email length > 64 symbols were throwing error. Same was observed at this answer.

So after setting "api-auth_backend" to "airflow.composer.api.backend.composer_auth" and making sure that our service account email length is <64 symbols - our old code for Airflow 1.x.x started to work for Authentication. Then we needed to make changes (API URLs and response handling) and stable Airflow (2.x.x) API started to work for us in the same way as it was for Airflow 1.x.x.

UPD: this is a defect in Airflow and will be fixed here: https://github.com/apache/airflow/pull/19932

0
Matthias Oscity On

I was trying to invoke Airflow 2.0's Stable REST API from Cloud Composer Version 2 via a Python script and encountered an HTTP 401 error while referring to Triggering DAGS with Cloud Functions and accessing the Airflow REST API.

I used this image version: composer-2.1.2-airflow-2.3.4

I also followed these 2 guides:

But I was always stuck with Error 401, when I tried to run the DAG via the Cloud Function. However, when the DAG was executed from the Airflow UI, it was successful (Trigger DAG in the Airflow UI).


For me the following solution worked:

In the airflow.cfg, set the following settings:

  • api - auth_backends=airflow.composer.api.backend.composer_auth,airflow.api.auth.backend.session

  • api - composer_auth_user_registration_role = Op (default)

  • api - enable_experimental_api = False (default)

  • webserver - rbac_user_registration_role = Op (default)


Service Account:

  • The service account email total length is <64 symbols.

  • The account has these roles:

    • Cloud Composer v2 API Service Agent Extension, Composer User

Airflow UI

  • Add the service account to the Airflow Users via Airflow UI (Security -> List Users with username) = accounts.google.com:<service account uid>, and assign the role of Op to it.

  • You can get the UID from via cloud shell command (see above), or just navigate to the IAM & Admin Page on Google Cloud -> Service Accounts -> Click on the service account and read the Unique ID from the Details page.

  • And now, IMPORTANT!: SET THE ACCOUNT ACTIVE! (In the Airflow UI, check the box "is Active?" to true).

This last step to set it active was not described anywhere, and for long time I just assumed it gets set active when there is an open session (when it makes the calls), but that is not the case. The account has to be set manually active. After that, everything worked fine :)

Other remarks: As I joined a new company, I also had to check some other stuff (maybe this is not related to your problem, but it's good to know anyway - maybe others can use this). I use Cloud Build to deploy the Cloud Functions and the DAGs in the Airflow, so I also had to check the following:

  • Cloud Source Repository (https://source.cloud.google.com/) is in sync with the GitHub Repository. If not: Disconnect the repository and reconnect again.
  • The GCS Bucket which is created when the Composer 2 Environment is setup the very first time has a subfolder "/dags/". I had to manually add the subfolder "/dags/dataflow/" so the deployed Dataflow Pipeline codes could be uploaded to that subfolder "/dags/dataflow/"