firewall requirements

128 views Asked by Averell At 21 February 2024 at 09:20

I'm trying my first GCP Dataproc Serverless PySpark batch job, which needs to connect to a public REST endpoint, and write to a GCS bucket in the same GCP project. After having submitted, the job was in PENDING state for awhile, and then failed with this error

StructuredError{spark, Timed out waiting for at least 1 worker(s) registered. This is often caused by firewall rules that prevent Spark workers from communicating with the master. Please review your network firewall rules and be sure they allow communication on all ports between all nodes. See https://cloud.google.com/dataproc-serverless/docs/concepts/network for instructions

In that guide, I could not understand the purpose of the section Open subnet connectivity.

Where are the master and workers running? Aren't they all in a separate VPC managed by GCP, why should I need to worry about the connectivity among them? What should I put in network-name and SUBNET_RANGES mentioned in that guide?

gcloud compute firewall-rules create allow-internal-ingress \
--network=network-name \
--source-ranges=SUBNET_RANGES \
--direction=ingress \
--action=allow \
--rules=all

Original Q&A

TechQA.

GCP DataProc Serverless - VPC/subnet/firewall requirements

There are 0 answers

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in NETWORKX

Related Questions in SERVERLESS

Related Questions in DATAPROC

Related Questions in GOOGLE-CLOUD-DATAPROC-SERVERLESS

Popular Questions

Trending Questions