I have an apache beam pipeline, written in python, that reads events from a kafka topic, does some curation, and write the cureted events to a bigquery table.
The pipeline runs properly when running locally with direct runner,
but not when running on dataflow.
On dataflow, I can see that messages are being read and curated properly from kafka, but when it comes to the bigquery step, I dont see any success/failure logs, nor new rows on the bigquery table.
import apache_beam as beam
from apache_beam.options.pipeline_options import SetupOptions
def curation_func(...):
pass
# ...
beam_options = SetupOptions(...)
with beam.Pipeline(options=beam_options) as pipeline:
(
pipeline
| "read kafka events" >> kafkaio.KafkaConsume(...)
| "curation" >> beam.Map(curation_func)
| "write to bigquery" >> beam.io.WriteToBigQuery(
table="my_table",
project="my_project",
dataset="my_dataset",
method="STREAMING_INSERTS",
create_disposition="CREATE_NEVER",
write_disposition="WRITE_APPEND",
)
)
Is there something wrong with my pipeline?
How can I see the logs of the bigquery step?