apache beam on dataflow: WriteToBigQuery doesnt work

36 views Asked by At

I have an apache beam pipeline, written in python, that reads events from a kafka topic, does some curation, and write the cureted events to a bigquery table.

The pipeline runs properly when running locally with direct runner, but not when running on dataflow.
On dataflow, I can see that messages are being read and curated properly from kafka, but when it comes to the bigquery step, I dont see any success/failure logs, nor new rows on the bigquery table.

import apache_beam as beam
from apache_beam.options.pipeline_options import SetupOptions


def curation_func(...):
    pass


# ...
beam_options = SetupOptions(...)
with beam.Pipeline(options=beam_options) as pipeline:
    (
        pipeline
        | "read kafka events" >> kafkaio.KafkaConsume(...)
        | "curation" >> beam.Map(curation_func)
        | "write to bigquery" >> beam.io.WriteToBigQuery(
            table="my_table",
            project="my_project",
            dataset="my_dataset",
            method="STREAMING_INSERTS",
            create_disposition="CREATE_NEVER",
            write_disposition="WRITE_APPEND",
        )
    )

Is there something wrong with my pipeline?
How can I see the logs of the bigquery step?

0

There are 0 answers