apache beam on dataflow: WriteToBigQuery doesnt work

36 views Asked by asafal At 25 February 2024 at 22:32

I have an apache beam pipeline, written in python, that reads events from a kafka topic, does some curation, and write the cureted events to a bigquery table.

The pipeline runs properly when running locally with direct runner, but not when running on dataflow.
On dataflow, I can see that messages are being read and curated properly from kafka, but when it comes to the bigquery step, I dont see any success/failure logs, nor new rows on the bigquery table.

import apache_beam as beam
from apache_beam.options.pipeline_options import SetupOptions


def curation_func(...):
    pass


# ...
beam_options = SetupOptions(...)
with beam.Pipeline(options=beam_options) as pipeline:
    (
        pipeline
        | "read kafka events" >> kafkaio.KafkaConsume(...)
        | "curation" >> beam.Map(curation_func)
        | "write to bigquery" >> beam.io.WriteToBigQuery(
            table="my_table",
            project="my_project",
            dataset="my_dataset",
            method="STREAMING_INSERTS",
            create_disposition="CREATE_NEVER",
            write_disposition="WRITE_APPEND",
        )
    )

Is there something wrong with my pipeline?
How can I see the logs of the bigquery step?

Original Q&A

TechQA.

apache beam on dataflow: WriteToBigQuery doesnt work

There are 0 answers

Related Questions in PYTHON

Related Questions in GOOGLE-BIGQUERY

Related Questions in GOOGLE-CLOUD-DATAFLOW

Related Questions in APACHE-BEAM

Related Questions in APACHE-BEAM-IO

Popular Questions

Trending Questions