dbt in BigQuery: partitioning spec is none

380 views Asked by At

I have a dbt model that looks like this:

{{
    config(
        materialized='incremental',
        incremental_strategy='insert_overwrite',
        partition_by = {'field': 'conversion_at_utc', 'data_type': 'timestamp'}
    )
}}

with base as (
    select
        conversion_id,
        user_id,
        campaign,
        timestamp_seconds(time) as conversion_at_utc,
    from {{ source('xyz', 'users_campaigns_conversion') }}
    where {{ select_date_increments('date(timestamp_seconds(time))')}}
)

select * from base

time is the unix-timestamp (seconds since 1970). When I run this model for the first time, it succeeds and creates a table that is partitioned by day and field conversion_at_utc. However, when I run it a second time, I get the error "Cannot replace a table with a different partitioning spec. Instead, DROP the table, and then recreate it. New partitioning spec is interval(type:day,field:conversion_at_utc) and existing spec is none"

Any idea which the existing spec could be none?

Thanks in advance

2

There are 2 answers

0
Poala Astrid On BEST ANSWER

The error you are facing is caused by using the incremental_strategy='insert_overwrite' configuration in dbt. Unfortunately, the 'insert_overwrite' incremental strategy does not allow altering the partitioning specification of an existing table.

When you run the dbt model for the first time, a new table is created with the appropriate partitioning specification. However, in subsequent runs, dbt attempts to update the existing table's data by overwriting the partitioned data with fresh information.

You may consider using a different incremental strategy that supports altering partitioning. You could use the merge incremental strategy instead of insert\_overwrite.

incremental_strategy = 'merge'

The extraction, transformation, and incremental loading process can be accomplished using the 'merge' strategy. This approach involves maintaining a unique key to identify records and ensuring that full refreshes are not allowed. By employing the 'merge' strategy, you can update the target table incrementally, only inserting new records and updating existing ones without altering the existing partitioning or other aspects of the table.

And since altering the incremental strategy is not an option, you can manually drop the existing table before running the dbt model again. This way, dbt will create a new table with the correct partitioning specification.

DROP TABLE IF EXISTS schema_name.table_name;

After you drop the table, rerun your dbt model. It will create the table again with the correct partitioning specification.

Note: Changing the partitioning specification for an existing table may cause data loss or unexpected behavior, so make sure to back up your data before making any changes. Additionally, consider the impact on downstream dependencies when altering the partitioning strategy or dropping and recreating the table.

0
pod On

This may not be the exact same issue you had, but I was getting the same error and discovered that it was due to a temp table dbt created that hadn't expired or been dropped by dbt yet. I was messing around with the partition column and at some point the job failed so the temp table was just silently hanging behind the scenes for 12 hours (when it was set to expire). This temp table gets created when you don't configure specific partitions to replace as part of your insert_overwrite strategy. If the job succeeds, the table gets dropped toward the end of the script.

Manually dropping that temp table in GBQ fixed the issue for me. The name of the temp table can be found in the script dbt generates upon building your model. It's usually named something like yourmodel__dbt_tmp.