Creating a Materialized View using actual date on the filter

50 views Asked by At

I'm trying to create a Materialized View to optimize comsuption of data. I want the view to show just the registers of the day, but when I try to use functions like CURRENT_DATE it shows the error message "Materialized view query cannot use functions that depend on current time (example: CURRENT_TIMESTAMP, CURRENT_DATE)."

I tried to use some custom functions and to use a select on the filter to get the last ocurrance (as this table has a big flux of data) both without sucess, as it returns the message "The incremental materialized view query contains an unsupported feature. This may be caused by having an expression on top of the grouping keys or aggregated values."

-- CURRENT_DATE

CREATE MATERIALIZED VIEW `myProjectId.mydataSet.statistics_mv` AS
SELECT
  _id,
  column1,
  DATA_STATISTICS
...
  
FROM
  `myProjectId.mydataSet.mytable`
WHERE CAST(DATA_STATISTICS AS DATE) > CURRENT_DATE('America/Sao_Paulo')

-- SELECT ON FILTER

CREATE MATERIALIZED VIEW `myProjectId.mydataSet.statistics_mv` AS
SELECT
  _id,
  column1,
  DATA_STATISTICS
...
  
FROM
  `myProjectId.mydataSet.mytable`
WHERE DATA_STATISTICS = (
  SELECT DATA_STATISTICS 
  FROM `statistics-homolog.MultAutomovel.raw`
  ORDER BY DATA_STATISTICS DESC
  LIMIT 1
)

Any sugestion how to do it?

2

There are 2 answers

0
Pratik Patil On

In BigQuery, materialized views are precomputed views that periodically cache the results of a query for increased performance and efficiency. [Ref: https://cloud.google.com/bigquery/docs/materialized-views-intro]

If you have any dynamic keywords (like current date, current timestamp) in the view definition then GBQ can not handle the refresh rate, so those are not allowed.

If you do not need the "materialized" reason for the view, then you can get away with this:

e.g.

CREATE VIEW `tmp.sample_view_delete_me` AS
select * from 
(SELECT
  'a' as col, 
  current_date() as day 
 ) 
WHERE  day = CURRENT_DATE()

I do not know more context of this requirement of yours but to go ahead maybe you can look into: either just staying with a standard view or a stored procedure or some other programmatic way of sending the current date to a block of SQL.

0
SQLmojoe On

Materialized views generally do not allow non-deterministic or mutable functions because they defeat the purpose of an MV. If they were allowed, the MV will have to be recomputed (full refresh, not incremental) every time you call the MV else the data will be incorrect. In that case, why bother materializing? The persisted result is invalid as soon as it's generated.

From your examples, the result of both queries you want to create an MV on can be different every time you run it. Actually, the first will always be different. The second can be different but there's no way to determine if it is different unless you run it. So in both cases, you cannot trust the persisted results in the MV to be correct.

Some databases relax the restrictions a little for MVs that are not automatically refreshed or have explicit staleness defined in the MV (same net result as no auto-refresh). That means the MV provides a precomputed result for a specific point in time so it is correct for that point in time. Stale, but correct. BigQuery does not support this today but you can create a feature request for it.

Note that databases that have the concept of MAX_STALENESS cannot support such functions either. MAX_STALENESS is not deterministic. It is not the same as exact staleness. Max staleness has to guarantee correct results even if it's stale which it cannot do if there are mutable or non-deterministic functions in the MV.

Again, in your example with CURRENT_DATE. If MAX_STALENESS is set to 5 minutes, CURRENT_DATE can return 2 different values if you query anytime >2355 and <0000. As far as the MV is concerned, so long as data is <5 minutes stale, it is correct but that cannot be guaranteed if there is a CURRENT_DATE function in the MV. Same with the predicate based on results from another query.