How to convert cross join lateral statement to big query?

172 views Asked by At

My sql statement is written using cross join lateral in postgres sql. But i want to execute this sql statement in google bigquery and bigquery doesn't support cross join lateral. How can i rewrite this sql for bigquery engine?

select 
left_table.*,x.*
from 
    data left_table
cross join lateral 
(

select string_agg(right_table.session_id,',')
from 
data right_table 
where left_table.source_ip = right_table.source_ip 
and 
    ((right_table.session_start_time >= left_table.session_start_time and right_table.session_start_time <= left_table.session_end_time)
    or 
    (right_table.session_end_time >= left_table.session_start_time and right_table.session_end_time <= left_table.session_end_time))
group by right_table.source_ip
) x
2

There are 2 answers

0
Adrian Maxwell On

Without lateral joins the "equivalent" is a "correlated subquery" in the select clause, and these are often a performance issue as the correlated subquery is executed row by row though the resultset:

SELECT
      left_table.*
    , (SELECT string_agg(right_table.session_id, ',') 
       FROM data right_table
       WHERE left_table.source_ip = right_table.source_ip
       AND (
            (
                right_table.session_start_time >= left_table.session_start_time
                AND right_table.session_start_time <= left_table.session_end_time
                )
            OR (
                right_table.session_end_time >= left_table.session_start_time
                AND right_table.session_end_time <= left_table.session_end_time
                )
            )
       GROUP BY right_table.source_ip)  x
FROM data left_table

nb: The lateral join is executed in a more efficient manner (as part of the from clause).

0
Maimoona Abid On

You can try this code. In this code CROSS JOIN LATERAL is not used, instead correlated sub-queries are used to achieve the same result.

WITH AggregatedSessions AS (
      SELECT
        left_table.*,
        (
          SELECT STRING_AGG(right_table.session_id, ',')
          FROM data AS right_table
          WHERE
            left_table.source_ip = right_table.source_ip
            AND (
              (right_table.session_start_time >= left_table.session_start_time AND right_table.session_start_time <= left_table.session_end_time)
              OR
              (right_table.session_end_time >= left_table.session_start_time AND right_table.session_end_time <= left_table.session_end_time)
            )
          GROUP BY right_table.source_ip
        ) AS session_ids
      FROM data AS left_table
    )
    SELECT * FROM AggregatedSessions;