How can one unnest an array in BigQuery without using unnest?

113 views Asked by At

I am trying to select hits.page.searchKeyword in a query, but I get the classical nested error. Do I have another option rather than unnest?

When I unnest data from hits (GA3 - UA) all my other selects duplicate their value, and it's a mess to fix. I just need to add the search term to my query, which I have identified in the scheme as hits.page.searchKeyword, and I am done. But I need to unnest the hits array without unnesting function.

SELECT 
  parse_date('%Y%m%d', date) AS date,
  channelGrouping,trafficSource.campaign, 
  hits.page.searchKeyword as searchterm,
  SUM(totals.visits) AS sessions, 
  count(distinct clientId),
  SUM(totals.transactions) AS transactions,
  SUM(totals.totalTransactionRevenue) AS revenue1mil
FROM `vdxl-prod-data-reporting-01.200759185.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20230101' AND '20231102'
GROUP BY 1, channelGrouping, trafficSource.campaign, hits.page.searchKeyword
ORDER BY 1 DESC

Error:

Cannot access field page on a value with type
ARRAY<STRUCT<hitNumber INT64, time INT64, hour INT64, ...>> at [4:8]
1

There are 1 answers

0
Martin Weitzmann On BEST ANSWER

UNNEST() simply turns an array into a relation. It doesn't automatically join that relation with the main table. That only happens if you do something like FROM `vdxl-prod-data-reporting-01.200759185.ga_sessions_*` t cross join t.hits (sometimes the cross join is replaced with its alias ,)

hits.page.searchKeyword is the internal search and means there can be multiple of these per session - how would you like to aggregate them to session scope?

One way could be to get a distinct list sorted by these keywords:

SELECT 
  parse_date('%Y%m%d', date) AS date,
  channelGrouping,
  trafficSource.campaign, 
  (select string_agg(distinct page.searchKeyword order by page.searchKeyword) from unnest(hits)) as searchterm,
  SUM(totals.visits) AS sessions, 
  count(distinct clientId),
  SUM(totals.transactions) AS transactions,
  SUM(totals.totalTransactionRevenue) AS revenue1mil
FROM `vdxl-prod-data-reporting-01.200759185.ga_sessions_*`
WHERE _TABLE_SUFFIX BETWEEN '20230101' AND '20231102'
GROUP BY 1, 2, 3, 4
ORDER BY 1 DESC

If you want one keyword per row you need the cross join - and of course it repeats session scoped rows for every hit in that row ... that's what joins do. You need to calculate your metrics on hit scope then and can't use session aggregates/totals anymore:

SELECT 
  parse_date('%Y%m%d', date) AS date,
  channelGrouping,
  trafficSource.campaign, 
  h.page.searchKeyword,
  count(distinct fullvisitorid || visitstarttime) AS sessions, 
  count(distinct clientId) as clients,
  countif(h.ecommerceaction.action_type='6') AS transactions,
  SUM(h.transaction.transactionrevenue /1000000) AS revenue
FROM `vdxl-prod-data-reporting-01.200759185.ga_sessions_*` as t
  CROSS JOIN t.hits as h
WHERE 
  _TABLE_SUFFIX BETWEEN '20230101' AND '20231102'
GROUP BY 
  1, 2, 3, 4
ORDER BY 
  1 DESC, 2, 3, 4