spark sql - Have disabled Broadcast Hash Join ,but "NOT IN" query still do the mechanism

43 views Asked by At

Here's my spark sql version: 3.1.3

SELECT 
 count(
  DISTINCT test_table.name
 ) AS cnt 
FROM 
 test_table 
WHERE 
 test_table.inv_date >= '20230101' 
 AND test_table.inv_date < '20230601' 
 AND (
  test_table.name NOT IN (
   SELECT 
    DISTINCT test_table.name AS name 
   FROM 
    test_table 
   WHERE 
    test_table.inv_date >= '20230101' 
    AND test_table.inv_date < '20230501'
  )
 )

Here's the spark application UI.

I would like to ask why the query contains "NOT IN" still do broadcast join even though I have disabled it?

I disabled Broadcast Hash Join (set spark.sql.autoBroadcastJoinThreshold to -1), but the following query still did broadcast.

0

There are 0 answers