I am getting "No more spool space in user" error for the below query.
SEL
C.*,AD.*,P.*
FROM mytestDB.TABLE_1 C
INNER JOIN mytestDB.TABLE_2 D
ON C.COL_A = D.COL_A
AND C.COL_B = D.COL_B
LEFT OUTER JOIN mytestDB.TABLE_3 AD
ON C.COL_A = AD.COL_A
AND C.COL_B = AD.COL_B
LEFT OUTER JOIN mytestDB.TABLE_4 P
ON C.COL_A = P.COL_A
AND C.COL_B = P.COL_B;
Observations
- If I remove any of the LEFT OUTER JOIN the query is returning the results without any issues.
- If I use count(*) instead of column list, then also it is working
- If I select one or two columns, then also it is returning the data in viewpoint
- Spool skew is showing around 70-80 during query execution
All 4 tables used in the above query have the same PI columns (COL_A, COL_B) and stats are also collected on the same PI columns. The Skew factor of all the tables is less than or equal to 2.
Here is the total record count and distinct record count based on PI columns from each table.
--TOTAL Count
sel count(1) from mytestDB.TABLE_1; 89,758,653
sel count(1) from mytestDB.TABLE_2; 148,915,580
sel count(1) from mytestDB.TABLE_3; 12,446,171
sel count(1) from mytestDB.TABLE_4; 160,661
-- DISTINCT PI count
sel count( 1) from (sel distinct COL_A,COL_B from mytestDB.TABLE_1)a; 89,758,653
sel count( 1) from (sel distinct COL_A,COL_B from mytestDB.TABLE_2)a; 64,616,959
sel count( 1) from (sel distinct COL_A,COL_B from mytestDB.TABLE_3)a; 11,032,454
sel count( 1) from (sel distinct COL_A,COL_B from mytestDB.TABLE_4)a; 121,860
Explain plan
1) First, we lock mytestDB.AD in TD_MAP1 for read on a reserved
RowHash in all partitions to prevent global deadlock.
2) Next, we lock mytestDB.P in TD_MAP1 for read on a reserved
RowHash in all partitions to prevent global deadlock.
3) We lock mytestDB.D in TD_MAP1 for read on a reserved RowHash in
all partitions to prevent global deadlock.
4) We lock mytestDB.C in TD_MAP1 for read on a reserved RowHash in
all partitions to prevent global deadlock.
5) We lock mytestDB.AD in TD_MAP1 for read, we lock mytestDB.P in
TD_MAP1 for read, we lock mytestDB.D in TD_MAP1 for read, and we
lock mytestDB.C in TD_MAP1 for read.
6) We execute the following steps in parallel.
1) We do an all-AMPs JOIN step in TD_MAP1 from mytestDB.C by
way of a RowHash match scan with no residual conditions,
which is joined to mytestDB.AD by way of a RowHash match
scan with no residual conditions. mytestDB.C and
mytestDB.AD are left outer joined using a rowkey-based
merge join, with a join condition of (
"(mytestDB.C.COL_A = mytestDB.AD.COL_A) AND
(mytestDB.C.COL_B = mytestDB.AD.COL_B)").
The result goes into Spool 2 (all_amps), which is built
locally on the AMPs. Then we do a SORT to partition Spool 2
by rowkey. The size of Spool 2 is estimated with low
confidence to be 89,797,457 rows (12,212,454,152 bytes). The
estimated time for this step is 7.85 seconds.
2) We do an all-AMPs RETRIEVE step in TD_MAP1 from mytestDB.P
by way of an all-rows scan with no residual conditions into
Spool 5 (all_amps), which is built locally on the AMPs. The
size of Spool 5 is estimated with high confidence to be
12,476,097 rows (1,147,800,924 bytes). The estimated time
for this step is 0.39 seconds.
7) We do an all-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by
way of a RowHash match scan, which is joined to mytestDB.D by
way of a RowHash match scan with no residual conditions. Spool 2
and mytestDB.D are joined using a rowkey-based merge join, with
a join condition of ("(COL_B = mytestDB.D.COL_B)
AND (COL_A = mytestDB.D.COL_A)"). The result goes into
Spool 6 (all_amps), which is redistributed by the hash code of (
mytestDB.C.COL_A, mytestDB.C.COL_B) to all AMPs in
TD_Map1. The size of Spool 6 is estimated with low confidence to
be 149,320,197 rows (20,307,546,792 bytes). The estimated time
for this step is 14.00 seconds.
8) We do an all-AMPs JOIN step in TD_Map1 from Spool 5 (Last Use) by
way of an all-rows scan, which is joined to Spool 6 (Last Use) by
way of an all-rows scan. Spool 5 and Spool 6 are right outer
joined using a single partition hash join, with a join condition
of ("(COL_A = COL_A) AND (COL_B = COL_B)").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with low confidence
to be 151,674,776 rows (37,463,669,672 bytes). The estimated time
for this step is 9.80 seconds.
9) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 31.65 seconds.
Could you please provide some inputs on this issue ?
Tried collecting stats on PI columns