Seeking suggestions for how to do the following in SQL (I have an approach but it seems very inefficient, described below). I have a ‘transactions’ table that has the following general structure – each row pertains to a particular ‘order ID’ and ‘transaction type’, with columns of information about the transaction:
| TYPE | ORDER_NO | revenue | shipping | rebates |
|---|---|---|---|---|
| SALE | 100001 | $55.22 | $7.00 | $(5.22) |
| REFUND | 100001 | $(55.22) | $0 | $5.22 |
| ADJUSTMENT | 100001 | $(8.00) | $(2.00) | $0 |
| SALE | 100005 | $40.00 | $4.00 | $0 |
| REFUND | 100005 | $(32.00) | $(4.00) | $0 |
I need to categorize the values in columns for accounting purposes, but each column has separate categorization logic depending on the value of TYPE for the row. In the extreme case where all values in 'order # 10001' above ended up in separate categories, the desired result would look like this:
| TYPE | ORDER_NO | revenue | shipping | rebates | accounting_code |
|---|---|---|---|---|---|
| SALE | 100001 | $55.22 | 101 | ||
| SALE | 100001 | $7.00 | 102 | ||
| SALE | 100001 | $(5.22) | 103 | ||
| REFUND | 100001 | $(55.22) | 104 | ||
| REFUND | 100001 | $5.22 | 105 | ||
| ADJUSTMENT | 100001 | $(8.00) | 106 | ||
| ADJUSTMENT | 100001 | $(2.00) | 107 |
My approach has been to pull the transactions table as a temp table, and then have a series of sub-queries each pulling only one of the columns (and 0 for the others) and assigning the category based on TYPE and whichever column is being pulled in that sub query. I then union all the results back together (sample query below). But this seems cumbersome (as verified by performance). Thanks in advance for any thoughts on more efficient ways to approach this.
Current query (a concatenation of TYPE and the hard-coded column name stands in for accounting code here - e.g. 'SALES | revenue'):
with TT as
(select * from transactions)
select
TYPE,
ORDER_no,
sum(revenue) as revenue,
0 as shipping,
0 as rebates,
CONCAT (type,' | ', 'revenue') as acc_code
FROM TT
group by
TYPE,
Order_no,
CONCAT (type,' | ', 'revenue')
UNION
…(repeat process for next column, setting others to 0)
I think I recommend considering something like the following, at least as a starting point:
The CTE
TTwill do thegroup byonce, likely improving performance (I'm skeptical your current CTE is helping performance-wise). If performance is still a concern (as per your comments), an index on(type, order_no)might help optimize thegroup by. Besides performance, this version of the CTE reduces the complexity of each of the main queries that aren'tunioned together (e.g. thegroup byisn't repeated) making it easier to read and maintain.Note that this handles the
acc_codelike your sample query does (as opposed to your expected output). You could adjust it if necessary, but that seems tangential to your question.It's a little unclear what data types the revenue/shipping/rebates are. I'm assuming it's safe to
sumthem as numerics, since your existing querysums them and it's implied that your existing query works.