Using SQL to categorize multiple columns in a row, each with their own categorization logic

Question

Using SQL to categorize multiple columns in a row, each with their own categorization logic

144 views Asked by David J At 25 January 2023 at 19:22

Seeking suggestions for how to do the following in SQL (I have an approach but it seems very inefficient, described below). I have a ‘transactions’ table that has the following general structure – each row pertains to a particular ‘order ID’ and ‘transaction type’, with columns of information about the transaction:

TYPE	ORDER_NO	revenue	shipping	rebates
SALE	100001	$55.22	$7.00	$(5.22)
REFUND	100001	$(55.22)	$0	$5.22
ADJUSTMENT	100001	$(8.00)	$(2.00)	$0
SALE	100005	$40.00	$4.00	$0
REFUND	100005	$(32.00)	$(4.00)	$0

I need to categorize the values in columns for accounting purposes, but each column has separate categorization logic depending on the value of TYPE for the row. In the extreme case where all values in 'order # 10001' above ended up in separate categories, the desired result would look like this:

TYPE	ORDER_NO	revenue	shipping	rebates	accounting_code
SALE	100001	$55.22			101
SALE	100001		$7.00		102
SALE	100001			$(5.22)	103
REFUND	100001	$(55.22)			104
REFUND	100001			$5.22	105
ADJUSTMENT	100001		$(8.00)		106
ADJUSTMENT	100001			$(2.00)	107

My approach has been to pull the transactions table as a temp table, and then have a series of sub-queries each pulling only one of the columns (and 0 for the others) and assigning the category based on TYPE and whichever column is being pulled in that sub query. I then union all the results back together (sample query below). But this seems cumbersome (as verified by performance). Thanks in advance for any thoughts on more efficient ways to approach this.

Current query (a concatenation of TYPE and the hard-coded column name stands in for accounting code here - e.g. 'SALES | revenue'):

with TT as
   (select * from transactions)
select 
   TYPE,
   ORDER_no,
   sum(revenue) as revenue,
   0 as shipping,
   0 as rebates,
   CONCAT (type,' | ', 'revenue') as acc_code

FROM TT

group by 
   TYPE,
   Order_no,
   CONCAT (type,' | ', 'revenue') 

UNION

…(repeat process for next column, setting others to 0)

Original Q&A

There are 1 answers

**EdmCoff** · Accepted Answer · 2023-01-25T21:23:32+00:00

I think I recommend considering something like the following, at least as a starting point:

WITH TT AS (
 SELECT
  type,
  order_no,
  sum(revenue) as revenue,
  sum(shipping) as shipping,
  sum(rebates) as rebates
 FROM 
  transactions
 GROUP BY
  type, order_no
)
SELECT type, order_no, revenue, 0 shipping, 0 rebates, CONCAT (type,' | ', 'revenue') acc_code
FROM TT
UNION ALL
SELECT type, order_no, 0 revenue, shipping, 0 rebates, CONCAT (type,' | ', 'revenue') acc_code
FROM TT
UNION ALL
SELECT type, order_no, 0 revenue, 0 shipping, rebates, CONCAT (type,' | ', 'revenue') acc_code
FROM TT

The CTE TT will do the group by once, likely improving performance (I'm skeptical your current CTE is helping performance-wise). If performance is still a concern (as per your comments), an index on (type, order_no) might help optimize the group by. Besides performance, this version of the CTE reduces the complexity of each of the main queries that aren't unioned together (e.g. the group by isn't repeated) making it easier to read and maintain.

Note that this handles the acc_code like your sample query does (as opposed to your expected output). You could adjust it if necessary, but that seems tangential to your question.

It's a little unclear what data types the revenue/shipping/rebates are. I'm assuming it's safe to sum them as numerics, since your existing query sums them and it's implied that your existing query works.

TechQA.

Using SQL to categorize multiple columns in a row, each with their own categorization logic

There are 1 answers

Related Questions in SQL

Related Questions in POSTGRESQL

Related Questions in ACCOUNTING

Related Questions in CATEGORIZATION

Popular Questions

Trending Questions