Convert comma delimited values in a column into rows

3.2k views Asked by At

I would like to convert comma-delimited values into rows in Redshift

For example:

store  |location |products
-----------------------------
1      |New York |fruit, drinks, candy...

The desired output is:

store  |location | products
------------------------------- 
1      |New York | fruit        
1      |New York | drinks         
1      |New York | candy     

Are there any simple solution that I could split the words based on delimiters and convert into rows? I was looking into this solution but it does not work yet: https://help.looker.com/hc/en-us/articles/360024266693-Splitting-Strings-into-Rows-in-the-Absence-of-Table-Generating-Functions

Any suggestions would be greatly appreciated.

5

There are 5 answers

0
Gordon Linoff On BEST ANSWER

If you know the maximum number of values, I think you can split_part():

select t.store, t.location, split_part(products, ',', n.n) as product
 from t join
      (select 1 as n union all
       select 2 union all
       select 3 union all
       select 4
      ) n
      on split_part(products, ',', n.n) <> '';
 

You can also use:

select t.store, t.location, split_part(products, ',', 1) as product
from t 
union all
select t.store, t.location, split_part(products, ',', 2) as product
from t 
where split_part(products, ',', 2) <> ''
union all
select t.store, t.location, split_part(products, ',', 3) as product
from t 
where split_part(products, ',', 3) <> ''
union all
select t.store, t.location, split_part(products, ',', 4) as product
from t 
where split_part(products, ',', 4) <> ''
union all
. . .
0
MatBailie On

First, you need to create a numbers table, as joining against another table is the only way redshift has of turning one row in to many rows (there are no flatten or unnest capabilities).

  • For example, a table with 1024 rows with the values 1..1024 in it

Then you can join on that and use split_part()

SELECT
  yourTable.*,
  numbers.ordinal,
  split_part(your_table.products, ',', numbers.ordinal)  AS product
FROM
  yourTable
INNER JOIN
  numbers
    ON  numbers.ordinal >= 1
    AND numbers.ordinal <= regexp_count(your_table.products, ',') + 1

But...

Redshift is shite at projecting the number of required rows. It will join the whole 1024 rows, then reject the mis-matching rows.

It performs like a dog.

Because the design assumption is that such processing is always done before loading in to Redshift.

4
Mansi Mistry On
CREATE TABLE temptbl  
(
    store INT,
    location  NVARCHAR(MAX),
    products NVARCHAR(MAX)
)



INSERT temptbl   SELECT 1,  'New York', 'Fruit, drinks, candy'

output of created table as u are created

output

select * from temptbl


;WITH tmp(store, location, DataItem, products) AS
(
    SELECT
        store,
        location,
        LEFT(products, CHARINDEX(',', products + ',') - 1),
        STUFF(products, 1, CHARINDEX(',', products + ','), '')
    FROM temptbl
    UNION all

    SELECT
        store  ,
        location,
        LEFT(products, CHARINDEX(',', products + ',') - 1),
        STUFF(products, 1, CHARINDEX(',', products + ','), '')
    FROM tmp
    WHERE
        products > ''
)

SELECT
    store,
    location,
    DataItem
FROM tmp

You want comma seperated values in Multiple Rows: output you want after run above commands:

output

Hope you find your solution :)))

0
Akina On

MYSQL is also fine


CREATE TABLE test
SELECT 1 store, 'New York' location, 'fruit,drinks,candy' products;

SELECT store, location, product
FROM test
CROSS JOIN JSON_TABLE(CONCAT('["', REPLACE(products, ',', '","'), '"]'),
                      "$[*]" COLUMNS (product VARCHAR(255) PATH "$")) jsontable
store location product
1 New York fruit
1 New York drinks
1 New York candy

db<>fiddle here

0
O. Jones On

In MySQL, this will work for up to four comma-separated values. Note UNION, not UNION ALL. Fiddle

SELECT store, location,  
       TRIM(SUBSTRING_INDEX(products, ',', 1)) product
  FROM inventory
 UNION 
SELECT store, location, 
       TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(products, ',', 2), ',', -1))
  FROM inventory
 UNION 
SELECT store, location, 
       TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(products, ',', 3), ',', -1))
  FROM inventory
 UNION 
SELECT store, location, 
       TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(products, ',', 4), ',', -1))
  FROM inventory

I'll echo what other people have said. With respect, comma-separated values is a bad table design.

  • It makes for ugly SQL. Being able to read and reason about SQL is very important. Clarity always wins.
  • And, the AWS shareholders will love you for it, because you'll spend a lot of extra money on redshift.