Extract Jotform Product Data from Multi Line Cell in Google Sheets

164 views Asked by At

I am using Jotform for customers to place quarterly orders, and we then receive this data into a google sheet. Unfortunately Jotform's product list places all the items into one multi-line cell, so when trying to total the number of cases for each item to know how much to order from our suppliers it is difficult to extract the information.

As an example the data we receive would be something like:

Product 1 (Amount: 32.00 USD, Number of cases: 66)
Product 2 (Amount: 16.00 USD, Number of cases: 2)
Product 5 (Amount: 35.00 USD, Number of cases: 3)
Product 6 (Amount: 17.00 USD, Number of cases: 3)
Product 9 (Amount: 13.00 USD, Number of cases: 1)
Product 10 (Amount: 12.00 USD, Number of cases: 4)
Total: 2,361.00 USD

There are 26 products in total, and while they do seem to come in the same order, not all products are present in each order.

I have tried several things with regex, splitting lines by CHAR(10), but can't quite find a simple solution to it. What I would like is an output where I could perhaps create a set of columns for each product and have the number of cases for that product on each row for our customer.

A sample with a small variety of the order lines is here

One solution that has proved promising is using the function from this solution combined with a regex/substitute hybrid, this can be seen on the 'expand_split' tab in the linked sheetand following table:

Customer Order Product 1 Product 2 Product 3 Product 4 Product 5 Product 6 Product 7 Product 8 Product 9 Product 10
Customer 1 Product 1 (Amount: 32.00 USD, Number of cases: 30) 30 0 0 0 0 0 0 0 0 0
Customer 1 Product 2 (Amount: 16.00 USD, Number of cases: 28) 0 28 0 0 0 0 0 0 0 0
Customer 1 Product 3 (Amount: 19.00 USD, Number of cases: 17) 0 0 17 0 0 0 0 0 0 0
Customer 1 Product 5 (Amount: 9.00 USD, Number of cases: 25) 0 0 0 0 25 0 0 0 0 0
Customer 1 Product 7 (Amount: 35.00 USD, Number of cases: 35) 0 0 0 0 0 0 35 0 0 0
Customer 1 Product 9 (Amount: 17.00 USD, Number of cases: 33) 0 0 0 0 0 0 0 0 33 0
Customer 1 Product 10 (Amount: 13.00 USD, Number of cases: 17) 0 0 0 0 0 0 0 0 0 17
Customer 1 Total: 8,613.00 USD 0 0 0 0 0 0 0 0 0 0
Customer 2 Product 1 (Amount: 32.00 USD, Number of cases: 10) 10 0 0 0 0 0 0 0 0 0
Customer 2 Product 2 (Amount: 16.00 USD, Number of cases: 8) 0 8 0 0 0 0 0 0 0 0
Customer 2 Product 3 (Amount: 19.00 USD, Number of cases: 8) 0 0 8 0 0 0 0 0 0 0
Customer 2 Product 5 (Amount: 9.00 USD, Number of cases: 8) 0 0 0 0 8 0 0 0 0 0
Customer 2 Product 7 (Amount: 35.00 USD, Number of cases: 12) 0 0 0 0 0 0 12 0 0 0
Customer 2 Product 8 (Amount: 12.00 USD, Number of cases: 20) 0 0 0 0 0 0 0 20 0 0
Customer 2 Product 9 (Amount: 17.00 USD, Number of cases: 16) 0 0 0 0 0 0 0 0 16 0
Customer 2 Product 10 (Amount: 13.00 USD, Number of cases: 8) 0 0 0 0 0 0 0 0 0 8
Customer 2 Total: 8,156.00 USD 0 0 0 0 0 0 0 0 0 0
Customer 3 Product 1 (Amount: 32.00 USD, Number of cases: 3) 3 0 0 0 0 0 0 0 0 0
Customer 3 Product 2 (Amount: 16.00 USD, Number of cases: 3) 0 3 0 0 0 0 0 0 0 0
Customer 3 Product 3 (Amount: 19.00 USD, Number of cases: 3) 0 0 3 0 0 0 0 0 0 0
Customer 3 Product 7 (Amount: 35.00 USD, Number of cases: 3) 0 0 0 0 0 0 3 0 0 0
Customer 3 Product 8 (Amount: 12.00 USD, Number of cases: 4) 0 0 0 0 0 0 0 4 0 0
Customer 3 Product 9 (Amount: 17.00 USD, Number of cases: 3) 0 0 0 0 0 0 0 0 3 0
Customer 3 Product 10 (Amount: 13.00 USD, Number of cases: 3) 0 0 0 0 0 0 0 0 0 3
Customer 3 Total: 1,179.00 USD

with the following code pulling the data from the product line in B2:

=IF(REGEXMATCH($B2, C$1), 
SUBSTITUTE(MID($B2,SEARCH("Number of cases: ",$B2,1)+17,3),")",""), 
0)
1

There are 1 answers

2
Osm On BEST ANSWER

Use this if every product in a each row.

=ArrayFormula({
      {"Products";"Amount";"Number of cases"},
        SPLIT(BYROW({"(.+)\(";"\(Amount: (.*) USD,";"Number of cases: (.*)\)"},
              LAMBDA(re, TEXTJOIN("|",1,IFERROR(REGEXEXTRACT(A1:A6, ""&re&""))))),"|")})

enter image description here

Or this formula if all the input is in a single cell

=ArrayFormula({
      {"Products";"Amount";"Number of cases"},
        SPLIT(BYROW({"(.+)\(";"\(Amount: (.*) USD,";"Number of cases: (.*)\)"},
              LAMBDA(re, TEXTJOIN("|",1,IFERROR(REGEXEXTRACT(TRANSPOSE(SPLIT(A1, CHAR(10))), ""&re&""))))),"|")})

enter image description here

Update

=ArrayFormula(LAMBDA(rg, QUERY(TRANSPOSE({{"Products";"Amount";"Number of cases"},
        SPLIT(BYROW({"(.+)\(";"\(Amount: (.*) USD,";"Number of cases: (.*)\)"},
              LAMBDA(re, TEXTJOIN("|",1,IFERROR(REGEXEXTRACT(QUERY(FLATTEN(TRANSPOSE(SPLIT(rg, CHAR(10))))," where not Col1 contains 'Total:'  "), ""&re&""))))),"|")}), 
                    "Select Col1,sum(Col2),sum(Col3) group by Col1 order by sum(Col2) desc ",1))(B2:B))

enter image description here

Used formulas help
ARRAYFORMULA - SPLIT - BYROW - LAMBDA - TEXTJOIN - IFERROR - REGEXEXTRACT - TRANSPOSE - CHAR - QUERY - FLATTEN - NOT - SUM