I'm working on SQL Server 2019 and have a problem I am trying to solve which I can best describe as "bundling" - the end result is a significantly shrunken dataset and fewer inventory items to keep track of. Here is a quick example of the dataset:
| Account ID | Product ID |
|---|---|
| 1000 | 10 |
| 1000 | 20 |
| 1000 | 30 |
| 1001 | 10 |
| 1001 | 20 |
| 1001 | 30 |
| 1001 | 50 |
| 1001 | 60 |
| 1001 | 70 |
| 1002 | 50 |
| 1002 | 60 |
| 1002 | 70 |
| 1003 | 80 |
| 1004 | 10 |
| 1004 | 20 |
| 1004 | 80 |
I created a new table which the "bundles" reside in as a reference table:
| Bundle ID | Product ID |
|---|---|
| 1 | 10 |
| 1 | 20 |
| 1 | 30 |
| 2 | 50 |
| 2 | 60 |
| 2 | 70 |
| 3 | 80 |
Quick note: In order for a bundle to be present on a given Account ID, ALL Product IDs within the bundle must be present. My goal is to have this sort of output:
| Account ID | Bundle ID | # of Products outside Bundles |
|---|---|---|
| 1000 | 1 | 0 |
| 1001 | 1 | 0 |
| 1001 | 2 | 0 |
| 1002 | 2 | 0 |
| 1003 | 3 | 0 |
| 1004 | 3 | 2 |
I expect to be eliminating/reviewing any outliers in the result set, i.e. Account IDs which have a number in the third column =/ 0
Initially I was just trying to use cases before making the actual table that the Bundles reside in, and then opted to put the product selection inside of a CTE. My second approach was to string together every product name (another column part of the first table, names are distinct) and associate the extended string listing every single product the account owns as a new column within the CTE. Then, if the Account ID hits every single qualifier (i.e. PRODUCTSTRING LIKE 'Product1' AND PRODUCTSTRING LIKE 'Product2' . . . ) then it fulfills the case. This approach feels shaky at best and at worst like it will just give incorrect results for me unknowingly.
The only other approach that I can think of thus far is to count the number of unique products an Account ID has and match it to the count of unique products of a bundle, and then run cases based off of that, but even that feels like it will become extremely bulky very quickly.
The Account ID/Product ID view I'm working off of has around 1300 rows. Could anyone point me in the direction of either an approach I have not considered or a tool I am missing?
This starts off as a kind of Relational Division With Remainder problem. You are missing the
AccountandBundletables, which store the unique values respectively, I have assumed these exist.However, this is going to be extremely difficult to get what you want, especially given how poorly defined the problem is.
If you wanted to know just how many
ProductsRemainingthere are after applying a singleBundle, you can do as follows:AccountwithBundleAPPLY...AccountProductfor thatAccount...BundleProductfor thatBundle...AccountProduct, ie there exists aBundleProductwith no matchingAccountProduct.AccountProductwith noBundleProduct.db<>fiddle
The problems start when, judging by your comments, you want to be able to take any possible combination of
Bundleto make up the products, but we don't necessarily know which combination. There may be multiple possible combinations, some may be contradictory. You may need a combinatorics approach in some procedural language, trying every possible combnation of bundles.For example, what happens if you put together one bundle which contains products
1, 2and another containing2, 3, is that a valid combination for an account which has products1, 2, 3?