All/Any Implementation in SQL alongside IN

90 views Asked by At

Hello I need to implement few Queries via dapper on a structure similar to below pseudocode.

// Tags
[{id: 1, name: "Tech"}, {id: 2, name: "SQL"}, {id: 3, name: "C#"}]

// BlogPost [
{
  Id: 1
  Tags: [1, 2] // Tech, Sql
},
{
   Id: 2,
   Tags: [1,3] // Tech, C#  
},
{
   Id: 3,
   Tags: [1,2,3] // Text, Sql, C#
}]

Given this query

SELECT 
    [Blogpost].*
From BlogPost blogPost
    LEFT JOIN BlogPostTags tags ON tags.blogId = blogPost.Id
WHERE blogpost.Tags IN (1,2)

Running above query i would expect this result. [{blogId: 1}, {blogId: 2}, {BlogId: 3}]

  • I need to obtain a result looking like [{blogId: 1}] given the same parameters (1,2) in above query.

  • Also I need to obtain a result like post [{blogId: 1}, {BlogId: 3}] given the parameters (1,2) in above query.

Does sql (MSSQL) have any nifty way to obtain these result?

Or how could I obtain these result in performant manner as I would have more joins in the actual query?

Thanks!

2

There are 2 answers

1
Charlieface On BEST ANSWER

A "Contains Any" is just a simple EXISTS query.

var tagIds = new[]{ 1, 2, };

using connection = GetConnection();
const string query = @"
SELECT 
    bp.*
FROM BlogPost bp
WHERE EXISTS (SELECT 1
    FROM BlogPostTags tags
    WHERE tags.blogId = bp.Id
      AND tags.Id IN @tags
);

var results = await connection.QueryAsync<BlogPost>(query, new { tags, });

For the other two requirements, these are classic Relational Division, one being With Remainder, the other being Without Remainder.

For With Remainder, just do an EXISTS join, along with a grouped HAVING checking that all of them exist.

var tagIds = new[]{ 1, 2, };

using connection = GetConnection();
const string query = @"
SELECT 
    bp.*
FROM BlogPost bp
WHERE EXISTS (SELECT 1
    FROM BlogPostTags tags
    WHERE tags.blogId = bp.Id
      AND tags.Id IN @tags
    HAVING COUNT(*) = @count   -- are all input matched?
);

var results = await connection.QueryAsync<BlogPost>(query, new { tags, count = tags.Length });

For Without Remainder, it's similar, but you need to look for all tags, and then check in the HAVING that the only ones that match are the ones you are looking for.

var tagIds = new[]{ 1, 2, };

using connection = GetConnection();
const string query = @"
SELECT 
    bp.*
FROM BlogPost bp
WHERE EXISTS (SELECT 1
    FROM BlogPostTags tags
    WHERE tags.blogId = bp.Id
    HAVING COUNT(*) = @count   -- are all input matched?
       AND COUNT(*) = COUNT(CASE WHEN tags.Id IN @tags THEN 1 END)   -- are all tags in the list
);

var results = await connection.QueryAsync<BlogPost>(query, new { tags, count = tags.Length });

Note that Dapper parameterizes lists as (@p1, @p2) etc automatically.


You can also use a Table Valued Parameter, which is more efficient for large lists.

Define a Table Type, it's best to keep a few useful standard ones around.

CREATE TYPE dbo.IntList AS TABLE (value int PRIMARY KEY);

Then put the values into a DataTable and use .AsTableValuedParameter.

The query is also slightly different as you would use a join here.

var tagIds = new[]{ 1, 2, };
var table = new DataTable { Columns = { { "value", typeof(int) } } };
foreach (var tag in tagIds)
    table.Add(tag);

using connection = GetConnection();
const string query = @"
SELECT 
    bp.*
FROM BlogPost bp
WHERE EXISTS (SELECT 1
    FROM BlogPostTags tags
    LEFT JOIN @tags input ON input.value = tags.Id
    WHERE tags.blogId = bp.Id
    HAVING COUNT(*) = @count   -- are all input matched?
       AND COUNT(*) = COUNT(input.value)   -- did the join match all tags?
);

var results = await connection.QueryAsync<BlogPost>(query, new { tags = tags.AsTableValuedParameter("dbo.IntList"), count = tags.Length });
2
Marc Gravell On

These kinds of queries are pretty awkward to express in SQL (note: I'd avoid LEFT OUTER JOIN here, as that can cause row duplication)

"contains any" is OK (usually via EXISTS (SELECT 1 FROM BlogPostTags tags WHERE tags.blogId = blogPost.Id AND tags.Tags IN @tags), using Dapper's automatic expansion of @tags) - however, you are asking for "contains all" and "contains all and nothing else"; these are much more complex; "contains all" would typically be multiple EXISTS operations, and "contains all and nothing else" could be that same "contains all" with an additional NOT EXISTS (SELECT 1 FROM BlogPostTags tags WHERE tags.blogId = blogPost.Id AND tags.Tags NOT IN @tags)

However! I would emphasize again: this operation isn't ideal in SQL. For Stack Overflow post tagging (which looks fundamentally identical to this), we turned this on it's head, by adding a separate index/server that existed purely to perform tag operations, including pre-loading all the post/tag data into RAM in optimized ways, and indexing by tag (processing delta updates); this allows a lot of operations to be performed in memory using a variety of tricks. Much more effort and work than a simple SQL query, but also much more more efficient.