I have a model representing an article:
public class Article {
@Id
private Integer id;
private String title;
private String content;
// ...
// plenty of other article properties, used to classify and filter over them
// ...
@ElementCollection
@CollectionTable(name = "article_tags", joinColumns = @JoinColumn(name = "article_id"))
@Column(name = "tag")
private Set<String> tags;
}
And a specification used among oters to build a query dynamically:
...
public static Specification<Article> byTagAnyOf(Set<String> referenceTags) {
return (root, query, builder) -> {
if (CollectionUtils.isEmpty(referenceTags)) {
return builder.conjunction();
}
return builder.or(referenceTags.stream()
.map(tag -> builder.isMember(tag, root.get("tags")))
.toArray(Predicate[]::new));
};
}
...
This solution actually works, but creates a query with numerous or statements, performing selection of tag elements, what causes performance problems.
Is there a way to perform array intersection check either by joining referenceTags or by selecting except by means of Specification API?
EDIT 1 (add Hibernate generated sql)
Hibernate:
select
article0_.id,
article0_.title,
article0_.content
from
articles article0_
where
1=1
and 1=1
and (
article0_.author in (
? , ? , ?
)
)
and (
? in (
select
tagart1_.tag
from
article_tags tagart1_
where
article0_.id=tagart1_.article_id
)
...
or ? in (
select
tagart6_.tag
from
article_tags tagart6_
where
article0_.id=tagart6_.article_id
)
)
and 1=1
and 1=1
and 1=1
and 1=1
and 1=1
and 1=1
and 1=1
You could have several reasons because for your performance issue, and for me make sense to investigate all of them.
Regarding your issue, you have many OR conditions because it's you that specify this with
builder.or.Honestly I didn't understand what you mean with array intersection, but I have a question for you: Have you consider to use the IN statement instead an OR of all the possible tags?
The second thing I suggest you to check is the generated query, please verify what is the query that is fired in the Database (you can found it in the logs if you activate it) and things will be more clear at least for me.
I perfectly understand that you are using Specification for generate queries dynamically, there is nothing wrong in this, but It's also true that you are generating "unpredictable" queries and this will force you to create all the possible indexes combination in the database (it's crazy to generate an index for all the possible combination).
A chain of OR clause could prevent the database to use the correct indexes, try with IN statement and let me know.
In conclusion my suggestions are: