I'm working with Hibernate Search for months now, but still I'm not able to digest the relevance it brings. I'm overall satisfied with the results it returns, but even simplest test does not satisfy my expectation.
First test was using the term frequency(tf). Data:
- word
- word word
- word word word
- word word word word
- word word word word word
- word word word word word word
Results I get:
- word
- word word word word
- word word word word word
- word word word word word word
- word word
- word word word
I'm really confused with this scoring effect. My Query is quite complex, but as this test did not have any other field involved, it can be simplified as below: booleanjunction.should(phraseQuery).should(keywordQuery).should(fuzzyQuery)
I've analyzers as below:
StandardFilterFactory
LowerCaseFilterFactory
StopFilterFactory
SnowballPorterFilterFactory for english
My Explanation object https://jsfiddle.net/o51kh3og/
Scoring calculation is something really complex. Here, you have to begin with the primal equation:
As you said, you have
tfwhich means term frequency and its value is the squareroot of the frequency of the term.But here, as you can see in your explanation, you also have
norm(akafieldNorm) which is used infieldWeightcalculation. Let's take your example:Here,
eklavyahas a better score than the other becausefieldWeightis the product oftf,idfandfieldNorm. This last one is higher foreklavyadocument because he only contains one term.As above documentation said:
The more terms you have in a field, lower
fieldNormwill be. Be careful with the value of this field.So, to conclude, here you have a perfect mix to understand that the score is not calculated only with the frequency but also with the number of term that you have in your field.