Fields in documents are analyzed, to create token.
{"message":"hello world"}-> token: ["hello", "world"]{"message":"hello"}-> token: ["hello"]{"message":"world"}-> token: ["world"]{"message":"hello java"}-> token: ["hello", "java"]{"message":"java"}-> token: ["java"]
Is there a possibility to search all documents in which a specific field contains a given token and 1 or more token other token?
- Result for the given example for token "hello" would be:
- 1,4
- For "world":
- 1
As described in termvectors, one can access the tokens or statistics about them. This only works for specific documents but not as search filter for a query or aggregation.
Would be nice if someone could help.
Yes, you can use the
token_counttype for this. For instance, in your mapping, you can definemessageas a multi-field to contain the message itself (i.e. "hello", "hello world", etc) and also the number of tokens of the message. Then you'll be able to include constraints on the word count in your queries.So your mapping for
messageshould look like this:Then, you can query for all documents having
helloin the message, but only those whosemessagehas more than one token. With the following query, you'll only gethello javaandhello world, but nothelloSimilarly, if you replace
hellowithworldin the above query, you'll only gethello world.