search with filter by token count

Question

search with filter by token count

1.4k views Asked by d4ndel1on At 09 June 2015 at 13:31

Fields in documents are analyzed, to create token.

{"message":"hello world"} -> token: ["hello", "world"]
{"message":"hello"} -> token: ["hello"]
{"message":"world"} -> token: ["world"]
{"message":"hello java"} -> token: ["hello", "java"]
{"message":"java"} -> token: ["java"]

Is there a possibility to search all documents in which a specific field contains a given token and 1 or more token other token?

Result for the given example for token "hello" would be:
- 1,4
For "world":
- 1

As described in termvectors, one can access the tokens or statistics about them. This only works for specific documents but not as search filter for a query or aggregation.
Would be nice if someone could help.

Original Q&A

There are 1 answers

**Val** · Accepted Answer · 2015-06-10T02:54:44+00:00

Yes, you can use the token_count type for this. For instance, in your mapping, you can define message as a multi-field to contain the message itself (i.e. "hello", "hello world", etc) and also the number of tokens of the message. Then you'll be able to include constraints on the word count in your queries.

So your mapping for message should look like this:

curl -XPUT localhost:9200/tests -d '
{
  "mappings": {
    "test": {
      "properties": {
        "message": {
          "type": "string",           <--- message is a normal analyzed string
          "fields": {
            "word_count": {           <--- a sub-field to include the word count
              "type": "token_count",
              "store": "yes",
              "analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

Then, you can query for all documents having hello in the message, but only those whose message has more than one token. With the following query, you'll only get hello java and hello world, but not hello

curl -XPOST localhost:9200/tests/test/_search -d '
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "message": "hello"
          }
        },
        {
          "range": {
            "message.word_count": {
              "gt": 1
            }
          }
        }
      ]
    }
  }
}

Similarly, if you replace hello with world in the above query, you'll only get hello world.

TechQA.

search with filter by token count

There are 1 answers

Related Questions in ELASTICSEARCH

Related Questions in TOKEN

Popular Questions

Trending Questions