Efficiently match texts contained in a query text

70 views Asked by At

I have a set of documents S in an index, where each document D has a text field D.text. I want to use a text query Q to find the documents with texts contained/match within the query Q.

An example:

  • A set S with documents D1, D2 and D3 have texts "Stranger Things", "special effects are top-notch", "entertaining and always keeps me on the edge", respectively.
  • A text query Q: "I think 'Stranger Things' is one of the best shows on Netflix. The acting is superb, the plot is intriguing, and the special effects are top-notch."
  • I have to find documents D1 and D2, because the text D1.text and D2.text are in Q.

So far I am using the PhraseMatcher class from SpaCy, which efficiently match large terminology lists based on a text. However, I every time need to build a large terminology list (the size of set S is >100000 document but can be even bigger) in memory to query around 100 texts to find these matches. I get requests to do this a few times a second.

Is there any way to perform this query in Elastic Search?

1

There are 1 answers

1
Tom Elias On

ElasticSearch, with default mapping, will seperate the text in D1,D2,D3 to "keywords seperated by space" and index those keywords. you can split the words in your query Q to keywords and search those keywords in your index which will produce only the documents that contain one or more of the searched "terms"