I have a set of documents S in an index, where each document D has a text field D.text. I want to use a text query Q to find the documents with texts contained/match within the query Q.
An example:
- A set S with documents D1, D2 and D3 have texts "Stranger Things", "special effects are top-notch", "entertaining and always keeps me on the edge", respectively.
- A text query Q: "I think 'Stranger Things' is one of the best shows on Netflix. The acting is superb, the plot is intriguing, and the special effects are top-notch."
- I have to find documents D1 and D2, because the text D1.text and D2.text are in Q.
So far I am using the PhraseMatcher class from SpaCy, which efficiently match large terminology lists based on a text. However, I every time need to build a large terminology list (the size of set S is >100000 document but can be even bigger) in memory to query around 100 texts to find these matches. I get requests to do this a few times a second.
Is there any way to perform this query in Elastic Search?
ElasticSearch, with default mapping, will seperate the text in D1,D2,D3 to "keywords seperated by space" and index those keywords. you can split the words in your query Q to keywords and search those keywords in your index which will produce only the documents that contain one or more of the searched "terms"