Efficiently match texts contained in a query text

Question

Efficiently match texts contained in a query text

70 views Asked by Andres Ferrero At 19 January 2023 at 19:07

I have a set of documents S in an index, where each document D has a text field D.text. I want to use a text query Q to find the documents with texts contained/match within the query Q.

An example:

A set S with documents D1, D2 and D3 have texts "Stranger Things", "special effects are top-notch", "entertaining and always keeps me on the edge", respectively.
A text query Q: "I think 'Stranger Things' is one of the best shows on Netflix. The acting is superb, the plot is intriguing, and the special effects are top-notch."
I have to find documents D1 and D2, because the text D1.text and D2.text are in Q.

So far I am using the PhraseMatcher class from SpaCy, which efficiently match large terminology lists based on a text. However, I every time need to build a large terminology list (the size of set S is >100000 document but can be even bigger) in memory to query around 100 texts to find these matches. I get requests to do this a few times a second.

Is there any way to perform this query in Elastic Search?

Original Q&A

There are 1 answers

**Tom Elias** · Answer 1 · 2023-01-23T13:46:12+00:00

ElasticSearch, with default mapping, will seperate the text in D1,D2,D3 to "keywords seperated by space" and index those keywords. you can split the words in your query Q to keywords and search those keywords in your index which will produce only the documents that contain one or more of the searched "terms"

TechQA.

Efficiently match texts contained in a query text

There are 1 answers

Related Questions in ELASTICSEARCH

Related Questions in LUCENE

Related Questions in SPACY

Related Questions in OPENSEARCH

Related Questions in PYLUCENE

Popular Questions

Trending Questions