Retrieving terms for a document in pylucene

25 views Asked by At

I'm using NGramTokenFilter to process text and store it into PyLucene's index. When searching for a document using IndexSearcher, is it possible to get the list of ngrams that's representing the document found from the indexSearcher? or should I just rerun the analyzer using:

analyzer = myAnalyzer()
stream = analyzer.tokenStream("", StringReader("FACEBK ADS"))
stream.reset()
tokens=[]
while stream.incrementToken():
    tokens.append(stream.getAttribute(CharTermAttribute.class_).toString())
print(tokens)

I use the following Field settings when building my Index Field:

field = FieldType()
field.setStored(True)
field.setTokenized(True)
field.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS)


field.setStoreTermVectors=True #RESULTS IN ERROR when used

When searching:

reader = DirectoryReader.open(directory)
searcher = IndexSearcher(reader)
analyzer = myAnalyzer()

query  = QueryParser("indexed_field", analyzer).parse('test')
scoreDocs = searcher.search(query, 1).scoreDocs
0

There are 0 answers