I have a custom analyzer that does parse keywords into ngrams:
class Custom_Analyzer(PythonAnalyzer):
def createComponents(self, fieldName):
source = LetterTokenizer()
filter = ASCIIFoldingFilter(source)
filter = LowerCaseFilter(source)
filter = StopFilter(filter, StopFilter.makeStopSet(['and','or'], True))
filter = NGramTokenFilter(filter, 5, 5,True)
return self.TokenStreamComponents(source, filter)
def initReader(self, fieldName, reader):
return reader
If I search for using AND in the query text, it would fail:
QueryParser("field", Custom_Analyzer()).parse("SOMETHING AND")
QueryParser("field", Custom_Analyzer()).parse("SOMETHING OR")
I know AND is a keyword in lucene for boolean search, but for some reason the STOP filter isn't removing it. If the query text is using lowercase, the QueryParser would succeed.
I know I can remove the AND and the OR before it gets into the custom analyzer, but I feel like it should be part of pylucene. What am I doing wrong?