Is there a method to index a field so that each substring containing a word would be treated as separate tokens?
For example, input: "hello world, how are you?"
output: "hello world how are you", "hello world how are", "hello world how", "hello world", "hello"
This would be used in combination of SuggestComponent to provide autosuggestion for users.
In principle, something like
solr.ShingleFilterFactorycould do the trick for you. It has 2 params:minShingleSizeandmaxShingleSize, so it will generate a lot of tokens for you and some of them could be not useful for you (also it will mean for you a lot of wasted space on disk)Potentially, you need either to filter out not needed tokens or potentially to write your own filter.