We're running Solr 3.4 and have a relatively small index of 90,000 documents or so. These documents are split over several logical sources, and so each search will have an applied filter query for a particular source, e.g:
?q=<query>&fq=source:<source>
where source is a classic string field. We're using edismax and have a default search field text.
We are currently seeing q=* taking on average 20 times longer to run than q=*:*. The difference is quite noticeable, with *:* taking 100ms and * taking up to 3500ms. A search for a common word in the document set (matching nearly 50% of all documents) will return a result in less than 200ms.
Looking at the queries with debugQuery on, we can see that * is parsed to a DisjunctionMaxQuery((text:*)), while *:* is parsed to a MatchAllDocsQuery(*:*). This makes sense, but I still don't feel like it accounts for a slowdown of this magnitude (a slowdown of 2000% over something that matches 50% of the documents).
What could be causing this? Is there anything we can tweak?
When you are passing just
*you are ordering to check every value in the field and match it against*and that is a lot to do. However when you are using* : *you are asking Solr to give you everything and skip any matching.Solr/Lucene is optimized to do
* : *fast and efficient!