Background
This question is too broad probably, but the context here is, i have a couchbase 7.1 enterprise server, and i am running with 3 index node and 3 data node setup.
I am having secondary index on my collection and not having any primary index.
I have written a query with window function row_number to partition and order the documents inside collection
- all the where clause entries (columns) in my query is part of my secondary index.
- my partition by clause (column on which i am partitioning) is also part of my secondary index.
still the query execution is taking more time around 4-5 seconds. I am having around 200k documents on the collection on which i am querying.
Query : (Please note this is anonymised query to give perspective and not actual query)
WITH data_from_index AS (
SELECT *, ROW_NUMBER(PARTITION BY col1 ORDER BY col2) as num
FROM regions
WHERE country = "usa" AND state="ny" AND language="spanish"
),
ranking AS (
SELECT *
FROM data_from_index
WHERE num=1
),
sorted AS (
SELECT * FROM ranking
ORDER BY col3
OFFSET 0 LIMIT 500
)
SELECT * FROM sorted;
In above query analysis what i found is the first clause which reads data from collection itself is taking 95% of the time, and it is able to use the index which we have, but it is selecting around 100k records out of my total 200k records, and then in subsequent clauses, it is doing sorting and paging on those 100k.
Question
What are the other things which I should keep in mind to optimize query performance in couchbase, apart from things mentioned in background section, and if I am missing something fundamental here, please point out.
Window functions will not reduce the any final result. PARTITION BY, ORDER BY required sort. Which depends on the result set size. Best way diagnosis is: