Optimize database for top-k range query

23 views Asked by twinsmaj At 28 December 2023 at 12:39

Given the database table below:

CREATE TABLE orders (
    orderId UUID PRIMARY KEY,
    beginTime TIMESTAMP,
    finishTime TIMESTAMP,
    orderBlob BLOB
);

How do I shard my database given the SELECT query below. It's basically a top-k range query. See below:

select * from orders AS o
where o.beginTime >= startTime and o.finishTime <= endTime
order by (c.finishTime - c.beginTime) DESC
LIMIT K

Talking points:

Since this is a range query, is it okay to define a clustered index on beginTime given that this will pre-sort the table rows? By the way, this is a write-heavy application, roughly 10 million writes/day. could it impact write speed?
Conversely, is it efficient to include a duration column(beginTime - endTime) in the orders table and define an index on it? My thought is that, with this approach, one could update the ORDER BY clause to order by duration. In this case I could shard through consistent hash on the orderId

TechQA.