I am currently working on implementing Vector Space Search (VSS) queries over vector embeddings in Redis. While I understand how to perform queries in a simple scenario with a single set of vectors, I am struggling to adapt this approach to more complex requirements. Here's an overview of what I'm trying to achieve:
I will have a large number of contexts, each containing approximately 50 sub-contexts. Each sub-context will consist of a few hundred embeddings. My goal is to perform queries within a specific context and have the ability to apply filters that include or exclude certain sub-contexts.
Most of the examples I've come across involve setting up a hashset of embeddings and passing it to a Redisearch query, which searches over all the vectors. However, I'm unsure how to proceed, I have a few ideas about which direction to go in but I dont know if they are possible or optimal. These are the two ideas I can think of...
Multiple Hashsets in a Single VSS Query:
Is it possible to provide multiple hashsets to a single VSS query in Redisearch? Ideally, I would like to retrieve the desired sub-context hashsets from Redis and include them in a single query. Is this feasible?
Applying Filters to Hashset Queries:
Is there a way to apply filters to a query on a hashset in Redisearch? If I have all the embeddings in a single hashset, indexed by sub-context ID, is there a mechanism to filter the query results based on the sub-contexts?
Please let me know if any further clarification or details are needed, thanks!
I would treat the "context" as a prefix on the keys for your hashes. So say you have
ctx1,ctx2andctx2. I would create my Hash keys as something likesomething:ctx:1:xxxwherexxxis the actual primary key of the hash, and the number afterctx:is the context. Then I would create multiple indices for the context, using the prefix in the index creation, so you would have indices likesomething:ctx:1,something:ctx:2,something:ctx:3which you could search individually, now if there is contexts hierarchy, say something likesomething:ctx:1:2:xxxI would decide based on usage if I create individual indices for all subcontexts (something:ctx:1:1andsomething:ctx:1:2..N) or just for the parent/encompassing context. Read about ephemeral indices for some ideas https://redis.com/blog/the-case-for-ephemeral-search/