Distributed System: ML Queue

43 views Asked by At

I want to design a distributed queue for a machine learning inference system (using kubernetes). The problem is the following:

Each request also contains a model_id the request should be executed with (there are a lot of models). So when a consumer gets a queue message it has to download the model file from a blob storage (maybe cache it locally) and put it onto the gpu and then execute it with a given datapoint (also given in the request).

This download and gpu load takes some time. Therefore, it would be really beneficial if the next queue element the worker gets also contains a request with the same, already loaded model_id.

Is this possible with any of the existing queue systems like rabbitmq or redis?

Furthermore, it would be good to have some sort of equal chance to get a request for each model_id independent of the number of requests for each model_id.

I tried to build this with redis but I don't see any mechanism to tell the queue to filter the messages and give me one with the specified model_id if it exists oder any other random model_id.

1

There are 1 answers

0
wpnpeiris On

Here you mean a distributed queue? sounds like you need to have multiple consumers (workers) for a queue and want to make sure that each consumer (worker) get messages of same model_id.

This could be achieved in Kafka, where the queue (topic) is partitioned assigned across multiple consumers (workers). The messages with the same key are ended up in the same partition, hence by the same consumer. In your case message key would be model_id