How to combine @singledispatch and @lru_cache?

296 views Asked by At

I have a Python single-dispatch generic function like this:

@singledispatch
def cluster(documents, n_clusters=8, min_docs=None, depth=2):
  ...

It is overloaded like this:

@cluster.register(QuerySet)
@lru_cache(maxsize=512)
def _(documents, *args, **kwargs):
  ...

The second one basically preprocesses a QuerySet object and calls the generic cluster() function. A QuerySet is a Django object, but that should not play a role here; apart from the fact that it is hashable and thus usable with lru_cache.

The generic function cannot be cached though because it accepts unhashable objects such as lists as arguments. However, the overloading function can be cached because a QuerySet object is hashable. That is why I've added the @lru_cache() annotation.

However, caching does not seem to be applied:

qs: QuerySet = [...]

start = datetime.now(); cluster(Document.objects.all()); print(datetime.now() - start)               
0:00:02.629259

I would expect the same call to take place in an instance, but:

start = datetime.now(); cluster(Document.objects.all()); print(datetime.now() - start)               
0:00:02.468675

This is confirmed by the cache statistics:

cluster.registry[django.db.models.query.QuerySet].cache_info()
CacheInfo(hits=0, misses=2, maxsize=512, currsize=2)

Changing the order of the @lru_cache and the @.register annotations does not seem to make a difference.

This question is similar, but the answer does not fit on the individual function level.

Is it even possible to combine these two annotations on this level? If so, how?

1

There are 1 answers

0
Nizam Mohamed On BEST ANSWER

hash(Document.objects.all()) == hash(Document.objects.all()) is not consistent for Django QuerySet.

The call Document.objects.all() doesn't hit the database until the QuerySet returned is evaluated.

Pickling is usually used as a precursor to caching

Django docs.

Depending on your use case you can try caching the pickle of the QuerySet or its query attribute.

@cluster.register(bytes)
@lru_cache(maxsize=512)
def _(documents, *args, **kwargs):
    documents = pickle.loads(documents)
    ...

cluster(pickle.dumps(Document.objects.all()))

or

cluster(pickle.dumps(Document.objects.all().query))