BERTopic guided topic modelling returns a ValueError (inhomogenous shape)

1k views Asked by At

I am trying to train a BERTopic model with a seed topic list. However, the model returns a Value Error:

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

I am working with Python 3.10.5 and Numpy 1.24.3.

The same error happens when running the official tutorial example, so I assume there is an issue with changes in libraries.

The example below:

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))["data"]

seed_topic_list = [["drug", "cancer", "drugs", "doctor"],
                   ["windows", "drive", "dos", "file"],
                   ["space", "launch", "orbit", "lunar"]]

topic_model = BERTopic(seed_topic_list=seed_topic_list, verbose=True, calculate_probabilities=False)
topics = topic_model.fit_transform(docs)

Thanks a lot for the ideas!

2

There are 2 answers

0
Horia Yeb On

I have been having the exact same problem, it's the first time I'm using this library, so I don't know if this worked in previous versions.

I'll have a look by downgrading to the version just before this one.

Found problem: It seems that there is a compatibility issue with the latest numpy package. I downgraded to 1.21.0 and the tutorial example works fine.

If this works for you select this as a solution so the thread can be closed.

0
jellis92 On

I encountered the same issue on Python 3.11 and Numpy 1.25.0

Downgrading to numpy 1.23.5 fixed the issue for me. I couldn't downgrade to 1.21.0 as that's not compatible with 3.11.

(I would add this as a comment to Horia Yeb answer but don't have the rep for it yet)