StableLM answers too slow on GCP VM with GPU

108 views Asked by At

I installed StableLM on a GCP VM with these specs:

1 x NVIDIA Tesla P4, 8 vCPU - 30 GB memory.

And I set the model params llm_int8_enable_fp32_cpu_offload=True. But it takes too long to answer questions, ~8 minutes. It was faster even when using CPU,~2 mins. I downloaded the repository from the official Github link directly and I'm running the notebook there. Where am I doing wrong? (I installed nvidia and cuda and the code finding nvidia-smi)

Also when I remove llm_int8_enable_fp32_cpu_offload=True param the code not even working. It throws this error: (I upgraded memory to 16 vCPU, 104GB memory but it still shows this error) enter image description here

1

There are 1 answers

0
Ray John Navarro On

Seems like the resources used are all good, I recommend looking at the CPU type just as mentioned by @alvas.

Here is a link for reference a discussion of the StableLM system specs and some recommendations for optimal performance. [1]

[1] https://github.com/Stability-AI/StableLM/issues/17