StableLM answers too slow on GCP VM with GPU

Question

StableLM answers too slow on GCP VM with GPU

108 views Asked by srls01 At 22 August 2023 at 13:20

I installed StableLM on a GCP VM with these specs:

1 x NVIDIA Tesla P4, 8 vCPU - 30 GB memory.

And I set the model params llm_int8_enable_fp32_cpu_offload=True. But it takes too long to answer questions, ~8 minutes. It was faster even when using CPU,~2 mins. I downloaded the repository from the official Github link directly and I'm running the notebook there. Where am I doing wrong? (I installed nvidia and cuda and the code finding nvidia-smi)

Also when I remove llm_int8_enable_fp32_cpu_offload=True param the code not even working. It throws this error: (I upgraded memory to 16 vCPU, 104GB memory but it still shows this error)

Original Q&A

There are 1 answers

**Ray John Navarro** · Answer 1 · 2023-08-28T15:37:49+00:00

Seems like the resources used are all good, I recommend looking at the CPU type just as mentioned by @alvas.

Here is a link for reference a discussion of the StableLM system specs and some recommendations for optimal performance. [1]

[1] https://github.com/Stability-AI/StableLM/issues/17

TechQA.

StableLM answers too slow on GCP VM with GPU

There are 1 answers

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GPU

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in LARGE-LANGUAGE-MODEL

Related Questions in NVIDIA-SMI

Popular Questions

Trending Questions