CUDA ran out of memory error in Python Mistral application

33 views Asked by At

I have a Mistral and ChromaDB question n answer application hosted in AWS EC2 g5.2xlarge instance. I used to kill the Python application without deleting llm variable so that CUDA is deallocated. Even when i reboot my EC2 instance i am facing the issue. I tried

torch.cuda.empty_cache() gc.collect()

but not helping. When i try to hard reset in the terminal using nvidia-smi --gpu-reset

it gives me "Insufficient Permissions" error. The following code shows how i instantiate my LLM

            hf_pipeline = pipeline(
                task="text-generation",
                model = "mistralai/Mistral-7B-Instruct-v0.1",
                tokenizer = tokenizer,
                trust_remote_code = True,
                max_new_tokens=1000,
                model_kwargs={
                    "device_map": "auto", 
                    "load_in_4bit": True, 
                    "max_length": 512, 
                    "temperature": 0.01,
                    "do_sample": True,
                    "torch_dtype":torch.bfloat16,
                    }
            )

What is the solution for CUDA ran out of memory error?

0

There are 0 answers