I am running ehartford_dolphin-2.1-mistral-7b on an RTX A6000 machine on RunPod with the template TheBloke LLMs Text Generation WebUI.
I have 2 options: running webui on runpod or running HuggingFace Text Generation Inference template on runpod
Option 1. RunPod WebUI
I can successfully loaded the model on textgen webui on RunPod on the Chat tab. I now want to access it ob my Python code and run inference. Ideal case would be if I integrate it on LangChain and create a LangChain LLM object.
- I enabled
openaiandapion RunPod webui on theSettingstab - I currently have
7860,5001and5000ports enabled
Using AutoMemGPT
I found this Python code using AutoMemGPT to access webui endpoint:
import os
import autogen
import memgpt.autogen.memgpt_agent as memgpt_autogen
import memgpt.autogen.interface as autogen_interface
import memgpt.agent as agent
import memgpt.system as system
import memgpt.utils as utils
import memgpt.presets as presets
import memgpt.constants as constants
import memgpt.personas.personas as personas
import memgpt.humans.humans as humans
from memgpt.persistence_manager import InMemoryStateManager, InMemoryStateManagerWithPreloadedArchivalMemory, InMemoryStateManagerWithEmbeddings, InMemoryStateManagerWithFaiss
import openai
config_list = [
{
"api_type": "open_ai",
"api_base": "https://0ciol64iqvewdn-5001.proxy.runpod.net/v1",
"api_key": "NULL",
},
]
llm_config = {"config_list": config_list, "seed": 42}
# If USE_MEMGPT is False, then this example will be the same as the official AutoGen repo
# (https://github.com/microsoft/autogen/blob/main/notebook/agentchat_groupchat.ipynb)
# If USE_MEMGPT is True, then we swap out the "coder" agent with a MemGPT agent
USE_MEMGPT = True
## api keys for the memGPT
openai.api_base="https://0ciol64iqvewdn-5001.proxy.runpod.net/v1"
openai.api_key="NULL"
# The user agent
user_proxy = autogen.UserProxyAgent(
name="User_proxy",
system_message="A human admin.",
code_execution_config={"last_n_messages": 2, "work_dir": "groupchat"},
human_input_mode="TERMINATE", # needed?
default_auto_reply="You are going to figure all out by your own. "
"Work by yourself, the user won't reply until you output `TERMINATE` to end the conversation.",
)
interface = autogen_interface.AutoGenInterface()
persistence_manager=InMemoryStateManager()
persona = "I am a 10x engineer, trained in Python. I was the first engineer at Uber."
human = "Im a team manager at this company"
memgpt_agent=presets.use_preset(presets.DEFAULT_PRESET, model='gpt-4', persona=persona, human=human, interface=interface, persistence_manager=persistence_manager, agent_config=llm_config)
if not USE_MEMGPT:
# In the AutoGen example, we create an AssistantAgent to play the role of the coder
coder = autogen.AssistantAgent(
name="Coder",
llm_config=llm_config,
system_message=f"I am a 10x engineer, trained in Python. I was the first engineer at Uber",
human_input_mode="TERMINATE",
)
else:
# In our example, we swap this AutoGen agent with a MemGPT agent
# This MemGPT agent will have all the benefits of MemGPT, ie persistent memory, etc.
print("\nMemGPT Agent at work\n")
coder = memgpt_autogen.MemGPTAgent(
name="MemGPT_coder",
agent=memgpt_agent,
)
# Begin the group chat with a message from the user
user_proxy.initiate_chat(
coder,
message="Write a Function to print Numbers 1 to 10"
)
Error
ModuleNotFoundError Traceback (most recent call last) Cell In[2], line 10 8 import memgpt.presets as presets 9 import memgpt.constants as constants ---> 10 import memgpt.personas.personas as personas 11 import memgpt.humans.humans as humans 12 from memgpt.persistence_manager import InMemoryStateManager, InMemoryStateManagerWithPreloadedArchivalMemory, InMemoryStateManagerWithEmbeddings, InMemoryStateManagerWithFaiss
ModuleNotFoundError: No module named 'memgpt.personas.personas'
What I tried to solve this error
pip install --upgrade pymemgpt-- does not change errorpip install pymemgpt==0.1.3-- I getopenaiversion conflictspip install -e .after cloning MemGPT repository -- another error
What I need
- I always get version conflicts between
openai,llama-index,pymemgpt,pyautogpt,numpy, so maybe the proper version to make this code run would be nice otherwise any advice?
Option 2. Using HuggingFace Text Generation Interface
So instead of loading TheBloke LLMs template that runs webui on RunPod I found a guide to instead use a TextGenerationInference template
Current code
gpu_count = 1
pod = runpod.create_pod(
name="Llama-7b-chat",
image_name="ghcr.io/huggingface/text-generation-inference:0.9.4",
gpu_type_id="NVIDIA RTX A4500",
data_center_id="EU-RO-1",
cloud_type="SECURE",
docker_args="--model-id TheBloke/Llama-2-7b-chat-fp16",
gpu_count=gpu_count,
volume_in_gb=50,
container_disk_in_gb=5,
ports="80/http,29500/http",
volume_mount_path="/data",
)
pod
from langchain.llms import HuggingFaceTextGenInference
inference_server_url = f'https://{pod["id"]}-80.proxy.runpod.net'
llm = HuggingFaceTextGenInference(
inference_server_url=inference_server_url,
max_new_tokens=1000,
top_k=10,
top_p=0.95,
typical_p=0.95,
temperature=0.1,
repetition_penalty=1.03,
)
It works well on Llama 2 but I cannot make it work on other LLMs that needs a ton of configuring on the webui before running. So for example Falcon or Mixtral where I need to change several parameters on webui manually.
What I need
- A way to run this code to any LLM by programmatically setting model parameters, settings, etc instead on RunPod webui