From 9a3050a77a98a1c83ce47ab01b0e963afca0e55b Mon Sep 17 00:00:00 2001 From: anna Date: Fri, 18 Apr 2025 14:56:42 +0200 Subject: [PATCH] Add README.md --- README.md | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..a9c3334 --- /dev/null +++ b/README.md @@ -0,0 +1,63 @@ +# Hosting LLMs with Vllm on Odin + +## How-to deploy models + +Before you start, check `odin.capgemini.com/portainer/` (the trailinig / is important!) to make sure that there is not other big LLM running. Otherwise, you will run out of VRAM. + +1. Pick a model from Huggingface (easiest) +2. Fill in the docker-compose template in this repo (huggingface model id, service name, port, route etc.) +3. `docker-compose -f up -d` + +Setup might take some time. Check docker logs (via terminal or portainer) to make sure your application is up and running. Check `odin.capgemini.com/dashboard` to make sure there are no issues with the reverse proxy. + + +## How to use models + +You will have to be on VPN or in the office to use odin. + +1. Check `odin.capgemini.com/portainer/` (the trailinig / is important!) if your model is running. Otherwise, start the container +2. Once the container is running, you can access the models with the Openai library. For the time being, you will have to use the http: link. + +### Usage Example + +#### Chat Completion +```python +from openai import OpenAI + +client = OpenAI( + api_key="EMPTY", + base_url="http://odin.capgemini.com/qwen/v1/", # replace with your URL +) + +chat_response = client.chat.completions.create( + model="Qwen/Qwen2.5-1.5B-Instruct", + messages=[ + {"role": "system", "content": "You are Qwen. You are a helpful assistant."}, + {"role": "user", "content": "Tell me a joke that involves Llamas. "}, + ], + temperature=0.7, + top_p=0.8, + max_tokens=512, +) + +print("Chat response:", chat_response.choices[0].message.content) + +``` + +#### Embeddings + +```python + +client = OpenAI( + api_key="EMPTY", + base_url="http://odin.capgemini.com/mixed-bread/v1/", +) + +response = client.embeddings.create( + model="mixedbread-ai/mxbai-embed-large-v1", + input="This is a test examle.", +) + +print(response.data[0].embedding) +``` +