From 9a3050a77a98a1c83ce47ab01b0e963afca0e55b Mon Sep 17 00:00:00 2001
From: anna <anna.pillar@sogeti.com>
Date: Fri, 18 Apr 2025 14:56:42 +0200
Subject: [PATCH] Add README.md

---
 README.md | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)
 create mode 100644 README.md
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..a9c3334
--- /dev/null
+++ b/README.md
@@ -0,0 +1,63 @@
+# Hosting LLMs with Vllm on Odin
+
+## How-to deploy models
+
+Before you start, check `odin.capgemini.com/portainer/` (the trailinig / is important!) to make sure that there is not other big LLM running. Otherwise, you will run out of VRAM.
+
+1. Pick a model from Huggingface (easiest)
+2. Fill in the docker-compose template in this repo (huggingface model id, service name, port, route etc.)
+3. `docker-compose -f <docker-file> up -d`
+
+Setup might take some time. Check docker logs (via terminal or portainer) to make sure your application is up and running. Check `odin.capgemini.com/dashboard` to make sure there are no issues with the reverse proxy.
+
+
+## How to use models
+
+You will have to be on VPN or in the office to use odin.
+
+1. Check `odin.capgemini.com/portainer/` (the trailinig / is important!) if your model is running. Otherwise, start the container
+2. Once the container is running, you can access the models with the Openai library. For the time being, you will have to use the http: link.
+
+### Usage Example
+
+#### Chat Completion
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="EMPTY",
+    base_url="http://odin.capgemini.com/qwen/v1/", # replace with your URL 
+)
+
+chat_response = client.chat.completions.create(
+    model="Qwen/Qwen2.5-1.5B-Instruct",
+    messages=[
+        {"role": "system", "content": "You are Qwen. You are a helpful assistant."},
+        {"role": "user", "content": "Tell me a joke that involves Llamas. "},
+    ],
+    temperature=0.7,
+    top_p=0.8,
+    max_tokens=512,
+)
+
+print("Chat response:", chat_response.choices[0].message.content)
+
+```
+
+#### Embeddings
+
+```python
+
+client = OpenAI(
+    api_key="EMPTY",
+    base_url="http://odin.capgemini.com/mixed-bread/v1/",
+)
+
+response = client.embeddings.create(
+    model="mixedbread-ai/mxbai-embed-large-v1",
+    input="This is a test examle.",
+)
+
+print(response.data[0].embedding)
+```
+