This repo contains docker compose file to load different LLMs with vllm and explains how to use them

Go to file

anna 5d782453a4 add pip install		2025-04-18 15:22:19 +02:00
docker-compose-mixed-bread.yml	Add docker compose for mixed-bread embeddings	2025-04-18 14:45:30 +02:00
docker-compose-qwen.yml	add docker compose for qwen 5b	2025-04-18 14:44:07 +02:00
README.md	add pip install	2025-04-18 15:22:19 +02:00

README.md

Hosting LLMs with Vllm on Odin

How-to deploy models

Before you start, check odin.capgemini.com/portainer/ (the trailinig / is important!) to make sure that there is not other big LLM running. Otherwise, you will run out of VRAM.

Pick a model from Huggingface (easiest)
Fill in the docker-compose template in this repo (huggingface model id, service name, port, route etc.)
docker-compose -f <docker-file> up -d

Setup might take some time. Check docker logs (via terminal or portainer) to make sure your application is up and running. Check odin.capgemini.com/dashboard to make sure there are no issues with the reverse proxy.

How to use models

You will have to be on VPN or in the office to use odin.

Check odin.capgemini.com/portainer/ (the trailinig / is important!) if your model is running. Otherwise, start the container
Once the container is running, you can access the models with the Openai library. For the time being, you will have to use the http: link.
Make sure you have openai installed ((uv) pip install openai)

Usage Example

Chat Completion

from openai import OpenAI

client = OpenAI(
    api_key="EMPTY",
    base_url="http://odin.capgemini.com/qwen/v1/", # replace with your URL 
)

chat_response = client.chat.completions.create(
    model="Qwen/Qwen2.5-1.5B-Instruct",
    messages=[
        {"role": "system", "content": "You are Qwen. You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke that involves Llamas. "},
    ],
    temperature=0.7,
    top_p=0.8,
    max_tokens=512,
)

print("Chat response:", chat_response.choices[0].message.content)

Embeddings


client = OpenAI(
    api_key="EMPTY",
    base_url="http://odin.capgemini.com/mixed-bread/v1/",
)

response = client.embeddings.create(
    model="mixedbread-ai/mxbai-embed-large-v1",
    input="This is a test examle.",
)

print(response.data[0].embedding)