64 lines
1.8 KiB
Markdown
64 lines
1.8 KiB
Markdown
# Hosting LLMs with Vllm on Odin
|
|
|
|
## How-to deploy models
|
|
|
|
Before you start, check `odin.capgemini.com/portainer/` (the trailinig / is important!) to make sure that there is not other big LLM running. Otherwise, you will run out of VRAM.
|
|
|
|
1. Pick a model from Huggingface (easiest)
|
|
2. Fill in the docker-compose template in this repo (huggingface model id, service name, port, route etc.)
|
|
3. `docker-compose -f <docker-file> up -d`
|
|
|
|
Setup might take some time. Check docker logs (via terminal or portainer) to make sure your application is up and running. Check `odin.capgemini.com/dashboard` to make sure there are no issues with the reverse proxy.
|
|
|
|
|
|
## How to use models
|
|
|
|
You will have to be on VPN or in the office to use odin.
|
|
|
|
1. Check `odin.capgemini.com/portainer/` (the trailinig / is important!) if your model is running. Otherwise, start the container
|
|
2. Once the container is running, you can access the models with the Openai library. For the time being, you will have to use the http: link.
|
|
|
|
### Usage Example
|
|
|
|
#### Chat Completion
|
|
```python
|
|
from openai import OpenAI
|
|
|
|
client = OpenAI(
|
|
api_key="EMPTY",
|
|
base_url="http://odin.capgemini.com/qwen/v1/", # replace with your URL
|
|
)
|
|
|
|
chat_response = client.chat.completions.create(
|
|
model="Qwen/Qwen2.5-1.5B-Instruct",
|
|
messages=[
|
|
{"role": "system", "content": "You are Qwen. You are a helpful assistant."},
|
|
{"role": "user", "content": "Tell me a joke that involves Llamas. "},
|
|
],
|
|
temperature=0.7,
|
|
top_p=0.8,
|
|
max_tokens=512,
|
|
)
|
|
|
|
print("Chat response:", chat_response.choices[0].message.content)
|
|
|
|
```
|
|
|
|
#### Embeddings
|
|
|
|
```python
|
|
|
|
client = OpenAI(
|
|
api_key="EMPTY",
|
|
base_url="http://odin.capgemini.com/mixed-bread/v1/",
|
|
)
|
|
|
|
response = client.embeddings.create(
|
|
model="mixedbread-ai/mxbai-embed-large-v1",
|
|
input="This is a test examle.",
|
|
)
|
|
|
|
print(response.data[0].embedding)
|
|
```
|
|
|