Add README.md
This commit is contained in:
parent
6b35badfae
commit
9a3050a77a
63
README.md
Normal file
63
README.md
Normal file
@ -0,0 +1,63 @@
|
||||
# Hosting LLMs with Vllm on Odin
|
||||
|
||||
## How-to deploy models
|
||||
|
||||
Before you start, check `odin.capgemini.com/portainer/` (the trailinig / is important!) to make sure that there is not other big LLM running. Otherwise, you will run out of VRAM.
|
||||
|
||||
1. Pick a model from Huggingface (easiest)
|
||||
2. Fill in the docker-compose template in this repo (huggingface model id, service name, port, route etc.)
|
||||
3. `docker-compose -f <docker-file> up -d`
|
||||
|
||||
Setup might take some time. Check docker logs (via terminal or portainer) to make sure your application is up and running. Check `odin.capgemini.com/dashboard` to make sure there are no issues with the reverse proxy.
|
||||
|
||||
|
||||
## How to use models
|
||||
|
||||
You will have to be on VPN or in the office to use odin.
|
||||
|
||||
1. Check `odin.capgemini.com/portainer/` (the trailinig / is important!) if your model is running. Otherwise, start the container
|
||||
2. Once the container is running, you can access the models with the Openai library. For the time being, you will have to use the http: link.
|
||||
|
||||
### Usage Example
|
||||
|
||||
#### Chat Completion
|
||||
```python
|
||||
from openai import OpenAI
|
||||
|
||||
client = OpenAI(
|
||||
api_key="EMPTY",
|
||||
base_url="http://odin.capgemini.com/qwen/v1/", # replace with your URL
|
||||
)
|
||||
|
||||
chat_response = client.chat.completions.create(
|
||||
model="Qwen/Qwen2.5-1.5B-Instruct",
|
||||
messages=[
|
||||
{"role": "system", "content": "You are Qwen. You are a helpful assistant."},
|
||||
{"role": "user", "content": "Tell me a joke that involves Llamas. "},
|
||||
],
|
||||
temperature=0.7,
|
||||
top_p=0.8,
|
||||
max_tokens=512,
|
||||
)
|
||||
|
||||
print("Chat response:", chat_response.choices[0].message.content)
|
||||
|
||||
```
|
||||
|
||||
#### Embeddings
|
||||
|
||||
```python
|
||||
|
||||
client = OpenAI(
|
||||
api_key="EMPTY",
|
||||
base_url="http://odin.capgemini.com/mixed-bread/v1/",
|
||||
)
|
||||
|
||||
response = client.embeddings.create(
|
||||
model="mixedbread-ai/mxbai-embed-large-v1",
|
||||
input="This is a test examle.",
|
||||
)
|
||||
|
||||
print(response.data[0].embedding)
|
||||
```
|
||||
|
||||
Loading…
Reference in New Issue
Block a user