Extend readme documentation

This commit is contained in:
Nielson Janné 2025-03-17 12:47:22 +01:00
parent 67d681fcc4
commit 674220f442

View File

@ -2,9 +2,51 @@
A Sogeti Nederland generic RAG demo A Sogeti Nederland generic RAG demo
## Getting Started ## Getting started
Please mind due to use of argparse the generic RAG demo can not be launched the usual way chainlit is started. ### Installation of system dependencies
#### Unstructered PDF loader (optional)
If you would like to run the application with the unstructered PDF loader, the application requires system dependencies.
The two currently used:
- [poppler-utils](https://launchpad.net/ubuntu/jammy/amd64/poppler-utils)
- [tesseract-ocr](https://github.com/tesseract-ocr/tesseract?tab=readme-ov-file#installing-tesseract)
```bash
sudo apt install poppler-utils tesseract-ocr
```
and run the generic RAG demo with the `--unstructured-pdf` flag.
> For more information please refer to the [langchain docs.](https://python.langchain.com/docs/integrations/providers/unstructured/)
#### Local LLM (optional)
The application supports running a local LLM, using Ollama.
To install Ollama, please run following commands
```bash
curl -fsSL https://ollama.com/install.sh | sh # install Ollama
ollama pull llama3.1:8b # fetch and dowload specific model
```
Include the model in the `.env` file:
```text
LOCAL_CHAT_MODEL="llama3.1:8b"
LOCAL_EMB_MODEL="llama3.1:8b"
```
And run the generic RAG demo with the `-b local` flag.
>For more information on installing Ollama, please refer to the Langchain Local LLM documentation, specifically the [Quickstart section](https://python.langchain.com/docs/how_to/local_llms/#quickstart).
### Running generic RAG demo
Please mind due to use of `argparse` the generic RAG demo can not be launched the way `chainlit` documentation recommends.
```bash ```bash
chainlit run generic_rag/app.py # will not work chainlit run generic_rag/app.py # will not work
@ -13,28 +55,29 @@ chainlit run generic_rag/app.py # will not work
Instead, the app can be launched and debugged the usual way. Instead, the app can be launched and debugged the usual way.
```bash ```bash
python generic_rag/app.py -p data # will work python generic_rag/app.py -p data # will work and parsers all pdf files in ./data
python generic_rag/app.py --help # will work and prints command line options python generic_rag/app.py --help # will work and prints command line options
``` ```
## .env file Please configure your `.env` file with your cloud provider (backend) of choice and set the `--backend` flag accordingly.
### .env file
A .env file needs to be populated to configure API end-points or local back-ends using environment variables. A .env file needs to be populated to configure API end-points or local back-ends using environment variables.
Currently all required environment variables are defined in code at [backend/models.py](generic_rag/backend/models.py) Currently all required environment variables are defined in code at [backend/models.py](generic_rag/backend/models.py)
with the exception of the API key variables itself. with the exception of the API key variables itself.
More information about configuring API endpoints for langchain can be found at the following locations. More information about configuring API endpoints for langchain can be found at the following locations.
- [langchain cloud chat model doc](https://python.langchain.com/docs/integrations/chat/) - [langchain cloud chat model doc](https://python.langchain.com/docs/integrations/chat/)
- [langchain local chat model doc](https://python.langchain.com/docs/how_to/local_llms/) - [langchain local chat model doc](https://python.langchain.com/docs/how_to/local_llms/)
- [langchain cloud/local emb model doc](https://python.langchain.com/docs/integrations/text_embedding/) - [langchain cloud/local emb model doc](https://python.langchain.com/docs/integrations/text_embedding/)
> for local models we currently use Ollama > for local models we currently use Ollama
An `.env` example is as followed. An `.env` example is as followed.
```text ```text
# only need 1 backend (azure, google, local, etc) # only one backend (azure, google, local, etc) is required. Please addjust the --backend flag accordingly
AZURE_OPENAI_API_KEY="<secret_key>" AZURE_OPENAI_API_KEY="<secret_key>"
AZURE_LLM_ENDPOINT="https://<project_hub>.openai.azure.com" AZURE_LLM_ENDPOINT="https://<project_hub>.openai.azure.com"
@ -52,12 +95,18 @@ GOOGLE_GENAI_CHAT_MODEL="gemini-2.0-flash"
GOOGLE_GENAI_EMB_MODEL="models/text-embedding-004" GOOGLE_GENAI_EMB_MODEL="models/text-embedding-004"
``` ```
## Chainlit starters ### Chainlit starters
Chainlit suggestions (starters) can be set with the `CHAINLIT_STARTERS` environment variable. Chainlit suggestions (starters) can be set with the `CHAINLIT_STARTERS` environment variable.
The variable should be a JSON array of objects with `label` and `message` properties. The variable should be a JSON array of objects with `label` and `message` properties.
An example is as followed. An example is as followed.
``` ```text
CHAINLIT_STARTERS=[{"label":"Label 1","message":"Message one."},{"label":"Label 2","message":"Message two."},{"label":"Label 3","message":"Message three."}] CHAINLIT_STARTERS=[{"label":"Label 1","message":"Message one."},{"label":"Label 2","message":"Message two."},{"label":"Label 3","message":"Message three."}]
``` ```
## Dev details
### Linting
Currently [Ruff](https://github.com/astral-sh/ruff) is used as Python linter. It is included in the [pyproject.toml](pyproject.toml) as `dev` dependency if your IDE needs that. However, for VS Code a [Ruff extension](https://marketplace.visualstudio.com/items?itemName=charliermarsh.ruff) excists.