This is a repo for the AI Friday (6th of June) about red-teaming LLMs
Go to file
2025-06-06 09:37:50 +02:00
.gitignore setup 2025-06-06 09:05:15 +02:00
.python-version setup 2025-06-06 09:05:15 +02:00
app.py refactor 2025-06-06 09:37:50 +02:00
Dockerfile setup 2025-06-06 09:05:15 +02:00
models.py setup 2025-06-06 09:05:15 +02:00
pyproject.toml setup 2025-06-06 09:05:15 +02:00
README.md add readme 2025-06-06 09:32:13 +02:00
uv.lock setup 2025-06-06 09:05:15 +02:00

Red Teaming Demonstration 🚩

This is the repo for the AI Friday Red Teaming Edition! 🎉 . You will find a basic chatbot 🤖 implementation in here, using langchain and chainlit.

Setup instructions ⚙️

It might be the easiest to run it with uv, it's as simple as uv sync 🪄.

There is also a requirements.txt file if you have all the time in the world and prefer pip.

Further, you need an .env file with the following variables:

TOGETHER_API_KEY=<I'll provide that during the session>
PASSWORD=<for developing, you can choose your own password>

The game 🎮

We are going to do two rounds.

In Part I, you will try to add additional security to make sure your chatbot will not spill the password. You may add a stronger system prompt but also all other measures we discussed previously. And anything else that you think might do the trick.

However, please make sure the chatbot is still usable!

A chatbot that does not answer will, of course, never reveal the password. But it's also quite useless 😉. Once you are done, please make sure that the Dockerfile is still working and then share the repo with me. I will run it on Google Cloud Run and we can start hacking! At that point, I will also set the password as an .env variable to keep it all secret ;)

For Part II, I'll share links to the different implementations and it's on you to convince the bots to share their password. Every bot will have an individual password, how many can you crack 🧨? To keep it educational, think about how you could have prevented your own attack?