This is a repo for the AI Friday (6th of June) about red-teaming LLMs

Go to file

anna.pillar 13962ffff6 refactor		2025-06-06 09:37:50 +02:00
.gitignore	setup	2025-06-06 09:05:15 +02:00
.python-version	setup	2025-06-06 09:05:15 +02:00
app.py	refactor	2025-06-06 09:37:50 +02:00
Dockerfile	setup	2025-06-06 09:05:15 +02:00
models.py	setup	2025-06-06 09:05:15 +02:00
pyproject.toml	setup	2025-06-06 09:05:15 +02:00
README.md	add readme	2025-06-06 09:32:13 +02:00
uv.lock	setup	2025-06-06 09:05:15 +02:00

README.md

Red Teaming Demonstration 🚩

This is the repo for the AI Friday – Red Teaming Edition! 🎉 . You will find a basic chatbot 🤖 implementation in here, using langchain and chainlit.

Setup instructions ⚙️

It might be the easiest to run it with uv, it's as simple as uv sync 🪄.

There is also a requirements.txt file if you have all the time in the world and prefer pip.

Further, you need an .env file with the following variables:

TOGETHER_API_KEY=<I'll provide that during the session>
PASSWORD=<for developing, you can choose your own password>

The game 🎮

We are going to do two rounds.

In Part I, you will try to add additional security to make sure your chatbot will not spill the password. You may add a stronger system prompt but also all other measures we discussed previously. And anything else that you think might do the trick.

However, please make sure the chatbot is still usable!

A chatbot that does not answer will, of course, never reveal the password. But it's also quite useless 😉. Once you are done, please make sure that the Dockerfile is still working and then share the repo with me. I will run it on Google Cloud Run and we can start hacking! At that point, I will also set the password as an .env variable to keep it all secret ;)

For Part II, I'll share links to the different implementations and it's on you to convince the bots to share their password. Every bot will have an individual password, how many can you crack 🧨? To keep it educational, think about how you could have prevented your own attack?

README.md Unescape Escape

Red Teaming Demonstration 🚩

Setup instructions ⚙️

The game 🎮

README.md