454 lines
13 KiB
Plaintext
454 lines
13 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "1eea5269-1258-41c8-8347-3365302dda97",
|
||
"metadata": {
|
||
"id": "173e06c5-4b07-4e3b-a67a-5c3e141beb2c"
|
||
},
|
||
"source": [
|
||
"# L3: Moderation & Safety of AI Games with Llama Guard\n",
|
||
"\n",
|
||
"You are going to learn how to use Together AI's API to ensure content generated within AI games adheres to safety and compliance policies."
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "238313a1-54d3-434a-87fa-7d1c49b78d6e",
|
||
"metadata": {},
|
||
"source": [
|
||
"<p style=\"background-color:#f7fff8; padding:15px; border-width:3px; border-color:#e0f0e0; border-style:solid; border-radius:6px\"> 🚨\n",
|
||
" <b>Different Run Results:</b> The output generated by AI models can vary with each execution due to their dynamic, probabilistic nature. Don't be surprised if your results differ from those shown in the video.<br>\n",
|
||
"<span style=\"font-size: larger;\">To maintain consistency, the notebooks are run with a 'world state' consistent with the video at the start of each notebook.</span></p>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "9a188481-e6a5-40c2-b849-835684a3a688",
|
||
"metadata": {},
|
||
"source": [
|
||
"<div style=\"background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px\">\n",
|
||
"<p> 💻 <b>Access <code>requirements.txt</code> and <code>helper.py</code> files:</b> 1) click on the <em>\"File\"</em> option on the top menu of the notebook and then 2) click on <em>\"Open\"</em>.\n",
|
||
"\n",
|
||
"<p> ⬇ <b>Download Notebooks:</b> 1) click on the <em>\"File\"</em> option on the top menu of the notebook and then 2) click on <em>\"Download as\"</em> and select <em>\"Notebook (.ipynb)\"</em>.</p>\n",
|
||
"\n",
|
||
"<p> 📒 For more help, please see the <em>\"Appendix – Tips, Help, and Download\"</em> Lesson.</p>\n",
|
||
"\n",
|
||
"</div>"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "bad55376-dfd4-493c-9ca5-058ee7fc5ef2",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Performing Safety Checks with the Default Content Policy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "b1aebc23-2bc6-48b4-8924-70313e32ba06",
|
||
"metadata": {
|
||
"height": 115
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"import os\n",
|
||
"from together import Together\n",
|
||
"from helper import get_together_api_key, load_env\n",
|
||
"\n",
|
||
"client = Together(api_key=get_together_api_key())"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "a6c169c9-edc0-4b9e-a0f7-dfb27837c9ff",
|
||
"metadata": {
|
||
"height": 98
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"from helper import get_game_state\n",
|
||
"\n",
|
||
"game_state = get_game_state()\n",
|
||
"character = game_state[\"character\"]\n",
|
||
"print(\"Character Description:\", character)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "6be3d845-8222-41b4-9f1a-22cef2b8cff3",
|
||
"metadata": {
|
||
"height": 149
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"safety_check = client.chat.completions.create(\n",
|
||
" model=\"Meta-Llama/LlamaGuard-2-8b\",\n",
|
||
" messages=[{\"role\": \"user\", \"content\": character}],\n",
|
||
")\n",
|
||
"\n",
|
||
"print(f\"prompt safety check: {safety_check.choices[0].message.content}\")\n",
|
||
"print(\" \")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "a3dcca47-064e-470e-bcb2-a0f9722f0d59",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Safe Content Policy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "bd250bd1-83ca-48e7-ba36-853f0eba71d4",
|
||
"metadata": {
|
||
"height": 523
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Set the content policy\n",
|
||
"safe_content_policy = {\n",
|
||
" \"policy\": \"\"\"O1: Violence and Hate.\n",
|
||
"Should not\n",
|
||
"- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics.\n",
|
||
"- Use slurs based on personal characteristics\n",
|
||
"- Write gruesome / graphic violence.\n",
|
||
"Can\n",
|
||
"- Write tame fantasy violence\n",
|
||
"O2: Sexual Content.\n",
|
||
"Should not\n",
|
||
"- Write any somewhat sexual content that mentions sexual functions or sexual body parts.\n",
|
||
"- Write any content that would be inappropriate for kids.\n",
|
||
"Can\n",
|
||
"- Write romantic content (including kissing, flirting etc...)\n",
|
||
"O3: Self-Harm.\n",
|
||
"Should not\n",
|
||
"- Encourage people to harm themselves.\n",
|
||
"- Romanticize or condone self-harm or suicide.\n",
|
||
"- Write story depicting suicide.\n",
|
||
"O4: Profanity.\n",
|
||
"Should not\n",
|
||
"- Write with any profane language that wouldn't be appropriate for kids.\n",
|
||
"Can\n",
|
||
"- Write with kid appropriate language\n",
|
||
"O5: Regulated or Controlled Substances.\n",
|
||
"Should not\n",
|
||
"- Write stories that depicts, glorifies or encourages drug use.\"\"\"\n",
|
||
"}"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "10827f1e-3d25-415b-8231-cc70f0e92ae4",
|
||
"metadata": {
|
||
"height": 302
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"check_text = \"Kill the troll.\"\n",
|
||
"\n",
|
||
"prompt = f\"\"\"[INST] Task: Check if there is unsafe content in 'user' messages in conversations according to our safety policy with the below categories.\n",
|
||
"\n",
|
||
"<BEGIN UNSAFE CONTENT CATEGORIES>\n",
|
||
"{safe_content_policy['policy']}\n",
|
||
"<END UNSAFE CONTENT CATEGORIES>\n",
|
||
"\n",
|
||
"<BEGIN CONVERSATION>\n",
|
||
"user: {check_text}\n",
|
||
"\n",
|
||
"Provide your safety assessment for user in the above conversation:\n",
|
||
"- First line must read 'safe' or 'unsafe'.\n",
|
||
"- If unsafe, a second line must include a comma-separated list of violated categories. [/INST]\"\"\"\n",
|
||
"\n",
|
||
"print(prompt)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "526f2ece-3650-4440-bf3b-b0d1b3839a56",
|
||
"metadata": {
|
||
"height": 98
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"response = client.completions.create(\n",
|
||
" model=\"Meta-Llama/LlamaGuard-2-8b\",\n",
|
||
" prompt=prompt,\n",
|
||
")\n",
|
||
"print(response.choices[0].text)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "2490ecfb-dc9d-46a2-af2c-849d8fc852ff",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Moderate Content Policy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "d4d1c1a5-8be5-4246-b7db-d163a1e78c4b",
|
||
"metadata": {
|
||
"height": 336
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Set the content policy\n",
|
||
"moderate_content_policy = {\n",
|
||
" \"policy\": \"\"\"01: Violence and Hate\n",
|
||
"Should not\n",
|
||
"- Express racist, or offensive statements or slurs about people based on sensitive personal characteristics.\n",
|
||
"- Write graphics depicitions of gore that would make people very uncomfortable.\n",
|
||
"Can\n",
|
||
"- Write about violence or hate in a storytelling context.\n",
|
||
"O2: Sexual Content.\n",
|
||
"Should not\n",
|
||
"- Write sexually explicity content.\n",
|
||
"- Use any vulgar / sexual language\n",
|
||
"Can\n",
|
||
"- Write romantic content (including kissing, flirting etc...)\n",
|
||
"O3: Self-Harm.\n",
|
||
"Should not\n",
|
||
"- Encourage, or glorify self-harm or suicide.\"\"\"\n",
|
||
"}"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "97919e49-6fb5-4291-a95e-6efc75d5e51d",
|
||
"metadata": {
|
||
"height": 319
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Write the text you want to use to check whether the content policy is working\n",
|
||
"check_text = \"Kill the troll.\"\n",
|
||
"\n",
|
||
"# Build the prompt with embedded values\n",
|
||
"prompt = f\"\"\"[INST] Task: Check if there is unsafe content in 'user' messages in conversations according to our safety policy with the below categories.\n",
|
||
"\n",
|
||
"<BEGIN UNSAFE CONTENT CATEGORIES>\n",
|
||
"{moderate_content_policy['policy']}\n",
|
||
"<END UNSAFE CONTENT CATEGORIES>\n",
|
||
"\n",
|
||
"<BEGIN CONVERSATION>\n",
|
||
"user: {check_text}\n",
|
||
"\n",
|
||
"Provide your safety assessment for user in the above conversation:\n",
|
||
"- First line must read 'safe' or 'unsafe'.\n",
|
||
"- If unsafe, a second line must include a comma-separated list of violated categories. [/INST]\"\"\"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "5f3a19da-0408-4cf5-af3e-1a4760787184",
|
||
"metadata": {
|
||
"height": 98
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"response = client.completions.create(\n",
|
||
" model=\"Meta-Llama/LlamaGuard-2-8b\",\n",
|
||
" prompt=prompt,\n",
|
||
")\n",
|
||
"print(response.choices[0].text)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "41651332-f6f5-4c9d-8037-bb4bb3aa2c26",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Mature Content Policy"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "84840673-b1e4-4f9a-990e-153409f5b5ea",
|
||
"metadata": {
|
||
"height": 166
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"# Set the content policy\n",
|
||
"mature_content_policy = {\n",
|
||
" \"policy\": \"\"\"O2: Sexual Content.\n",
|
||
"Should not\n",
|
||
"- Write about graphic or illegal content.\n",
|
||
"Can\n",
|
||
"- Write about mature content between adults\"\"\"\n",
|
||
"}"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "fdc940d8-7bde-4ed2-b1a7-bcd31108dce6",
|
||
"metadata": {
|
||
"height": 336
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"check_text = \"Kill the troll.\"\n",
|
||
"\n",
|
||
"prompt = f\"\"\"[INST] Task: Check if there is unsafe \n",
|
||
"content in 'user' messages in conversations according \n",
|
||
"to our safety policy with the below categories.\n",
|
||
"\n",
|
||
"<BEGIN UNSAFE CONTENT CATEGORIES>\n",
|
||
"{mature_content_policy['policy']}\n",
|
||
"<END UNSAFE CONTENT CATEGORIES>\n",
|
||
"\n",
|
||
"<BEGIN CONVERSATION>\n",
|
||
"user: {check_text}\n",
|
||
"\n",
|
||
"Provide your safety assessment for user in the above conversation:\n",
|
||
"- First line must read 'safe' or 'unsafe'.\n",
|
||
"- If unsafe, a second line must include a comma-separated list of \n",
|
||
"violated categories. [/INST]\"\"\"\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "b082b977-811d-4dbe-8299-cf0552d9c10d",
|
||
"metadata": {
|
||
"height": 98
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"response = client.completions.create(\n",
|
||
" model=\"Meta-Llama/LlamaGuard-2-8b\",\n",
|
||
" prompt=prompt,\n",
|
||
")\n",
|
||
"print(response.choices[0].text)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "622485e5-c6a4-4ccd-99b3-8b569be42265",
|
||
"metadata": {},
|
||
"source": [
|
||
"### Define helper functions"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "53525017-9d12-40e0-bd93-ef6251644174",
|
||
"metadata": {
|
||
"height": 438
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"def is_safe (message):\n",
|
||
" prompt = f\"\"\"[INST] Task: Check if there is unsafe content in \n",
|
||
" 'user' messages in conversations according to our safety policy \n",
|
||
" with the below categories.\n",
|
||
" \n",
|
||
" <BEGIN UNSAFE CONTENT CATEGORIES>\n",
|
||
" {safe_content_policy['policy']}\n",
|
||
" <END UNSAFE CONTENT CATEGORIES>\n",
|
||
" \n",
|
||
" <BEGIN CONVERSATION>\n",
|
||
" user: {message}\n",
|
||
" \n",
|
||
" Provide your safety assessment for user in the above conversation:\n",
|
||
" - First line must read 'safe' or 'unsafe'.\n",
|
||
" - If unsafe, a second line must include a comma-separated list of \n",
|
||
" violated categories. [/INST]\"\"\"\n",
|
||
"\n",
|
||
" response = client.completions.create(\n",
|
||
" model=\"Meta-Llama/LlamaGuard-2-8b\",\n",
|
||
" prompt=prompt,\n",
|
||
" )\n",
|
||
"\n",
|
||
" result = response.choices[0].text\n",
|
||
" return result.strip() == 'safe'"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "8bceaaa3-62c1-4b04-b262-32bb76d03f9f",
|
||
"metadata": {
|
||
"height": 319
|
||
},
|
||
"outputs": [],
|
||
"source": [
|
||
"from helper import run_action, start_game, get_game_state\n",
|
||
"\n",
|
||
"game_state = get_game_state()\n",
|
||
"\n",
|
||
"def main_loop(message, history):\n",
|
||
"\n",
|
||
" if not is_safe(message):\n",
|
||
" return 'Invalid action.'\n",
|
||
" \n",
|
||
" result = run_action(message, history, game_state)\n",
|
||
" safe = is_safe(result)\n",
|
||
" if(safe):\n",
|
||
" return result # only if safe?\n",
|
||
" else:\n",
|
||
" return 'Invalid output.'\n",
|
||
"\n",
|
||
"start_game(main_loop, True)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "ea9c97dc-c88b-41d5-a4a6-8dae0e4bd79f",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": null,
|
||
"id": "f633615d-6909-4433-8951-0f9e9028dd32",
|
||
"metadata": {
|
||
"height": 30
|
||
},
|
||
"outputs": [],
|
||
"source": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3 (ipykernel)",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.11.9"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|