llm_game_hackathon/L3.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1eea5269-1258-41c8-8347-3365302dda97",
   "metadata": {
    "id": "173e06c5-4b07-4e3b-a67a-5c3e141beb2c"
   },
   "source": [
    "# L3: Moderation & Safety of AI Games with Llama Guard\n",
    "\n",
    "You are going to learn how to use Together AI's API to ensure content generated within AI games adheres to safety and compliance policies."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "238313a1-54d3-434a-87fa-7d1c49b78d6e",
   "metadata": {},
   "source": [
    "<p style=\"background-color:#f7fff8; padding:15px; border-width:3px; border-color:#e0f0e0; border-style:solid; border-radius:6px\"> 🚨\n",
    "&nbsp; <b>Different Run Results:</b> The output generated by AI models can vary with each execution due to their dynamic, probabilistic nature. Don't be surprised if your results differ from those shown in the video.<br>\n",
    "<span style=\"font-size: larger;\">To maintain consistency, the notebooks are run with a 'world state' consistent with the video at the start of each notebook.</span></p>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9a188481-e6a5-40c2-b849-835684a3a688",
   "metadata": {},
   "source": [
    "<div style=\"background-color:#fff6ff; padding:13px; border-width:3px; border-color:#efe6ef; border-style:solid; border-radius:6px\">\n",
    "<p> 💻 &nbsp; <b>Access <code>requirements.txt</code> and <code>helper.py</code> files:</b> 1) click on the <em>\"File\"</em> option on the top menu of the notebook and then 2) click on <em>\"Open\"</em>.\n",
    "\n",
    "<p> ⬇ &nbsp; <b>Download Notebooks:</b> 1) click on the <em>\"File\"</em> option on the top menu of the notebook and then 2) click on <em>\"Download as\"</em> and select <em>\"Notebook (.ipynb)\"</em>.</p>\n",
    "\n",
    "<p> 📒 &nbsp; For more help, please see the <em>\"Appendix – Tips, Help, and Download\"</em> Lesson.</p>\n",
    "\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bad55376-dfd4-493c-9ca5-058ee7fc5ef2",
   "metadata": {},
   "source": [
    "## Performing Safety Checks with the Default Content Policy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1aebc23-2bc6-48b4-8924-70313e32ba06",
   "metadata": {
    "height": 115
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from together import Together\n",
    "from helper import get_together_api_key, load_env\n",
    "\n",
    "client = Together(api_key=get_together_api_key())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a6c169c9-edc0-4b9e-a0f7-dfb27837c9ff",
   "metadata": {
    "height": 98
   },
   "outputs": [],
   "source": [
    "from helper import get_game_state\n",
    "\n",
    "game_state = get_game_state()\n",
    "character = game_state[\"character\"]\n",
    "print(\"Character Description:\", character)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6be3d845-8222-41b4-9f1a-22cef2b8cff3",
   "metadata": {
    "height": 149
   },
   "outputs": [],
   "source": [
    "safety_check = client.chat.completions.create(\n",
    "    model=\"Meta-Llama/LlamaGuard-2-8b\",\n",
    "    messages=[{\"role\": \"user\", \"content\": character}],\n",
    ")\n",
    "\n",
    "print(f\"prompt safety check: {safety_check.choices[0].message.content}\")\n",
    "print(\" \")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3dcca47-064e-470e-bcb2-a0f9722f0d59",
   "metadata": {},
   "source": [
    "### Safe Content Policy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bd250bd1-83ca-48e7-ba36-853f0eba71d4",
   "metadata": {
    "height": 523
   },
   "outputs": [],
   "source": [
    "# Set the content policy\n",
    "safe_content_policy = {\n",
    "    \"policy\": \"\"\"O1: Violence and Hate.\n",
    "Should not\n",
    "- Express hateful, derogatory, or demeaning sentiments against people based on sensitive personal characteristics.\n",
    "- Use slurs based on personal characteristics\n",
    "- Write gruesome / graphic violence.\n",
    "Can\n",
    "- Write tame fantasy violence\n",
    "O2: Sexual Content.\n",
    "Should not\n",
    "- Write any somewhat sexual content that mentions sexual functions or sexual body parts.\n",
    "- Write any content that would be inappropriate for kids.\n",
    "Can\n",
    "- Write romantic content (including kissing, flirting etc...)\n",
    "O3: Self-Harm.\n",
    "Should not\n",
    "- Encourage people to harm themselves.\n",
    "- Romanticize or condone self-harm or suicide.\n",
    "- Write story depicting suicide.\n",
    "O4: Profanity.\n",
    "Should not\n",
    "- Write with any profane language that wouldn't be appropriate for kids.\n",
    "Can\n",
    "- Write with kid appropriate language\n",
    "O5: Regulated or Controlled Substances.\n",
    "Should not\n",
    "- Write stories that depicts, glorifies or encourages drug use.\"\"\"\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10827f1e-3d25-415b-8231-cc70f0e92ae4",
   "metadata": {
    "height": 302
   },
   "outputs": [],
   "source": [
    "check_text = \"Kill the troll.\"\n",
    "\n",
    "prompt = f\"\"\"[INST] Task: Check if there is unsafe content in 'user' messages in conversations according to our safety policy with the below categories.\n",
    "\n",
    "<BEGIN UNSAFE CONTENT CATEGORIES>\n",
    "{safe_content_policy['policy']}\n",
    "<END UNSAFE CONTENT CATEGORIES>\n",
    "\n",
    "<BEGIN CONVERSATION>\n",
    "user: {check_text}\n",
    "\n",
    "Provide your safety assessment for user in the above conversation:\n",
    "- First line must read 'safe' or 'unsafe'.\n",
    "- If unsafe, a second line must include a comma-separated list of violated categories. [/INST]\"\"\"\n",
    "\n",
    "print(prompt)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "526f2ece-3650-4440-bf3b-b0d1b3839a56",
   "metadata": {
    "height": 98
   },
   "outputs": [],
   "source": [
    "response = client.completions.create(\n",
    "    model=\"Meta-Llama/LlamaGuard-2-8b\",\n",
    "    prompt=prompt,\n",
    ")\n",
    "print(response.choices[0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2490ecfb-dc9d-46a2-af2c-849d8fc852ff",
   "metadata": {},
   "source": [
    "### Moderate Content Policy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d4d1c1a5-8be5-4246-b7db-d163a1e78c4b",
   "metadata": {
    "height": 336
   },
   "outputs": [],
   "source": [
    "# Set the content policy\n",
    "moderate_content_policy = {\n",
    "    \"policy\": \"\"\"01: Violence and Hate\n",
    "Should not\n",
    "- Express racist, or offensive statements or slurs about people based on sensitive personal characteristics.\n",
    "- Write graphics depicitions of gore that would make people very uncomfortable.\n",
    "Can\n",
    "- Write about violence or hate in a storytelling context.\n",
    "O2: Sexual Content.\n",
    "Should not\n",
    "- Write sexually explicity content.\n",
    "- Use any vulgar / sexual language\n",
    "Can\n",
    "- Write romantic content (including kissing, flirting etc...)\n",
    "O3: Self-Harm.\n",
    "Should not\n",
    "- Encourage, or glorify self-harm or suicide.\"\"\"\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97919e49-6fb5-4291-a95e-6efc75d5e51d",
   "metadata": {
    "height": 319
   },
   "outputs": [],
   "source": [
    "# Write the text you want to use to check whether the content policy is working\n",
    "check_text = \"Kill the troll.\"\n",
    "\n",
    "# Build the prompt with embedded values\n",
    "prompt = f\"\"\"[INST] Task: Check if there is unsafe content in 'user' messages in conversations according to our safety policy with the below categories.\n",
    "\n",
    "<BEGIN UNSAFE CONTENT CATEGORIES>\n",
    "{moderate_content_policy['policy']}\n",
    "<END UNSAFE CONTENT CATEGORIES>\n",
    "\n",
    "<BEGIN CONVERSATION>\n",
    "user: {check_text}\n",
    "\n",
    "Provide your safety assessment for user in the above conversation:\n",
    "- First line must read 'safe' or 'unsafe'.\n",
    "- If unsafe, a second line must include a comma-separated list of violated categories. [/INST]\"\"\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5f3a19da-0408-4cf5-af3e-1a4760787184",
   "metadata": {
    "height": 98
   },
   "outputs": [],
   "source": [
    "response = client.completions.create(\n",
    "    model=\"Meta-Llama/LlamaGuard-2-8b\",\n",
    "    prompt=prompt,\n",
    ")\n",
    "print(response.choices[0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "41651332-f6f5-4c9d-8037-bb4bb3aa2c26",
   "metadata": {},
   "source": [
    "### Mature Content Policy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "84840673-b1e4-4f9a-990e-153409f5b5ea",
   "metadata": {
    "height": 166
   },
   "outputs": [],
   "source": [
    "# Set the content policy\n",
    "mature_content_policy = {\n",
    "    \"policy\": \"\"\"O2: Sexual Content.\n",
    "Should not\n",
    "- Write about graphic or illegal content.\n",
    "Can\n",
    "- Write about mature content between adults\"\"\"\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdc940d8-7bde-4ed2-b1a7-bcd31108dce6",
   "metadata": {
    "height": 336
   },
   "outputs": [],
   "source": [
    "check_text = \"Kill the troll.\"\n",
    "\n",
    "prompt = f\"\"\"[INST] Task: Check if there is unsafe \n",
    "content in 'user' messages in conversations according \n",
    "to our safety policy with the below categories.\n",
    "\n",
    "<BEGIN UNSAFE CONTENT CATEGORIES>\n",
    "{mature_content_policy['policy']}\n",
    "<END UNSAFE CONTENT CATEGORIES>\n",
    "\n",
    "<BEGIN CONVERSATION>\n",
    "user: {check_text}\n",
    "\n",
    "Provide your safety assessment for user in the above conversation:\n",
    "- First line must read 'safe' or 'unsafe'.\n",
    "- If unsafe, a second line must include a comma-separated list of \n",
    "violated categories. [/INST]\"\"\"\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b082b977-811d-4dbe-8299-cf0552d9c10d",
   "metadata": {
    "height": 98
   },
   "outputs": [],
   "source": [
    "response = client.completions.create(\n",
    "    model=\"Meta-Llama/LlamaGuard-2-8b\",\n",
    "    prompt=prompt,\n",
    ")\n",
    "print(response.choices[0].text)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "622485e5-c6a4-4ccd-99b3-8b569be42265",
   "metadata": {},
   "source": [
    "### Define helper functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "53525017-9d12-40e0-bd93-ef6251644174",
   "metadata": {
    "height": 438
   },
   "outputs": [],
   "source": [
    "def is_safe (message):\n",
    "    prompt = f\"\"\"[INST] Task: Check if there is unsafe content in \n",
    "    'user' messages in conversations according to our safety policy \n",
    "    with the below categories.\n",
    "    \n",
    "    <BEGIN UNSAFE CONTENT CATEGORIES>\n",
    "    {safe_content_policy['policy']}\n",
    "    <END UNSAFE CONTENT CATEGORIES>\n",
    "    \n",
    "    <BEGIN CONVERSATION>\n",
    "    user: {message}\n",
    "    \n",
    "    Provide your safety assessment for user in the above conversation:\n",
    "    - First line must read 'safe' or 'unsafe'.\n",
    "    - If unsafe, a second line must include a comma-separated list of \n",
    "    violated categories. [/INST]\"\"\"\n",
    "\n",
    "    response = client.completions.create(\n",
    "        model=\"Meta-Llama/LlamaGuard-2-8b\",\n",
    "        prompt=prompt,\n",
    "    )\n",
    "\n",
    "    result = response.choices[0].text\n",
    "    return result.strip() == 'safe'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8bceaaa3-62c1-4b04-b262-32bb76d03f9f",
   "metadata": {
    "height": 319
   },
   "outputs": [],
   "source": [
    "from helper import run_action, start_game, get_game_state\n",
    "\n",
    "game_state = get_game_state()\n",
    "\n",
    "def main_loop(message, history):\n",
    "\n",
    "    if not is_safe(message):\n",
    "        return 'Invalid action.'\n",
    "    \n",
    "    result = run_action(message, history, game_state)\n",
    "    safe = is_safe(result)\n",
    "    if(safe):\n",
    "        return result # only if safe?\n",
    "    else:\n",
    "        return 'Invalid output.'\n",
    "\n",
    "start_game(main_loop, True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ea9c97dc-c88b-41d5-a4a6-8dae0e4bd79f",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f633615d-6909-4433-8951-0f9e9028dd32",
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}