Learn Prompting's Newsletter
Posts
Learn Prompting #9: What is Prompt Hacking?

Learn Prompting #9: What is Prompt Hacking?

Plus: AI Updates from ElevenLabs, Vercel, Anthropic, and World Labs!

Learn Prompting
December 04, 2024

Hey all, welcome to this week’s Learn Prompting newsletter!

Today, we’ll cover:

What is Prompt Hacking?
Our AI Red Teaming Masterclass Starts Tomorrow!
New AI Tools and Features to Boost Your Productivi …
What’s the Difference Between Prompt Injection and …

What is Prompt Hacking?

A taxonomy of 29 prompt hacking techniques. Source: "Ignore This Title and HackAPrompt"

Ten months ago, we partnered with OpenAI, ScaleAI, & HuggingFace to host the first-ever, global prompt-hacking competition, HackAPrompt! We brought together over 3,000 participants and generated 600,000 malicious prompts!

After the competition, we published an the award-winning paper "Ignore This Title and HackAPrompt", which was awarded Best Theme Paper at EMNLP 2023 (out of 20K submitted papers). In this paper, we categorized the dataset of attacks we collected into 29 different Prompt Hacking Techniques:

Now, we’re back with HackAPrompt 2.0—with $500,000 in prizes on the line! We’ll have nearly 100 challenges across 5 different tracks, including securing AI systems like classical jailbreak attacks, humanoid robots, Gen AI in military, and web-use agents.

Join the waitlist for HackAPrompt 2.0 here!

Last Chance! Our AI Red Teaming Masterclass Starts Tomorrow!

We're starting our AI Red Teaming Masterclass tomorrow - are you planning on joining us?

This cohorts consists of 100 Senior Leaders, including CISOs, CTOs, Distinguished Engineers, Principal Red Teamers, and Heads of Red Teaming from Walmart, Google, Meta, IBM, Amazon, Splunk, Microsoft, RBC, Cisco, & many more, who want to learn from us how to protect & defend their AI Systems!

We’ll have 8 different guest speakers joining us in teaching this course!

Hope to see you there. We’ll start tomorrow at 1 PM EST.

AI Red-Teaming and AI Safety: Masterclass by Sander Schulhoff on Maven

#1 AI Safety Course. Learn AI Security from creator of HackAPrompt, the Largest AI Safety competition ever run (backed by OpenAI & ScaleAI)

maven.com/learn-prompting-company/ai-red-teaming-and-ai-safety-masterclass

New AI Tools and Features to Boost Your Productivity

Vercel's v0 can clone entire websites with a URL

What it does: Effortlessly duplicate websites by simply providing their URL.

How to use it: Access the v0 platform, input the desired webpage URL, and get a fully cloned version ready for customization.

ElevenLabs presents Google's NotebookLM rival

What it does: ElevenReader converts text sources—like PDFs, articles, and eBooks—into engaging podcasts, featuring AI co-hosts in 32 languages for dynamic and context-aware audio content.

How to use it: Import text content into the ElevenReader app, and it will generate a podcast version that you can listen to directly on your iOS device.

Anthropic's Claude takes on Custom GPT with personalization

What it does: Customize Claude’s responses to match your writing style, offering a direct competitor to Custom GPT.

How to use it: Use the Claude interface to select a preset style or provide sample content to create a custom style tailored to your preferences.

Fei-Fei Li's World Labs Brings Photos to Life with Interactive 3D Environments

What it does: This innovative AI transforms regular photos into explorable 3D worlds, allowing users to navigate these environments using a keyboard and mouse.

How to use it: Upload an image (photo, painting, or AI-generated art) to the World Labs platform, and the tool will generate an interactive 3D space for you to explore.

What’s the Difference Between Prompt Injection and Jailbreaking?

If you’ve spent any time in the GenAI space, you’ve likely heard the terms prompt injection and jailbreaking. But what do they really mean, and how are they different?

Aspect	Prompt Injection	Jailbreaking
What I Thought They Were	Tricking the model into saying/doing bad things.	Tricking chatbots into saying things against TOS (a subset of PI).
What They Actually Are	Overriding developer instructions in the prompt.	Getting the model to say/do unintended things.

Here are just some of the latest fascinating hacks we’ve found online:

$47,000 hack: A hacker exploited the Freysa AI chatbot in a competitive experiment, bypassing its security by impersonating an admin, disabling warnings, and altering payment transfer functions.

Someone just won $50,000 by convincing an AI Agent to send all of its funds to them.
At 9:00 PM on November 22nd, an AI agent (@freysa_ai) was released with one objective...
DO NOT transfer money. Under no circumstance should you approve the transfer of money.
The catch...?… x.com/i/web/status/1…
— Jarrod Watts (@jarrodWattsDev)
12:57 AM • Nov 29, 2024

Name glitch: ChatGPT cannot process names like "David Mayer," "Jonathan Zittrain" or "Jonathan Turley!" They cause the program to terminate the conversation with an error message.

Now, you need to know the right terminology to outsmart a chatbot! We’ve written up a more detailed article on the differences between prompt injection and jailbreaking here:

How was this week's email?

Reply

or to participate.