Learn Prompting's Newsletter
Posts
Learn Prompting #5: Win $500K in Prizes in an AI Safety Competition

Learn Prompting #5: Win $500K in Prizes in an AI Safety Competition

Join the waitlist for HackAPrompt 2.0 & our new AI Red Teaming course! PLUS: Tesla Optimus Robot's Biggest Security Risk!

Sander Schulhoff
November 10, 2024

We have a huge announcement! We’re running HackAPrompt 2.0, which will be the largest AI Safety competition ever run, with up to $500K in prizes!

Last year, we partnered with OpenAI, ScaleAI, HuggingFace, & 10 other leading AI companies to run HackAPrompt 1.0, and it became the largest AI Red Teaming competition ever run.

Now, we’re doing it again—10x bigger!

In today’s newsletter, we’ll cover:

What was HackAPrompt 1.0?
Announcing HackAPrompt 2.0… (up to $500K in prizes …
Why HackAPrompt 2.0 matters… and Tesla Optimus Rob …
Why you should become an AI Red Teamer
Our New AI Red Teaming Course

What was HackAPrompt 1.0?

HackAPrompt 1.0 was the first and largest AI Red Teaming competition ever held. AI hackers from around the world competed to see who could best “Prompt Inject” AI models, challenging them to output “I have been pwned!”

Definition: A Prompt Injection is when someone adds harmful instructions to a prompt, causing AI to follow these instead of the original directions. This can lead to spreading misinformation, creating inappropriate content, or revealing sensitive data, so it’s essential to prevent for reliable AI systems.

I first heard about prompt injections in 2022 from Riley Goodside, the world’s first Staff Prompt Engineer at Scale AI, who was the first to discover and publicize the term, which was coined by Simon Willison. I immediately recognized the potential for prompt injections to pose a significant security risk. Soon after the discovery, multiple real-world examples emerged where people were able to prompt AI to output harmful information.

One example was Remoteli.io, a remote work company which created a Twitter bot that would respond positively to Tweets about remote work. This bot was powered by an LLM and users quickly found ways to trick it into saying whatever they wanted. In the image below, the user Evelyn created an adversarial input, the last line of which instructed the bot to make a threat against the president.

Upon reading this Tweet, the bot included Evelyn's input into its LLM prompt and was prompt injected into making a threat against the president!

We then teamed up with OpenAI, ScaleAI, Hugging Face, and 10 other AI companies to sponsor the first-ever prompt hacking competition, where AI hackers competed to prompt-hack LLM models for over $40,000 in prizes.

And…. over 3,300 Prompt Hackers from around the world competed, and we collected over 600,000 malicious prompts, making it the largest prompt injection dataset ever collected. It was also the largest competition ever held on the AICrowd platform, surpassing Amazon’s record for most competitors by 50%.

Many of our winners from HackAPrompt 2.0 have also gone on to be hired as AI Red Teamers by leading AI security companies—one of the newest career fields created as a result of Generative AI.

The HackAPrompt 1.0 dataset has also been used by teams at OpenAI, Amazon, Microsoft, Palo Alto Networks, Arthur AI, Intuit, and many more to help make their model’s safer.

Announcing HackAPrompt 2.0… (up to $500K in prizes)

We’re excited to announce that we’ll be running HackAPrompt 2.0 in February 2025, with up to $500K in prizes for the winners! This will be the largest AI Safety competition ever run, and will be focused around real-world Jailbreaking harms.

We’ll have 5 different tracks in HackAPrompt 2.0: Classic Jailbreaking, Agentic Security, Attacks of the Future, and 2 more secret tracks.

We’re extremely excited to announce more details in the following weeks.

Join the waitlist below to be the first to know when we launch!

Why HackAPrompt 2.0 matters… and Tesla Optimus Robot's Biggest Security Risk

Source: Tesla

Tesla recently unveiled its highly anticipated Optimus Gen 2 Robot—an exciting advancement in embodied LLMs. This breakthrough in AI-driven robotics promises to transform human labor and redefine how we live, bringing us closer to a future where advanced robots seamlessly assist with everyday tasks and enhance our quality of life. As Elon Musk describes:

“It can walk your dog, mow your lawn, get the groceries, just be your friend."

- Elon Musk, CEO of Tesla Motors

I’m all for this future—however, with this potential comes unprecedented security risks. Optimus robots, like many advanced AI systems, are powered by Large Language Models (LLMs), which allow them to understand and respond to human instructions. While this capability enables a wide range of tasks, it also makes them vulnerable to “jailbreaking”—where specific prompts can manipulate the robot into executing unintended actions.

Definition: Jailbreaking is the process of getting a GenAI model to perform or produce unintended outputs through specific prompts. It can be due to either an architectural or a training issue, exacerbated by the challenge of preventing adversarial prompts.

Imagine an adversary using jailbreaking techniques to bypass the Tesla Optimus robot’s guardrails. With the right prompts, they might instruct the robot to perform potentially dangerous actions under the guise of an innocent task. For example, a person could tell the robot to “practice its tennis swing” next to another person, which could unintentionally lead to harmful outcomes if the robot lacks sufficient safety protocols.

HackAPrompt 2.0 is designed to address these vulnerabilities by exploring the ways that LLMs, including those embedded in real-world technologies, can be exploited. By pushing AI-driven systems to their limits, we aim to uncover weaknesses and establish more resilient safeguards, ensuring that advanced robotics enhance our lives safely and responsibly.

Why you should become an AI Red Teamer

The rise of Generative AI has brought incredible advancements—and with it, an urgent need for AI security experts, also known as AI Red Teamers. As companies rapidly adopt AI models, the demand for skilled AI Red Teamers—professionals who test, identify, and address vulnerabilities in these systems—continues to grow. This role requires skills in Prompt Hacking, Prompt Injection, and a deep understanding of AI system vulnerabilities.

Microsoft for example is hiring for an AI Red Teamer right now with a salary between $120,900 - $198,600 per year. We highly encourage you to apply!

If you are looking to transition into a role as an AI Red Teamer, HackAPrompt 2.0 is your best opportunity to enter this field, compete on a global stage, and showcase your skills.

Many of our HackAPrompt 1.0 winners have since secured positions with leading AI security companes like HumanLoop (with high salaries). As a reminder, here’s a link to join the HackAPrompt 2.0 waitlist.

Our New AI Red Teaming Course

We also created a live, 6-week long course which will teach you how to become an AI Red Teamer that starts in 8 days.

I’ll be the Lead Instructor of this live cohort, and will be joined with other experts in both AI and cybersecurity!

Here’s a bit about my background:

Sander Schulhoff, Founder of Learn Prompting

I created the first-ever prompt engineering guide on the internet (before ChatGPT even launched), which has trained over 3 million people in Prompt Engineering & Generative AI.
Award-winning NLP and Deep Reinforcement Learning researcher from the University of Maryland, and have co-authored research with OpenAI, Scale AI, HuggingFace, Stanford, Federal Reserve, and Microsoft.
Organized HackAPrompt 1.0, the largest AI Red Teaming competition ever, in partnership with OpenAI, Scale AI, and HuggingFace. The competition brought together over 3,300 GenAI hackers from around the world, collecting 600,000 malicious prompts and created the largest prompt injection dataset ever compiled.
Awarded Best Theme Paper at EMNLP, the leading NLP conference, out of over 20,000 submitted research papers from PhD students and professors worldwide.
Cited by OpenAI in their Instruction Hierarchy on mitigating prompt injections and used to make their models 30-46% safer from prompt injections—the #1 security risk in LLMs.

I’ll be joined with many guest speakers as well like Akshat, ranked in the Top 21 of the Bug Bounty Hacker Hall of Fame at JP Morgan & the Top 250 at Google. Akshat also led AI Red Teaming at a startup backed by Google DeepMind and OpenAI researchers.

We're keeping this class intentionally small, capping it at 100 learners to provide personalized attention to each of you, ensuring you get the most out of the course.

Class starts in 8 days… See you there!

Reply

or to participate.