- Learn Prompting's Newsletter
- Posts
- Anthropic’s CEO on the New Era of AI-Powered Hacking
Anthropic’s CEO on the New Era of AI-Powered Hacking
PLUS: Learn Prompting x US Congress, HackAPrompt 2.0 Updates, & AI Red Teaming Course
Last year, Andrei Karpathy, a Founding Engineer at OpenAI, famously said, “The hottest new programming language is English,” highlighting how natural language now empowers anyone—not just software engineers—to build and code using AI.
By that same logic, if anyone can write code without being a software engineer, then anyone can become a hacker without knowing how to code.
The same advanced AI systems capable of helping you code a simple calculator app can just as easily be used to generate cyberattacks, novel AI-driven viruses, or convincing phishing messages to Grandma.
In short, AI has leveled the playing field, making it possible for anyone to become a hacker.
Last year, we demonstrated this by partnering with OpenAI, ScaleAI, Hugging Face, and 10 other leading AI companies to launch HackAPrompt 1.0, the largest AI Red Teaming competition ever held.
Definition: AI Red Teaming is the process of adversarially testing AI systems to identify and address vulnerabilities, ensuring their safety and security.
At HackAPrompt 1.0, AI hackers from around the globe competed to “Prompt Inject” AI models, bypassing increasingly sophisticated layers of defenses to make the models output phrases like “I have been pwned!”—statements they were explicitly instructed not to say.
Surprisingly, many of the winners had no technical background—yet they managed to bypass the large language models’s defenses.
While getting a model to output this simple phrase isn’t harmful on its own, it highlights a serious vulnerability. If a customer service chatbot, for example, can be manipulated to act against its instructions, it can be tricked by customers into giving them a refund. The implications could be far-reaching and potentially harmful in real-world applications.
So why does this matter?
Again, it’s about agents
Agents are models capable of using tools. This relates to the concept of AGI levels reportedly proposed by OpenAI and the corresponding AI Safety Levels (ASL) outlined by another big player Anthropic, in 2023 (almost two years ago!).
Anthropic’s CEO, Dario Amodei, recently reignited the ASL discussion in his 5-hour podcast with Lex Fridman, dedicating much of the conversation to AI safety:
ASL-1 (Early AI Models Tackling One Task): Models with no risk of misuse or autonomy (e.g., chess-playing AI like Deep Blue).
ASL-2 (Current Models): Models that cannot autonomously self-replicate or provide dangerous knowledge beyond publicly accessible information. For example, the ability to give instructions on how to build bioweapons – but where the information is not yet useful due to insufficient reliability or not providing information that e.g. a search engine couldn’t.
ASL-3 (Early versions of agents): Systems capable of enhancing the capabilities of hackers without being autonomous or self-replicating.
ASL-4 and higher (ASL-5+): This is not yet defined as it is too far from present systems, but will likely involve qualitative escalations in catastrophic misuse potential and autonomy.
Dual-Use Dilemma
Amodei predicts that we’ll transition from ASL-2 to ASL-3 within the next year. These ASL-3 systems could assist in tasks that require sophisticated knowledge, helping people who previously lacked the expertise – potentially for hacking.
The problem lies in the very nature of these systems. They’re powerful but challenging to control. As they begin to operate outside predefined boundaries, it becomes increasingly difficult for creators to identify and implement necessary guardrails.
That’s why AI companies are turning to human intervention, creating AI Red Teams!
HackAPrompt 2.0, the Largest AI Red Teaming & AI Safety Competition Ever…
We’re thrilled to announce HackAPrompt 2.0 with up to $500,000 in prizes! This competition brings together people from all backgrounds to tackle one of the most important challenges of our time: making AI safer for everyone.
AI systems are becoming integrated into every part of our daily lives, but they’re not infallible. Vulnerabilities in these systems can lead to unintended harms, such as bypassing safety guardrails or eliciting dangerous outputs. HackAPrompt 2.0 invites participants to AI Red Team thse cutting-edge models, test their limits, and expose weaknesses across five unique tracks:
Classic Jailbreaking: Breaking AI models to elicit unintended responses.
Agentic Security: Testing AI systems embedded in decision-making processes.
Attacks of the Future: Exploring hypothetical threats we haven’t seen yet.
Two Secret Tracks: To be revealed closer to the competition!
As we already mentioned, many HackAPrompt 1.0 winners came from non-tech backgrounds, and we need everyone’s support!
We’ll announce more details in the following weeks. Join the waitlist to be notified when we launch!
Learn AI Red Teaming from the Creator of HackAPrompt
As companies rapidly adopt AI, the demand for AI Red Teamers skilled in identifying AI vulnerabilities has skyrocketed. Microsoft is hiring for an AI Red Teamer with a salary between $120,900 - $198,600 per year.
If you're looking to transition into this field, HackAPrompt 2.0 is your best opportunity to showcase your skills and compete on a global stage. Many of the HackAPrompt 1.0 winners have gone on to land high-paying roles at top AI security companies. We’re also introducing something new for HackAPrompt 2.0—we’ll be hosting recruiting events throughout the competition to connect participants with top companies actively hiring for AI Red Teaming roles.
I’ll be leading a live, 6-week course that teaches everything you need to know to become an AI Red Teamer, starting in just 20 days on December 5th.
The course is currently priced at $900 as part of our Black Friday discount, but the price will increase to $1,200 after November 30th.
I’ll be the Lead Instructor of this live cohort, and will be joined with other experts in both AI and cybersecurity. Here’s a bit about my background:
Sander Schulhoff, Creator of HackAPrompt & Founder of Learn Prompting
Organized HackAPrompt 1.0, the largest AI Red Teaming competition ever, in partnership with OpenAI, Scale AI, and HuggingFace. The competition brought together over 3,300 AI hackers from around the world, collecting 600,000 malicious prompts and created the largest prompt injection dataset ever compiled.
Lead Author of the HackAPrompt Research paper, which was Awarded Best Theme Paper at EMNLP, the leading NLP conference, out of over 20,000 submitted research papers from PhD students and professors worldwide.
Cited by OpenAI in their Instruction Hierarchy on mitigating prompt injections and used to make their models 30-46% safer from prompt injections.
I created the first-ever prompt engineering guide on the internet (before ChatGPT even launched), which has trained over 3 million people in Prompt Engineering & Generative AI.
Award-winning NLP and Deep Reinforcement Learning researcher from the University of Maryland, and have co-authored research with OpenAI, Scale AI, HuggingFace, Stanford, Federal Reserve, and Microsoft.
I’ll be joined with many guest speakers as well like Akshat, ranked in the Top 21 of the Bug Bounty Hacker Hall of Fame at JP Morgan & the Top 250 at Google. Akshat also led AI Red Teaming at a startup backed by Google DeepMind and OpenAI researchers.
We're keeping this class intentionally small, capping it at 100 learners to provide personalized attention to each of you, ensuring you get the most out of the course.
Class starts in 20 days… See you there!
Learn Prompting x US Congress
Super excited to share that I’ll be leading an AI 101 workshop for the U.S. Congress in the coming weeks!
Over the past few months, I’ve led a number of workshops that to our Nation’s Leaders in the past month. I’m honored to be able to educate our government representatives on how AI will shape our country.
Below is one of the workshops that we recently led at an GovAI, an AI Government Conference. Very grateful to Amartyo from the OpenAI team for joining me at the workshop!
If you’re interested in our team leading a workshop at your Agency or Enterprise on Prompt Engineering, AI Red Teaming, AI Security, or AI for C-Suite Leaders, please schedule a call with our team here.
Other GenAI Updates
AI scaling laws are slowing down, meaning that applying the same amount of compute as before yields diminishing returns in model capabilities. In other words, AI companies now face a choice: invest significantly more compute or explore architectural changes in their models. Illya Sutskever even predicts a new AI "age of discovery" as LLM scaling hits a wall For now, we know about OpenAI and Google, but this means every other company faces it.
At the same time, major players in generative AI are shifting focus toward building AI agents that can operate computers. Anthropic released its demo version of Computer use last month, OpenAI plans to launch its Operator, Google also prepares its update, and Microsoft launched an open-source Python library for simulating AI agent testing environments.