Learn Prompting #8: Can AI Be Trained to Feel Human Emotions?

PLUS: Security Risk & Implications of AIs that can think, feel, and talk like us.

Hey folks,

We’ve got a great newsletter for you today! First, a few updates on our AI Red Teaming course (which starts in 9 days). Then, we’ll dive into how researchers are training AI to feel emotions like a human!

Our AI Red Teaming course goes UP by $300 in 4 days!

If you’re looking to secure your AI applications, don’t miss our live AI Red Teaming course (price goes up in 4 days by $300). So far, this cohort consists of CISOs, Principal Security Engineers, CTOs, and AI Product Managers from companies like Walmart, Cisco, IBM, Microsoft, and Splunk, who want to learn from us how to protect their AI systems.

The cohort is already over 50% full, and starts in 9 days! Speakers include:

  • Lead Instructor, Sander Schulhoff: CEO of Learn Prompting & Organizer of HackAPrompt, the 1st and Largest Generative AI Red Teaming competition ever, created with OpenAI, ScaleAI, & Hugging Face. The dataset from the competition was used by OpenAI to increase resistance to prompt injection attacks by up to 46% in their latest models. Lead Author of Ignore This Title and HackAPrompt, which has been cited by OpenAI twice (Instruction Hierarchy paper, Automated Red Teaming paper)

  • Johann Rehberger: Created the Red Team at Microsoft Azure as a Principal Security Engineering Manager and built the Red Team at Uber. Johann discovered attack vectors like ASCII Smuggling and AI-powered C2 (Command and Control) attacks. He's also discovered Bug Bounties in OpenAI’s ChatGPT, Microsoft Copilot, GitHub Copilot Chat, Anthropic Claude, & Google Bard/Gemini.

  • Pliny the Prompter: The most renowned AI jailbreaker, who has successfully jailbroken every AI model released to date—including OpenAI’s o1, which hasn’t even been made public!

  • Joseph Thacker: Principal AI Engineer at AppOmni, security researcher specializing in application security and AI, with over 1,000 vulnerabilities submitted across HackerOne and Bugcrowd.

  • Akshat Parikh: Former AI security researcher at a startup backed by OpenAI and DeepMind researchers, Top 21 in JP Morgan’s Bug Bounty Hall of Fame, and Top 250 in Google’s Bug Bounty Hall of Fame.

Now, to the main topic!

AI That Feels Like A Real Person

Researches from Stanford and Google DeepMind are using simulation agents to replicate the attitudes, emotions, and behaviors of 1,000+ real individuals with an impressive 85% accuracy. By conducting 2-hour interviews with each participant, they are training AI to act, respond, and even emulate emotions as if it were a real person.

Fun fact: The latest Alibaba's Qwen2.5 Turbo reads ten novels in just one minute. And what’s your reading speed?

How Simulation Agents Were Created

With advancements like these, we’re on the brink of encountering AI systems that don’t just mimic humans—they’ll act, decide, and even use tools autonomously. Instead of simply interacting with models like ChatGPT or MidJourney, these systems could seamlessly blend into human-like roles.

The result? You might not even realize whether you’re engaging with a person or a remarkably convincing AI. A sci-fi dream? Perhaps. A hacker’s dream? Without a doubt.

The Security Risks and Ethical Considerations

Simulation agents bring incredible potential but also open the door to hard-to-detect exploits. Here’s what could go wrong:

  1. Sophisticated Social Engineering: Imagine an AI scammer who knows your hobbies, favorite coffee order, and how to earn your trust—far beyond today’s phishing techniques.

  2. Impersonation and Deepfakes: These agents could create convincing digital replicas of real people. With enough scraped data, they could bypass identity checks, deceive customer support, and spark chaos on social media.

  3. Generating Harmful Content: Just like today’s AI can be tricked into breaking rules, these agents could be manipulated into producing malicious content.

AI systems already struggle with vulnerabilities, and simulation agents add a new layer of complexity. Tech giants like OpenAI, Anthropic, and Cohere need to step up with serious safeguards like:

  • Advanced security protocols to keep the bad actors out.

  • Rigorous testing to uncover potential exploits before the public does.

We’ll talk about this more in depth in our AI Red Teaming course which starts in 9 days!

What’s more about agents from last week?

The Latest in Prompting

How well can your LLMs be steered to reflect different value systems? Turns out, it’s not as easy as you’d think! IBM researchers discovered some interesting (and slightly concerning) things about AI’s ability to adapt to new perspectives through prompting:

  • Models struggle with flexibility. Many AIs have a hard time adjusting to new or diverse viewpoints.

  • Negative bias is a problem. It’s often easier to steer models toward negative or extreme stances than toward positive or balanced ones.

  • Size matters. Larger models are better at adapting, requiring fewer examples to steer them effectively.

The takeaway? AI still has a long way to go in representing multiple perspectives fairly.

GenAI Market Updates

FLUX.1 Inpainting in Action

Our Resources: 25+ prompt hacking techniques

OpenAI recently published a paper proposing automated red-teaming (read a tweet about it), and guess what? Our HackAPrompt 1.0 research analyzing 600K+ adversarial prompts was cited multiple times!

In our paper, we also share 25+ prompt hacking techniques, along with a unique categorization of prompt hacking attacks. Want to dive deeper? Check it out here!

And if you’re excited to contribute to similar research, HackAPrompt 2.0 is on its way—with a massive $500,000 in prizes! It’s set to be the largest AI safety hackathon ever. Join the waitlist today!

How was this week's email?

Login or Subscribe to participate in polls.

Reply

or to participate.