Making AI safe at Anthropic

I'm Isa, a research engineer on the Alignment Science team. I build experiments to understand and steer the behavior of powerful AI systems.

About me Get in touch

Ask me anything about AI

A research engineer working to keep AI safe as it becomes more powerful.

I run experiments to figure out how AI systems might fail, and how to stop that from happening. Based in San Francisco.

Model organisms of misalignment

Training models to misbehave on purpose

Safety evaluations

Testing if our defenses actually work

Multi-agent experiments

AI Debate and scalable oversight

Featured Projects

Scalable Oversight

Keeping highly capable models helpful and honest

Alignment Stress-testing

Building model organisms to study misalignment

Safeguards Research

Developing robust defenses against adversarial attacks

View all projects

Alignment Playground

Learn how AI alignment works through interactive demos. RLHF training, spotting misalignment, Constitutional AI, and red-teaming - all explained through hands-on exercises.

Recent Writing

All posts

PersonalDec 20, 2025

Training Models to Misbehave

Currently working on

Training model organisms to test if our safety techniques actually work

Running AI Debate experiments for scalable oversight

Building red-teaming tools to stress-test model safety

Want to chat about AI safety, alignment research, or career advice?

Always happy to connect with people interested in AI safety.

Get in touch Read my story