Skip to main content

Interactive Learning

Alignment Playground

Learn how AI alignment works through interactive demos. No API calls, no costs - just educational examples showing real alignment challenges.

Based on techniques I work with at Anthropic. All examples are pre-written for education.

What You'll Learn

RLHF Training

How human preferences shape AI behavior through reinforcement learning.

Failure Modes

Common ways AI can go wrong: sycophancy, deception, harmful outputs.

Safety Techniques

Constitutional AI, red-teaming, and adversarial robustness.

Want to learn more about my work on these topics?

View my research projects