Interactive Learning
Alignment Playground
Learn how AI alignment works through interactive demos. No API calls, no costs - just educational examples showing real alignment challenges.
Based on techniques I work with at Anthropic. All examples are pre-written for education.
What You'll Learn
RLHF Training
How human preferences shape AI behavior through reinforcement learning.
Failure Modes
Common ways AI can go wrong: sycophancy, deception, harmful outputs.
Safety Techniques
Constitutional AI, red-teaming, and adversarial robustness.
Want to learn more about my work on these topics?
View my research projects