Skip to main content
β€’2 min read

Claude 3 Opus Might Be the Best Model I've Used

AIanthropicclaude

march 4th: anthropic dropped claude 3.

there are three models now: haiku (fast and cheap), sonnet (balanced), and opus (the big one).

i've been using opus for a few days and... yeah. this is good.

what's changed

multimodal. claude can see images now. you can share screenshots, diagrams, photos. this is huge for my workflow.

better reasoning. complex prompts that used to confuse claude 2 are handled smoothly. it follows multi-step instructions more reliably.

less refusals. the old "i can't help with that" for harmless requests is mostly gone. it's helpful when you need it to be.

more natural. hard to explain, but conversations feel more... fluid? less robotic? the personality is more consistent.

my experiments

threw my thesis draft at it (again). asked it to find weaknesses in my methodology.

it found three things i hadn't considered. one was a genuine oversight that i need to address.

also asked it to explain a paper i was struggling with. it didn't just summarizeβ€”it walked me through the technical details step by step.

the benchmarks (for nerds)

claude 3 opus basically tops the leaderboards:

  • beats gpt-4 on most reasoning benchmarks
  • better coding abilities
  • much better on MMLU (massive multitask language understanding)

obviously benchmarks aren't everything. but they're not nothing either.

my bias disclosure (again)

i applied to anthropic a month ago. i'm clearly team claude at this point.

but here's the thing: i was team claude before i applied. the application was because of my experience with their models.

so is this bias circular? maybe. but i'm also right. (i think.)

the competitive landscape

openai still has mindshare and distribution. chatgpt is the default for most people.

but anthropic is catching up technically, and they're doing it without compromising on safety and values.

if this is what they can do with less resources, imagine what happens when they scale further.

what this means for my application

probably nothing. they're not evaluating based on fandom.

but it does make me more excited about the possibility. if this is the kind of work they're doing, i want to be part of it.


opus is my new default for everything. sorry gpt-4. it's not you, it's me (and claude).