Claude 3 Opus Might Be the Best Model I've Used

march 4th: anthropic dropped claude 3.

there are three models now: haiku (fast and cheap), sonnet (balanced), and opus (the big one).

i've been using opus for a few days and... yeah. this is good.

what's changed

multimodal. claude can see images now. you can share screenshots, diagrams, photos. this is huge for my workflow.

better reasoning. complex prompts that used to confuse claude 2 are handled smoothly. it follows multi-step instructions more reliably.

less refusals. the old "i can't help with that" for harmless requests is mostly gone. it's helpful when you need it to be.

more natural. hard to explain, but conversations feel more... fluid? less robotic? the personality is more consistent.

threw my thesis draft at it (again). asked it to find weaknesses in my methodology.

it found three things i hadn't considered. one was a genuine oversight that i need to address.

also asked it to explain a paper i was struggling with. it didn't just summarize—it walked me through the technical details step by step.

claude 3 opus basically tops the leaderboards:

obviously benchmarks aren't everything. but they're not nothing either.

i applied to anthropic a month ago. i'm clearly team claude at this point.

but here's the thing: i was team claude before i applied. the application was because of my experience with their models.

so is this bias circular? maybe. but i'm also right. (i think.)

openai still has mindshare and distribution. chatgpt is the default for most people.

but anthropic is catching up technically, and they're doing it without compromising on safety and values.

if this is what they can do with less resources, imagine what happens when they scale further.

probably nothing. they're not evaluating based on fandom.

but it does make me more excited about the possibility. if this is the kind of work they're doing, i want to be part of it.

opus is my new default for everything. sorry gpt-4. it's not you, it's me (and claude).