I did not expect to be impressed. I had not used ChatGPT Codex models or the harness much until now. I was pretty darn happy with Claude, especially with OpenCode as its harness.
Boy was I missing out.
Codex is slow, but it just never misses. Prompting it is far more natural than any other model I have used. I see now why people flame the whole plan to implement flow. With Codex, it just happens. It is just a conversation.
My go-to will still probably be a faster model. I just cannot bear to wait so long for quick commands or edits. But for complex tasks where precision matters, like getting a finicky build or refactor right on the first pass, Codex is starting to look very compelling.
The harness itself is surprisingly good as well. I am especially a fan of the default sandboxed environment. It has already saved me from a few gaffes. It can be annoying, especially when it refuses to use certain tools or make network requests. Despite this, I think the industry will move more in the direction of sandboxing. Vibe coding is fun and games until you rm -rf some critical directory without knowing it.
I used Codex to make this little CLI to manage my Langfuse prompts. Took about 10 minutes once I gave it the spec. I never really had to guide it other than for some custom output needs.
