Andrej Karpathy: From Vibe Coding to Agentic Engineering w/ Stephanie Zhan
Andrej Karpathy on Software 3.0, Vibe Coding, and Why Understanding Still Wins
When one of the people who helped build modern AI says he's "never felt more behind as a programmer," it's worth asking what he saw. On stage with Sequoia's Stephanie Zhan, Andrej Karpathy unpacked that line and walked through where coding, computing, and human skill are heading. The throughline: the tools changed suddenly, the paradigm is genuinely new, and the one thing you still can't hand off is understanding.
Everything below reflects what Karpathy said in the conversation.
The December That Changed Everything
Karpathy's "never felt more behind" comment wasn't despair — it was a description of a step change. He'd been using agentic coding tools for about a year; they were helpful but error-prone. Then, around December, on a break with more time to experiment, he noticed the latest models' code chunks "just came out fine." He kept asking for more, kept getting clean results, and eventually couldn't remember the last time he had to correct the output. He was vibe coding, and his side-projects folder exploded.
His broader point: a lot of people experienced AI in 2024 as a ChatGPT-style chatbot and formed their impression then. He urged them to look again, because the agentic, coherent workflow that started genuinely working around December is a different thing.
Software 1.0, 2.0, and 3.0
To explain why this feels new, Karpathy returned to his framing of three eras. Software 1.0 is writing explicit code. Software 2.0 is "programming" by curating datasets and training neural networks. Software 3.0 is prompting: the model becomes a kind of programmable computer, and what you put in the context window is your lever over it.
He made it concrete with an installation example. Installing a certain tool used to mean running a shell script that balloons in complexity to cover every platform. The 3.0 version is just a block of text you copy-paste to your agent, which inspects your environment and debugs in the loop. The "program" is now the text you hand the agent.
MenuGen and the App That "Shouldn't Exist"
The most vivid example was MenuGen. Karpathy wanted to photograph a restaurant menu and see pictures of unfamiliar dishes, so he built an app on Vercel that does OCR, calls an image generator, and re-renders the menu with pictures. Then he saw the software-3.0 version, and it blew his mind: just give the photo to Gemini and ask its image model, which he referred to as "Nano Banana," to render the dishes directly onto the menu image. The model returned exactly that.
His conclusion was striking — his whole app "shouldn't exist." The neural network does the work; the prompt is the image, the output is the image, and there's no app in between. He pushed the point further: this isn't only about code getting faster. His LLM knowledge-bases project turns documents into a wiki, recompiling facts into a new reframing that no traditional program could produce. The exciting part, he said, is the genuinely new capabilities, not just speedups.
As an extrapolation, he floated "neural computers": raw video or audio feeding a neural net that uses diffusion to render a UI unique to the moment. Early computing wavered between calculator-like and neural-net-like machines; we took the calculator path, but he thinks it may flip, with neural nets as the host process and CPUs as the co-processor.
Verifiability and Jagged Intelligence
Why are these models brilliant at some things and oddly dumb at others? Karpathy's answer is verifiability. Traditional computers automate what you can specify in code; LLMs automate what you can verify. Because frontier labs train them as giant reinforcement-learning systems with verification rewards, the models become "jagged" — peaking in verifiable domains like math and code, rougher everywhere else.
His favorite illustration: a model will tell you to walk to a car wash 50 meters away, while a state-of-the-art model can refactor a 100,000-line codebase or find zero-day vulnerabilities. The lesson is to stay in the loop and treat the models as tools. He also noted capabilities depend on what labs feed in — chess improved sharply across model versions, he said, largely because chess data entered the pre-training set. You're somewhat at the mercy of the mix, so you have to explore which "circuits" you're in and consider fine-tuning when you fall outside them. Asked what's safe from automation, his eventual answer drew a laugh: "Everything is automatable."
Vibe Coding vs. Agentic Engineering
Karpathy drew a clean line between the term he coined last year and where we are now. Vibe coding raises the floor: anyone can build software. Agentic engineering preserves the ceiling: you keep the professional quality bar, don't introduce vulnerabilities, stay responsible for your software — but you go faster, done properly. Agents are spiky, fallible, and stochastic yet powerful, and coordinating them well is its own engineering discipline. He thinks the payoff dwarfs the old "10x engineer."
He extended this to hiring, arguing the puzzle-interview is the old paradigm. A better test, in his telling, is handing someone a big project — build a secure app, simulate activity, then try hard to break it with agents — and watching how they use the tooling.
The Skill That Grows: Taste and Understanding
As agents do more, what becomes more valuable? Karpathy's answer was aesthetics, judgment, taste, and oversight. He treats agents like interns with excellent recall; he no longer memorizes fiddly API details. But he warned their code is often bloated, copy-pasted, and brittle, and shared a telling bug: his agent tried to match payments to users by email address instead of a persistent user ID. Humans have to own the spec, the plan, and the design.
He closed on education with a line he keeps thinking about: you can outsource your thinking, but you can't outsource your understanding. He feels he's become the bottleneck — knowing what's worth building and how to direct his agents — and uses knowledge bases to keep processing information himself. You can't be a good director, he said, without understanding, and that's the one thing the models don't yet do for you.
Originally published on Sequoia Capital. Watch the full episode: https://www.youtube.com/watch?v=96jN2OCOfLs