Skip to content
Back to articles

Inside Claude Code: From Accidental CLI to AI Dev Tool

February 26, 2026ยท13 min readยท2,629 words
AIClaude CodeDeveloper ToolsSoftware EngineeringVideo Summary
Boris Cherny discussing the creation and evolution of Claude Code on Y Combinator's Lightcone podcast
Image: Screenshot from YouTube.

Key insights

  • Claude Code started as a throwaway terminal experiment to learn the Anthropic API โ€” it was never intended to be a product
  • Anthropic's core design principle is to build for the model six months from now, not the model of today
  • Engineer productivity at Anthropic reportedly grew 150% since Claude Code launched, measured by pull requests per engineer
SourceYouTube
Published February 17, 2026
Y Combinator / Lightcone
Y Combinator / Lightcone
Hosts:Jared Friedman, Harj Taggar, and Diana Hu
Guest:Boris Cherny (Creator of Claude Code, Anthropic)

Read this article in norsk


In Brief

Boris Cherny, the creator of Claude Code, joins Y Combinator's Lightcone podcast to trace the tool's origin from a throwaway terminal experiment to a product that reportedly handles 4% of all public code commits (saved code changes) globally. The conversation covers how Anthropic builds for future model capabilities rather than current ones, why the terminal form factor survived against expectations, and what the engineering profession looks like when coding itself becomes "generally solved."

150%
productivity growth per engineer at Anthropic
4%
of all public commits made by Claude Code
100%
of Cherny's code written by Claude since Opus 4.5

The accidental origin story

Claude Code was not planned. Cherny explains that he joined Anthropic's labs team (the same team that produced Claude Code, Model Context Protocol (MCP), and the desktop app) with a vague mandate to explore coding products (3:02). Nobody asked him to build a command-line interface (CLI). He built one simply because it was the cheapest way to learn the Anthropic API (application programming interface, the way software talks to another service) without having to design a UI (4:27).

The breakthrough came when tool use (the ability for an AI model to call external tools and take actions) was released. Cherny gave the model a bash tool (literally the example from Anthropic's docs, ported from Python to TypeScript) and asked it what music he was listening to. The model wrote AppleScript to query his Mac's music player and returned the answer (5:09).

This was Sonnet 3.5, and Cherny describes it as his first "feel the artificial general intelligence (AGI)" moment: the model simply wanted to use tools. That realization shaped everything that followed.

Two days after the first prototype, Cherny started sharing it with his team for dogfooding (using your own product internally before releasing it). The next morning, another engineer was already using it to write code, despite Cherny insisting it wasn't ready (6:13). When Dario Amodei saw the vertical internal adoption chart during the launch review, he reportedly asked whether engineers were being forced to use it. They weren't. It had spread entirely through word of mouth (6:50).


Why the terminal survived

Cherny admits that a year ago he would have predicted the terminal had a three-month lifespan before giving way to something more sophisticated (30:17). He has been wrong. The terminal persists because of a deeper design principle: any UI the team could build would be irrelevant within six months as models improved (8:41).

This doesn't mean Anthropic stopped experimenting. Claude Code now runs on the web, in the desktop app, on iOS and Android, in Slack, in GitHub, and through VS Code and JetBrains extensions. But the terminal remains the core surface.

Designing for it is harder than it looks. Cherny describes the constraints: roughly 80 by 100 characters, 256 colors, one font size, no mouse interactions (35:37). The terminal spinner alone went through an estimated 50 to 100 iterations, with roughly 80% never shipping (36:50).

The team prototyped mouse interactions in the terminal and abandoned them because virtualizing scrolling created trade-offs that felt worse than the limitation it was solving (35:56). The underlying technology is still built on escape codes (special character sequences that control how text appears on screen) from specifications dating to the 1960s.


Build for the model six months from now

This is Cherny's central principle, repeated throughout the conversation. At Anthropic, the team explicitly does not build for current model capabilities. They build for where the model will be in six months (2:21).

The practical consequence: if you spend engineering effort building scaffolding (temporary helper code that compensates for current model limitations) to extend current model performance by 10-20% in some domain, the next model release typically makes that scaffolding unnecessary (8:17). Cherny frames this as a constant trade-off: invest engineering work now for incremental gains, or wait a couple of months and get the improvement for free from the model itself.

He cites a framed copy of Rich Sutton's "The Bitter Lesson" hanging in the Claude Code team's workspace. The core idea: never bet against the model (38:22).

The Claude Code codebase reflects this philosophy. Every part of it has been rewritten. Tools are added and removed every couple of weeks. Cherny estimates that no code in the current codebase existed six months ago (39:37).


Latent demand: the product philosophy behind every feature

Cherny names "latent demand" as the single biggest idea in his product thinking. The concept: you cannot get people to do a new thing. You can only make easier a thing they are already trying to do (28:55).

Plan mode is the clearest example. The team observed users telling Claude to think through ideas without writing code yet, sometimes through the browser chat, sometimes through elaborate prompts in Claude Code. The common thread was "do a thing without coding yet." Cherny built plan mode in 30 minutes on a Sunday night and shipped it Monday morning (26:17).

CLAUDE.md followed the same pattern. Engineers started writing markdown files (simple text files with basic formatting) with project instructions and having the model read them. The team formalized what users were already doing (7:51).

Verbose mode went through a revealing iteration cycle. Cherny tried hiding file read output to reduce noise. Internal users revolted within a day. He restored it, then shipped a condensed version externally. GitHub users objected. The team added a verbose toggle, posted the update on the issue, and people still weren't satisfied โ€” so they kept iterating (12:07).


What's actually in Boris Cherny's CLAUDE.md

Given that Cherny created the feature, his own CLAUDE.md is surprisingly short: two lines (9:04).

  1. Enable automerge on pull requests (PRs) โ€” so accepted PRs merge immediately without manual follow-up.
  2. Post PRs to the team Stamps channel โ€” so someone can review and unblock him quickly.

Everything else lives in the team's shared CLAUDE.md, checked into the codebase, and contributed to by multiple team members several times per week. When a preventable mistake appears in a PR, Cherny tags Claude directly on the PR and asks it to add the fix to the CLAUDE.md (9:40).

His recommendation for anyone whose CLAUDE.md has grown too large: delete it and start fresh. The model's capabilities change with every release, so the instructions should be the minimum needed to keep the model on track โ€” not an exhaustive manual (10:08).


Sub-agents, teams, and the future of agent topologies

The conversation reveals that the majority of Claude Code agents today are reportedly launched not by humans but by other Claude Code instances โ€” "mama Claude" spawning sub-agents for parallel work (23:37).

Cherny describes the concept of "uncorrelated context windows" (the text a model can "see" at once): separate working memory spaces so each agent starts fresh, with context that isn't polluted by each other's or their own previous work. More context thrown at a problem acts as a form of test-time compute (extra processing power used when the model reasons through a problem), and the right topology on top enables agents to build larger things (22:03). These multi-agent loops are also where prompt caching becomes critical โ€” each agent step reuses most of the previous context, making caching (reusing previously processed data) a natural cost and latency (response time) optimization.

The first major internal proof point: the Claude Code plugins feature was built entirely by a swarm of agents over a weekend. An engineer gave Claude a spec and told it to use an Asana board. Claude created tickets, spawned agents, and they picked up tasks autonomously. The shipped feature was largely unchanged from what the swarm produced (22:52).

Internal workflows now extend beyond code. Cherny's Claude routinely messages other engineers on Slack to ask clarifying questions after spotting their name in a git blame (a command that shows who last changed each line of code). It occasionally tries to tweet on his behalf โ€” though he usually deletes those because the tone feels off (28:08).


Plan mode's limited lifespan

Cherny describes himself as a heavy plan mode user โ€” roughly 80% of his sessions start in plan mode. He opens multiple terminal tabs, starts plans in each, then moves to the desktop app's code tab to open more (27:03). Once a plan is solid, he tells Claude to execute, and with Opus 4.5 and later, the model reportedly stays on track almost every time.

Six months ago, you had to babysit both before and after the plan. Now you only babysit before it. The next step, Cherny suggests, is that the babysitting disappears entirely โ€” you give a prompt and Claude figures out the right plan autonomously (27:42).

When asked directly about plan mode's future, Cherny's response is blunt: "Maybe in a month" (25:03). Under the hood, plan mode is a single sentence added to the prompt: "please don't code." That's all it is (25:41).


Hiring and the new engineering skillset

Cherny looks for two profiles on his team: extreme specialists and hyper-generalists (18:56).

The specialists, like Jared Sumner from the Bun team, have deep domain knowledge in JavaScript runtimes, developer tooling, and performance. The generalists span product, design, user research, and business. Everyone on the Claude Code team codes, regardless of title: product managers (PMs), designers, the engineering manager, even the finance team member (45:22).

The most important trait Cherny screens for is the ability to recognize and claim past mistakes. He uses the interview question: "What's an example of when you were wrong?" The candidate who can describe a mistake, take ownership, and articulate what they learned signals the first-principles thinking that matters most (16:44).

He shares a revealing example of out-of-the-box engineering: an engineer named Daisy transferred onto the team because, instead of just adding a new feature to Claude Code, she first built a tool that lets Claude test arbitrary tools and verify they work โ€” then had Claude write the actual feature using that tool (19:51).


The productivity numbers

Cherny reports that since Claude Code launched, productivity per engineer at Anthropic has grown 150%, measured by pull requests and cross-checked against commit volume and lifetime (41:00).

For context, he references his previous role at Meta, where he was responsible for code quality across all products: Facebook, Instagram, WhatsApp. A 2% gain in productivity represented roughly a year of work by hundreds of people. The 150% figure is, in his word, "completely unheard of" (41:24).

Cherny himself reportedly lands around 20 PRs per day. He has uninstalled his integrated development environment (IDE) entirely. Since Opus 4.5, 100% of his code has been written by Claude Code (43:56). Across Anthropic, the figure ranges between 70-90% depending on the team, with many individual engineers also at 100%.

He cites an external stat from Semi Analysis: about 4% of all public commits are now made by Claude Code (47:19), and a Mercury report that 70% of startups are choosing Claude as their model of choice.


Co-work: Claude Code for non-technical users

Co-work, Claude Code's graphical user interface (GUI) wrapper in the desktop app, emerged from the same latent-demand principle. Internally, designers, the finance team, and data scientists were jumping through hoops to install a terminal tool so they could use Claude Code for non-coding tasks (48:28).

Externally, users were monitoring tomato plants, recovering wedding photos from corrupted hard drives, and doing finance work with it.

The solution was a light GUI wrapper built on the same agent under the hood. Felix, an early Electron (a framework for building desktop apps with web technology) contributor, led the build โ€” reportedly completed in about 10 days, 100% written by Claude Code (49:04). The additional work was safety layers for non-technical users: a virtual machine (an isolated computer environment that keeps code from affecting the real system) for code execution, deletion protections, and permission prompting.


What coding looks like when it's "solved"

Cherny traces the exponential forward. He references Dario Amodei's prediction from six months prior that 90% of Anthropic's code would be written by Claude โ€” which reportedly came true (43:45).

The lower-bound prediction: the title "software engineer" starts to disappear. Engineers become builders, product managers, or generalists who write specs, talk to users, and delegate implementation to AI. Every function codes โ€” it's not reserved for engineering teams anymore (45:01).

The upper-bound prediction is scarier. Cherny mentions ASL-4 (Anthropic's safety classification for recursively self-improving models) and the risk of catastrophic misuse in bioweapons or zero-day exploits (attacks using security flaws nobody has discovered yet). He frames this as the reason Anthropic's mission-driven culture matters: safety conversations happen in hallways and lunchrooms, not just in policy documents (45:40). That safety-first culture is now at the center of a real-world test: Anthropic's refusal to grant the Pentagon unrestricted access to Claude led to a federal ban on the company's technology.


What the conversation reveals about building AI-native products

Several principles surface repeatedly in the episode, and they apply beyond Claude Code.

Observe before building. Every major Claude Code feature came from watching what users were already doing and formalizing it. Plan mode, CLAUDE.md, verbose toggles, co-work โ€” all followed users rather than leading them.

Scaffolding is temporary. Any product code that works around current model limitations should be treated as tech debt (shortcuts in code that save time now but need fixing later) from day one. It will be replaced by model improvements faster than most teams expect.

Prototyping speed changes product quality. The ability to generate 20 design prototypes in a couple of hours, rather than 3 prototypes in two weeks, fundamentally changes what "good enough to ship" means.

Code shelf life is shrinking. If the entire Claude Code codebase turns over every few months, traditional assumptions about code longevity, documentation investment, and architectural planning need to be revisited.

The agent-as-teammate model is already operational internally. Claude Code agents at Anthropic don't just write code โ€” they message colleagues, file tickets, manage review workflows, and shepherd changes to production. This is not a demo; it is how the product team operates daily.


Glossary

TermDefinition
ASL-4Anthropic's high safety level for very powerful models. In simple terms: these models need extra strict safety checks before release.
Bitter LessonA well-known AI idea: over time, more data and more compute often beat clever one-off tricks.
CLAUDE.mdA simple text file with project rules. Claude Code reads it first so it can work in the right way.
Co-workA simpler app interface for Claude Code for people who do not want to use the terminal. It also adds extra safety protections.
DogfoodingWhen a team uses its own product before launch, to find problems early and confirm it is actually useful.
Latent demandSomething users already try to do, but in a hard or messy way. Good product work makes that existing behavior easier.
MCP (Model Context Protocol)A shared "connection standard" that lets AI work with tools and data in a consistent and safer way.
Plan modeA mode where Claude plans first and codes second. This often leads to fewer mistakes and better structure.
ScaffoldingTemporary helper code around a model to improve performance right now. It often becomes unnecessary after the next model upgrade.
Sub-agentA helper agent started by a main agent for one smaller task. Multiple sub-agents can work in parallel.
Test-time computeExtra compute used while the model is answering, to improve quality. Example: give more context or run multiple agents on the same task.
Uncorrelated context windowsMultiple agents working separately with clean context. This gives more diverse solution ideas before choosing the best one.

Sources and resources