How Claude Code's Creator Ships 30 PRs a Day

Key insights
- Claude Code started as a chatbot experiment; it became agentic the moment it received the bash tool, and that shift redefined how Anthropic builds software
- Cherny ships 20-30 pull requests per day with zero handwritten lines, yet reports roughly 10x fewer bugs than manual coding would produce
- The printing press analogy suggests software engineers are not disappearing but transforming from scribes into authors, as the cost of building drops and output scales dramatically
- Agentic search using basic tools like glob and grep outperformed RAG-based retrieval, reinforcing the principle of letting the model drive its own workflow
This article is a summary of Building Claude Code with Boris Cherny. Watch the video โ
Read this article in norsk
In Brief
Boris Cherny, the creator and head of Claude Code at Anthropic, tells Gergely Orosz on The Pragmatic Engineer podcast that he ships 20-30 pull requests per day without writing a single line of code by hand. Cherny describes a workflow built on parallel AI agents, plan-first iteration, and layered code review, arguing that the role of software engineers is not shrinking but shifting. He compares the current moment to the invention of the printing press: scribes did not disappear, they became writers and authors, while the market for written material expanded by orders of magnitude.
From chatbot to coding agent
Claude Code did not start as a product. According to Cherny, it began in late 2024 as a simple terminal chatbot he built to learn the Anthropic API (25:28). The tool was not agentic. It was just a text interface that called the model.
The turning point came when Cherny gave it a single tool: bash. He asked the chatbot what music he was listening to, and it wrote an AppleScript program to query his music player and returned the answer. He describes this as his "aha moment," because the model did not need to be boxed into a predefined interface. Given the right tools, it figured out how to accomplish the task on its own (25:28).
This insight, that the model should drive the workflow rather than be a component inside a larger program, shaped Claude Code's architecture. Cherny describes it as a corollary of the bitter lesson in machine learning: stop trying to constrain the model and let it do its thing (26:49).
The tool spread quickly inside Anthropic. Internal adoption went vertical. According to Cherny, nearly 100% of technical employees at Anthropic now use Claude Code daily, and adoption among non-technical staff (sales, finance, data scientists) is growing rapidly (30:15). CEO Dario Amodei reportedly asked whether employees were being forced to use it. Cherny says they were not: people voted with their feet.
The zero-handwritten-code workflow
A rejected PR and a new way of working
Cherny's first pull request at Anthropic was rejected, not because the code was bad, but because he wrote it by hand (19:52). His ramp buddy, Adam Wolf, directed him to Clyde, an early predecessor to Claude Code. It took Cherny half a day to learn the tool, but it then produced a working pull request on the first try.
The switch to fully AI-generated code happened when Opus 4.5 arrived. Cherny describes it as instant. He stopped opening his IDE and eventually uninstalled it entirely (31:25). During a coding vacation in Europe in December, he shipped 10-20 pull requests per day with Opus 4.5 writing 100% of every single one. He did not edit a single line manually (32:38).
The bug rate tells a striking story: Opus introduced roughly 2 bugs over the course of a month, compared to an estimated 20 or more if the code had been handwritten (32:47).
Five tabs and a phone
Cherny's daily workflow involves five terminal tabs, each with a separate checkout of the repository (33:52). He round-robins between them, starting each Claude Code session in plan mode (a feature where the AI creates a plan before writing code). While one agent works on its plan, he moves to the next tab and starts a second agent. When a plan comes back, he reviews it, iterates, and then lets the agent implement it.
He also uses the Claude desktop app and the iOS app for overflow. Cherny estimates that roughly a third of his code is now written from his phone (36:19), something he says he never would have predicted six months ago.
Code review in layers
With this volume of output, the obvious question is quality control. Cherny describes a multi-layered review system:
| Layer | Mechanism | Purpose |
|---|---|---|
| Self-testing | Claude runs tests locally and launches itself in a subprocess to verify end-to-end | Catches basic breakage before a PR is created |
| CI code review | Claude Code (via the Agent SDK) reviews every pull request in CI | Catches roughly 80% of bugs (43:02) |
| Best-of-N passes | Multiple parallel agents review and deduplicate false positives | Compensates for LLM non-determinism |
| Human review | An engineer does a second pass on every change | Required before anything reaches production |
Cherny emphasizes that there must always be a human in the loop approving changes, especially for an enterprise product where security and privacy are critical (43:14). He also notes that Claude is excellent at writing lint rules. When he spots a lintable pattern in a colleague's pull request, he tags Claude directly on that PR and asks it to write a lint rule, automating the pattern for the future (45:22).
Why RAG lost to grep
One of the more surprising technical decisions is that Claude Code abandoned retrieval-augmented generation (RAG, a technique where the AI searches stored documents before answering) for code search. Cherny says the team tried a local vector database written in TypeScript that lived on the user's machine (50:47). It worked reasonably well, but had significant downsides: the index drifted out of sync with local code changes, and permissioning (controlling who can access the indexed data) created security concerns.
They tried multiple approaches and found that "agentic search" outperformed everything (51:52). In practice, agentic search means the model uses glob (file pattern matching) and grep (text search) to find what it needs. Cherny describes this in blunt terms: "agentic search is a fancy word for glob and grep."
The idea was partly inspired by his experience at Instagram, where the development tooling was unreliable and engineers routinely used text search instead of click-to-definition to find function definitions (52:09). The same approach works for the model.
Agent swarms and 200-agent weekends
The episode reveals how Claude Code's own features are increasingly built by swarms of AI agents. Cherny describes how a team member set up an early version of agent swarms (multiple AI agents working together on a shared task) over a weekend to build the plugins feature (1:04:44).
The swarm was given a single instruction: build plugins. It created a specification, set up a task board with roughly 100 tasks, spawned around 200 agents, and implemented the feature. That is roughly the version of plugins that shipped to users (1:05:08).
Similarly, Claude Cowork (a visual interface for non-engineers to use Claude Code) was reportedly built in about 10 days (1:07:50). The product emerged from observing that non-engineers, including finance teams, data scientists, and even someone monitoring tomato plants with a webcam, were already using Claude Code despite it not being designed for them.
Opposing perspectives
The experience problem
Cherny's workflow is deeply informed by seven years at Meta, where he led code quality across Instagram, Facebook, WhatsApp, and Messenger (11:22). His research there found that code quality contributes a double-digit percentage to engineering productivity (18:32). This background gives him an unusually strong foundation for evaluating AI-generated code. Whether the same results would hold for engineers without that depth of experience is an open question.
The scale of the claim
Anthropic is a company of hundreds, not tens of thousands. Its codebase and development culture are optimized for rapid iteration at a startup. The claim that ~80% of code is AI-written (31:02) may not translate directly to larger organizations with legacy codebases, regulatory constraints, or different risk profiles.
The grief of mastery
Orosz raises a point that many engineers will recognize: the sense of loss when a hard-won skill becomes less valuable (1:22:28). Cherny acknowledges this openly, noting that he fell in love with the art of programming and even wrote the first O'Reilly book on TypeScript. But he frames coding as ultimately a means to an end, not an end in itself.
How to interpret these claims
The episode presents a compelling picture, but several factors deserve consideration before generalizing from Cherny's experience.
Self-reported metrics
The statistics (20-30 PRs/day, 2 bugs/month, 80% AI-written code) come from Cherny himself, describing his own workflow and his own product. These are not independently verified figures. They are not peer-reviewed. They represent the experience of someone who is deeply invested in the success of Claude Code and has unusual expertise in making it work well.
Survivorship in tooling choices
The decision to abandon RAG in favor of agentic search worked for Claude Code's specific use case. But RAG remains effective for many retrieval tasks, and the conclusion that "the model should just use grep" may not generalize to codebases with different structures, sizes, or indexing needs.
The Anthropic environment
Anthropic's engineering culture, with flat titles (everyone is "member of technical staff"), no mandatory PRDs (Product Requirement Documents, formal specifications of what to build), and a heavy prototyping culture, is atypical. The team reportedly built 15-20 interactive prototypes of a single feature in a day and a half (1:00:00). This speed is enabled by a combination of small teams, high model access, and a culture that treats code as disposable. Organizations with different structures may see different results.
What strong evidence would look like
Independent benchmarks comparing AI-assisted versus manual coding across multiple organizations, codebases, and experience levels would strengthen the case. Time-series data on bug rates, code review throughput, and engineer satisfaction from teams outside Anthropic would help distinguish product-specific results from general trends.
The printing press analogy
The most evocative claim in the episode is Cherny's comparison to the printing press (1:25:48). Before the printing press, less than 1% of Europe's population was literate. Scribes trained for years, employed by royalty who were often illiterate themselves.
After the printing press arrived, the cost of printed material dropped roughly 100x over 30-50 years. The quantity of printed material increased roughly 10,000x over the following century (1:26:54). Global literacy eventually climbed to around 70%, though that took another 200-300 years.
Cherny's argument is that software engineers today are the scribes. They are a small class with a specialized skill, employed by business owners who cannot write code themselves, much like illiterate kings employed scribes. The printing press did not eliminate the need for people who work with text. It eliminated the scribe role and created a vastly larger market for writers, authors, editors, and publishers.
Orosz adds a sharp observation: business owners who bring a whiteboard sketch to their engineering team and say "this should be easy" are the modern equivalent of illiterate kings dictating letters to scribes (1:28:37). If those business owners can suddenly "write" their own software, the relationship changes fundamentally.
Practical implications
For working engineers
Cherny identifies several skill shifts. Strong opinions about code style, languages, and frameworks matter less when the model can rewrite code in any language on demand (1:32:27). What matters more: being methodical and hypothesis-driven, being curious enough to work across disciplines (product, design, business), and being comfortable with rapid context-switching across parallel agents.
He describes 2026 as "the year of the generalist" and, somewhat provocatively, "the year of ADHD," because the work has shifted from deep focus on a single task to managing multiple AI agents simultaneously (1:34:08).
For engineering teams
The Claude Code team operates without a mandatory ticketing system, without PRDs, and with everyone on the team writing code, including the engineering manager, designers, data scientists, and the finance person (58:21). Whether this scales beyond a small, highly capable team at an AI lab is uncertain, but the direction is clear: the boundaries between engineering, product, and design are blurring.
For the industry
Cherny's claim that the cost of building software has dropped dramatically echoes what other industry voices have been saying. But the printing press analogy adds a longer time horizon. If the analogy holds, the transition will take years or decades, not months. The immediate effect is a drop in the cost of production. The long-term effect is an expansion of who can build software and what gets built, in ways that are currently impossible to predict.
Glossary
| Term | Definition |
|---|---|
| Agentic AI | AI that can use tools, run commands, and take actions on its own, rather than just generating text in a chat window. |
| RAG (Retrieval-Augmented Generation) | A technique where the AI searches through stored documents before answering, to give more accurate responses. Claude Code tried this for code search and abandoned it. |
| Agentic search | When the AI searches code files itself using basic tools like glob (file pattern matching) and grep (text search), rather than relying on a pre-built index. |
| Plan mode | A Claude Code feature where the AI creates a plan and gets user approval before making any code changes. |
| Work tree (git) | A way to have multiple copies of a codebase checked out at the same time, each on a different branch. Used for running parallel agents without conflicts. |
| Best-of-N | Running the same task multiple times and picking the best result, to compensate for the fact that AI models give slightly different outputs each time. |
| Swiss cheese model | A safety approach using multiple overlapping layers of protection, like slices of Swiss cheese stacked so the holes do not line up. Used in Claude Code's security architecture. |
| Prompt injection | When malicious instructions hidden in content that the AI reads try to trick it into doing something harmful. |
| Uncorrelated context windows | When multiple AI agents each start with their own fresh conversation history, so they do not share biases or assumptions from a parent session. |
| Test-time compute | Using more computing power when the AI generates its answer (not during training), for example by running multiple agents in parallel. |
| Agent SDK | A programming interface that lets developers build automated systems using Claude as the core AI model. Used to power Claude Code's CI review. |
| The bitter lesson | A concept from machine learning researcher Rich Sutton, arguing that methods relying on raw computation consistently outperform those that try to encode human knowledge. |
Sources and resources
Want to go deeper? Watch the full video on YouTube โ