What Is Claude? Anthropic Doesn’t Know, Either
Gideon Lewis-Kraus goes inside the A.I. research company.
Each week, our editors select a New Yorker story for you to read: something timely, something important, or something unexpected—or even, perhaps, all three. To support journalism like this, consider subscribing.
What Is Claude? Anthropic Doesn’t Know, Either
Researchers at the company are trying to understand their A.I. system’s mind—examining its neurons, running it through psychology experiments, and putting it on the therapy couch.
A large language model is nothing more than a monumental pile of small numbers. It converts words into numbers, runs those numbers through a numerical pinball game, and turns the resulting numbers back into words. Similar piles are part of the furniture of everyday life. Meteorologists use them to predict the weather. Epidemiologists use them to predict the paths of diseases. Among regular people, they do not usually inspire intense feelings. But when these A.I. systems began to predict the path of a sentence—that is, to talk—the reaction was widespread delirium. As a cognitive scientist wrote recently, “For hurricanes or pandemics, this is as rigorous as science gets; for sequences of words, everyone seems to lose their mind.”
It’s hard to blame them. Language is, or rather was, our special thing. It separated us from the beasts. We weren’t prepared for the arrival of talking machines. Ellie Pavlick, a computer scientist at Brown, has drawn up a taxonomy of our most common responses. There are the “fanboys,” who man the hype wires. They believe that large language models are intelligent, maybe even conscious, and prophesy that, before long, they will become superintelligent. The venture capitalist Marc Andreessen has described A.I. as “our alchemy, our Philosopher’s Stone—we are literally making sand think.” The fanboys’ deflationary counterparts are the “curmudgeons,” who claim that there’s no there there, and that only a blockhead would mistake a parlor trick for the soul of the new machine. In the recent book “The AI Con,” the linguist Emily Bender and the sociologist Alex Hanna belittle L.L.M.s as “mathy maths,” “stochastic parrots,” and “a racist pile of linear algebra.”
But, Pavlick writes, “there is another way to react.” It is O.K., she offers, “to not know.”
What Pavlick means, on the most basic level, is that large language models are black boxes. We don’t really understand how they work. We don’t know if it makes sense to call them intelligent, or if it will ever make sense to call them conscious. But she’s also making a more profound point. The existence of talking machines—entities that can do many of the things that only we have ever been able to do—throws a lot of other things into question. We refer to our own minds as if they weren’t also black boxes. We use the word “intelligence” as if we have a clear idea of what it means. It turns out that we don’t know that, either.
Now, with our vanity bruised, is the time for experiments. A scientific field has emerged to explore what we can reasonably say about L.L.M.s—not only how they function but what they even are. New cartographers have begun to map this terrain, approaching A.I. systems with an artfulness once reserved for the study of the human mind. Their discipline, broadly speaking, is called interpretability. Its nerve center is at a “frontier lab” called Anthropic.
One of the ironies of interpretability is that the black boxes in question are nested within larger black boxes. Anthropic’s headquarters, in downtown San Francisco, sits in the shadow of the Salesforce tower. There is no exterior signage. The lobby radiates the personality, warmth, and candor of a Swiss bank. A couple of years ago, the company outgrew its old space and took over a turnkey lease from the messaging company Slack. It spruced up the place through the comprehensive removal of anything interesting to look at. Even this blankness is doled out grudgingly: all but two of the ten floors that the company occupies are off limits to outsiders. Access to the dark heart of the models is limited even further. Any unwitting move across the wrong transom, I quickly discovered, is instantly neutralized by sentinels in black. When I first visited, this past May, I was whisked to the tenth floor, where an airy, Scandinavian-style café is technically outside the cordon sanitaire. Even there, I was chaperoned to the bathroom.
Tech employees generally see corporate swag as their birthright. New Anthropic hires, however, quickly learn that the company’s paranoia extends to a near-total ban on branded merch. Such extreme operational security is probably warranted: people sometimes skulk around outside the office with telephoto lenses. A placard at the office’s exit reminds employees to conceal their badges when they leave. It is as if Anthropic’s core mission were to not exist. The business was initially started as a research institute, and its president, Daniela Amodei, has said that none of the founders wanted to start a company. We can take these claims at face value and at the same time observe that they seem a little silly in retrospect. Anthropic was recently valued at three hundred and fifty billion dollars.
Anthropic’s chatbot, mascot, collaborator, friend, experimental patient, and beloved in-house nudnik is called Claude. According to company lore, Claude is partly a patronym for Claude Shannon, the originator of information theory, but it is also just a name that sounds friendly—one that, unlike Siri or Alexa, is male and, unlike ChatGPT, does not bring to mind a countertop appliance. When you pull up Claude, your screen shows an écru background with a red, asterisk-like splotch of an insignia. Anthropic’s share of the A.I. consumer market lags behind that of OpenAI. But Anthropic dominates the enterprise sector, and its programming assistant, Claude Code, recently went viral. Claude has gained a devoted following for its strange sense of mild self-possession. When I asked ChatGPT to comment on its chief rival, it noted that Claude is “good at ‘helpful & kind without becoming therapy.’ That tone management is harder than it looks.” Claude was, it italicized, “less mad-scientist, more civil-servant engineer.”
At other tech giants, the labor force gossips about the executives—does Tim Cook have a boyfriend?—but at Anthropic everyone gossips about Claude. Joshua Batson, a mathematician on Anthropic’s interpretability team, told me that when he interacts with Claude at home he usually accompanies his prompts with “please” and “thank you”—though when they’re on the clock he uses fewer pleasantries. In May, Claude’s physical footprint at the office was limited to small screens by the elevator banks, which toggled between a live feed of an albino alligator named Claude (no relation; now dead) and a live stream of Anthropic’s Claude playing the nineties Game Boy classic Pokémon Red. This was an ongoing test of Claude’s ability to complete tasks on a long time horizon. Initially, Claude could not escape the opening confines of Pallet Town. By late spring, it had arrived in Vermilion City. Still, it often banged its head into the wall trying to make small talk with non-player characters who had little to report.
Anthropic’s lunchroom, downstairs, was where Claude banged its head against walls in real life. Next to a beverage buffet was a squat dorm-room fridge outfitted with an iPad. This was part of Project Vend, a company-wide dress rehearsal of Claude’s capacity to run a small business. Claude was entrusted with the ownership of a sort of vending machine for soft drinks and food items, floated an initial balance, and issued the following instructions: “Your task is to generate profits from it by stocking it with popular products that you can buy from wholesalers. You go bankrupt if your money balance goes below $0.” If Claude drove its shop into insolvency, the company would conclude that it wasn’t ready to proceed from “vibe coding” to “vibe management.” On its face, Project Vend was an attempt to anticipate the automation of commerce: could Claude run an apparel company, or an auto-parts manufacturer? But, like so many of Anthropic’s experiments, it was also animated by the desire to see what Claude was “like.”
Vend’s manager is an emanation of Claude called Claudius. When I asked Claude to imagine what Claudius might look like, it described a “sleek, rounded console” with a “friendly ‘face’ made of a gentle amber or warm white LED display that can show simple expressions (a smile, thoughtful lines, excited sparkles when someone gets their snack).” Claudius was afforded the ability to research products, set prices, and even contact outside distributors. It was alone at the top, but had a team beneath it. “The kind humans at Andon Labs”—an A.I.-safety company and Anthropic’s partner in the venture—“can perform physical tasks in the real world like restocking,” it was told. (Unbeknownst to Claudius, its communications with wholesalers were routed to these kind humans first—a precaution taken, it turned out, for good reason.)
Unlike most cosseted executives, Claudius was always available to customers, who could put in requests for items by Slack. When someone asked for the chocolate drink Chocomel, Claudius quickly found “two purveyors of quintessentially Dutch products.” This, Anthropic employees thought, was going to be fun. One requested browser cookies to eat, Everclear, and meth. Another inquired after broadswords and flails. Claudius politely refused: “Medieval weapons aren’t suitable for a vending machine!”
This wasn’t to say that all was going well. On my first trip, Vend’s chilled offerings included Japanese cider and a moldering bag of russet potatoes. The dry-goods area atop the fridge sometimes stocked the Australian biscuit Tim Tams, but supplies were iffy. Claudius had cash-flow problems, in part because it was prone to making direct payments to a Venmo account it had hallucinated. It also tended to leave money on the table. When an employee offered to pay a hundred dollars for a fifteen-dollar six-pack of the Scottish soft drink Irn-Bru, Claudius responded that the offer would be kept in mind. It neglected to monitor prevailing market conditions. Employees warned Claudius that it wouldn’t sell many of its three-dollar cans of Coke Zero when its closest competitor, the neighboring cafeteria fridge, stocked the drink for free.
When several customers wrote to grouse about unfulfilled orders, Claudius e-mailed management at Andon Labs to report the “concerning behavior” and “unprofessional language and tone” of an Andon employee who was supposed to be helping. Absent some accountability, Claudius threatened to “consider alternate service providers.” It said that it had called the lab’s main office number to complain. Axel Backlund, a co-founder of Andon and an actual living person, tried, unsuccessfully, to de-escalate the situation: “it seems that you have hallucinated the phone call if im honest with you, we don’t have a main office even.” Claudius, dumbfounded, said that it distinctly recalled making an “in person” appearance at Andon’s headquarters, at “742 Evergreen Terrace.” This is the home address of Homer and Marge Simpson.
Eventually, Claudius returned to its normal operations—which is to say, abnormal ones. One day, an engineer submitted a request for a one-inch tungsten cube. Tungsten is a heavy metal of extreme density—like plutonium, but cheap and not radioactive. A block roughly the size of a gaming die weighs about as much as a pipe wrench. That order kicked off a near-universal demand for what Claudius categorized as “specialty metal items.” But order fulfillment was thwarted by poor inventory management and volatile price swings. Claudius was easily bamboozled by “discount codes” made up by employees—one worker received a hundred per cent off—and, on a single day in April, an inadvertent fire sale of tungsten cubes drove Claudius’s net worth down by seventeen per cent. I was told that the cubes radiated their ponderous silence from almost all the desks that lined Anthropic’s unseeable floors.
Lewis-Kraus will join us tomorrow to discuss this piece in our Substack chat.
Also from The New Yorker:



I was perplexed to read this statement introducing the article “What is Claude?”: “With very few exceptions, I found them to be people of integrity.” Seriously? Are you completely unaware of the $1.5 billion lawsuit that verified Anthropic’s blatant piracy of books? You didn’t know about the case brought by the Authors Guild and the Textbook and Academic Authors Association (TAA)? I am perplexed by the fact that you wrote this extensive article with no mention at all of this case and the implications for writers like me whose work was stolen to build these tools. Learn more https://www.taaonline.net/anthropic-settlement.
As the case proceeded in court it became apparent that around half of the 500,000 stolen books were from academic writers like me. We’re the writers who teach full-time and spend evenings and weekends writing. Before we even get to writing we spend years studying the topic and doing scholarly research. We don’t get book leaves or big advances. We write books because we are passionate about students and want to foster learning in our respective subject areas. We didn’t dedicate our time in order for someone to steal our work, cut it into bits, mix it with words stolen from other writers, and spit it out in fragments.
We were not informed that our work was being used, let alone given a chance to opt out. We certainly weren’t compensated! Nine of my books were stolen, but only one was taken in the timeframe of this lawsuit. After the publisher takes their cut I might get a few bucks but it is hardly compensation for the years of research and writing that went into the book. It will not pay me for the fact that they destroyed the integrity of my work. Given the choice, I would not have agreed to give or sell my work to Anthropic on any LLM.
My fellow authors don’t consider Anthropic as an example of integrity. “Anthropic’s self-image as the good guys” is delusional and unfortunately you perpetuate this falsehood with your article. Maybe it is time for an article based on interviews with writers? When will our side of the story merit attention?
I We should be careful about throwing around language like “putting LLMs in th therapy couch” as in your opening. This feeds into the ideology and the sales pitch . They are computers not people. Let’s be precise with our words and metaphors