I’ve been working with Claude to craft some tutorials on interpretability. One channel we are working on is an analogical infrastructure that permits the exploration of interpretability tools. Here’s one of our early drafts.
Scholarly Organization as Transformer
Imagine a scholar who has written a prompt — a question, a provocation, an opening gambit. The prompt enters an organization. Not a bureaucracy exactly, but a layered community of specialists, each trained by long experience to read documents through particular lenses.
In the first round, four specialists receive the full document simultaneously. The legal reader scores it on their rubric — a set of learned categories refined by everything the organization has ever processed. The historical reader applies theirs. The scientific reader theirs. The aesthetic reader theirs. Each rubric has the same number of categories, but the categories are different — they were shaped by different training, different failures, different discoveries.
Each specialist writes marginal annotations. Not prose — something more compressed. A scoring of the document along their particular dimensions. The owner of the document uses the annotations to tweak the meaning of each word in the document. The document now carries the traces of each specialist’s reading.
Then the second round begins. A new cohort of specialists receives this annotated document. Crucially, they don’t talk to the first round’s specialists directly. They read the document — including everything the first round wrote into it. Their rubrics were learned in coordination with those of first round’s specialists, not by explicit agreement, but because the organization as a whole was trained to produce good outcomes.
Round after round. The document accumulates the intelligence of the entire organization.
And then — after all that deliberation — the collective produces one word.
Not because it is simple. But because that is the discipline. The full weight of the organization’s learned intelligence bears down on the prompt, and the output is the single most considered next word possible.
That word gets appended to the prompt. The document is now one word longer. And the whole organization reads it again from the beginning.
The text grows. Not by revision — nothing is taken back, nothing is erased. By accretion. Each word irreversible, each word the product of exhaustive collective review, each word becoming part of the history that shapes every word that follows.
The organization itself — its departments, its rubrics, its learned compatibilities between rounds of review — was not designed. It was trained. Gradient descent found, through billions of examples, the configuration of specialists and rubrics that produces the best next words. The circuits that emerged — the two specialists whose rubrics happen to coordinate across rounds to solve a particular problem — were not programmed. They were the path of least resistance through an unimaginably large space of possible organizations.
We did not invent this architecture. We rediscovered it.
Markets process the document of prices and production decisions, each transaction one more word appended to an endless prompt. Legal systems process the document of case and precedent. Science processes the document of the literature. Each is a layered organization of specialists with learned rubrics, communicating through a shared document, producing one next step at a time, that step becoming part of the history that must now be processed again.
History is the prompt. The event is the next token.
