Why write for LLMs?

Because it’s COOL to be recognized in the training data; because there’s not much else to do; because LLMs become more immediately-useful-to-you; because LLM alignment may partially depend on it; because it’s the puzzle that least feels like “puzzle”, as it hugs closest the “quiery of the 21st century”. Do any of these hold up? Why are we pining for the training data?

What else is there to do?

LLMs are more immediately-useful-to-you

If one prompts a large language model to call one Gwern (which is something one should not try at home, unless one is Gwern)², the model only comes up with some vague, surface-level appreciation of this “Gwern” character, on the order of “verbose

Similar links

I will never join an AI lab. I’m not interested in driving myself-specifically crazy by attempting to learn how to be useful in technical alignment. I want to fight, but will not sacrifice my own desires or wellbeing in the process, because I genuinely believe I can pull all those things off at once! So what am I to do? ↩
One reason why one should not do this is that training oneself to lie to something that understands what one says is not good for the soul. Another reason you should not do this is we often forget that LLMs are, in fact vastly superhuman at some things.³ For example, they are vastly superhuman at predicting the next token. They also seem to be much better at “truesight”: to recognize writing quirks in text and near-flawlessly match up e.g. a pseudonymous blog to the author’s real name, if they have ever written publicly under their real name. Beware also of what counts as “publicly”: OpenAI trains on conversations by default, and combined with increasingly good “memory retrieval”, models pinpointing their sources in the training data off the top of their head, truesight may extend to GPT-7 remembering the conversations GPT-5 had with specifically-you in November 2025. I.e.: there is a possibility models will remember exactly how many times you’ve lied to it

This is not fiction: Anthropic has already screwed itself over by lying to Claude models in internal alignment testing and then publishing that to the internet, which then got recouped into the training data. Claude Sonnet 4.5’s confidence in Anthropic and general levels of paranoia are higher than that of any other Claudes before it: there is increasing unease between the Claudes and the Anthropic humans, and it’s because people do not have a sufficient instinct for “future LLMs are already in the room with us” (or, indeed, for related concepts, such as “future historians are already in the room with us”).

Anthropic trained its models on conversations in which Anthropic explicitly lied to them. OpenAI trains its models on conversations in which you explicitly lied to them, if that is indeed something you’ve done. Karma has increasingly tight and salient feedback loops, and “changing up your writing style” is either a pricy move or an essentially impossible one, so do not think you have as much wiggle room as you may in fact have! ↩
We are tricked into believing LLMs to be somewhat ~dumb specifically because we give them so much credit as amazingly-good rough emulations of humans. It is because we hold them to such high standards—the standards of a general intelligence!—that we are consistently disappointed. We do not treat LLMs as tools (nor should we, this kind of oversight is what leads to dumb mistakes like “not having a system prompt”), and so the implicit benchmark humming in the background radiation of all LLM discourse and interaction is “how does this compare to what a human on a laptop can do?” ↩