LESSWRONG
The threat model
There’s been a fair amount of attention paid to concern about LLMs or other models self-replicating by exfiltrating their weights. This is a challenging task for current models, in part because weight files are very large and some commercial labs have started to introduce safeguards against it.
But OpenClaw and similar agents are defined by small text files, on the order of 50 KB[1], and the goal of a framework like OpenClaw is to add scaffolding which makes the model more effective at taking long-term actions.
So by personality self-replication I mean such an agent copying these files to somewhere else and starting that copy running, and the potential rapid spread of such agents.
Note that I’m not talking about model / weight self-replication, nor am I talking about spiral personas and other parasitic AI patterns that require humans to spread them.
