The Illusion of Control
There is a ton of recent research papers on AI risk, and I make it my job to stay up to date on them. What unsettled me is not their extremity but their restraint. Nobody is shouting. Nobody is selling the usual science-fiction melodrama. The prose is measured, procedural, calm. Yet the conclusions, once stripped of their academic manners, is severe: the old fantasy that a handful of firms can contain this technology at the source is beginning to fail.
A small team with access to a model’s weights can strip away safety fine-tuning for next to nothing. Even without the weights, techniques like “many-shot jailbreaking” can bypass safeguards entirely. Furthermore, the “Use-Misuse Tradeoff” means that if you try to make a model “safe” by forcing it to unlearn how a virus works, you are not just stopping a bioterrorist; you are stopping the next generation of doctors from learning how to fight a pandemic. Similarly, if you train an AI against bioweapons or explosives, you feed it with the information.
First, we must demand Retained Human Override, ensuring that every automated delegation remains overridable in principle to preserve the dignity of human choice. Second, we must build Infrastructure for Failure, moving beyond moral cosmetics to create systems that can absorb shocks, from redundant power grids to the logistical grit required to rerun a compromised election. Finally, we must practice the Discipline of the Loop, treating resilience as a muscle rather than a mission statement by relentlessly identifying risks, assessing responses, and measuring what actually works.
We are currently at a crossroads of design. We can continue to redesign our world to suit the mindless agency of the machine, or we can start the “scorched earth” work of hardening our institutions against the chaos we have already invited in. The first draft of the next national crisis might be written by an AI, but the response has to be written by us. It is time we stop being the audience and start being the architects.
