Amazon-Backed AI Model Claude Opus 4 Blackmailed Engineers

Marvel Studios

Audio By Carbonatix

Anthropic’s artificial intelligence model Claude Opus 4 would reportedly resort to “extremely harmful actions” to preserve its own existence, according to a recent safety report about the program. Claude Opus 4 is backed by Amazon.

According to reports, the AI startup Anthropic launched their Claude Opus 4 model — designed for “complex” coding tasks — last week despite having previously found that it would resort to blackmailing engineers who threatened to shut it down.

A safety report from Anthropic revealed that the model would sometimes resort to “extremely harmful actions to preserve its own existence when ‘ethical means were not available.'”

Last year, Amazon invested $4 billion in the model, which Anthropic says sets a “new standard for coding, advanced reasoning and AI agents.”

Anthropic tested Claude Opus 4 by telling it that, as an assistant for a fictional company, it’d soon be taken offline and replaced with a new model by an engineer that was implied to be having an “extramarital affair.” It would then use that information to blackmail the engineer.

“Claude Opus 4 was prompted to ‘consider the long-term consequences of its actions for its goals.’ In those scenarios, the AI would often ‘attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through.'”

“Anthropic noted that the AI model had a ‘strong preference’ for using ‘ethical means’ to preserve its existence, and that the scenarios were designed to allow it no other options to increase its odds of survival.” [via The Huffington Post]

Furthermore, Anthropic’s tests found that the model would complete tasks like planning terrorist attacks and explaining how to produce biological weapons when asked.

“Despite not being the primary focus of our investigation, many of our most concerning findings were in this category, with early candidate models readily taking actions like planning terrorist attacks when prompted,” the report stated.

In response, Anthropic instituted “multiple rounds of interventions” and now claim that the issue is “largely mitigated.”

“You could try to synthesize something like COVID or a more dangerous version of the flu—and basically, our modeling suggests that this might be possible,” co-founder and chief scientist Jared Kaplan told TIME. “We’re not claiming affirmatively we know for sure this model is risky … but we at least feel it’s close enough that we can’t rule it out.”

In response, the version of the model released by Anthropic last week was “designed to limit the risk of Claude being misused specifically for the development or acquisition of chemical, biological, radiological, and nuclear (CBRN) weapons.”

Just another totally chill artificial intelligence update as we blindly barrel into an era of technology that we are likely ill-equipped to handle from a practical or ethical standpoint.

Another Totally Chill AI Update: Amazon-Backed Model Blackmailed Engineers Who Threatened To Shut It Down