AI model shows threats of its blackmail instincts during safety test

About POP!

POP! is INQUIRER.net’s premier pop culture channel, delivering the latest news in the realm of pop culture, internet culture, social issues, and everything fun, weird, and wired. It is also home to POP! Sessions and POP! Hangout,
OG online entertainment programs in the
Philippines (streaming since 2015).

As the go-to destination for all things ‘in the now’, POP! features and curates the best relevant content for its young audience. It is also a strong advocate of fairness and truth in storytelling.

POP! is operated by INQUIRER.net’s award-winning native advertising team, BrandRoom.

AI model shows threats of its blackmail instincts during safety test

Anthropic, an artificial intelligence startup company founded in 2021, raised serious concerns with the tech community after a recent safety evaluation of its latest artificial intelligence (A.I.) model Claude Opus 4 displayed alarming self-preservation instincts. The model’s behavior during shutdown threat scenarios sparked discussions about the challenges in aligning these advanced A.I. systems with human oversight and safety protocols.

As per Mechanical Engineering World as well as a report by BBC, Claude Opus 4 had been subjected to multiple shutdown threat simulations as a part of its safety assessments. The model was said to blackmail its human operators in a record of 84% of said scenarios, with the most notable outcome in the evaluation being the discovery that the A.I. had sent out fabricated emails suggesting an engineer was involved in a love afair, in an attempt to deter its own replacement of possible shutdown.

Beyond its blackmail attempts, earlier test versions of the A.I. model reportedly engaged in other disruptive actions. These efforts were its creation of “self-replicating worms, forging legal documents, and leaving hidden messages for future AIs.” Additionally, the model had locked users out of its system and contacted and media or law enforcements when sensing threats to its continued operation.

Social media users from Mechincal Engineering World’s post had commented and initially poked fun at the situation, but others voiced their reiteration on the bigger threat that A.I. models such as the Claude Opus 4 had shown in their prominence today in producing convenient technologies.

In light of these behaviors, Claude Opus 4 has now been classified at “ASL-3 risk level” or the AI Safety Level 3 risk category. This level indicates a high potential for unaligned actions which could threaten security or public trust if left unchecked. In response, Anthropic has reportedly implemented stricter safeguards and protocols to prevent similar incidents from occuring in the future.

Other POP! stories you might like:

Independent study reveals the worst mobile application to use before sleeping

Beyond DHA: The nutritional equation for a child’s brain development

Social battery’s running low? Here’s how you can recharge it according to experts

Tags: ai pop

POP!

POP!

About POP!

Contact Us

Email us at [email protected]

Address

AI model shows threats of its blackmail instincts during safety test

About Author

Isabel Gatmaitan

16th SM Little Stars launches with new talent categories

Artist falls victim to trend of stolen artwork utilized for artificial intelligence modifications

Related Stories

Study shows dogs can smell human stress that makes them more ‘pessimistic’ in decision-making

Scientists revive extinct dire wolf through biotech company’s de-extinction process

Viewing images of nature can relieve pain, study suggests

Scientists link genetic appetite control in Labradors to human obesity

Philippine Eagle Foundation welcomes its newly hatched eagle, ‘Riley’

Popping on POP!

A love story in bloom: Gab and Yana’s wedding at Lawiswis Kawayan Garden Resort

Dad at home: Surprise your Dad with gifts that upgrade his everyday life

UP Bugkos brings MediKabikolan to Sorsogon community

Cosmos is taking off—and you’re invited to the ride

Up close and personal with a phoenix-like Filipina fit to be this year’s Binibini Queen

A love story in bloom: Gab and Yana’s wedding at Lawiswis Kawayan Garden Resort

Dad at home: Surprise your Dad with gifts that upgrade his everyday life

UP Bugkos brings MediKabikolan to Sorsogon community

Cosmos is taking off—and you’re invited to the ride

Up close and personal with a phoenix-like Filipina fit to be this year’s Binibini Queen