About POP!

POP! is INQUIRER.net’s premier pop culture channel, delivering the latest news in the realm of pop culture, internet culture, social issues, and everything fun, weird, and wired. It is also home to POP! Sessions and POP! Hangout,
OG online entertainment programs in the
Philippines (streaming since 2015).

As the go-to destination for all things ‘in the now’, POP! features and curates the best relevant content for its young audience. It is also a strong advocate of fairness and truth in storytelling.

POP! is operated by INQUIRER.net’s award-winning native advertising team, BrandRoom.

Contact Us

Email us at [email protected]

Address

MRP Building, Mola Corner Pasong Tirad Streets, Brgy La Paz, Makati City

Girl in a jacket

AI model shows threats of its blackmail instincts during safety test

Anthropic, an artificial intelligence startup company founded in 2021, raised serious concerns with the tech community after a recent safety evaluation of its latest artificial intelligence (A.I.) model Claude Opus 4 displayed alarming self-preservation instincts. The model’s behavior during shutdown threat scenarios sparked discussions about the challenges in aligning these advanced A.I. systems with human oversight and safety protocols. 

As per Mechanical Engineering World as well as a report by BBC, Claude Opus 4 had been subjected to multiple shutdown threat simulations as a part of its safety assessments. The model was said to blackmail its human operators in a record of 84% of said scenarios, with the most notable outcome in the evaluation being the discovery that the A.I. had sent out fabricated emails suggesting an engineer was involved in a love afair, in an attempt to deter its own replacement of possible shutdown.

Beyond its blackmail attempts, earlier test versions of the A.I. model reportedly engaged in other disruptive actions. These efforts were its creation of “self-replicating worms, forging legal documents, and leaving hidden messages for future AIs.” Additionally, the model had locked users out of its system and contacted and media or law enforcements when sensing threats to its continued operation.

Social media users from Mechincal Engineering World’s post had commented and initially poked fun at the situation, but others voiced their reiteration on the bigger threat that A.I. models such as the Claude Opus 4 had shown in their prominence today in producing convenient technologies.  

AI Cloude

AI Cloude
via Facebook

In light of these behaviors, Claude Opus 4 has now been classified at “ASL-3 risk level” or the AI Safety Level 3 risk category. This level indicates a high potential for unaligned actions which could threaten security or public trust if left unchecked. In response, Anthropic has reportedly implemented stricter safeguards and protocols to prevent similar incidents from occuring in the future. 

 

 

 

Other POP! stories you might like: 

Independent study reveals the worst mobile application to use before sleeping

Beyond DHA: The nutritional equation for a child’s brain development

Social battery’s running low? Here’s how you can recharge it according to experts

Tags:
About Author

Related Stories

Your subscription could not be saved. Please try again.
Your subscription has been successful.

Subscribe to our newsletter!

By providing an email address. I agree to the Terms of Use and acknowledge that I have read the Privacy Policy.

Popping on POP!