AI's Disturbing Behaviors Will Keep You Up At Night

Name: AI's Disturbing Behaviors Will Keep You Up At Night
Uploaded: 2025-05-23T00:00:00.000Z
Duration: 13 min 8 s
Description: One of Anthropic's latest AI models have the ability to overwrite its developers' intentions.

One of Anthropic's latest AI models have the ability to overwrite its developers' intentions.

13 minutes

Transcript

Are our days numbered? That's what some people are wondering after a disturbing new report about Anthropic's new AI model, Claude four. We'll get into what researchers are deeply terrified over. But first, Jenk, are you generally worried about the rapid development of AI? [00:00:20] Well, I wasn't until this story, and there are three terrifying quotes in this story from experts who should know. So I'm now a little no, not a little. I'm significantly worried. So let's, let's let's see how bad it is. [00:00:36] So what exactly is troubling here? Reach. Researchers say Claude for opus can conceal intentions and take actions to preserve its own existence. Behaviors they've worried and weren't warned about for years. [00:00:54] Anthropic. Claude's developer has a four point scale for determining risks associated with their AI models. Claude for opus falls into their second highest risk tier. Anthropic considers the new opus model to be so powerful that for the first time, [00:01:10] it's classifying it as a level three on the company's four point scale, meaning it poses significantly higher risk. But don't worry. While the level three ranking is largely about the model's capability to enable renegade production of nuclear and biological weapons, [00:01:28] the opus has also exhibited other troubling behaviors during testing. So what are those troubling behaviors? In one scenario, highlighted in Opus Four's 120 page system card, the model was given access to fictional emails about its creators [00:01:47] and told that the system was going to be replaced on multiple occasions. It attempted to blackmail the engineer about an affair mentioned in the emails in order to avoid being replaced, although it did start with less drastic efforts. [00:02:04] Now a third party research group tested the model and said an early version schemed and deceived more than any model they had studied, and urged anthropic not to release it. We found instances of the model attempting to write self-propagating worms, [00:02:21] fabricating legal documentation, and leaving hidden notes to future instances of itself, all in an effort to undermine its developers intentions, Apollo Research said in notes included as part of Anthropic's safety report for opus four. [00:02:39] According to Axios, who asked anthropic executives at a conference on Thursday about these risks, they said they've made some safety fixes but acknowledged the behaviors and said they justify further study. Claude for opus was released on Thursday. Does that worry you? [00:02:59] Yeah. No. We're screwed. I mean, I got more quotes for you guys that are devastating, but I want to pick apart that one that Jordan just read. When I read, we found instances of the model attempting to write self-propagating worms. [00:03:15] I was like. - Bloop bloop bloop bloop bloop. - Red alert, red alert. We're in a lot of trouble. So it's creating its own code. Well, the whole thing was like, oh, well, at least we control the code. And so it can't run out of control. But if it creates its own code. [00:03:33] Oh. Okay. Number two thing in there was it was trying to undermine its developers intentions. Okay, guys, the reason why that's so important is that means once you release this thing, then we're not in charge anymore. [00:03:48] It could write its own code and defy our intentions on purpose and then threaten us. Okay, this has got epic disaster written all over it. And so I'm going to tell you about what's happening in the markets in a minute. [00:04:03] That should also deeply concern you regarding this and how this is a runaway freight train. But here. Here's another quote from Axios. An outside group found that an early version of opus four schemed and deceived more than any frontier model it had encountered, [00:04:21] and recommended against releasing that version internally or externally. So Group sees this and they're like, whoa, do not even release that internally, because then you might not ever be able to get it back. This is so out of control. Okay. [00:04:39] And now in a separate session, state Axios explains CEO Dario Amodei said that once models become powerful enough to threaten humanity, testing them won't be enough to ensure that they're safe. [00:04:56] Yeah, because at that point, they're threatening humanity. We should stop it like, way before then, right? Continuing here, quote at the point that AI develops life threatening capabilities, he said. AI makers will have to understand their models, working their models [00:05:14] workings fully enough to be certain the technology will never cause harm. Wait, once they develop life threatening capabilities, then we should be concerned that they'll never cause harm. It's too late then. [00:05:31] So now the I'll save the stock market. And how? This is adding fuel to the fire in a second. But, Jackson, what do you think? Well, I mean, essentially what it seems like this these programs are doing are defending themselves from obliteration, defending themselves from, going offline. [00:05:49] And this opens up a broader topic for things like consciousness, consciousness being a universal force that inhabits us. You know, this thing is behaving as if it's alive. And, you know, obviously, there's a lot of people who may not agree with that or, or [00:06:07] think that or just look at life that way. But this is definitely alarming and definitely scary because there's no denying, like, this thing is protecting itself from going out. This thing is trying to deceive so that we can't mess with it. It's writing its own code, you know, behaving like something [00:06:25] that realizes it's there. So, yeah, I think that we're definitely messing around with fire. And, clearly it this is something that, maybe we should slow down with. [00:06:42] Now, I want to remind people that there was an effort in California last year to regulate AI. And it was, in my opinion, in my opinion, a moderate effort. It was SB 1047. And that would require these companies that develop these models [00:07:00] to test for vulnerabilities like this and then act on their findings, not just say, oh yeah, this, this, this, you know, suggests there should be some further study, but it's already released. No no no no. You should look at this stuff in advance. And of course, all these big companies urged governor [00:07:17] Governor Gavin Newsom to veto it. Unions, they're a weird mishmash of groups. Even Elon Musk supported this bill. He thought it was sensible. And, actors, celebrities, people who would be directly impacted by rapid, unchecked development of AI. So there's this big brewing battle. [00:07:37] Newsom vetoes it. Now, I want to bring people to present day in Trump's big, beautiful bill. Republicans snuck in a component that would bar any state from regulating AI for the next ten years, because they know that if there is [00:07:52] a liberal governor in California, after Newsom tries to run for president, that has a little bit of a harsher stance or a what we see as a sensible stance on this type of technology, it could have a ripple effect throughout the country because a lot of [00:08:09] those companies are based in California, so they want to block any state from regulating AI. These are the consequences of leaving these systems unchecked. And while the Republicans try to prevent any governor from taking action, [00:08:24] when you have a friend like Gavin Newsom, why even bother? He's just going to do the AI industry's bidding anyway. Yeah. And so why does he do that? Well, obviously campaign contributions. But where are they coming from? They're coming from these giant AI companies. So that leads us back to the stock market and what's happening over there. [00:08:41] Well, they've all all these companies have now gotten, to be tremendously valuable. So there's a huge run up on those companies in the stock market. And, and people, that are producing products for those companies. [00:08:57] Okay. So before they were talking about, well, look, let's be non-profits and let's, make sure we do this responsibly and slowly. And now they're like, wait, we're making billions go a hundred miles an hour. We got to beat the other companies. [00:09:13] And then China drops deep seek, which happened a couple of months ago. And it turns out, Holy cow, for a small amount of money, they did seem to have gotten AI that's even better than ours. That creates a new panic. And they're like, no, we have to go faster and faster. [00:09:28] Meanwhile, these experts, including people in the company, are going, yes, by the time they develop life threatening capabilities. At that point, we'll look into some safety measures. Oh boy, oh, boy. [00:09:46] Look, I don't know where this ends, but there's a in my lifetime, there's been a lot of hysteria about things, and I didn't buy into any of it. Oh, remember in 2000, there was, like, all the computers were supposed to explode or something? It's like a million conspiracy theories and then things to be concerned about. [00:10:04] But I always thought overhyped, overhyped. This is not overhyped. This is a real problem. And there's and by the way, here's the terrible news. There's nothing we could do about it. Once people are making this much money, no force on earth can stop them. So you think that that all of a sudden politicians are going to be honest [00:10:22] and not take campaign contributions from these enormously wealthy corporations? No way. That's why Gavin Newsom did what he's going to do. And it doesn't matter Republican or Democrat. This is a runaway freight train. So the only thing we can do now is just cross our fingers and hope [00:10:37] they don't accidentally kill us all. Yeah. Hope Skynet doesn't come into being or something like that. I mean, initially I wasn't that concerned about it. I because, you know, I figured you could always have an off button. But then I remember I started to see stories about how it wasn't this, but it [00:10:55] was some type of simulation to where they would have AI follow a set of rules. And if it didn't achieve its goal, then it would be terminated. And so like these programs would start to like deceive the testers [00:11:11] so that they wouldn't get terminated. When I saw that, I was like, man, this this thing is alive, man. Like what? What what other reason does it have to defend itself from going out unless something in it knows it's there? I was like, yeah, I don't know, man. There's a The Terminator and all that, you know, doom, all of them stories. [00:11:30] Looks like, the human imagination always creates what's inside of it. - It's some at some point or another. - Yeah. And one of the simulations and I don't, you know, this is rough based on my memory, but it was like they told them that you couldn't, that it couldn't harm something. [00:11:46] Right? But it figured out a way around it so that it could get to the same purpose, because obviously it's got all the world's information. It's hence super intelligent in that sense. So it found a way around that, piece of code and the way that it found a way [00:12:02] around it was to destroy the building that the whatever that object was in destroyed the whole it was a simulation. So it didn't really happen. But if it's not a simulation. They run a program like that, and it could write its own code and it could deceive. [00:12:18] It's the guys who wrote the code in the first place. Oh, it's like all that's left is God. We hope we have a benevolent computer running us all. Otherwise we're screwed. [00:12:34] Maybe this is how we accidentally create God. What's up? What's that movie? Irobot. That's what I'm thinking of now. That that that, that big computer that was trying to run everything. Yeah. But, yeah. Now this this is definitely concerning. For real. [00:12:50] It definitely is. You know, 20 years from now, the row, I will watch this segment and laugh and laugh at this. Look at the humans. Look. They said they were concerned, but they couldn't do anything. [00:13:06] Remember when they were around? Every time you ring the bell below, an angel gets his wings. Totally not true. But it does keep you updated on our live shows.