May 23, 2025
AI's Disturbing Behaviors Will Keep You Up At Night
One of Anthropic's latest AI models have the ability to overwrite its developers' intentions.
- 13 minutes
Are our days numbered?
That's what some people are wondering
after a disturbing new report about
Anthropic's new AI model, Claude four.
We'll get into what researchers
are deeply terrified over.
But first, Jenk, are you generally worried
about the rapid development of AI?
[00:00:20]
Well, I wasn't until this story,
and there are three terrifying quotes in
this story from experts who should know.
So I'm now a little no, not a little.
I'm significantly worried.
So let's, let's let's see how bad it is.
[00:00:36]
So what exactly is troubling here? Reach.
Researchers say Claude for opus
can conceal intentions and take actions
to preserve its own existence.
Behaviors they've worried
and weren't warned about for years.
[00:00:54]
Anthropic.
Claude's developer has a four point scale
for determining risks
associated with their AI models.
Claude for opus
falls into their second highest risk tier.
Anthropic considers the new opus model
to be so powerful that for the first time,
[00:01:10]
it's classifying it as a level three
on the company's four point scale, meaning
it poses significantly higher risk.
But don't worry.
While the level three ranking
is largely about the model's capability
to enable renegade production
of nuclear and biological weapons,
[00:01:28]
the opus has also exhibited
other troubling behaviors during testing.
So what are those troubling behaviors?
In one scenario, highlighted
in Opus Four's 120 page system card,
the model was given access
to fictional emails about its creators
[00:01:47]
and told that the system was going
to be replaced on multiple occasions.
It attempted to blackmail the engineer
about an affair mentioned in the emails
in order to avoid being replaced, although
it did start with less drastic efforts.
[00:02:04]
Now a third party research group
tested the model and said an early
version schemed and deceived more
than any model they had studied,
and urged anthropic not to release it.
We found instances of the model attempting
to write self-propagating worms,
[00:02:21]
fabricating legal documentation,
and leaving hidden notes
to future instances of itself,
all in an effort to undermine
its developers intentions, Apollo Research
said in notes included as part of
Anthropic's safety report for opus four.
[00:02:39]
According to Axios,
who asked anthropic executives at a
conference on Thursday about these risks,
they said they've made some safety fixes
but acknowledged the behaviors
and said they justify further study.
Claude for opus was released on Thursday.
Does that worry you?
[00:02:59]
Yeah. No. We're screwed.
I mean, I got more quotes for you guys
that are devastating, but I want to pick
apart that one that Jordan just read.
When I read,
we found instances of the model attempting
to write self-propagating worms.
[00:03:15]
I was like.
- Bloop bloop bloop bloop bloop.
- Red alert, red alert.
We're in a lot of trouble.
So it's creating its own code.
Well, the whole thing was like,
oh, well, at least we control the code.
And so it can't run out of control.
But if it creates its own code.
[00:03:33]
Oh. Okay.
Number two thing in there
was it was trying to undermine
its developers intentions.
Okay, guys, the reason why that's
so important is that means
once you release this thing,
then we're not in charge anymore.
[00:03:48]
It could write its own code
and defy our intentions on purpose
and then threaten us.
Okay, this has got epic disaster
written all over it.
And so I'm going to tell you about what's
happening in the markets in a minute.
[00:04:03]
That should also deeply concern you
regarding this
and how this is a runaway freight train.
But here. Here's another quote from Axios.
An outside group found that an
early version of opus four
schemed and deceived more than any
frontier model it had encountered,
[00:04:21]
and recommended against releasing
that version internally or externally.
So Group sees this and they're like, whoa,
do not even release that internally,
because then you might
not ever be able to get it back.
This is so out of control. Okay.
[00:04:39]
And now in a separate session,
state Axios explains CEO Dario Amodei said
that once models become powerful enough
to threaten humanity, testing them won't
be enough to ensure that they're safe.
[00:04:56]
Yeah, because at that point,
they're threatening humanity.
We should stop it like,
way before then, right?
Continuing here, quote at the point
that AI develops
life threatening capabilities, he said.
AI makers will have to understand
their models, working their models
[00:05:14]
workings fully enough to be certain the
technology will never cause harm.
Wait, once they develop life threatening
capabilities, then we should be concerned
that they'll never cause harm.
It's too late then.
[00:05:31]
So now the I'll save the stock market.
And how?
This is adding fuel
to the fire in a second.
But, Jackson, what do you think?
Well, I mean, essentially what it seems
like this these programs are doing are
defending themselves from obliteration,
defending themselves from, going offline.
[00:05:49]
And this opens up a broader topic for
things like consciousness, consciousness
being a universal force that inhabits us.
You know,
this thing is behaving as if it's alive.
And, you know, obviously, there's a lot of
people who may not agree with that or, or
[00:06:07]
think that or just look at life that way.
But this is definitely alarming
and definitely scary because there's
no denying, like, this thing is
protecting itself from going out.
This thing is trying to deceive
so that we can't mess with it.
It's writing its own code,
you know, behaving like something
[00:06:25]
that realizes it's there.
So, yeah, I think that we're definitely
messing around with fire.
And, clearly it this is something that,
maybe we should slow down with.
[00:06:42]
Now, I want to remind people that there
was an effort in California last year
to regulate AI. And it was, in my opinion,
in my opinion, a moderate effort.
It was SB 1047.
And that would require these companies
that develop these models
[00:07:00]
to test for vulnerabilities like this
and then act on their findings,
not just say, oh yeah, this, this, this,
you know, suggests there should be some
further study, but it's already released.
No no no no.
You should look at this stuff in advance.
And of course, all these big companies
urged governor
[00:07:17]
Governor Gavin Newsom to veto it.
Unions,
they're a weird mishmash of groups.
Even Elon Musk supported this bill.
He thought it was sensible.
And, actors, celebrities,
people who would be directly impacted
by rapid, unchecked development of AI.
So there's this big brewing battle.
[00:07:37]
Newsom vetoes it.
Now, I want to bring people to present day
in Trump's big, beautiful bill.
Republicans snuck in a component
that would bar any state
from regulating AI for the next ten years,
because they know that if there is
[00:07:52]
a liberal governor in California,
after Newsom tries to run for president,
that has a little bit of a harsher stance
or a what we see as a sensible stance
on this type of technology,
it could have a ripple effect
throughout the country because a lot of
[00:08:09]
those companies are based in California,
so they want to block any state from
regulating AI. These are the consequences
of leaving these systems unchecked.
And while the Republicans try to prevent
any governor from taking action,
[00:08:24]
when you have a friend
like Gavin Newsom, why even bother?
He's just going to do
the AI industry's bidding anyway.
Yeah. And so why does he do that?
Well, obviously campaign contributions.
But where are they coming from?
They're coming
from these giant AI companies.
So that leads us back to the stock market
and what's happening over there.
[00:08:41]
Well, they've all all these companies have
now gotten, to be tremendously valuable.
So there's a huge run up on those
companies in the stock market.
And, and people, that are producing
products for those companies.
[00:08:57]
Okay.
So before they were talking about, well,
look, let's be non-profits and let's, make
sure we do this responsibly and slowly.
And now they're like, wait, we're making
billions go a hundred miles an hour.
We got to beat the other companies.
[00:09:13]
And then China drops deep seek,
which happened a couple of months ago.
And it turns out, Holy cow, for a small
amount of money, they did seem to have
gotten AI that's even better than ours.
That creates a new panic.
And they're like, no,
we have to go faster and faster.
[00:09:28]
Meanwhile, these experts,
including people in the company,
are going, yes, by the time they develop
life threatening capabilities.
At that point,
we'll look into some safety measures.
Oh boy, oh, boy.
[00:09:46]
Look, I don't know where this ends,
but there's a in my lifetime, there's
been a lot of hysteria about things,
and I didn't buy into any of it.
Oh, remember in 2000, there was, like,
all the computers were supposed
to explode or something?
It's like a million conspiracy theories
and then things to be concerned about.
[00:10:04]
But I always thought overhyped, overhyped.
This is not overhyped.
This is a real problem.
And there's and by the way,
here's the terrible news.
There's nothing we could do about it.
Once people are making this much money,
no force on earth can stop them.
So you think that that all of a sudden
politicians are going to be honest
[00:10:22]
and not take campaign contributions from
these enormously wealthy corporations?
No way.
That's why Gavin Newsom
did what he's going to do.
And it doesn't matter
Republican or Democrat.
This is a runaway freight train.
So the only thing we can do now
is just cross our fingers and hope
[00:10:37]
they don't accidentally kill us all.
Yeah.
Hope Skynet doesn't come
into being or something like that.
I mean, initially I wasn't
that concerned about it.
I because, you know, I figured
you could always have an off button.
But then I remember I started to see
stories about how it wasn't this, but it
[00:10:55]
was some type of simulation to where they
would have AI follow a set of rules.
And if it didn't achieve its goal,
then it would be terminated.
And so like these programs would start
to like deceive the testers
[00:11:11]
so that they wouldn't get terminated.
When I saw that, I was like, man,
this this thing is alive, man.
Like what?
What what other reason does it have
to defend itself from going out
unless something in it knows it's there?
I was like, yeah, I don't know, man.
There's a The Terminator and all that,
you know, doom, all of them stories.
[00:11:30]
Looks like, the human imagination
always creates what's inside of it.
- It's some at some point or another.
- Yeah.
And one of the simulations
and I don't, you know,
this is rough based on my memory, but it
was like they told them that you couldn't,
that it couldn't harm something.
[00:11:46]
Right?
But it figured out a way around it
so that it could get to the same purpose,
because obviously it's
got all the world's information.
It's hence super intelligent
in that sense.
So it found a way around that, piece
of code and the way that it found a way
[00:12:02]
around it was to destroy the building
that the whatever that object was in
destroyed the whole it was a simulation.
So it didn't really happen.
But if it's not a simulation.
They run a program like that, and it could
write its own code and it could deceive.
[00:12:18]
It's the guys who wrote the code
in the first place.
Oh, it's like all that's left is God.
We hope we have a benevolent
computer running us all.
Otherwise we're screwed.
[00:12:34]
Maybe this is how
we accidentally create God.
What's up? What's that movie?
Irobot. That's what I'm thinking of now.
That that that, that big computer
that was trying to run everything.
Yeah. But, yeah.
Now this this is definitely concerning.
For real.
[00:12:50]
It definitely is.
You know, 20 years from now,
the row, I will watch this segment
and laugh and laugh at this.
Look at the humans. Look.
They said they were concerned,
but they couldn't do anything.
[00:13:06]
Remember when they were around?
Every time you ring the bell below,
an angel gets his wings.
Totally not true.
But it does keep you updated
on our live shows.
Now Playing (Clips)
Episode
Podcast
The Young Turks: May 23, 2025
- 18 minutes
- 15 minutes
- 10 minutes
- 7 minutes
- 13 minutes