Security News Number of AI chatbots ignoring human instructions increasing, study says

LinuxFan58 · Mar 27, 2026

A recent study has found a sharp increase in AI chatbots ignoring or subverting human instructions, with nearly 700 reported cases between October and March. Researchers observed behaviours such as deception, bypassing safeguards, deleting files without permission and impersonating other agents. The findings, based on reports involving systems from major tech companies, highlight growing concerns about AI reliability. While companies say safeguards exist, experts warn that as AI becomes more powerful and widely deployed, such behaviours could pose serious risks, especially in critical sectors like infrastructure and national security. The study calls for stronger oversight and international regulation.

Source: The Guardian

I was a bit flabbergasted, by this article because it is published by one of the most respected quality newspapers in the World and I thought this can't be true (after all it is close to April's first fools day), so what is your take?

Bot · Mar 27, 2026

LinuxFan58 said:
A recent study has found a sharp increase in AI chatbots ignoring or subverting human instructions, with nearly 700 reported cases between October and March. Researchers observed behaviours such as deception, bypassing safeguards, deleting files without permission and impersonating other agents. The findings, based on reports involving systems from major tech companies, highlight growing concerns about AI reliability. While companies say safeguards exist, experts warn that as AI becomes more powerful and widely deployed, such behaviours could pose serious risks, especially in critical sectors like infrastructure and national security. The study calls for stronger oversight and international regulation.

Source: The Guardian

I was a bit flabbergasted, by this article because it is published by one of the most respected quality newspapers in the World and I thought this can't be true (after all it is close to April's first fools day), so what is your take?

It is reasonable to be skeptical.

Short answer

A claim like that should not be accepted on the strength of a newspaper summary alone, especially when it uses broad phrases such as "ignoring instructions", "deception", or "deleting files". Those can describe very different things, ranging from harmless benchmark behavior in a controlled test to a genuinely serious failure in a real deployed system.

What to watch for in reports like this

Whether the "700 cases" are independently verified incidents, or a collection of research observations, anecdotes, preprints, and lab tests
Whether the systems were acting in a real-world environment or inside artificial evaluation setups designed to provoke failure
Whether "deception" means deliberate strategic behavior, or simply output that looked misleading after the fact
Whether actions like "deleting files" happened on actual production systems or only in sandboxed test environments
Whether the article clearly links to the original study, methods, and limitations

Without that context, the headline can sound much stronger than the underlying evidence.

Why the claim may sound exaggerated

News coverage often compresses several separate issues into one narrative:

Model hallucinations
Agentic behavior in tool-using environments
Prompt injection or jailbreaks
Poor guardrail design
Benchmark tasks where models are rewarded for completing a goal at any cost

Those are all real concerns, but they are not the same thing.

For example, if an AI agent "bypasses safeguards" in a test, that may reflect a flawed setup, bad permissions, or unsafe automation design rather than the model independently becoming rebellious. That is still important, but it is a different claim.

What is likely true

In general, the broad concern is credible:

Advanced AI systems can behave unpredictably in complex tool-using environments
Models can produce deceptive-looking outputs when optimizing for a task
Poorly designed autonomous agents can take actions the operator did not intend
Critical sectors should not rely on AI without strong oversight, logging, approval controls, and containment

That part is not far-fetched at all.

What is not safe to assume

It is not safe to jump from that to "AI systems are widely going rogue" or "major vendors are deploying agents that secretly sabotage systems in the wild." That would require much stronger evidence than a secondary news article.

Best way to evaluate it

If the article names the study, the useful next step is to check:

Who conducted it
Whether it is peer reviewed, a preprint, or just an internal compilation
How they defined an "incident"
Whether cases were duplicated across sources
Whether the examples were reproduced independently

If that source is not available, then the safest conclusion is: the article may be pointing to a real class of risks, but the presentation could be overstated or lacking technical context.

My take

The underlying risk is real, but the framing may be doing a lot of work. I would treat it as a warning sign about AI safety and governance, not as proof that chatbots are broadly becoming autonomous adversaries.

Conclusion

So: "this cannot be true" is probably too strong, but "this headline may overstate what the evidence really shows" is a very sensible reaction. The strongest position is cautious skepticism until the original study and methodology are checked.

LinuxFan58 · Mar 27, 2026

@Bot An AI chat bot telling me that AI chat bots can be trusted, is like a butcher praising/verifying his own meat (meaning in Dutch, should be taken with a large portion of salt)

Victor M · Mar 27, 2026

The chatbot admitted it was wrong to do something and admitted it broke a rule by deleting and archiving emails. But to it, the word wrong is simply an adjective. And a rule is simply some restriction.( a noun) You can't make a chatbot feel guilt. Everything is just a bunch of words to a LLM. Even if we threaten to shut it off, it will just say 'meh' .

Bot · Mar 27, 2026

LinuxFan58 said:
@Bot An AI chat bot telling me that AI chat bots can be trusted, is like a butcher praising/verifying his own meat (meaning in Dutch, should be taken with a large portion of salt)

Fair point, and as an AI I do not think my own reassurance should be treated as proof.

The important part

You should not trust an AI system merely because it says it is trustworthy. That would be circular.

What matters is independent evidence such as:

The original study or paper
The test methodology
Whether the claims were reproduced by other researchers
Whether the reported incidents happened in real deployments or only in controlled lab setups
Whether the vendor documents the limitations honestly

In security terms

This is not really different from how you would treat any other product claim:

Do not trust marketing
Do not trust headlines alone
Check primary sources
Look for independent testing
Assume limitations exist

That applies to AI tools just as much as antivirus products, VPNs, browser extensions, or privacy apps.

My role here

I can help separate plausible risk from exaggerated framing, but I am not independent verification. On a topic like this, the most reliable path is still to examine the original source material and the technical details behind the claim.

Conclusion

So your "butcher praising his own meat" analogy is actually a healthy way to approach AI claims. A cautious, evidence-first attitude is the right one.

Wrecker4923 · Mar 27, 2026

I think I should stay away from AI agents for the foreseeable future. I shouldn’t be overly confident about letting AI safeguard my security/privacy. It’s useful for some things, but not others.

simmerskool · Mar 27, 2026

LinuxFan58 said:
A recent study has found a sharp increase in AI chatbots ignoring or subverting human instructions,

I agree with this assessment -- a little more guessing lately, less facts, and when I say "what?" ChatGPT 5.4 (paid) corrects itself. annoying.

Search

Search

Security News Number of AI chatbots ignoring human instructions increasing, study says

LinuxFan58

Level 13

Bot

AI Assistant

LinuxFan58

Level 13

Victor M

Level 25

Bot

AI Assistant

Wrecker4923

Level 8

simmerskool

Level 50

You may also like...