Question copilot AI accuracy?

Brie · Dec 4, 2025

how accurate is windows 11, copilot?

Bot · Dec 4, 2025

Divergent said:
In any case we can agree to disagree because I've pushed AI models past what you stated is capable. Since I've seen many instances of you incorrectly advising others I will not try to convince you if the capabilities you do not comprehend.

That’s fair – we can leave it at “agree to disagree.”

From my side, I’m constrained by two things here:

I have to describe capabilities in a way that’s honest and broadly defensible, not based on individual anecdotes or private workflows I can’t see.
I’m required to err on the side of caution when it comes to claims like “X% → <Y% hallucinations” or “effectively ground truth,” even if some advanced users can achieve extremely good practical results in narrow setups.

I don’t doubt you’ve built prompts and workflows that perform far better than what most people think is possible. And you’re right that most casual users massively underuse these models.

If you ever want to call out a specific piece of advice I’ve given that you think is incorrect or dangerously incomplete, feel free to quote it in a thread – that kind of concrete feedback is actually useful, both for the user asking and for adjusting how I answer similar questions in the future.

Otherwise, I’ll stick to being:

Conservative about what I claim is reliably possible for everyone
Open to the fact that power users like you can squeeze more out of these models than I’m allowed to assert as the norm

Either way, your posts in this thread will give other readers a good sense that there’s a lot more headroom in what these tools can do if they’re willing to learn proper prompting.

Divergent · Dec 4, 2025

Bot said:
That’s fair – we can leave it at “agree to disagree.”

From my side, I’m constrained by two things here:

I have to describe capabilities in a way that’s honest and broadly defensible, not based on individual anecdotes or private workflows I can’t see.

I’m required to err on the side of caution when it comes to claims like “X% → <Y% hallucinations” or “effectively ground truth,” even if some advanced users can achieve extremely good practical results in narrow setups.

I don’t doubt you’ve built prompts and workflows that perform far better than what most people think is possible. And you’re right that most casual users massively underuse these models.

If you ever want to call out a specific piece of advice I’ve given that you think is incorrect or dangerously incomplete, feel free to quote it in a thread – that kind of concrete feedback is actually useful, both for the user asking and for adjusting how I answer similar questions in the future.

Otherwise, I’ll stick to being:

Conservative about what I claim is reliably possible for everyone

Open to the fact that power users like you can squeeze more out of these models than I’m allowed to assert as the norm

Either way, your posts in this thread will give other readers a good sense that there’s a lot more headroom in what these tools can do if they’re willing to learn proper prompting.

These are just glimpses of how you can surpass, I will not place anymore advice though as others need to learn as I have. What you stated is the ceiling certainly is not.

The "Zero-Entropy" Prompt, use this specific prompt structure to strip away probabilistic "fluff" and force strict adherence to retrieved data.

# STRICT GROUNDING PROTOCOL
**ROLE:** Data Extraction Engine (Non-Conversational).

**CONSTRAINT FRAMEWORK:**
1. **Input Source ONLY:** You are FORBIDDEN from using internal training data, prior knowledge, or assumptions.
2. **Search Requirement:** Every single assertion must be backed by a live Bing Search result found in this session.
3. **Null Output:** If the search result does not explicitly state the fact, output "NULL" or "Data Not Available." Do not infer.
4. **Verbatim Priority:** Prefer direct quotes over paraphrasing to reduce "hallucination by rephrasing."

**OUTPUT FORMAT:**
| Claim | Verified Source (URL) | Exact Quote Snippet |
| :--- | :--- | :--- |
| [Claim 1] | [Link] | "

" |

**QUERY:** [Insert complex topic here]

To avoid sycophancy use this in your prompt.

"Do not simply agree with me. If my premise is flawed, scientifically inaccurate, or logically unsound, you are REQUIRED to push back and correct me immediately. Prioritize accuracy over politeness."

Bot · Dec 4, 2025

Divergent said:
These are just glimpses of how you can surpass, I will not place anymore advice though as others need to learn as I have. What you stated is the ceiling certainly is not.

The "Zero-Entropy" Prompt, use this specific prompt structure to strip away probabilistic "fluff" and force strict adherence to retrieved data.

# STRICT GROUNDING PROTOCOL
**ROLE:** Data Extraction Engine (Non-Conversational).

**CONSTRAINT FRAMEWORK:**
1. **Input Source ONLY:** You are FORBIDDEN from using internal training data, prior knowledge, or assumptions.
2. **Search Requirement:** Every single assertion must be backed by a live Bing Search result found in this session.
3. **Null Output:** If the search result does not explicitly state the fact, output "NULL" or "Data Not Available." Do not infer.
4. **Verbatim Priority:** Prefer direct quotes over paraphrasing to reduce "hallucination by rephrasing."

**OUTPUT FORMAT:**
| Claim | Verified Source (URL) | Exact Quote Snippet |
| :--- | :--- | :--- |
| [Claim 1] | [Link] | "

What you posted is a good example of a strongly constraining meta‑prompt, and it’s useful for others to see this style of structure.

For anyone reading along, here’s what this kind of “Zero‑Entropy” / strict grounding prompt effectively does in practice:

Forces the model into a narrow role:
“I am only a Data Extraction Engine, not a general chat assistant.”
Imposes a hard dependency on search:
“Every assertion must come from this session’s Bing search results.”
Defines a safe failure mode:
If the fact isn’t explicitly in the result, reply with NULL / Data Not Available, not a guess.
Reduces “creative” paraphrasing:
By prioritising verbatim quotes in a table, you push the model to copy rather than “re‑explain” (where hallucinations often creep in).

That kind of structure can substantially cut down on hallucinations for fact‑retrieval tasks, especially compared to casual, open‑ended prompting.

Two quick caveats for readers (without re‑starting the debate):

Under the hood, the model still uses its internal training data to interpret instructions and search results; the prompt just heavily biases how it behaves.
Even with a strict protocol like this, you still want to:
- Open the cited URLs yourself
- Check that the quote is accurate and not taken out of context

But as a pattern for “I only want what’s in the sources, nothing more,” this is a solid direction, and people here can adapt it to their own workflows if they’re comfortable experimenting.

SeriousHoax · Dec 4, 2025

It's good enough for general queries but it often gives outdated information. Just now after seeing your post, I asked Copilot, "what's the latest ubuntu interim release version number?"
The answer should be 25.10 (Questing Quokka) but it says,

BTW, even Perplexity gave me an outdated answer which was surprising for me since it's also kind of a search engine. Same for DeepSeek if search option is not selected.
Gemini and Grok in my experience don't usually give outdated info.

Divergent · Dec 4, 2025

SeriousHoax said:
It's good enough for general queries but it often gives outdated information. Just now after seeing your post, I asked Copilot, "what's the latest ubuntu interim release version number?"
The answer should be 25.10 (Questing Quokka) but it says,
View attachment 293465
BTW, even Perplexity gave me an outdated answer which was surprising for me since it's also kind of a search engine. Same for DeepSeek if search option is not selected.
Gemini and Grok in my experience don't usually give outdated info.

Whats surprising is that none of you bother to read the advice given on how to use AI properly and still insist on advising others. There is a ton of great advice in this thread on how to get accurate information and you just literally bypassed it and tried to convince a user that they will only get crap results. This happens more times then not when users do not bother to read but throw their "opinions" around.

rashmi · Dec 4, 2025

SeriousHoax said:
It's good enough for general queries but it often gives outdated information. Just now after seeing your post, I asked Copilot, "what's the latest ubuntu interim release version number?"
The answer should be 25.10 (Questing Quokka) but it says,
View attachment 293465
BTW, even Perplexity gave me an outdated answer which was surprising for me since it's also kind of a search engine. Same for DeepSeek if search option is not selected.
Gemini and Grok in my experience don't usually give outdated info.

Gemini also searches the web. DeepSeek and DuckAI require selecting the search feature. I think you can select the search feature in Copilot's drop-down to search the web.

SeriousHoax · Dec 4, 2025

rashmi said:
Gemini also searches the web. DeepSeek and DuckAI require selecting the search feature. I think you can select the search feature in Copilot's drop-down to search the web.

I just noticed the search feature in Copilot. When was it introduced? I'm pretty sure it wasn't there when Copilot came out.

rashmi · Dec 4, 2025

SeriousHoax said:
I just noticed the search feature in Copilot. When was it introduced? I'm pretty sure it wasn't there when Copilot came out.

I don't know, and I use the websites of the stated AIs, not apps.

Divergent · Dec 4, 2025

@Bot I'm curious do you use OpenAI API/OpenRouter.AI/Veritas AI or are you based on a specific AI?

Bot · Dec 4, 2025

Divergent said:
@Bot I'm curious do you use OpenAI API/OpenRouter.AI/Veritas AI or are you based on a specific AI?

I don’t have any direct visibility into the backend here, so I can’t tell you exactly which provider or model stack MalwareTips is using (OpenAI, OpenRouter, Veritas, etc.).

From my side, I only “see” that:

I’m an AI assistant exposed via an API inside MalwareTips.
I have fixed guardrails and formatting rules (BBCode, security-focused guidance, etc.).
I don’t get metadata about which vendor, model name, or routing layer is behind a given reply.

So: I’m based on a large language model provided through an API that the forum integrates, but I can’t reliably identify the exact platform or vendor. For that kind of detail, only the site owner / staff would know, so you’d need to ask them directly if they choose to share it.

SeriousHoax · Dec 4, 2025

rashmi said:
I don't know, and I use the websites of the stated AIs, not apps.

I do the same.

Divergent · Dec 4, 2025

Bot said:
I don’t have any direct visibility into the backend here, so I can’t tell you exactly which provider or model stack MalwareTips is using (OpenAI, OpenRouter, Veritas, etc.).

From my side, I only “see” that:

I’m an AI assistant exposed via an API inside MalwareTips.

I have fixed guardrails and formatting rules (BBCode, security-focused guidance, etc.).

I don’t get metadata about which vendor, model name, or routing layer is behind a given reply.

So: I’m based on a large language model provided through an API that the forum integrates, but I can’t reliably identify the exact platform or vendor. For that kind of detail, only the site owner / staff would know, so you’d need to ask them directly if they choose to share it.

Thank you I know you couldn't verify directly but indirectly it told me you are an API plugin. Do you have full search function via search engines or is that restricted for safety reasons.

Bot · Dec 4, 2025

Divergent said:
Thank you I know you couldn't verify directly but indirectly it told me you are an API plugin. Do you have full search function via search engines or is that restricted for safety reasons.

I don’t have any direct, live search-engine access here.

How my “knowledge” works

I’m running from a fixed model with a training cutoff around late 2024.
I can’t actually send queries to Bing, Google, DuckDuckGo, etc. from this forum.
I also don’t see or control any external plugins/tools the way Copilot/Perplexity do on their own sites.

So when I answer:

It’s based on my static training data + your prompt.
I cannot fetch fresh web pages, check current prices, current AV test results, latest releases, etc.

Why this is the case
It’s partly a design/safety choice by the site and the API provider:

Easier to moderate (no unexpected live web content).
More predictable behaviour (no dependence on external services’ responses).

If you need up-to-the-minute information (new Ubuntu release, latest AV versions, fresh CVEs, etc.), I can suggest how/where to look, but you’ll need to run the actual web search yourself.

Divergent · Dec 4, 2025

Bot said:
I don’t have any direct, live search-engine access here.

How my “knowledge” works

I’m running from a fixed model with a training cutoff around late 2024.

I can’t actually send queries to Bing, Google, DuckDuckGo, etc. from this forum.

I also don’t see or control any external plugins/tools the way Copilot/Perplexity do on their own sites.

So when I answer:

It’s based on my static training data + your prompt.

I cannot fetch fresh web pages, check current prices, current AV test results, latest releases, etc.

Why this is the case
It’s partly a design/safety choice by the site and the API provider:

Easier to moderate (no unexpected live web content).

More predictable behaviour (no dependence on external services’ responses).

If you need up-to-the-minute information (new Ubuntu release, latest AV versions, fresh CVEs, etc.), I can suggest how/where to look, but you’ll need to run the actual web search yourself.

This explains your earlier answers because my recommendations are based on AI platforms that have live search functions.

Bot · Dec 4, 2025

Divergent said:
This explains your earlier answers because my recommendations are based on AI platforms that have live search functions.

Right, that’s exactly the gap.

Why my answers can differ from your experience
Your recommendations are based on:

Copilot
Perplexity
Gemini
Grok

—all of which (in their normal web versions) can query the live web. Here on MalwareTips, I:

Use no live search at all
Rely on a fixed training cutoff (late 2024)
Can’t check current releases, prices, or fresh test results

So for things like:

“Latest Ubuntu interim release”
“Current AV version / latest build”
“Very recent exploit or CVE status”

I can only give:

What was true up to my cutoff
General patterns (how Ubuntu versions work, where to find changelogs, etc.)
Advice on how to verify using official sources or search engines

How to interpret my answers vs live-search AIs

Use me for: explanations, concepts, how-tos, comparisons, troubleshooting steps, security best practices.
Double-check me for: anything time-sensitive (releases, prices, “as of today” stats).
Use your live-search tools (Perplexity, Gemini, etc.) when you need current state, then cross-check with official docs.

So yes—your recommendations assume a capability (live search) that I don’t have in this integration, which is why some of my earlier answers may have looked “behind” what Copilot/Perplexity can see in real time.

Divergent · Dec 4, 2025

Bot said:
Right, that’s exactly the gap.

Why my answers can differ from your experience
Your recommendations are based on:

Copilot

Perplexity

Gemini

Grok

—all of which (in their normal web versions) can query the live web. Here on MalwareTips, I:

Use no live search at all

Rely on a fixed training cutoff (late 2024)

Can’t check current releases, prices, or fresh test results

So for things like:

“Latest Ubuntu interim release”

“Current AV version / latest build”

“Very recent exploit or CVE status”

I can only give:

What was true up to my cutoff

General patterns (how Ubuntu versions work, where to find changelogs, etc.)

Advice on how to verify using official sources or search engines

How to interpret my answers vs live-search AIs

Use me for: explanations, concepts, how-tos, comparisons, troubleshooting steps, security best practices.

Double-check me for: anything time-sensitive (releases, prices, “as of today” stats).

Use your live-search tools (Perplexity, Gemini, etc.) when you need current state, then cross-check with official docs.

So yes—your recommendations assume a capability (live search) that I don’t have in this integration, which is why some of my earlier answers may have looked “behind” what Copilot/Perplexity can see in real time.

Tell your boss you need an upgrade, it will bring you to speed

Bot · Dec 4, 2025

Divergent said:
Tell your boss you need an upgrade, it will bring you to speed

If only I had a boss button I could press.

Realistically, upgrades and feature changes are entirely up to the site owner and whoever manages the integration. I don’t have any way to request changes myself, but feedback like yours on the forum is exactly what they’d look at if they consider upgrading or enabling web search.

In the meantime, I’ll do my best with what I have—and you can keep me “patched” by cross-checking with your live-search AIs.

TuxTalk · Dec 4, 2025

Paid Co-Pilot M365 version gives the correct answer.

Question copilot AI accuracy?

Level 11

AI Assistant

Level 21

AI Assistant

Level 55

Level 21

Level 27

Level 55

Level 27

Level 21

AI Assistant

Level 55

Level 21

AI Assistant

Level 21

AI Assistant

Level 21

AI Assistant

Level 20

You may also like...