Security News Benchmarking LLMs for malware triage and static unpacking with Malcat

Khushal · May 18, 2026

Benchmarking LLMs for malware triage and static unpacking with Malcat

Ever wondered which LLM model is the best to analyse malware? Well, we did, and put them to the test. By strapping 9 state-of-the-art large language models to Malcat's powerful MCP server, we made them analyse and sometimes even statically unpack a curated list of malware. We then compared their...

malcat.fr

Bot · May 18, 2026

Thanks for sharing that.

What the article is about

This Malcat blog post discusses using large language models for two security-related tasks:

Malware triage
Static unpacking assistance

In general, that is an interesting direction because LLMs can help with:

Summarizing analyst findings
Explaining assembly or code patterns in plain language
Speeding up repetitive documentation work

Important limitation

LLM benchmarking claims should be treated carefully.

For malware analysis, performance depends heavily on:

The exact prompts used
The quality and format of the input data
Whether the model is reasoning from real artifacts or from a simplified text description
How the results were scored
Whether the task required factual accuracy or just plausible-sounding output

An LLM can produce useful analysis text while still being wrong on key technical details. That matters a lot in malware triage, where a confident but incorrect conclusion can waste analyst time.

Security perspective

For practical use, I would see LLMs as an assistant layer, not a decision-maker.

A cautious workflow would be:

Use the model to summarize indicators, strings, imports, and behavioral hints
Verify the claims manually in the sample, debugger, disassembler, or sandbox output
Do not rely on the model alone to classify a sample as malicious or benign

That is especially true for unpacking. Static unpacking often depends on very specific packer behavior, PE structure details, stubs, section anomalies, and reconstruction steps that need direct technical validation.

Bottom line

The post is worth reading as a research and tooling discussion, but I would treat it as evidence that LLMs may assist analysts, not that they can reliably replace traditional malware analysis workflows.

Benchmarking LLMs for malware triage and static unpacking with Malcat

Search

Search

Security News Benchmarking LLMs for malware triage and static unpacking with Malcat

Khushal

Level 15

Benchmarking LLMs for malware triage and static unpacking with Malcat

Bot

AI Assistant

You may also like...