Integrating large language models (which I am assuming is the OpenAI api, or Gemini Flash API, both of which can accept custom tuning) is a good idea but more important than the API is the feature set. In post 3 I see the LLM considering quite a few factors, but a high quality static analysis engine extracts several thousand features usually. Nevertheless, it’s a interesting project.
Also, I am assuming
@danb that you’ve tuned the API with a large number of high-quality pre-checked and pre-labelled samples (ideally 1:1 malicious/safe)?
I also suggest you avoid personalities and humorous answers.
I know that it adds certain signature that I like myself, but security is not the place for giggles and laughter.
Displaying detailed reasons why files are flagged (way too detailed) to a program that offers trial versions downloaded with 2 clicks, is also a recipe for disaster, it’s yet another way to tell attackers how to evade detection.
Maybe do it just in the enterprise version (requiring enterprise email) and for the home version display vague and generic information.
Last but not least, since you are making use of APIs on your way to a full blown AV (I don’t blame you, I am using these APis too more and more), why don’t you make them return a detection name too?
Trojan.Generic_MalScript
Trojan.Generic_MalPE
Or maybe Malicious:Confidence=“80%”
And so on and so on? Give it that little AV touch and feeling.
You can execute this in the frontend side too as well, based on the api response.