- May 4, 2018
- 2,261
This is fab insight into how you make signatures for EAM et al. Such a great and open little company you are, may need to make a future purchase.Talking about signature tools. I almost forgot. This is, for example, one of the tools we developed internally. It's called "Signature Maker". It's a clever name, I know. It's kind of like an IDE, except for creating detection signatures for our scan engine:
View attachment 210180
In general signatures for the Emsisoft engine are essentially functions that are being called by the scan engine depending on certain filter flags, like the file type for example. The signature flags you can see on the right are pretty much functions that perform certain tests. We can match signatures against certain version information fields for example or based on specific PE header fields. Things like imported APIs or exported APIs. But also more advanced information. Programming languages like .NET or Delphi, for example, leave a bunch of meta information behind, that our scan engine is capable of parsing and use as flags and information to feed into the actual detection functions (which is what signatures for our engine actually are).
Fields can be matched using a variety of methods. The most obvious one is literal matching, so checking whether the value of the file to be scanned is exactly like a given value. But it's also possible to use wild cards or regular expressions, to create more complex strings to match against. This applies to binary strings as well by the way.
One way we apply machine learning, for example, is by automatically suggesting our analysts flags and fields that are high-quality candidates for an actual signature, depending on which samples they are currently working on. You can see those red pins in front of some of the signature flags, which indicate attributes that are anomalies and therefore likely flags that would make a good signature.
But we aren't limited to just these flags. Signatures can also be made up or contain more complex patterns:
View attachment 210182
You can simply highlight the areas of the file that should be used for detection and how to locate that area. Whether it should be relative to certain points of interest for example. Patterns can have ranges. So even if they move around in the file, they still can be found. Obviously doing those by hand is a bit tedious. So you can also, once again using machine learning techniques, let the tool figure out good candidates for you:
View attachment 210181
This one, for example, parses all the functions inside the code of the file and extracts the code blocks and fragments that are most unique and don't appear in other good files. But it also works for normal strings:
View attachment 210183
At the very end of all of this, whether you decided to create the signature manually or let all the machine learning stuff help you, you end up with a small function in our own domain-specific programming language that is used by our scan engine:
View attachment 210184
This function will then be compiled into native machine code. The code of hundreds of thousands of these signatures is then combined into signature files that are being shipped to our users.
This is just a very small portion of what Signature Maker can do, but it outlines roughly how we would go about adding detection of a new malicious file. Ultimately there are a whole bunch of additional features, especially for clustering vast amounts of samples to find all the samples that are related to each other for example, so we can extract a single signature that matches all of them (often tens of thousands of variations).
It also signifies something, that I don't think a lot of people realise: For a lot of AVs, there is no difference between the engine and the signatures. In many cases, the "engine" is just a loader or a virtual machine, that loads and executes the actual logic and functionality that is part of the signature files. I only showed you a very small amount of what we can do, but in general, it can get a lot crazier and "signatures", which are really just normal code running on your system, can end up being entire algorithms and perform complex operations (for unpacking for example) and can interact with the entire Windows API.
I hope that little excursion was interesting.
~LDogg