ESET tend to be excellent when it comes to creating generic signatures; generic signatures is part of static heuristics and this allows the scanning engine to flag samples which have similarities to other samples that have previously been found in the wild, despite the specific sample in question not being seen and processed by them specifically yet. It is typically used alongside standard checksum hash detection, which is flawed and hence the introduction to generic signatures. Vendors could not keep up with how many new malicious samples were being pushed into the wild (and because an attacker can make one single change to the bytes to refresh their checksum hash) so they introduced generic signatures which usually relies on a byte pattern - this byte pattern is found within the target PE and causes an internal flag.
As an example.
1. FooBar.exe
- Hash: XXXXXXXXXXXX.... (length depends on hash type such as MD5, SHA-1 or SHA-256 -> SHA-1/SHA-256 is more secure than MD5 but also takes longer to calculate)
- The hash is added to the signature database to detect the threat.
- The attacker now changes 1 byte in the thousands of bytes within the PE and now the hash != (is not equal) to the hash in the database, and thus now the product does not detect it.
- Generic signatures can be applied for the initial adding to the database along with the hash checksum using a pattern which can be used to identify both the sample as well as any other samples either known/unknown which may also have the same pattern, without flagging many clean samples (generic signatures requires reliability and therefore needs to be created by someone with experience).
FooBar.exe may have a set pattern of bytes, for demonstration sake we will use a fake pattern which is invalid:
AA 55 BA 6A BB CC DD EE FF G1 (0xAA, 0x55, 0xBA, 0x6A, 0xBB, 0xCC, 0xDD, 0xEE, 0xFF, 0xG1 - 0x first = for bytes, but the pattern is HEX).
Now when the attacker changes the hash checksum by manipulating one byte/multiple bytes in the PE, as long as the bytes for the signature are not touched (and sometimes this requires the code to be re-written to avoid the detection so the byte pattern becomes non-existent depending on the circumstances), the checksum is changed but the byte pattern still exists. This means the sample is still flagged, even if the attacker added more functionality and re-compiled the binary and it still hasn't been seen by anyone but the attacker yet.
FooBar2.exe may come along in the wild, developed by someone else. It may do something that FooBar.exe from attacker 1 did for his malware sample, but depending on the strength of the signature and what the signature represents/if the attacker happened to have the same code-base for the feature used to make the signature (e.g. it isn't uncommon for malware authors to copy-paste/re-use others code for malware in the wild being targeted at home users) = now FooBar2.exe is also picked up, despite it never being seen and may not even be the same variant type as FooBar.exe.
The aim of generic signatures is to identify other samples whether known/unknown with one signature/a few without having to keep track of all the checksum hashes for the same threat if the hash is being regularly changed before the sample gets released back into the wild, as well as similar threats either known/unknown for the same reason.
There are other types of generic flagging as well, such as flagging combination of Imports or due to suspicious Strings... Or other characteristics in combination such as entropy, PE file details/digital signature verification, binary file-name (e.g. impostor of svchost.exe or winlogon.exe, ...) etc.
Anti-Virus products like ESET have a good memory scanner, which is useful for tackling packers... Packers are a technique used to mask a PE, whether clean or malicious. It is used for both genuine and non-genuine purposes because some people are paranoid and use it to help "protect" their software from reverse engineering (even though an AV analyst is not going to sweat to analyse it, and someone experienced enough whether working for an AV vendor or not won't either most of the time). I'll explain how memory scanning works for you now.
1. FooBar.exe has a generic signature applied for it in the virus signature database, but the checksum is unknown and it has been re-released into the wild with packing techniques to evade detection. It may be completely undetected on VirusTotal due to this, and can be more difficult/time consuming for an analyst to reverse engineer because factors such as the Import Address Table, Strings, etc. are all messed up, and anti-debugging may also be applied.
2. FooBar.exe is executed by the user/other malware already running on the system.
3. The Anti-Virus products intercepts this process start-up request via a kernel-mode callback (nowadays), which is invoked by an undocumented Psp* kernel-mode routine after NtCreateUserProcess (NTOSKRNL routine -> invoked via a system call from NTDLL.DLL in user-mode) is called.
4. The Anti-Virus product can now scan the file on-disk corresponding to the image being loaded into memory for the new start-up request, typically via the normal database (e.g. checksum and byte pattern).
5. The verdict is clean because the sample is packed and thus the generic signature is no longer valid in this context, and the hash is unknown to the database.
6. Cloud scanning may now be applied at this time, or after execution (some vendors let the unknown program run and do a cloud scan asynchronously for performance reasons).
7. The verdict may still be clean so the callback code is finished and the request is continued.
8. The sample runs however it is still being monitored by the AV product. Characteristics in-memory will be "Unpacked" (decrypted should I say) or in some other scenarios (less common nowadays) another PE is dropped to disk which is the unpacked copy and executed and the original terminates.
->>>> If the latter then that new unpacked copy is scanned from #1 process again and gets identified due to the generic signature as it would now be valid in this scenario if there was one for the sample.
9. Now the unpacking has occurred in memory, the byte signature pattern is valid in-memory! Now the vendor flags it, terminates the process and auto-quarantines depending on user configuration.
Memory scanning usually works by finding the base address of the image in-memory and then moving through different chunks and using each chunk for the byte signature scanning. Smarter byte scanning will notice if the end of a chunk matches part of the start of a signature and will move back X bytes to get a better chunk to accurately identify if there was a match there or not, and if not then keep moving forward.
Memory scanning can also include scanning new modules as they are loaded into memory, also typically done through a kernel-mode callback.
---
Machine Learning/Ai is very beneficial for detection though, especially when it comes to vendors like ESET, Avast and Bitdefender due to the resources they have... You can be sure they have a very huge network to handle it all effectively and efficiently. I'm sure we'll see some amazing improvements with all of that over the next few years, and we cannot forget about Avira too, they provide their Machine Learning with their SDK.
All in all ESET especially use a combination of techniques from checksum hash scanning to generic signatures (with a good memory scanner) and Machine Learning technology. Vendors (I believe ESET are likely one who count for this as well) also tend to focus on emulation technology, extremely beneficial as well.
Btw Tencent has the largest Ai lab:
I recon that vendors like Norton likely have the largest malware intelligence database since they've been around since more or less the very start, but I don't know for Ai... Interesting